Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

How to Filter in GATK4

I'm using GATK version4. After using VariantFiltration, I got the information of "PASS" of "not PASS". Next I want to filter out the "not PASS" variants. So I used the Selectvariants but, I can't filtered out the the not PASS variant well?
Please tell me how to use the -select argument.

Thank you

codes are as below
hkmac2017:exome_bam hirokikimura$ java -jar /Users/hirokikimura/exome_bam/gatk-4.1.2-2.0/gatk-package- SelectVariants -R human_g1k_v37_decoy.fasta -V combined_genotyped_filtered_snps_indels_mixed.vcf -select "FILTER == PASS" -O combined_genotyped_filtered_snps_indels_mixed.PASS.vcf

Best Answers


  • meganejinmeganejin Member
    Thanks to your suggestion, I can do that.
  • meganejinmeganejin Member
    I'm now analyzing bam files. I'm going to do the GATK Haplotypecalling.
    But, after samtool sort , picard remove duplicates, GATK recalibration, I find there are many QC-failed reads in bam file.
    Could you tell me I should remove the QC-failed reads in bam file before Haplotypecalling?

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    HI @meganejin

    Take a look at our recommended best practices for data preprocessing: https://software.broadinstitute.org/gatk/best-practices/workflow?id=11165

  • meganejinmeganejin Member
    Thank you for reply! It's very simple!
    Actually, I got multiple bam files after "Map to Refference" BWA from my collaborator.
    So, I performed MarkDuplicates by pocard and BaseRecalibrator by GATK4 for each bamfiles.
    The result is attached file.
    I asked you, because I saw many examples of bam files which don't have QC-failed reads.
    So, is it OK to perform the Haplotypecalling by GATK 4 despite the many QC-failed reads?
  • meganejinmeganejin Member

    Thank you very much for your detailed advice!!
    One more question, I want to analyze 700 bam files. So, I don't want to reduce the time of analyzing per sample.
    For each samples, so far I performed samtools-sort, Picard-FixmateInformation, Picard-Markduplication.
    However, in the GATK"best practice" you showed me last week, only MarkDuplicatesSpark is needed for the above three process.

    What I want to ask you is that if I perform GATK4-MarkDuplicatesSpark, I don't have to perform the samtools sort or Picard-FixmateInformation ?
    If so, I'm glad because many time will be saved.

    Best regards
  • meganejinmeganejin Member
    One more question,
    As your suggestion, I performed the MarkDuplicatesSpark from my bam file using below command.
    java -jar /Users/hirokikimura/exome_bam/gatk-4.1.2-2.0/gatk-package- MarkDuplicatesSpark -I NC412.bam -O NC412.marked_duplicates.bam
    But I can't look the "marked_duplicates.bam" file because of the fail to open the bam file.
    samtools flagstat NC412.marked_duplicates.bam
    [E::hts_hopen] Failed to open file NC412.marked_duplicates.bam
    [E::hts_open_format] Failed to open file NC412.marked_duplicates.bam
    Furthermore, I can't perform the Basereacalibrator -GATK4 using the above NC412.marked_duplicates.bam because of the following commnet.
    java -jar /Users/hirokikimura/exome_bam/gatk-4.1.2-2.0/gatk-package- BaseRecalibrator -I NC412.marked_duplicates.bam -R human_g1k_v37_decoy.fasta --known-sites dbsnp_138.b37.vcf -known-sites Mills_and_1000G_gold_standard.indels.b37.vcf -O NC412_spark_recal.table

    A USER ERROR has occurred: Input files reference and reads have incompatible contigs: No overlapping contigs found.
    reference contigs = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, X, Y, MT, GL000207.1, GL000226.1, GL000229.1, GL000231.1, GL000210.1, GL000239.1, GL000235.1, GL000201.1, GL000247.1, GL000245.1, GL000197.1, GL000203.1, GL000246.1, GL000249.1, GL000196.1, GL000248.1, GL000244.1, GL000238.1, GL000202.1, GL000234.1, GL000232.1, GL000206.1, GL000240.1, GL000236.1, GL000241.1, GL000243.1, GL000242.1, GL000230.1, GL000237.1, GL000233.1, GL000204.1, GL000198.1, GL000208.1, GL000191.1, GL000227.1, GL000228.1, GL000214.1, GL000221.1, GL000209.1, GL000218.1, GL000220.1, GL000213.1, GL000211.1, GL000199.1, GL000217.1, GL000216.1, GL000215.1, GL000205.1, GL000219.1, GL000224.1, GL000223.1, GL000195.1, GL000212.1, GL000222.1, GL000200.1, GL000193.1, GL000194.1, GL000225.1, GL000192.1, NC_007605, hs37d5]
    reads contigs = []

    Are there any error for using MarkduplicatesSpark ?

    Best regards,
  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin


    1) MarkDuplicatesSpark processing can replace MarkDuplicates and SortSam steps but not Picard-FixmateInformation.


    I can't look the "marked_duplicates.bam" file because of the fail to open the bam file.

    Please validate you bam file using this tool https://software.broadinstitute.org/gatk/documentation/tooldocs/current/picard_sam_ValidateSamFile.php


    A USER ERROR has occurred: Input files reference and reads have incompatible contigs: No overlapping contigs found.

    Can you please confirm that the the reference build that you used to align the reads match the reference build used in BaseRecalibrator?
    Please post the GATK versions and the exact commands you used for preprocessing.

Sign In or Register to comment.