Holiday Notice:
The Frontline Support team will be slow to respond December 17-18 due to an institute-wide retreat and offline December 22- January 1, while the institute is closed. Thank you for your patience during these next few weeks. Happy Holidays!

GATK v3.7.0 SelectVariants discards variant

Hi

I've found an usual case where a variant disappears during the hard-filtering workflow.

#Joint genotyping /share/apps/jre-distros/jre1.8.0_101/bin/java -Djava.io.tmpdir=/state/partition1/tmpdir -Xmx16g -jar /share/apps/GATK-distros/GATK_3.7.0/GenomeAnalysisTK.jar \ -T GenotypeGVCFs \ -R /state/partition1/db/human/gatk/2.8/b37/human_g1k_v37.fasta \ -V GVCFs.list \ -L /data/diagnostics/pipelines/GermlineEnrichment/GermlineEnrichment-"$version"/"$panel"/"$panel"_ROI_b37.bed \ -o "$seqId"_variants.vcf \ -ped "$seqId"_pedigree.ped \ -dt NONE

After joint genotyping GATK outputs a HET true-positive SNV 11:2906165A>G and a deletion.

11 2906165 . AGCCGGGGCCGGG GGCCGGGGCCGGG,A 5324.68 . AC=16,1;AF=0.364,0.023;AN=44;BaseQRankSum=0.786;ClippingRankSum=0.00;DP=707;ExcessHet=1.2164;FS=0.760;InbreedingCoeff=0.1366;MLEAC=16,1;MLEAF=0.364,0.023;MQ=60.20;MQRankSum=0.00;QD=13.31;ReadPosRankSum=-1.033e+00;SOR=0.643 GT:AD:DP:GQ:PGT:PID:PL 0/1:21,26,0:47:99:.:.:507,0,364,569,442,1011

I select SNVs for hard filtering with:

#Select SNPs /share/apps/jre-distros/jre1.8.0_101/bin/java -Djava.io.tmpdir=/state/partition1/tmpdir -Xmx4g -jar /share/apps/GATK-distros/GATK_3.7.0/GenomeAnalysisTK.jar \ -T SelectVariants \ -R /state/partition1/db/human/gatk/2.8/b37/human_g1k_v37.fasta \ -V "$seqId"_variants.lcr.vcf \ -selectType SNP \ -L /data/diagnostics/pipelines/GermlineEnrichment/GermlineEnrichment-"$version"/"$panel"/"$panel"_ROI_b37.bed \ -o "$seqId"_snps.vcf \ -dt NONE

and Indels with:

#Select INDELs /share/apps/jre-distros/jre1.8.0_101/bin/java -Djava.io.tmpdir=/state/partition1/tmpdir -Xmx16g -jar /share/apps/GATK-distros/GATK_3.7.0/GenomeAnalysisTK.jar \ -T SelectVariants \ -R /state/partition1/db/human/gatk/2.8/b37/human_g1k_v37.fasta \ -V "$seqId"_variants.lcr.vcf \ -selectType INDEL \ -L /data/diagnostics/pipelines/GermlineEnrichment/GermlineEnrichment-"$version"/"$panel"/"$panel"_ROI_b37.bed \ -o "$seqId"_indels.vcf \ -dt NONE

This row is absent from both output VCF files. See attached screenshot (order: all, SNPs, Indels).

Thanks
Matt

Best Answers

Answers

  • CardiffBioinfCardiffBioinf CardiffMember
    Accepted Answer

    My oversight sorry -- found this answer

    @Kurt said:
    That would be -selectType MIXED I believe.

  • CardiffBioinfCardiffBioinf CardiffMember

    Here is the workaround I'm planning:

    #Select SNPs /share/apps/jre-distros/jre1.8.0_101/bin/java -Djava.io.tmpdir=/state/partition1/tmpdir -Xmx4g -jar /share/apps/GATK-distros/GATK_3.7.0/GenomeAnalysisTK.jar \ -T SelectVariants \ -R /state/partition1/db/human/gatk/2.8/b37/human_g1k_v37.fasta \ -V "$seqId"_variants.lcr.vcf \ -selectType SNP \ -L /data/diagnostics/pipelines/GermlineEnrichment/GermlineEnrichment-"$version"/"$panel"/"$panel"_ROI_b37.bed \ -o "$seqId"_snps.vcf \ -dt NONE

    #Filter SNPs /share/apps/jre-distros/jre1.8.0_101/bin/java -Djava.io.tmpdir=/state/partition1/tmpdir -Xmx4g -jar /share/apps/GATK-distros/GATK_3.7.0/GenomeAnalysisTK.jar \ -T VariantFiltration \ -R /state/partition1/db/human/gatk/2.8/b37/human_g1k_v37.fasta \ -V "$seqId"_snps.vcf \ --filterExpression "QUAL < 30.0" \ --filterName "LowQual" \ --filterExpression "QD < 2.0" \ --filterName "QD" \ --filterExpression "FS > 60.0" \ --filterName "FS" \ --filterExpression "SOR > 3.0" \ --filterName "SOR" \ --filterExpression "MQ < 40.0" \ --filterName "MQ" \ --filterExpression "MQRankSum < -12.5" \ --filterName "MQRankSum" \ --filterExpression "ReadPosRankSum < -8.0" \ --filterName "ReadPosRankSum" \ -L /data/diagnostics/pipelines/GermlineEnrichment/GermlineEnrichment-"$version"/"$panel"/"$panel"_ROI_b37.bed \ -o "$seqId"_snps_filtered.vcf \ -dt NONE

    #Select non-snps (INDEL, MIXED, MNP, SYMBOLIC, NO_VARIATION) /share/apps/jre-distros/jre1.8.0_101/bin/java -Djava.io.tmpdir=/state/partition1/tmpdir -Xmx16g -jar /share/apps/GATK-distros/GATK_3.7.0/GenomeAnalysisTK.jar \ -T SelectVariants \ -R /state/partition1/db/human/gatk/2.8/b37/human_g1k_v37.fasta \ -V "$seqId"_variants.lcr.vcf \ --selectTypeToExclude SNP \ -L /data/diagnostics/pipelines/GermlineEnrichment/GermlineEnrichment-"$version"/"$panel"/"$panel"_ROI_b37.bed \ -o "$seqId"_non_snps.vcf \ -dt NONE

    #Filter non-snps (INDEL, MIXED, MNP, SYMBOLIC, NO_VARIATION) /share/apps/jre-distros/jre1.8.0_101/bin/java -Djava.io.tmpdir=/state/partition1/tmpdir -Xmx4g -jar /share/apps/GATK-distros/GATK_3.7.0/GenomeAnalysisTK.jar \ -T VariantFiltration \ -R /state/partition1/db/human/gatk/2.8/b37/human_g1k_v37.fasta \ -V "$seqId"_non_snps.vcf \ --filterExpression "QUAL < 30.0" \ --filterName "LowQual" \ --filterExpression "QD < 2.0" \ --filterName "QD" \ --filterExpression "FS > 200.0" \ --filterName "FS" \ --filterExpression "SOR > 10.0" \ --filterName "SOR" \ --filterExpression "ReadPosRankSum < -20.0" \ --filterName "ReadPosRankSum" \ --filterExpression "InbreedingCoeff < -0.8" \ --filterName "InbreedingCoeff" \ -L /data/diagnostics/pipelines/GermlineEnrichment/GermlineEnrichment-"$version"/"$panel"/"$panel"_ROI_b37.bed \ -o "$seqId"_non_snps_filtered.vcf \ -dt NONE

    #Combine filtered VCF files /share/apps/jre-distros/jre1.8.0_101/bin/java -Djava.io.tmpdir=/state/partition1/tmpdir -Xmx4g -jar /share/apps/GATK-distros/GATK_3.7.0/GenomeAnalysisTK.jar \ -T CombineVariants \ -R /state/partition1/db/human/gatk/2.8/b37/human_g1k_v37.fasta \ --variant "$seqId"_snps_filtered.vcf \ --variant "$seqId"_non_snps_filtered.vcf \ -o "$seqId"_combined_filtered.vcf \ -genotypeMergeOptions UNSORTED \ -dt NONE

    @Geraldine_VdAuwera if this sounds nuts please let me know

    Thanks
    Matt

  • CardiffBioinfCardiffBioinf CardiffMember

    @Sheila Thanks thats really helpful!

Sign In or Register to comment.