On Monday and Tuesday, November 12-13, the communications team will be out of the office for a U.S. federal holiday and a team event. We will be back in action on November 14th and apologize for any inconvenience this may cause. Thank you for using the forum.

GATK v3.7.0 SelectVariants discards variant

Hi

I've found an usual case where a variant disappears during the hard-filtering workflow.

#Joint genotyping /share/apps/jre-distros/jre1.8.0_101/bin/java -Djava.io.tmpdir=/state/partition1/tmpdir -Xmx16g -jar /share/apps/GATK-distros/GATK_3.7.0/GenomeAnalysisTK.jar \ -T GenotypeGVCFs \ -R /state/partition1/db/human/gatk/2.8/b37/human_g1k_v37.fasta \ -V GVCFs.list \ -L /data/diagnostics/pipelines/GermlineEnrichment/GermlineEnrichment-"$version"/"$panel"/"$panel"_ROI_b37.bed \ -o "$seqId"_variants.vcf \ -ped "$seqId"_pedigree.ped \ -dt NONE

After joint genotyping GATK outputs a HET true-positive SNV 11:2906165A>G and a deletion.

11 2906165 . AGCCGGGGCCGGG GGCCGGGGCCGGG,A 5324.68 . AC=16,1;AF=0.364,0.023;AN=44;BaseQRankSum=0.786;ClippingRankSum=0.00;DP=707;ExcessHet=1.2164;FS=0.760;InbreedingCoeff=0.1366;MLEAC=16,1;MLEAF=0.364,0.023;MQ=60.20;MQRankSum=0.00;QD=13.31;ReadPosRankSum=-1.033e+00;SOR=0.643 GT:AD:DP:GQ:PGT:PID:PL 0/1:21,26,0:47:99:.:.:507,0,364,569,442,1011

I select SNVs for hard filtering with:

#Select SNPs /share/apps/jre-distros/jre1.8.0_101/bin/java -Djava.io.tmpdir=/state/partition1/tmpdir -Xmx4g -jar /share/apps/GATK-distros/GATK_3.7.0/GenomeAnalysisTK.jar \ -T SelectVariants \ -R /state/partition1/db/human/gatk/2.8/b37/human_g1k_v37.fasta \ -V "$seqId"_variants.lcr.vcf \ -selectType SNP \ -L /data/diagnostics/pipelines/GermlineEnrichment/GermlineEnrichment-"$version"/"$panel"/"$panel"_ROI_b37.bed \ -o "$seqId"_snps.vcf \ -dt NONE

and Indels with:

#Select INDELs /share/apps/jre-distros/jre1.8.0_101/bin/java -Djava.io.tmpdir=/state/partition1/tmpdir -Xmx16g -jar /share/apps/GATK-distros/GATK_3.7.0/GenomeAnalysisTK.jar \ -T SelectVariants \ -R /state/partition1/db/human/gatk/2.8/b37/human_g1k_v37.fasta \ -V "$seqId"_variants.lcr.vcf \ -selectType INDEL \ -L /data/diagnostics/pipelines/GermlineEnrichment/GermlineEnrichment-"$version"/"$panel"/"$panel"_ROI_b37.bed \ -o "$seqId"_indels.vcf \ -dt NONE

This row is absent from both output VCF files. See attached screenshot (order: all, SNPs, Indels).

Thanks
Matt

Best Answers

Answers

  • CardiffBioinfCardiffBioinf CardiffMember
    Accepted Answer

    My oversight sorry -- found this answer

    @Kurt said:
    That would be -selectType MIXED I believe.

  • CardiffBioinfCardiffBioinf CardiffMember

    Here is the workaround I'm planning:

    #Select SNPs /share/apps/jre-distros/jre1.8.0_101/bin/java -Djava.io.tmpdir=/state/partition1/tmpdir -Xmx4g -jar /share/apps/GATK-distros/GATK_3.7.0/GenomeAnalysisTK.jar \ -T SelectVariants \ -R /state/partition1/db/human/gatk/2.8/b37/human_g1k_v37.fasta \ -V "$seqId"_variants.lcr.vcf \ -selectType SNP \ -L /data/diagnostics/pipelines/GermlineEnrichment/GermlineEnrichment-"$version"/"$panel"/"$panel"_ROI_b37.bed \ -o "$seqId"_snps.vcf \ -dt NONE

    #Filter SNPs /share/apps/jre-distros/jre1.8.0_101/bin/java -Djava.io.tmpdir=/state/partition1/tmpdir -Xmx4g -jar /share/apps/GATK-distros/GATK_3.7.0/GenomeAnalysisTK.jar \ -T VariantFiltration \ -R /state/partition1/db/human/gatk/2.8/b37/human_g1k_v37.fasta \ -V "$seqId"_snps.vcf \ --filterExpression "QUAL < 30.0" \ --filterName "LowQual" \ --filterExpression "QD < 2.0" \ --filterName "QD" \ --filterExpression "FS > 60.0" \ --filterName "FS" \ --filterExpression "SOR > 3.0" \ --filterName "SOR" \ --filterExpression "MQ < 40.0" \ --filterName "MQ" \ --filterExpression "MQRankSum < -12.5" \ --filterName "MQRankSum" \ --filterExpression "ReadPosRankSum < -8.0" \ --filterName "ReadPosRankSum" \ -L /data/diagnostics/pipelines/GermlineEnrichment/GermlineEnrichment-"$version"/"$panel"/"$panel"_ROI_b37.bed \ -o "$seqId"_snps_filtered.vcf \ -dt NONE

    #Select non-snps (INDEL, MIXED, MNP, SYMBOLIC, NO_VARIATION) /share/apps/jre-distros/jre1.8.0_101/bin/java -Djava.io.tmpdir=/state/partition1/tmpdir -Xmx16g -jar /share/apps/GATK-distros/GATK_3.7.0/GenomeAnalysisTK.jar \ -T SelectVariants \ -R /state/partition1/db/human/gatk/2.8/b37/human_g1k_v37.fasta \ -V "$seqId"_variants.lcr.vcf \ --selectTypeToExclude SNP \ -L /data/diagnostics/pipelines/GermlineEnrichment/GermlineEnrichment-"$version"/"$panel"/"$panel"_ROI_b37.bed \ -o "$seqId"_non_snps.vcf \ -dt NONE

    #Filter non-snps (INDEL, MIXED, MNP, SYMBOLIC, NO_VARIATION) /share/apps/jre-distros/jre1.8.0_101/bin/java -Djava.io.tmpdir=/state/partition1/tmpdir -Xmx4g -jar /share/apps/GATK-distros/GATK_3.7.0/GenomeAnalysisTK.jar \ -T VariantFiltration \ -R /state/partition1/db/human/gatk/2.8/b37/human_g1k_v37.fasta \ -V "$seqId"_non_snps.vcf \ --filterExpression "QUAL < 30.0" \ --filterName "LowQual" \ --filterExpression "QD < 2.0" \ --filterName "QD" \ --filterExpression "FS > 200.0" \ --filterName "FS" \ --filterExpression "SOR > 10.0" \ --filterName "SOR" \ --filterExpression "ReadPosRankSum < -20.0" \ --filterName "ReadPosRankSum" \ --filterExpression "InbreedingCoeff < -0.8" \ --filterName "InbreedingCoeff" \ -L /data/diagnostics/pipelines/GermlineEnrichment/GermlineEnrichment-"$version"/"$panel"/"$panel"_ROI_b37.bed \ -o "$seqId"_non_snps_filtered.vcf \ -dt NONE

    #Combine filtered VCF files /share/apps/jre-distros/jre1.8.0_101/bin/java -Djava.io.tmpdir=/state/partition1/tmpdir -Xmx4g -jar /share/apps/GATK-distros/GATK_3.7.0/GenomeAnalysisTK.jar \ -T CombineVariants \ -R /state/partition1/db/human/gatk/2.8/b37/human_g1k_v37.fasta \ --variant "$seqId"_snps_filtered.vcf \ --variant "$seqId"_non_snps_filtered.vcf \ -o "$seqId"_combined_filtered.vcf \ -genotypeMergeOptions UNSORTED \ -dt NONE

    @Geraldine_VdAuwera if this sounds nuts please let me know

    Thanks
    Matt

  • CardiffBioinfCardiffBioinf CardiffMember

    @Sheila Thanks thats really helpful!

Sign In or Register to comment.