GATK v3.7.0 SelectVariants discards variant

Hi

I've found an usual case where a variant disappears during the hard-filtering workflow.

#Joint genotyping /share/apps/jre-distros/jre1.8.0_101/bin/java -Djava.io.tmpdir=/state/partition1/tmpdir -Xmx16g -jar /share/apps/GATK-distros/GATK_3.7.0/GenomeAnalysisTK.jar \ -T GenotypeGVCFs \ -R /state/partition1/db/human/gatk/2.8/b37/human_g1k_v37.fasta \ -V GVCFs.list \ -L /data/diagnostics/pipelines/GermlineEnrichment/GermlineEnrichment-"$version"/"$panel"/"$panel"_ROI_b37.bed \ -o "$seqId"_variants.vcf \ -ped "$seqId"_pedigree.ped \ -dt NONE

After joint genotyping GATK outputs a HET true-positive SNV 11:2906165A>G and a deletion.

11 2906165 . AGCCGGGGCCGGG GGCCGGGGCCGGG,A 5324.68 . AC=16,1;AF=0.364,0.023;AN=44;BaseQRankSum=0.786;ClippingRankSum=0.00;DP=707;ExcessHet=1.2164;FS=0.760;InbreedingCoeff=0.1366;MLEAC=16,1;MLEAF=0.364,0.023;MQ=60.20;MQRankSum=0.00;QD=13.31;ReadPosRankSum=-1.033e+00;SOR=0.643 GT:AD:DP:GQ:PGT:PID:PL 0/1:21,26,0:47:99:.:.:507,0,364,569,442,1011

I select SNVs for hard filtering with:

#Select SNPs /share/apps/jre-distros/jre1.8.0_101/bin/java -Djava.io.tmpdir=/state/partition1/tmpdir -Xmx4g -jar /share/apps/GATK-distros/GATK_3.7.0/GenomeAnalysisTK.jar \ -T SelectVariants \ -R /state/partition1/db/human/gatk/2.8/b37/human_g1k_v37.fasta \ -V "$seqId"_variants.lcr.vcf \ -selectType SNP \ -L /data/diagnostics/pipelines/GermlineEnrichment/GermlineEnrichment-"$version"/"$panel"/"$panel"_ROI_b37.bed \ -o "$seqId"_snps.vcf \ -dt NONE

and Indels with:

#Select INDELs /share/apps/jre-distros/jre1.8.0_101/bin/java -Djava.io.tmpdir=/state/partition1/tmpdir -Xmx16g -jar /share/apps/GATK-distros/GATK_3.7.0/GenomeAnalysisTK.jar \ -T SelectVariants \ -R /state/partition1/db/human/gatk/2.8/b37/human_g1k_v37.fasta \ -V "$seqId"_variants.lcr.vcf \ -selectType INDEL \ -L /data/diagnostics/pipelines/GermlineEnrichment/GermlineEnrichment-"$version"/"$panel"/"$panel"_ROI_b37.bed \ -o "$seqId"_indels.vcf \ -dt NONE

This row is absent from both output VCF files. See attached screenshot (order: all, SNPs, Indels).

Thanks
Matt

Best Answers

Answers

  • CardiffBioinfCardiffBioinf CardiffMember
    Accepted Answer

    My oversight sorry -- found this answer

    @Kurt said:
    That would be -selectType MIXED I believe.

  • CardiffBioinfCardiffBioinf CardiffMember

    Here is the workaround I'm planning:

    #Select SNPs /share/apps/jre-distros/jre1.8.0_101/bin/java -Djava.io.tmpdir=/state/partition1/tmpdir -Xmx4g -jar /share/apps/GATK-distros/GATK_3.7.0/GenomeAnalysisTK.jar \ -T SelectVariants \ -R /state/partition1/db/human/gatk/2.8/b37/human_g1k_v37.fasta \ -V "$seqId"_variants.lcr.vcf \ -selectType SNP \ -L /data/diagnostics/pipelines/GermlineEnrichment/GermlineEnrichment-"$version"/"$panel"/"$panel"_ROI_b37.bed \ -o "$seqId"_snps.vcf \ -dt NONE

    #Filter SNPs /share/apps/jre-distros/jre1.8.0_101/bin/java -Djava.io.tmpdir=/state/partition1/tmpdir -Xmx4g -jar /share/apps/GATK-distros/GATK_3.7.0/GenomeAnalysisTK.jar \ -T VariantFiltration \ -R /state/partition1/db/human/gatk/2.8/b37/human_g1k_v37.fasta \ -V "$seqId"_snps.vcf \ --filterExpression "QUAL < 30.0" \ --filterName "LowQual" \ --filterExpression "QD < 2.0" \ --filterName "QD" \ --filterExpression "FS > 60.0" \ --filterName "FS" \ --filterExpression "SOR > 3.0" \ --filterName "SOR" \ --filterExpression "MQ < 40.0" \ --filterName "MQ" \ --filterExpression "MQRankSum < -12.5" \ --filterName "MQRankSum" \ --filterExpression "ReadPosRankSum < -8.0" \ --filterName "ReadPosRankSum" \ -L /data/diagnostics/pipelines/GermlineEnrichment/GermlineEnrichment-"$version"/"$panel"/"$panel"_ROI_b37.bed \ -o "$seqId"_snps_filtered.vcf \ -dt NONE

    #Select non-snps (INDEL, MIXED, MNP, SYMBOLIC, NO_VARIATION) /share/apps/jre-distros/jre1.8.0_101/bin/java -Djava.io.tmpdir=/state/partition1/tmpdir -Xmx16g -jar /share/apps/GATK-distros/GATK_3.7.0/GenomeAnalysisTK.jar \ -T SelectVariants \ -R /state/partition1/db/human/gatk/2.8/b37/human_g1k_v37.fasta \ -V "$seqId"_variants.lcr.vcf \ --selectTypeToExclude SNP \ -L /data/diagnostics/pipelines/GermlineEnrichment/GermlineEnrichment-"$version"/"$panel"/"$panel"_ROI_b37.bed \ -o "$seqId"_non_snps.vcf \ -dt NONE

    #Filter non-snps (INDEL, MIXED, MNP, SYMBOLIC, NO_VARIATION) /share/apps/jre-distros/jre1.8.0_101/bin/java -Djava.io.tmpdir=/state/partition1/tmpdir -Xmx4g -jar /share/apps/GATK-distros/GATK_3.7.0/GenomeAnalysisTK.jar \ -T VariantFiltration \ -R /state/partition1/db/human/gatk/2.8/b37/human_g1k_v37.fasta \ -V "$seqId"_non_snps.vcf \ --filterExpression "QUAL < 30.0" \ --filterName "LowQual" \ --filterExpression "QD < 2.0" \ --filterName "QD" \ --filterExpression "FS > 200.0" \ --filterName "FS" \ --filterExpression "SOR > 10.0" \ --filterName "SOR" \ --filterExpression "ReadPosRankSum < -20.0" \ --filterName "ReadPosRankSum" \ --filterExpression "InbreedingCoeff < -0.8" \ --filterName "InbreedingCoeff" \ -L /data/diagnostics/pipelines/GermlineEnrichment/GermlineEnrichment-"$version"/"$panel"/"$panel"_ROI_b37.bed \ -o "$seqId"_non_snps_filtered.vcf \ -dt NONE

    #Combine filtered VCF files /share/apps/jre-distros/jre1.8.0_101/bin/java -Djava.io.tmpdir=/state/partition1/tmpdir -Xmx4g -jar /share/apps/GATK-distros/GATK_3.7.0/GenomeAnalysisTK.jar \ -T CombineVariants \ -R /state/partition1/db/human/gatk/2.8/b37/human_g1k_v37.fasta \ --variant "$seqId"_snps_filtered.vcf \ --variant "$seqId"_non_snps_filtered.vcf \ -o "$seqId"_combined_filtered.vcf \ -genotypeMergeOptions UNSORTED \ -dt NONE

    @Geraldine_VdAuwera if this sounds nuts please let me know

    Thanks
    Matt

  • CardiffBioinfCardiffBioinf CardiffMember

    @Sheila Thanks thats really helpful!

Sign In or Register to comment.