Annotate variants with frequency and then filter/flag variants

Hi,

I am planning to annotate VCF files with external frequency information such as 1KGP. So I used VariantAnnotator walker to do that and got a VCF file with OneKGP.AF in the INFO tag which is good. However, when I try to VariantFiltration this new VCF file, GATK actually filter/flag my variants using frequency information from "AF" instead of "OneKGP.AF" due to the dot in the name of "OneKGP.AF".

-- It is a great function for users to be able to annotate variant with external information and filter variants. I wonder if there is a way to get around of the problem I encounter here. Thanks

1 866511 rs60722469 C CCCCT 42628.59 PASS AC=42;AF=0.808;AN=52;AS_InbreedingCoeff=0.5047;BaseQRankSum=-1.570e-01;ClippingRankSum=-3.100e-02;DB;DP=1404;ExcessHet=0.0547;FS=1.709;FractionInformativeReads=0.796;GC=68.32;GQ_MEAN=90.42;GQ_STDDEV=16.42;HRun=0;HW=13.2;InbreedingCoeff=0.5047;MLEAC=42;MLEAF=0.808;MQ=58.15;MQRankSum=0.226;NCC=0;NDA=1;OneKGP.AF=0.148962;QD=32.93;RPA=3,4;RU=CCCT;ReadPosRankSum=0.361;SOR=1.037;STR;;VQSLOD=2.08;VariantType=INSERTION.NumRepetitions_3.EventLength_4;culprit=FS GT:AD:DP:GQ:MLPSAC:MLPSAF:PL

The commandline i used to annotate the original VCF file is

Annotate VCF with global AF from 1KGP

${PathToJava}/java -jar ${PatToGATK}/GenomeAnalysisTK.jar \
-R $Ref \
-T VariantAnnotator \
-V CONTROLS_PLUS_CGC_PedTest4.VQSR.ANNOTATED.VARIANT_ONLY.PASS.vcf \
-o CGC_PedTest4.VQSR.PASS.AnnoFreq.vcf \
--resource:OneKGP ALL.autosomes.phase3_shapeit2_mvncall_integrated_v5.20130502.sites.vcf \
-E OneKGP.AF

The commandline i used to Filter/Flag variants

Filter VCF file based on 1KGP.AF information:

${PathToJava}/java -jar ${PatToGATK}/GenomeAnalysisTK.jar \
-R $Ref \
-T VariantFiltration \
-o CGC_PedTest4.VQSR.PASS.AnnoFreqAll.Filter.vcf \
--variant CGC_PedTest4.VQSR.PASS.AnnoFreqAll.vcf \
--filterExpression "OneKGP.AF>0.1" \
--filterName "OneKGP.AF_Pct10"

As the issue coming from the dot in the name of "OneKPG.AF", I tried to replace " -E OneKGP.AF" with "-E OneKGP_AF" in the annotation step and apparently GATK doesn't allow me to do that as shown below.

ERROR ------------------------------------------------------------------------------------------
ERROR A USER ERROR has occurred (version 3.3-0-g37228af):
ERROR
ERROR This means that one or more arguments or inputs in your command are incorrect.
ERROR The error message below tells you what is the problem.
ERROR
ERROR If the problem is an invalid argument, please check the online documentation guide
ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.
ERROR
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR
ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
ERROR
ERROR MESSAGE: Invalid command line: Argument OneKGP_AF has a bad value: it should be in rodname.value format
ERROR ------------------------------------------------------------------------------------------

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @HeidiLing
    Hi,

    I just tested this with a later version and it worked for me. Can you please try upgrading to the latest GATK version (3.5)?

    Thanks,
    Sheila

  • HeidiLingHeidiLing Member ✭✭

    I do not think it gets fixed in GATK3.5. I am using 3.5.0 now. This variant was annotated to have low OneKGP.AF, but you can see there was not even OneKGP.AF key in the INFO tag. It is actually AF<0.1 that cause this variant being picked up as having low freq.

    1 909247 rs72631892 C T 2012.3 OneKGP.AF_Pct10
    ABHet=0.492;ABHom=1.00;AC=4;AF=0.077;AN=52;AS_InbreedingCoeff=-0.0833;BaseQRankSum=-1.197e+00;ClippingRankSum=0.747;DB;DP=1055;ExcessHet=3.5480;FS=9.000;FractionInformativeReads=0.997;GC=66.34;GQ_MEAN=91.73;GQ_STDDEV=12.18;HRun=0;HW=0.0;InbreedingCoeff=-0.0833;MLEAC=4;MLEAF=0.077;MQ=22.77;MQRankSum=0.055;NCC=0;NDA=1;QD=13.51;ReadPosRankSum=0.916;SOR=1.370;Samples=NA12891-1,NA12891-2,NA12891-500,NA12891-short;VQSLOD=-1.281e-01;VariantType=SNP;culprit=FS

    There is another example, that AF<0.1 but OneKGP.AF>0.1.
    1 897738 rs6696971 C T 10135.18 OneKGP.AF_Pct10 ABHet=0.488;ABHom=0.999;AC=3;AF=0.058;AN=52;AS_InbreedingCoeff=-0.0612;BaseQRankSum=-8.264e+00;ClippingRankSum=-3.480e-01;DB;DP=6393;ExcessHet=3.2736;FS=1.107;FractionInformativeReads=1.000;GC=61.39;GQ_MEAN=99.00;GQ_STDDEV=0.00;HRun=2;HW=0.0;InbreedingCoeff=-0.0612;MLEAC=3;MLEAF=0.058;MQ=19.60;MQRankSum=0.126;NCC=0;NDA=1;OND=9.620e-04;OneKGP.AF=0.102037;OneKGP.AFR_AF=0.18;OneKGP.AMR_AF=0.0994;OneKGP.EAS_AF=0.0774;OneKGP.EUR_AF=0.0507;OneKGP.SAS_AF=0.0767;POSITIVE_TRAIN_SITE;QD=14.90;ReadPosRankSum=0.724;SOR=0.777;Samples=NA18503,NA18504,NA18505;VQSLOD=4.02;VariantType=SNP;culprit=MQ

    This is the commandline I used
    java -jar GenomeAnalysisTK-3.5-0/GenomeAnalysisTK.jar -R /isilon/sequencing/GATK_resource_bundle/1.5/b37/human_g1k_v37_decoy.fasta -T VariantFiltration -o /VCF/CGC_PedTest4.VQSR.PASS.AnnoFreqAll.Filter.vcf --variant /CGC_PedTest4.VQSR.PASS.AnnoFreqAll.vcf --filterExpression "OneKGP.AF<0.1" --filterName "OneKGP.AF_Pct10"

  • HeidiLingHeidiLing Member ✭✭

    Here is an example where OneKGP.AF<0.1 but AF>0.1. This variant was considered to be PASS

    1 879317 rs7523549 C T 20468.57 PASS ABHet=0.508;ABHom=0.999;AC=12;AF=0.231;AN=52;AS_InbreedingCoeff=-0.3000;BaseQRankSum=-7.700e-02;ClippingRankSum=-1.800e-01;DB;DP=2809;ExcessHet=10.1846;FS=0.536;FractionInformativeReads=0.999;GC=68.32;GQ_MEAN=99.00;GQ_STDDEV=0.00;HRun=0;HW=4.2;InbreedingCoeff=-0.3000;MLEAC=12;MLEAF=0.231;MQ=41.55;MQRankSum=0.025;NCC=0;NDA=1;OND=6.660e-04;OneKGP.AF=0.091254;OneKGP.AFR_AF=0.2156;OneKGP.AMR_AF=0.0778;OneKGP.EAS_AF=0.0665;OneKGP.EUR_AF=0.0398;OneKGP.SAS_AF=0.0112;POSITIVE_TRAIN_SITE;QD=15.23;ReadPosRankSum=0.207;SOR=0.748;Samples=NA00857-1Bl-1-3,NA00857-1Bl-2,NA00857-1CMSC-1-3,NA00857-1CMSC-2,NA00857-500Bl-1-3,NA00857-500Bl-2,NA00857-500CMSC-2,NA18503,NA18504,NA18505,NA24631-1,NA24695-1;VQSLOD=4.75;VariantType=SNP;culprit=FS

  • HeidiLingHeidiLing Member ✭✭

    In addition, if we replace OneKGP.AF with OneKGP_AF the filtering will be done correctly. We can do the global search and replacement in VCF file. But I am worried that there will be some unknown replacement occur which we do not want. Hope this can be solved/fixed.

  • HeidiLingHeidiLing Member ✭✭

    @Sheila said:
    @HeidiLing
    Hi,

    I just tested this with a later version and it worked for me. Can you please try upgrading to the latest GATK version (3.5)?

    Thanks,
    Sheila

    Hi Sheila,

    see my post above. The issue is not solved yet.

Sign In or Register to comment.