selectvariants DP for each sample

Hello,

I run the command
java -jar 3.4.0/GenomeAnalysisTK.jar -R ref.fasta -T SelectVariants --file.vcf -o filt_DP.vcf -select "DP > 2000.0" (just to check)

and I noticed that the variants are selected based on the total DP across samples (29 in my case). How is it possible to select the variants based on the individual DP of each sample? Ideally, I would like all the samples to have DP>=7.0.

Also, how is the total DP calculated? I checked and it is not the sum of the DP of all samples.

Here is one example of my file
chr1 1650845 rs1059831 G A 25434.21 PASS AC=29;AF=0.500;AN=58;BaseQRankSum=-6.170e-01;ClippingRankSum=-3.620e-01;DB;DP=2317;FS=0.000;InbreedingCoeff=-1.0000;MLEAC=29;MLEAF=0.500;MQ=60.00;MQRankSum=0.054;POSITIVE_TRAIN_SITE;QD=11.07;ReadPosRankSum=1.06;SOR=0.672;VQSLOD=33.43;culprit=MQ GT:AD:DP:GQ:PL 0/1:36,42:78:99:1033,0,851 0/1:59,43:102:99:982,0,2194 0/1:42,55:97:99:1347,0,1036 0/1:43,49:92:99:1123,0,1634 0/1:12,21:33:99:520,0,332 0/1:43,38:81:99:947,0,971 0/1:69,61:130:99:1447,0,1688 0/1:37,28:65:99:601,0,942 0/1:32,7:39:90:90,0,799 0/1:45,32:77:99:651,0,1722 0/1:41,22:63:99:541,0,1016 0/1:47,34:81:99:732,0,1138 0/1:40,46:86:99:1109,0,996 0/1:30,25:55:99:572,0,746 0/1:63,44:107:99:900,0,2450 0/1:54,31:85:99:705,0,1991 0/1:40,57:97:99:1372,0,935 0/1:37,36:73:99:810,0,909 0/1:41,30:71:99:687,0,1626 0/1:54,69:123:99:1556,0,1129 0/1:40,34:74:99:778,0,1006 0/1:34,29:63:99:715,0,820 0/1:27,50:77:99:1133,0,605 0/1:46,47:93:99:1205,0,1150 0/1:34,27:61:99:593,0,789 0/1:36,31:67:99:725,0,834 0/1:22,27:49:99:651,0,532 0/1:49,42:91:99:896,0,1163 0/1:40,48:88:99:1114,0,941

Issue · Github
by Sheila

Issue Number
517
State
closed
Last Updated
Assignee
Array
Milestone
Array
Closed By
vdauwera

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @alejandra
    Hi,

    I am asking the team how to do this. We will get back to you soon.

    -Sheila

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    It's not possible to select directly on genotype-level annotations.

    One way to go about this would be to apply genotype-level filters using VariantFiltration then perform a selection based on number or fraction of filtered genotypes using SelectVariants.

    A different approach would be to dump annotations to table using VariantsToTable, run your preferred data analysis software to identify rows with the desired property, then use that information to produce a list of positions to include or exclude using SelectVariants.

    The first method has the advantage of being pure-GATK and easy to automate in a workflow. The second is more flexible if you decide to experiment with your filtering criteria; among other things you can also plot distributions of values etc while you're at it, which can be helpful for determining the thresholds you want to apply.

  • alejandraalejandra spainMember

    What I did is the following:

    I select the variants with DP>=5.0 for sample1 and then I use the output file as the input in the second command to select the variants with DP>=5.0 for sample2. I continue to do that for all the 29 samples that I have.

    For example
    java -jar gatk/3.4.0/GenomeAnalysisTK.jar -R ref.fasta -T SelectVariants --variant filtered.vcf -o output1.vcf -select 'vc.getGenotype("sample1").getDP() >= 5.0'

    java -jar gatk/3.4.0/GenomeAnalysisTK.jar -R ref.fasta -T SelectVariants --variant output1.vcf -o output2.vcf -select 'vc.getGenotype("sample2").getDP() >= 5.0'

    Do you think it is correct?

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @alejandra
    Hi,

    No that won't work. Have a look at what Geraldine said above. You can use VariantFiltration to filter out sites that have FORMAT DP less than some value. https://www.broadinstitute.org/gatk/guide/tooldocs/org_broadinstitute_gatk_tools_walkers_filters_VariantFiltration.php#--genotypeFilterExpression
    Then, you can use SelectVariants to select the sites that are not filtered by VariantFiltration.

    -Sheila

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Well, what I meant is that we don't have a pre-baked argument for selection on genotype properties, but that JEXL expression might actually work (@alejandra, does the command run successfully?). But even so it would be a really painful/inefficient process. If it was me I would go through the VariantsToTable route, if only because if you change your mind about criteria you can adjust quickly, instead of having to repeat that tedious iterative process.

  • alejandraalejandra spainMember

    Hi Geraldine and Sheila,

    The command runs successfully but I would like to confirm the result with the other way too.
    I run the following:
    java -jar GenomeAnalysisTK.jar -R ref.fasta -T VariantFiltration -o output.vcf --variant filt_QUAL50.vcf --genotypeFilterExpression "DP < 5" --genotypeFilterName "FT"

    so in the output I have the FT filter for the variants that have DP<5.

    **But how do I select the variants that are not filtered with FT? **

    I tried java -jar GenomeAnalysisTK.jar -R ref.fasta -T SelectVariants --variant output.vcf -select 'vc.hasAttribute("DP")' -o final_filtered.vcf

    but it didn't work as all the variants have the DP attribute in the genotype filtering.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin
    edited February 2016
  • alejandraalejandra spainMember

    Hello,

    I have the version of gatk 3.4.0.
    I run the following command:

    java -jar gatk/3.4.0/GenomeAnalysisTK.jar -R ucsc.hg19.fasta -T SelectVariants --variant output.vcf --setFilteredGtToNocall

    and I get the error ERROR MESSAGE: Argument with name 'setFilteredGtToNocall' isn't defined.

    What exactly is the problem?

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @alejandra
    Hi,

    It looks like that argument was introduced in 3.4-46.

    -Sheila

  • alejandraalejandra spainMember

    I also tried:

    java -jar gatk/3.4.0/GenomeAnalysisTK.jar -R ucsc.hg19.fasta -T VariantFiltration -o output1.vcf --variant filt_QUAL50_vqslod_PASS.vcf --genotypeFilterExpression "DP >= 5" --genotypeFilterName "FT"

    java -jar gatk/3.4.0/GenomeAnalysisTK.jar -R ucsc.hg19.fasta -T SelectVariants --variant output1.vcf -select 'vc.hasAttribute("FT")' -o final_filtered.vcf

    but it didn't work either.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    What does "didn't work" mean? Did it produce an error or did it run but the result is not what you wanted?

Sign In or Register to comment.