Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Weird behaviour of SelectVatiants

Dear all,

I need split a multiple sample VCF file into one sample files. I used Select Variants with:

 --num_threads 4 -R /data/resources/chr_hg19.fa -T SelectVariants --variant /datos/samples.vcf -o SID158.vcf -sn SID158

to select in this case information of sample SID158. But I obtain weird results:

1) Allele frecuences are badly changed, for example turns form 0,8 to 0,0 (1/1:0,8:8:24:319,24,0 --> 1/1:0,0:8:24:319,24,0)
I don't see references about changes in allele depth using select variants, i don't undertand also why these values should be change, suld you explain o give me a link to learn about this?

a)After select Variants

chr7 35293972 . A G 286.14 PASS AC=40;AF=0.870;AN=46;DP=182;Dels=0.00;HRun=0;MQ0=0;set=variant-variant2-variant5-variant6-variant8-variant11-variant14-variant15-variant18-variant23-variant25-variant26-variant28-variant30-variant31-variant33-variant34-variant35-variant44-variant45-variant46-variant49-variant51 GT:AD:DP:GQ:PL 1/1:0,8:8:24:319,24,0 1/1:0,8:8:24:321,24,0 ./. ./. 1/1:0,9:9:27:343,27,0 1/1:0,2:2:6:73,6,0 ./. 1/1:0,2:2:6:80,6,0 ./. ./.

b)Before select Variants.

chr7 35293972 . A G 286.14 PASS AC=2;AF=1.00;AN=2;DP=8;Dels=0.00;HRun=0;MQ0=0;set=variant-variant2-variant5-variant6-variant8-variant11-variant14-variant15-variant18-variant23-variant25-variant26-variant28-variant30-variant31-variant33-variant34-variant35-variant44-variant45-variant46-variant49-variant51 GT:AD:DP:GQ:PL 1/1:0,0:8:24:319,24,0

2) Like above it seems allele frequence is recalculated to a bad value, and this time the variant is selected twice:
0/1:3684,3369:7054:99:103887,0,114881 --> 0/1:0,7:7054:99:103887,0,114881

After chr2 179417938 . G A 103886.87 PASS AB=0.522;AC=1;AF=0.500;AN=2;BaseQRankSum=4.370;DP=7063;Dels=0.00;FS=5.537;HRun=1;HaplotypeScore=173.4470;MQ=59.48;MQ0=0;MQRankSum=0.468;QD=14.71;ReadPosRankSum=0.312;SB=-52336.28;set=variant5 GT:AD:DP:GQ:PL ./. ./. ./. ./. 0/1:3684,3369:7054:99:103887,0,114881 ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./.

Before:

chr2 179417938 . G A 103886.87 PASS AB=0.522;AC=1;AF=0.500;AN=2;BaseQRankSum=4.370;DP=7054;Dels=0.00;FS=5.537;HRun=1;HaplotypeScore=173.4470;MQ=59.48;MQ0=0;MQRankSum=0.468;QD=14.71;ReadPosRankSum=0.312;SB=-52336.28;set=variant5 GT:AD:DP:GQ:PL ** 0/1:0,7:7054:99:103887,0,114881**

chr2 179417938 . G A 103886.87 PASS AB=0.522;AC=1;AF=0.500;AN=2;BaseQRankSum=4.370;DP=7054;Dels=0.00;FS=5.537;HRun=1;HaplotypeScore=173.4470;MQ=59.48;MQ0=0;MQRankSum=0.468;QD=14.71;ReadPosRankSum=0.312;SB=-52336.28;set=variant5 GT:AD:DP:GQ:PL 0/1:0,7:7054:99:103887,0,114881

3) In substitutions the sum of allele frequencies are not equal to DP value, for example: 1/1:0,3:4:9:103,9,0

Thanks a lot

David.

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi David, that is strange. Can you tell me what version of GATK you are using?

  • Hi Geraldine, i'm using GATK 2.2-15 .

  • igcocoleigcocole Member

    Hi David,

    It seems this bug is coming from the multithreaded mode. Same for CombineVariants.
    Try running without the -nt option. Does it reproduce?

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi David, @igcocole seems to be onto something -- can you do as he suggests (run without -nt) and let us know what happens?

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi there,

    As noted here, we're unable to reproduce this behavior; in our hands these tools work perfectly fine in multithreaded mode. It looks like it might be an issue with your filesystem not handling the multithreading operations properly. We're looking at ways to test for this problem in order to be able to issue a warning to users when this happens, but we unfortunately don't foresee being able to fix it. At this point all we can say is that if you're experiencing this problem you should run the tools without -nt.

    If anyone else experiences these issues let us know. If we get more cases we may able to find out what they have in common and pinpoint the precipitating conditions.

  • Hi igcocole and Geraldine, thank you so much for the information, i will try you say when i have a minute and send you all information possible, my apologies for don't answer before. I expect sen you information soon. Thank you again

Sign In or Register to comment.