The current GATK version is 3.6-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Powered by Vanilla. Made with Bootstrap.

How to produce output all the variants that are unique in my dataset

rzengrzeng HoustonPosts: 18Member

I hope I have not duplicated the question since I did not find solution.

Suppose I have one variant dataset which just includes variants from ONE sample . If i have another outer datasets (not my test dataset), I can produce variants that are unique in my test call dataset by using --discordance argument like this with no problem:

$ java -Xmx2g -jar GenomeAnalysisTK.jar \ -R ref.fasta \ -T SelectVariants \ --variant myCalls.vcf \ --discordance outerdatasets.vcf \ -o unique_in_my_set.vcf

However, If my sample dataset includes one clinical affected sample and three controls, I want produce all the variants of this affected sample that are unique within this dataset ( discordance of this affected sample comparing with three controls), what tools or commends I can use?

Thank you,

Tagged:

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 10,469Administrator, Dev admin

    You can pass multiple datasets with the --discordance argument, I think that will do what you want.

    Geraldine Van der Auwera, PhD

  • rzengrzeng HoustonPosts: 18Member
    edited December 2013

    Thank you! Geraldine, I might not explain it clear.

    I have just one variant dataset which has three samples (affected, control1 and control2). I want to generate variants that are 100% unique in affected sample but not in 2 control samples.

    For example. my VCF variant information lines are,

    CHROM    POS   ID   REF ALT QUAL    FILTER  INFO     FORMAT               affected         1                   2
    chr4    x      x      A     C     78    pass   xx     GT:AD:DP:GQ:PL    0/1:x:x:x:x      0/1:x:x:x:x    1/1:x:x:x:x
    chr8    x      x      A     T     1444    pass   xx     GT:AD:DP:GQ:PL    0/1:x:x:x:x      0/0:x:x:x:x    0/0:x:x:x:x
    chr10   x      x      T     C     230      pass   xx     GT:AD:DP:GQ:PL    1/1:x:x:x:x      0/0:x:x:x:x   0/0:x:x:x:x
    

    The new variant file should be like this:

        CHROM       POS ID  REF ALT QUAL    FILTER  INFO    FORMAT              affected      
       chr8    x      x      A     T     1444    pass   xx     GT:AD:DP:GQ:PL    0/1:x:x:x:x     
       chr10   x      x      T     C     230      pass   xx     GT:AD:DP:GQ:PL    1/1:x:x:x:x      
    
  • rzengrzeng HoustonPosts: 18Member
    edited December 2013

    I tried to use the following commands but it generated the variant discordance of affected sample with reference genome BUT NOT with the two controls.

    Select a sample and exclude non-variant loci and filtered loci:
    java -Xmx2g -jar GenomeAnalysisTK.jar \
    -R ref.fasta \
    -T SelectVariants \
    --variant myfile.vcf \
    -o output.vcf \
    -sn affected \
    -env \
    -ef

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 10,469Administrator, Dev admin

    Try -sn sample -ENV with JEXL 'AC == 1'

    Geraldine Van der Auwera, PhD

Sign In or Register to comment.