Bug Bulletin: The GenomeLocPArser error in SplitNCigarReads has been fixed; if you encounter it, use the latest nightly build.

How to produce output all the variants that are unique in my dataset

rzengrzeng HoustonPosts: 18Member

I hope I have not duplicated the question since I did not find solution.

Suppose I have one variant dataset which just includes variants from ONE sample . If i have another outer datasets (not my test dataset), I can produce variants that are unique in my test call dataset by using --discordance argument like this with no problem:

$ java -Xmx2g -jar GenomeAnalysisTK.jar \ -R ref.fasta \ -T SelectVariants \ --variant myCalls.vcf \ --discordance outerdatasets.vcf \ -o unique_in_my_set.vcf

However, If my sample dataset includes one clinical affected sample and three controls, I want produce all the variants of this affected sample that are unique within this dataset ( discordance of this affected sample comparing with three controls), what tools or commends I can use?

Thank you,

Tagged:

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,412Administrator, GATK Developer admin

    You can pass multiple datasets with the --discordance argument, I think that will do what you want.

    Geraldine Van der Auwera, PhD

  • rzengrzeng HoustonPosts: 18Member
    edited December 2013

    Thank you! Geraldine, I might not explain it clear.

    I have just one variant dataset which has three samples (affected, control1 and control2). I want to generate variants that are 100% unique in affected sample but not in 2 control samples.

    For example. my VCF variant information lines are,

    CHROM    POS   ID   REF ALT QUAL    FILTER  INFO     FORMAT               affected         1                   2
    chr4    x      x      A     C     78    pass   xx     GT:AD:DP:GQ:PL    0/1:x:x:x:x      0/1:x:x:x:x    1/1:x:x:x:x
    chr8    x      x      A     T     1444    pass   xx     GT:AD:DP:GQ:PL    0/1:x:x:x:x      0/0:x:x:x:x    0/0:x:x:x:x
    chr10   x      x      T     C     230      pass   xx     GT:AD:DP:GQ:PL    1/1:x:x:x:x      0/0:x:x:x:x   0/0:x:x:x:x
    

    The new variant file should be like this:

        CHROM       POS ID  REF ALT QUAL    FILTER  INFO    FORMAT              affected      
       chr8    x      x      A     T     1444    pass   xx     GT:AD:DP:GQ:PL    0/1:x:x:x:x     
       chr10   x      x      T     C     230      pass   xx     GT:AD:DP:GQ:PL    1/1:x:x:x:x      
    
    Post edited by rzeng on
  • rzengrzeng HoustonPosts: 18Member
    edited December 2013

    I tried to use the following commands but it generated the variant discordance of affected sample with reference genome BUT NOT with the two controls.

    Select a sample and exclude non-variant loci and filtered loci: java -Xmx2g -jar GenomeAnalysisTK.jar \ -R ref.fasta \ -T SelectVariants \ --variant myfile.vcf \ -o output.vcf \ -sn affected \ -env \ -ef

    Post edited by rzeng on
  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,412Administrator, GATK Developer admin

    Try -sn sample -ENV with JEXL 'AC == 1'

    Geraldine Van der Auwera, PhD

Sign In or Register to comment.