GATK licensing moves to direct-through-Broad model -- read about it on the GATK blog

from: Mahyar

mahyarheymahyarhey BostonPosts: 50Member
edited September 2013 in Ask the GATK team

I run UnifiedGenotyper for my 42 samples (bam files) to get VCF file. after almost 48 hours I got my vcf file but there was no 0/0 reference homozygous genotype in the vcf file.
How I can achieve these genotype information? I need them for the eQTL analyses. I want to see all 3 class of genotype in the vcf file (0/0, 0/1, and 1/1). Thanks

Post edited by Geraldine_VdAuwera on

Best Answer

Answers

  • mahyarheymahyarhey BostonPosts: 50Member

    Thanks Geraldine for the answer. I used this commend "Emit_All_Sites" and the program is still running. Is there a way to get the output faster and not to wait for 48 hours? Thanks

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 7,566Administrator, GATK Developer admin

    To accelerate processing, you can use parallelism. See this documentation article for details: http://www.broadinstitute.org/gatk/guide/article?id=1988

    Geraldine Van der Auwera, PhD

  • mahyarheymahyarhey BostonPosts: 50Member

    Thanks Geraldine for the article. It is so useful. By the way, is there a practical example in how to use -nt and -nct in UnifiedGenotyper?
    thanks

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 7,566Administrator, GATK Developer admin

    We don't currently provide detailed guidelines, but you'll find a few pointers here: http://www.broadinstitute.org/gatk/guide/article?id=1975

    Geraldine Van der Auwera, PhD

  • mahyarheymahyarhey BostonPosts: 50Member

    Hi Geraldin,
    Regarding your comment, I used "Emit_ALL_SITES" to achieve all class of genotype (0/0, 0/1, 1/1). After 72 hours I got the output of 42 samples. However, there is "." in the column for SNP-ID. What is the problem? My output is like as follows:

    chr1 17915 . C . 151.23 . AN=2;DP=250;DS;MQ=21.05;MQ0=1 GT:DP 0/0:238
    chr1 17916 . T . 148.23 . AN=2;DP=250;DS;MQ=21.05;MQ0=1 GT:DP 0/0:238
    chr1 17917 . G . 157.23 . AN=2;DP=250;DS;MQ=21.28;MQ0=1 GT:DP 0/0:237
    chr1 17918 . C . 154.23 . AN=2;DP=249;DS;MQ=21.09;MQ0=1 GT:DP 0/0:237
    chr1 17919 . A . 145.23 . AN=2;DP=250;DS;MQ=21.05;MQ0=1 GT:DP 0/0:237
    chr1 17920 . G . 148.23 . AN=2;DP=247;DS;MQ=20.69;MQ0=1 GT:DP 0/0:235
    chr1 17921 . G . 139.23 . AN=2;DP=250;DS;MQ=20.07;MQ0=1 GT:DP 0/0:237
    chr1 17922 . G . 142.23 . AN=2;DP=250;DS;MQ=19.82;MQ0=1 GT:DP 0/0:236

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 7,566Administrator, GATK Developer admin

    That is described in the VCF specification.

    Geraldine Van der Auwera, PhD

  • mahyarheymahyarhey BostonPosts: 50Member

    Thanks for the article Geraldine. I read it but I did not get the point. Could you please give me a clue how to fix this problem>
    I have no value for "SNP-ID" in the entire of my VCF file which is very odd for me! Thanks

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 7,566Administrator, GATK Developer admin

    @mahyarhey, this is really something you should be able to find out yourself. This field tells you if a variant in your data is preset in a database of known sites that you pass using the --dbsnp / -D argument. If you didn't provide a dbSNP file (or similar) the field will be empty throughout your file.

    Geraldine Van der Auwera, PhD

  • mahyarheymahyarhey BostonPosts: 50Member

    Hi, I run UnifiedGenotyper with my 42 samples and got a huge vcf file (Emit_All_sites). I want to separate the information for each sample separately using this vcf file. Do you know how I can do it? Thanks

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 7,566Administrator, GATK Developer admin

    You can use SelectVariants to extract individual samples. Please see the Tech Docs: http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_walkers_variantutils_SelectVariants.html

    Geraldine Van der Auwera, PhD

Sign In or Register to comment.