The current GATK version is 3.3-0

#### Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Powered by Vanilla. Made with Bootstrap.

# from: Mahyar

BostonPosts: 37Member
edited September 2013

I run UnifiedGenotyper for my 42 samples (bam files) to get VCF file. after almost 48 hours I got my vcf file but there was no 0/0 reference homozygous genotype in the vcf file. How I can achieve these genotype information? I need them for the eQTL analyses. I want to see all 3 class of genotype in the vcf file (0/0, 0/1, and 1/1). Thanks

Post edited by Geraldine_VdAuwera on
Tagged:

## Answers

• BostonPosts: 37Member

Thanks Geraldine for the answer. I used this commend "Emit_All_Sites" and the program is still running. Is there a way to get the output faster and not to wait for 48 hours? Thanks

• Posts: 6,462Administrator, GATK Developer admin

To accelerate processing, you can use parallelism. See this documentation article for details: http://www.broadinstitute.org/gatk/guide/article?id=1988

Geraldine Van der Auwera, PhD

• BostonPosts: 37Member

Thanks Geraldine for the article. It is so useful. By the way, is there a practical example in how to use -nt and -nct in UnifiedGenotyper? thanks

• Posts: 6,462Administrator, GATK Developer admin

We don't currently provide detailed guidelines, but you'll find a few pointers here: http://www.broadinstitute.org/gatk/guide/article?id=1975

Geraldine Van der Auwera, PhD

• BostonPosts: 37Member

Hi Geraldin, Regarding your comment, I used "Emit_ALL_SITES" to achieve all class of genotype (0/0, 0/1, 1/1). After 72 hours I got the output of 42 samples. However, there is "." in the column for SNP-ID. What is the problem? My output is like as follows:

chr1 17915 . C . 151.23 . AN=2;DP=250;DS;MQ=21.05;MQ0=1 GT:DP 0/0:238 chr1 17916 . T . 148.23 . AN=2;DP=250;DS;MQ=21.05;MQ0=1 GT:DP 0/0:238 chr1 17917 . G . 157.23 . AN=2;DP=250;DS;MQ=21.28;MQ0=1 GT:DP 0/0:237 chr1 17918 . C . 154.23 . AN=2;DP=249;DS;MQ=21.09;MQ0=1 GT:DP 0/0:237 chr1 17919 . A . 145.23 . AN=2;DP=250;DS;MQ=21.05;MQ0=1 GT:DP 0/0:237 chr1 17920 . G . 148.23 . AN=2;DP=247;DS;MQ=20.69;MQ0=1 GT:DP 0/0:235 chr1 17921 . G . 139.23 . AN=2;DP=250;DS;MQ=20.07;MQ0=1 GT:DP 0/0:237 chr1 17922 . G . 142.23 . AN=2;DP=250;DS;MQ=19.82;MQ0=1 GT:DP 0/0:236

• Posts: 6,462Administrator, GATK Developer admin

That is described in the VCF specification.

Geraldine Van der Auwera, PhD

• BostonPosts: 37Member

Thanks for the article Geraldine. I read it but I did not get the point. Could you please give me a clue how to fix this problem> I have no value for "SNP-ID" in the entire of my VCF file which is very odd for me! Thanks

• Posts: 6,462Administrator, GATK Developer admin

@mahyarhey, this is really something you should be able to find out yourself. This field tells you if a variant in your data is preset in a database of known sites that you pass using the --dbsnp / -D argument. If you didn't provide a dbSNP file (or similar) the field will be empty throughout your file.

Geraldine Van der Auwera, PhD

• BostonPosts: 37Member

Hi, I run UnifiedGenotyper with my 42 samples and got a huge vcf file (Emit_All_sites). I want to separate the information for each sample separately using this vcf file. Do you know how I can do it? Thanks

• Posts: 6,462Administrator, GATK Developer admin

You can use SelectVariants to extract individual samples. Please see the Tech Docs: http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_walkers_variantutils_SelectVariants.html

Geraldine Van der Auwera, PhD

Sign In or Register to comment.