Bug Bulletin: we have identified a bug that affects indexing when producing gzipped VCFs. This will be fixed in the upcoming 3.2 release; in the meantime you need to reindex gzipped VCFs using Tabix.

UnifiedGenotyper gives empty output on test set

JetseJetse Posts: 2Member
edited June 2013 in Ask the team

Command java -jar GenomeAnalysisTK.jar -T UnifiedGenotyper -dcov 1 -R smallRefGenome.fa -I testWh.bam -o test.vcf

Gives me this output:

INFO  15:48:37,070 HelpFormatter - --------------------------------------------------------------------------------
INFO  15:48:37,074 HelpFormatter - The Genome Analysis Toolkit (GATK) v2.5-2-gf57256b, Compiled 2013/05/01 09:27:02
INFO  15:48:37,074 HelpFormatter - Copyright (c) 2010 The Broad Institute
INFO  15:48:37,074 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk
INFO  15:48:37,081 HelpFormatter - Program Args: -T UnifiedGenotyper -dcov 1 -R smallRefGenome.fa -I testWh.bam -o test.vcf
INFO  15:48:37,081 HelpFormatter - Date/Time: 2013/06/11 15:48:37
INFO  15:48:37,081 HelpFormatter - --------------------------------------------------------------------------------
INFO  15:48:37,081 HelpFormatter - --------------------------------------------------------------------------------
INFO  15:48:37,220 GenomeAnalysisEngine - Strictness is SILENT
INFO  15:48:37,331 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1
INFO  15:48:37,342 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
INFO  15:48:37,366 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.02
INFO  15:48:37,504 GenomeAnalysisEngine - Creating shard strategy for 1 BAM files
INFO  15:48:37,527 GenomeAnalysisEngine - Done creating shard strategy
INFO  15:48:37,528 ProgressMeter -        Location processed.sites  runtime per.1M.sites completed total.runtime remaining
INFO  15:48:38,635 ProgressMeter -            done        1.20e+03    1.0 s       15.3 m     99.9%         1.0 s     0.0 s
INFO  15:48:38,636 ProgressMeter - Total runtime 1.11 secs, 0.02 min, 0.00 hours
INFO  15:48:38,636 MicroScheduler - 0 reads were filtered out during traversal out of 418 total (0.00%)
INFO  15:48:45,809 GATKRunReport - Uploaded run statistics report to AWS S3

The reference genome is 1200 nt long, all 418 reads map between position 100 and 1100 of this reference genome and are 100nt long. The reads are generated by Illumina and mapped with BWA. The bam file contains paired end data, but none are properly paired. The output looks like everything worked, with -dcov 1 I have to find many SNPs...

All steps I did before GATK, after mapping:

Convert sam to bam samtools view -u -S -b test.sam > test.bam

sort the bam file samtools sort test.bam testSorted

Add the missing header line in the bam file java -jarAddOrReplaceReadGroups.jar I=testSorted.bam O=testWh.bam LB=test PL=illumina PU=lane SM=samplename

Index the bam file samtools index ../testFiles/output//test/testWh.bam

Does anyone see where I made a mistake?

Is there another setting which I have to set for finding a SNP with a mall dataset on a small reference genome?

Post edited by Jetse on

Best Answer


Sign In or Register to comment.