The current GATK version is 3.6-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Powered by Vanilla. Made with Bootstrap.
Last chance to register for the GATK workshop next week in Basel, Switzerland!

UnifiedGenotyper gives empty output on test set

JetseJetse Posts: 2Member
edited June 2013 in Ask the GATK team

java -jar GenomeAnalysisTK.jar -T UnifiedGenotyper -dcov 1 -R smallRefGenome.fa -I testWh.bam -o test.vcf

Gives me this output:

INFO  15:48:37,070 HelpFormatter - --------------------------------------------------------------------------------
INFO  15:48:37,074 HelpFormatter - The Genome Analysis Toolkit (GATK) v2.5-2-gf57256b, Compiled 2013/05/01 09:27:02
INFO  15:48:37,074 HelpFormatter - Copyright (c) 2010 The Broad Institute
INFO  15:48:37,074 HelpFormatter - For support and documentation go to
INFO  15:48:37,081 HelpFormatter - Program Args: -T UnifiedGenotyper -dcov 1 -R smallRefGenome.fa -I testWh.bam -o test.vcf
INFO  15:48:37,081 HelpFormatter - Date/Time: 2013/06/11 15:48:37
INFO  15:48:37,081 HelpFormatter - --------------------------------------------------------------------------------
INFO  15:48:37,081 HelpFormatter - --------------------------------------------------------------------------------
INFO  15:48:37,220 GenomeAnalysisEngine - Strictness is SILENT
INFO  15:48:37,331 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1
INFO  15:48:37,342 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
INFO  15:48:37,366 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.02
INFO  15:48:37,504 GenomeAnalysisEngine - Creating shard strategy for 1 BAM files
INFO  15:48:37,527 GenomeAnalysisEngine - Done creating shard strategy
INFO  15:48:37,528 ProgressMeter -        Location processed.sites  runtime per.1M.sites completed total.runtime remaining
INFO  15:48:38,635 ProgressMeter -            done        1.20e+03    1.0 s       15.3 m     99.9%         1.0 s     0.0 s
INFO  15:48:38,636 ProgressMeter - Total runtime 1.11 secs, 0.02 min, 0.00 hours
INFO  15:48:38,636 MicroScheduler - 0 reads were filtered out during traversal out of 418 total (0.00%)
INFO  15:48:45,809 GATKRunReport - Uploaded run statistics report to AWS S3

The reference genome is 1200 nt long, all 418 reads map between position 100 and 1100 of this reference genome and are 100nt long. The reads are generated by Illumina and mapped with BWA. The bam file contains paired end data, but none are properly paired.
The output looks like everything worked, with -dcov 1 I have to find many SNPs...

All steps I did before GATK, after mapping:

Convert sam to bam
samtools view -u -S -b test.sam > test.bam

sort the bam file
samtools sort test.bam testSorted

Add the missing header line in the bam file
java -jarAddOrReplaceReadGroups.jar I=testSorted.bam O=testWh.bam LB=test PL=illumina PU=lane SM=samplename

Index the bam file
samtools index ../testFiles/output//test/testWh.bam

Does anyone see where I made a mistake?

Is there another setting which I have to set for finding a SNP with a mall dataset on a small reference genome?

Best Answer


  • JetseJetse Posts: 2Member

    Thank you for your quick response. I took a look at the bam files in IGV and there where no SNPs.
    This is just for testing so I don't need any significant SNPs, just some random SNPs will work too. But then I think the best option will be creating a real SNP in the bam file and check If I can find this SNP back.

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 10,469Administrator, Dev admin

    That sounds good. We do provide test data in our resource bundle if you want to play with some real SNPs.

    Geraldine Van der Auwera, PhD

Sign In or Register to comment.