We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

UnifiedGenotyper gives empty output on test set

JetseJetse Member
edited June 2013 in Ask the GATK team

java -jar GenomeAnalysisTK.jar -T UnifiedGenotyper -dcov 1 -R smallRefGenome.fa -I testWh.bam -o test.vcf

Gives me this output:

INFO 15:48:37,070 HelpFormatter - -------------------------------------------------------------------------------- INFO 15:48:37,074 HelpFormatter - The Genome Analysis Toolkit (GATK) v2.5-2-gf57256b, Compiled 2013/05/01 09:27:02 INFO 15:48:37,074 HelpFormatter - Copyright (c) 2010 The Broad Institute INFO 15:48:37,074 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk INFO 15:48:37,081 HelpFormatter - Program Args: -T UnifiedGenotyper -dcov 1 -R smallRefGenome.fa -I testWh.bam -o test.vcf INFO 15:48:37,081 HelpFormatter - Date/Time: 2013/06/11 15:48:37 INFO 15:48:37,081 HelpFormatter - -------------------------------------------------------------------------------- INFO 15:48:37,081 HelpFormatter - -------------------------------------------------------------------------------- INFO 15:48:37,220 GenomeAnalysisEngine - Strictness is SILENT INFO 15:48:37,331 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1 INFO 15:48:37,342 SAMDataSource$SAMReaders - Initializing SAMRecords in serial INFO 15:48:37,366 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.02 INFO 15:48:37,504 GenomeAnalysisEngine - Creating shard strategy for 1 BAM files INFO 15:48:37,527 GenomeAnalysisEngine - Done creating shard strategy INFO 15:48:37,527 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING] INFO 15:48:37,528 ProgressMeter - Location processed.sites runtime per.1M.sites completed total.runtime remaining INFO 15:48:38,635 ProgressMeter - done 1.20e+03 1.0 s 15.3 m 99.9% 1.0 s 0.0 s INFO 15:48:38,636 ProgressMeter - Total runtime 1.11 secs, 0.02 min, 0.00 hours INFO 15:48:38,636 MicroScheduler - 0 reads were filtered out during traversal out of 418 total (0.00%) INFO 15:48:45,809 GATKRunReport - Uploaded run statistics report to AWS S3

The reference genome is 1200 nt long, all 418 reads map between position 100 and 1100 of this reference genome and are 100nt long. The reads are generated by Illumina and mapped with BWA. The bam file contains paired end data, but none are properly paired.
The output looks like everything worked, with -dcov 1 I have to find many SNPs...

All steps I did before GATK, after mapping:

Convert sam to bam
samtools view -u -S -b test.sam > test.bam

sort the bam file
samtools sort test.bam testSorted

Add the missing header line in the bam file
java -jarAddOrReplaceReadGroups.jar I=testSorted.bam O=testWh.bam LB=test PL=illumina PU=lane SM=samplename

Index the bam file
samtools index ../testFiles/output//test/testWh.bam

Does anyone see where I made a mistake?

Is there another setting which I have to set for finding a SNP with a mall dataset on a small reference genome?

Best Answer


  • JetseJetse Member

    Thank you for your quick response. I took a look at the bam files in IGV and there where no SNPs.
    This is just for testing so I don't need any significant SNPs, just some random SNPs will work too. But then I think the best option will be creating a real SNP in the bam file and check If I can find this SNP back.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    That sounds good. We do provide test data in our resource bundle if you want to play with some real SNPs.

Sign In or Register to comment.