The current GATK version is 3.8-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Get notifications!


You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

Got a problem?


1. Search using the upper-right search box, e.g. using the error message.
2. Try the latest version of tools.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

Did we ask for a bug report?


Then follow instructions in Article#1894.

Formatting tip!


Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ``` ) each to make a code block as demonstrated here.

Jump to another community
Download the latest Picard release at https://github.com/broadinstitute/picard/releases.
GATK version 4.beta.3 (i.e. the third beta release) is out. See the GATK4 beta page for download and details.

HaplotypeCaller only generates header when called with recalibrated/dedupped BAM files

Hello,

I am having trouble calling variants using Haplotype Caller on simulated exome reads. I have been able to call reasonable-looking variants on the exome (simulated with dwgsim) with HaplotypeCaller before running it through the Best Practices Pre-Processing pipeline. The pre-processed data worked fine with UnifiedGenotyper but with HaplotypeCaller, though it runs without errors and seems to walk across the genome, only outputs a VCF header. I have tried calling variants with and without using -L to provide the exome regions (as recommended in this forum post: http://gatkforums.broadinstitute.org/discussion/1681/expected-file-size-haplotype-caller) but this hasn't made a difference - when we run the command with the pre-processed BAMs, we only get a VCF header. Everything has been tested with both 2.4-7 and 2.4-9.

Any help or guidance would be greatly appreciated!

Command Used for HaplotypeCaller:

java -Xmx4g -jar GenomeAnalysisTK.jar -T HaplotypeCaller -R ucsc.hg19.fasta -I exome.realigned.dedup.recal.bam -o exome.raw.vcf -D dbsnp_137.hg19.vcf -stand_emit_conf 10 -rf BadCigar -L Illumin_TruSeq.bed --logging_level DEBUG

Commands Used for pre-processing (run in sequence using a Perl script):

java -Xmx16g -jar GenomeAnalysisTK.jar -T RealignerTargetCreator -nt 8 -R ucsc.hg19.fasta -I exome.bam -o exome.intervals -known dbsnp_137.hg19.vcf

>

java -Xmx4g -jar GenomeAnalysisTK.jar -T IndelRealigner -R ucsc.hg19.fasta -I exome.bam -o exome.realigned.bam -targetIntervals intervals.bam -known dbsnp_137.hg19.vcf

>

java -Xmx16g -jar MarkDuplicates.jar I=exome.realigned.bam METRICS_FILE=exome.dups O=exome.realigned.dedup.bam

samtools index exome.realigned.dedup

java -Xmx4g -jar GenomeAnalysisTK.jar -T BaseRecalibrator -nct 8 -R ucsc.hg19.fasta -I exome.realigned.dedup.bam -o exome.recal_data.grp -knownSites dbsnp_137.hg19.vcf -cov ReadGroupCovariate -cov ContextCovariate -cov CycleCovariate -cov QualityScoreCovariate

java -Xmx4g -jar GenomeAnalysisTK.jar -T PrintReads -nct 8 -R ucsc.hg19.fasta -I exome.realigned.dedup.bam -BQSR exome.recal_data.grp -baq CALCULATE_AS_NECESSARY -o exome.realigned.dedup.recal.bam

Best Answer

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Hmm, we do have a known issue related to using both -baq and -BQSR together, as you are in the last step of pre-processing: the BAQ'ing is being done first in the engine (on the original bases), but in fact it should be done after recalibration with the BQSR. Try again using only -BQSR with your PrintReads step and then use -BAQ in your variant calling step. Let me know if that solves your problem.

    FYI this baq/bqsr issue is fixed in our internal dev version, but the fix will only be available when we release version 2.5 because it's too complex to patch onto 2.4.

  • Thanks @Geraldine_VdAuwera for the quick response!

    I tried eliminating the -baq from the Print Reads command, instead calling:

    java -Xmx4g -Djava.io.tmpdir=tmpdir/ -jar $gatk_dir/GenomeAnalysisTK.jar -T PrintReads -nct 8 -R ucsc.hg19.fasta -I exome.dedup.bam -BQSR exome.recal_data.grp -o exome.processed.bam

    I then tried a few iterations of the HaplotypeCaller invocation, first including the -baw parameter:

    java -Xmx4g -Djava.io.tmpdir=tmpdir/ -jar $gatk_dir/GenomeAnalysisTK.jar -T HaplotypeCaller -R ucsc.hg19.fasta -I exome.processed.bam -o ./variants.raw.vcf -D dbsnp_137.hg19.vcf -stand_emit_conf 10 -rf BadCigar -baq CALCULATE_AS_NECESSARY -L Illumina_TruSeq.bed

    This output the error:

    ERROR MESSAGE: Invalid command line: Argument baq has a bad value: Walker cannot accept BAQ'd base qualities, and yet BAQ mode CALCULATE_AS_NECESSARY was requested.

    I then tried calling it without the -baq parameter, testing both with and without the -L parameter:

    java -Xmx4g -Djava.io.tmpdir=tmpdir/ -jar $gatk_dir/GenomeAnalysisTK.jar -T HaplotypeCaller --R ucsc.hg19.fasta -I exome.processed.bam -o ./variants.raw.vcf -D dbsnp_137.hg19.vcf -stand_emit_conf 10 -rf BadCigar -L Illumina_TruSeq.bed

    java -Xmx4g -Djava.io.tmpdir=tmpdir/ -jar $gatk_dir/GenomeAnalysisTK.jar -T HaplotypeCaller --R ucsc.hg19.fasta -I exome.processed.bam -o ./variants.raw.vcf -D dbsnp_137.hg19.vcf -stand_emit_conf 10 -rf BadCigar

    Both of those runs completed with the same problem (only a header output).

  • Hi @Geraldine_VdAuwera! It looks like we fixed our problem; it was cased by our simulator, dwgsim.

    Dwgsim creates a set of random reads (rather than reads mutated from the reference) where the contig is given as "rand". These are included to help evaluate the aligner and they are supposed to be ignored in the remaining steps but it seems one the pre-processing steps makes them look like viable reads to HaplotypeCaller (and coincidentally Dindel). We fixed this problem by removing all the "rand" labeled reads in the sam file and now HaplotypeCaller is working properly.

    Thank you for your help- it was running Picard ValidateSamFile validation as a quality check that helped us identify this issue.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Glad to hear your problem is solved! And thanks for reporting your solution, as it may prove useful to other users in future. Validating files is always a good troubleshooting reflex :-)

Sign In or Register to comment.