The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

#### Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

#### ☞ Got a problem?

1. Search using the upper-right search box, e.g. using the error message.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

#### ☞ Formatting tip!

Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ` ) each to make a code block as demonstrated here.

Picard 2.10.4 has MAJOR CHANGES that impact throughput of pipelines. Default compression is now 1 instead of 5, and Picard now handles compressed data with the Intel Deflator/Inflator instead of JDK.
GATK version 4.beta.3 (i.e. the third beta release) is out. See the github release page for download and details.

Boston Children's HospitalMember

Hello,

I am new to the field, so please forgive me if this has a simple answer. I am attempting to call variants on 6 whole exome sequences. The best practices documentation suggested using 30 or more samples, so I downloaded 24 bam files from the 1000genomes database to use with mine. However, whenever I attempted to use ReduceReads on the files, I received the following error right near 100% completion:

##### ERROR MESSAGE: BUG: requested unknown contig=NC_007605 index=-1

I am using the latest b37 reference file from the bundle, and I have tried re-indexing it and re-forming the .dict file. Here is the stack trace from the above error. What is causing this, and how do I fix it?

##### ERROR stack trace

org.broadinstitute.sting.utils.exceptions.ReviewedStingException: BUG: requested unknown contig=NC_007605 index=-1
at org.broadinstitute.sting.gatk.traversals.TraverseReadsNano$2.next(TraverseReadsNano.java:139) at org.broadinstitute.sting.gatk.traversals.TraverseReadsNano$2.next(TraverseReadsNano.java:128)

Tagged:

Which file(s) did this error happen with? Can you post your command lines?

• Boston Children's HospitalMember

It happened with the 24 bam files I got from the 1000genomes project, but not with the 6 samples I have. My command line was:

for f in *.bam ; do echo $f ; java -jar /opt/GenomeAnalysisTK-2.7-4/GenomeAnalysisTK.jar -T ReduceReads -R ../References/human_g1k_v37.fa -I "$f" -o "\$f".reduced.bam

I'm using the 1000genome exome bams (such as HG00116.mapped.ILLUMINA.bwa.GBR.exome.20120522.bam, for example)

Hmm, it looks like the 1000G exomes include some extra contigs (seems NC_007605 is herpes virus) compared to our reference. Can you check the headers of the 1000G bam files for the reference info?

• Boston Children's HospitalMember

Ah, I see that now. They included a couple of different things. I copied the lines that differed from the reference file:

@SQ SN:NC_007605 LN:171823 M5:6743bd63b3ff2b5b8985d8933c53290a UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz AS:NCBI37 SP:Human
@SQ SN:hs37d5 LN:35477943 M5:5b6a4b3a81a2d3c134b7d14bf6ad39f1 UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz AS:NCBI37 SP:Human

So I should download and use the hs37d5.fa file for reducing reads instead of the b37 build I found in the GATK bundle, correct? If I do that though, will I need to realign all of my samples as well? Or can I plug them into the haplotypecaller along with the 24 samples from the 1000genomes project even though the latter apparently contains some extra contigs?

I think @Kurt is correct -- our bundle's *_decoy reference should work with the 1000G files. If you use that as a reference you shouldn't need to realign your bam files.

• Boston Children's HospitalMember

Excellent, thank you both so much!