Badly formed genome location

I am trying to get depth of coverage using DepthOfCoverage tools of gatk for determining CNV. But I am consistently getting an error of badly formed genome location. Here is the command I am using for calculations:

java -Xmx3072m -jar GenomeAnalysisTK.jar -T DepthOfCoverage -nt 10 -I /home/MM_Data/cnv_data/group1.READS.bam.list -L /home/MM_Data/cnv_data/nexterarapidcapture_expandedexome_targetedregions.interval_list -R /home/refs/ucsc.hg19.fasta -dt BY_SAMPLE -dcov 5000 -l INFO --omitDepthOutputAtEachBase --omitLocusTable --minBaseQuality 10 --minMappingQuality 20 --start 1 --stop 5000 --nBins 200 --includeRefNSites --countType COUNT_FRAGMENTS -o /home/MM_Data/cnv_data/group1.DATA

And the error statement is givine below:

##### ERROR MESSAGE: Badly formed genome location: Contig 'chr1 14362 14829' does not match any contig in the GATK sequence dictionary derived from the reference; are you sure you are using the correct reference fasta file?

I am using the same reference ucsc hg19 fasta file which I used in whole pipeline for NGS processing and variant evaluation. I have downloaded the exome interval list from here as I was told to use exome interval for NXTR Rapid Cap Expand EXM kit Cat no. FC-140-1005. Gatk countLoci is working well with my bam files and gives number of loci. I don't know what I missed. Any suggestions........

Thanks.

Answers

  • NandaNanda CanadaMember
    edited May 16

    Hi Vivek,

    Did you create .dict, .sa, .fai, .pac, etc?

    Also, check your uscs.hg19.fasta file for chromosome name and your interval list chromosome name. I suspect your interval list might have "1" instead of "chr1".

  • @Nanda : Thanks for reply. I have all necessary files (.dict, .sa, .fai,.pac etc.) . I have also checked my reference file. chrM is used in both reference file and interval list. The only difference I found is that the interval list I have dowloaded from Illumina website starts from chr1 and ucsc.hg19.fasta starts from chrM. I think that should not create any problem.

  • YimingYiming Member

    Hi Vivek,

    I do not know the solution to your question however, the location of chr1 1436214829 seems to exceed the total length of chromosome 1 which is 249,250,621 bp of hg19.

  • @Yiming : Sorry for this confusion. Actually, it is chr1 14362 14829. It may be unclear (for me too) due to shifting in next line in the question.

  • YimingYiming Member

    I see, the right format appears after I resize my browser :D My mistake.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator
    edited May 20

    @vivekruhela
    Hi,

    The only difference I found is that the interval list I have dowloaded from Illumina website starts from chr1 and ucsc.hg19.fasta starts from chrM. I think that should not create any problem.

    I think that actually is the problem. Have a look at this article. You should be able to use ReorderSam to fix this.

    -Sheila

    EDIT: I realize ReorderSam will not help in this case, as it is your interval list that is not ordered in the same way. I spoke too soon. Can you post the FASTA dict file and the BAM header (I need to see the @SQ lines). Thanks

  • @Sheila : Thanks for reply. I have attached the bam header (in .txt format) and fasta dict file (in .txt format because .dict was not acceptable format for file uploading in this forum. Thanks.

Sign In or Register to comment.