Badly formed genome location

I am trying to get depth of coverage using DepthOfCoverage tools of gatk for determining CNV. But I am consistently getting an error of badly formed genome location. Here is the command I am using for calculations:

java -Xmx3072m -jar GenomeAnalysisTK.jar -T DepthOfCoverage -nt 10 -I /home/MM_Data/cnv_data/group1.READS.bam.list -L /home/MM_Data/cnv_data/nexterarapidcapture_expandedexome_targetedregions.interval_list -R /home/refs/ucsc.hg19.fasta -dt BY_SAMPLE -dcov 5000 -l INFO --omitDepthOutputAtEachBase --omitLocusTable --minBaseQuality 10 --minMappingQuality 20 --start 1 --stop 5000 --nBins 200 --includeRefNSites --countType COUNT_FRAGMENTS -o /home/MM_Data/cnv_data/group1.DATA

And the error statement is givine below:

##### ERROR MESSAGE: Badly formed genome location: Contig 'chr1 14362 14829' does not match any contig in the GATK sequence dictionary derived from the reference; are you sure you are using the correct reference fasta file?

I am using the same reference ucsc hg19 fasta file which I used in whole pipeline for NGS processing and variant evaluation. I have downloaded the exome interval list from here as I was told to use exome interval for NXTR Rapid Cap Expand EXM kit Cat no. FC-140-1005. Gatk countLoci is working well with my bam files and gives number of loci. I don't know what I missed. Any suggestions........

Thanks.

Best Answer

Answers

  • NandaNanda CanadaMember
    edited May 16

    Hi Vivek,

    Did you create .dict, .sa, .fai, .pac, etc?

    Also, check your uscs.hg19.fasta file for chromosome name and your interval list chromosome name. I suspect your interval list might have "1" instead of "chr1".

  • @Nanda : Thanks for reply. I have all necessary files (.dict, .sa, .fai,.pac etc.) . I have also checked my reference file. chrM is used in both reference file and interval list. The only difference I found is that the interval list I have dowloaded from Illumina website starts from chr1 and ucsc.hg19.fasta starts from chrM. I think that should not create any problem.

  • YimingYiming Member

    Hi Vivek,

    I do not know the solution to your question however, the location of chr1 1436214829 seems to exceed the total length of chromosome 1 which is 249,250,621 bp of hg19.

  • @Yiming : Sorry for this confusion. Actually, it is chr1 14362 14829. It may be unclear (for me too) due to shifting in next line in the question.

  • YimingYiming Member

    I see, the right format appears after I resize my browser :D My mistake.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator
    edited May 20

    @vivekruhela
    Hi,

    The only difference I found is that the interval list I have dowloaded from Illumina website starts from chr1 and ucsc.hg19.fasta starts from chrM. I think that should not create any problem.

    I think that actually is the problem. Have a look at this article. You should be able to use ReorderSam to fix this.

    -Sheila

    EDIT: I realize ReorderSam will not help in this case, as it is your interval list that is not ordered in the same way. I spoke too soon. Can you post the FASTA dict file and the BAM header (I need to see the @SQ lines). Thanks

  • @Sheila : Thanks for reply. I have attached the bam header (in .txt format) and fasta dict file (in .txt format because .dict was not acceptable format for file uploading in this forum. Thanks.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @vivekruhela
    Hi,

    I am not sure what is going on. Can you confirm the tool runs without -nt or -L?

    Thanks,
    Sheila

  • vivekruhelavivekruhela Member
    edited May 25

    @Sheila : No. I haven't tried this without -nt 10and -L (because it makes it too slow and I need to evaluate depth at specific intervals). Let me try this and will update you. Thanks.

  • @Sheila : Sorry for late response. I was busy in other stuffs. I tried without -nt and -L. tool is working well but not well for the pipeline. I am trying to get CNV using XHMM tool (because I don't have control data). Later stages of XHMM are creating problem. So I can avoid -nt but I have to use -L argument to get interval specific coverage depth. If I use -L then error is same as in my question. Any suggestions......Thanks.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @vivekruhela
    Hi,

    Okay, so it is an issue with your interval file. Can you confirm other GATK tools run to completion with the same interval list? Can you also post the first few lines of the interval list and the FASTA .dict file?

    Thanks,
    Sheila

  • @Sheila : I haven't tried this interval list with other GATK tools. This is first time when I am using this for CNV detection. According to ILLUMINA officials, they provides three interval list i.e. nexterarapidcapture_expandedexome_targetedregions.bed, nexterarapidcapture_exome_vs_expandedexome_overlap.bed, nexterarapidcapture_expandedexome_uniqueintervals.bed
    The first few lines of targetedregions.bed are as follows:
    chr1 14362 14829 WASH5P-chr1-14363-14829
    chr1 14969 15038 WASH5P-chr1-14970-15038
    chr1 15795 15947 WASH5P-chr1-15796-15947
    chr1 16606 16765 WASH5P-chr1-16607-16765
    chr1 16857 17055 WASH5P-chr1-16858-17055
    chr1 17232 17368 WASH5P-chr1-17233-17368
    chr1 17605 17742 WASH5P-chr1-17606-17742
    chr1 69090 70008 OR4F5-chr1-69091-70008
    chr1 661139 665184 LOC100133331-chr1-661140-665184

    The first few lines of ovrelap.bed are as follows:
    chr1 69091 70008 CEX-chr1-69089-70010
    chr1 664484 665108 CEX-chr1-664485-665108
    chr1 762079 762571 CEX-chr1-762080-762571
    chr1 861319 861393 CEX-chr1-861320-861395
    chr1 865535 865716 CEX-chr1-865533-865718
    chr1 866419 866469 CEX-chr1-866417-866471

    The first few lines of uniqueintervals.bed are as follows:
    chr1 14363 14829 WASH5P-chr1-14363-14829
    chr1 14970 15038 WASH5P-chr1-14970-15038
    chr1 15796 15947 WASH5P-chr1-15796-15947
    chr1 16607 16765 WASH5P-chr1-16607-16765
    chr1 16858 17055 WASH5P-chr1-16858-17055
    chr1 17233 17368 WASH5P-chr1-17233-17368
    chr1 17606 17742 WASH5P-chr1-17606-17742

    And, finally, first few lines of ucsc.hg19.dict are as follows:
    @HD VN:1.0 SO:unsorted
    @SQ SN:chrM LN:16571 M5:d2ed829b8a1628d16cbeee88e88e39eb UR:file:///mnt/storage/Vivek/refs/ucsc.hg19.fasta
    @SQ SN:chr1 LN:249250621 M5:1b22b98cdeb4a9304cb5d48026a85128 UR:file:///mnt/storage/Vivek/refs/ucsc.hg19.fasta
    @SQ SN:chr2 LN:243199373 M5:a0d9851da00400dec1098a9255ac712e UR:file:///mnt/storage/Vivek/refs/ucsc.hg19.fasta
    @SQ SN:chr3 LN:198022430 M5:641e4338fa8d52a5b781bd2a2c08d3c3 UR:file:///mnt/storage/Vivek/refs/ucsc.hg19.fasta
    @SQ SN:chr4 LN:191154276 M5:23dccd106897542ad87d2765d28a19a1 UR:file:///mnt/storage/Vivek/refs/ucsc.hg19.fasta
    @SQ SN:chr5 LN:180915260 M5:0740173db9ffd264d728f32784845cd7 UR:file:///mnt/storage/Vivek/refs/ucsc.hg19.fasta

    Thanks.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @vivekruhela
    Hi,

    I am not sure if this will help, but can you try just inputting the first interval from the bed file? You can make a new .bed file with just the first interval and remove the last line with the name of the interval. According to this article, the bed file should only have chr start and stop lines separated by tabs.

    -Sheila

  • @Sheila

    I tried that too. I create a separate interval list with chr No. , chr start and chr stop. That didn't work. I was getting the same error.
    Thanks.

  • @Sheila

    Thanks for reply. Finally, it works by changing the format from chrX <start> <end> to chrX:<start>-<end>. But why is that happening...different format for same file type i.e. Illumina officials follows first format and GATK follows the second one although both are .bed file. :neutral:

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @vivekruhela
    Hi,

    I thought all GATK tools took bed files, but sorry I did not catch this earlier. Some tools must only accept chr:pos style files. Hopefully you can easily convert the other interval file.

    -Sheila

Sign In or Register to comment.