Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

BQSR non overlapping contigs error

Hi,

I am trying to run BQSR and I'm getting the non overlapping contigs error that seems to come from mismatched reference and known sites vcf files. My code is below. What I'm trying to figure out is how I can tell which files are matched. I tried the gnomAD exome vcf, which to my understanding is hg19, but apparently not the same format. My reference genome is from UCSC. I have downloaded dbSNP 151 but I expect that will be NCBI format so I'm not sure that will help. Can you offer some guidance on which files match and how to tell?

Also, in terms of documentation, it would be helpful if the tool doc mentioned that the .vcf needs to be indexed.

gatk BaseRecalibrator \
-I $SCRATCH/active/memtest2/SRR112728.marked_duplicates.sorted.bam \
-R $WORK/hg19.fa \
--known-sites $WORK/gnomad.exomes.r2.1.1.sites.vcf.bgz \
-O $SCRATCH/active/memtest2/SRR112728_recal_data.table

Thanks very much,
Nancy

Answers

  • SChaluvadiSChaluvadi Member, Broadie, Moderator admin

    Hello @nparmalee - if you are using hg19 reference genome, the build uses the "chr1" ("chrM") notation to label chromosomes. You should make sure that the bam file was also aligned to the same hg19 reference. Additionally, you should make sure that the .vcf.bgz file also is in the same chromosome naming convention. This document shows a list of potential areas that need to be double checked when you see an incompatible contigs error. In general your BAM file, your reference file and your VCF and reference should all match.

    If you would like to share a small snippet of your reference, your BAM, and your VCF, where we can see the format in which the records have chromosome labeled we might be able to find the mismatched file.

  • nparmaleenparmalee Member

    Hi SChaluvadi, thanks for your reply. I can see that I have a mismatch. The hg10 reference uses the chr10 notation and the .vcf.bgz simply uses 10 for the chromosome. My question is what files do you recommend for the known sites file? I know these don't match. Can you give some guidance on where to find something that does match? What does Broad use for the known sites file?

    Thanks,
    Nancy

  • nparmaleenparmalee Member

    This is the error message that seems to encapsulate the problem:

    A USER ERROR has occurred: Input files reference and features have incompatible contigs: No overlapping contigs found.
    reference contigs = [chr10, chr11, chr11_gl000202_random, chr12, chr13, chr14, chr15, chr16, chr17_ctg5_hap1, chr17, chr17_gl000203_random, chr17_gl000204_random, chr17_gl000205_random, chr17_gl000206_random, chr18, chr18_gl000207_random, chr19, chr19_gl000208_random, chr19_gl000209_random, chr1, chr1_gl000191_random, chr1_gl000192_random, chr20, chr21, chr21_gl000210_random, chr22, chr2, chr3, chr4_ctg9_hap1, chr4, chr4_gl000193_random, chr4_gl000194_random, chr5, chr6_apd_hap1, chr6_cox_hap2, chr6_dbb_hap3, chr6, chr6_mann_hap4, chr6_mcf_hap5, chr6_qbl_hap6, chr6_ssto_hap7, chr7, chr7_gl000195_random, chr8, chr8_gl000196_random, chr8_gl000197_random, chr9, chr9_gl000198_random, chr9_gl000199_random, chr9_gl000200_random, chr9_gl000201_random, chrM, chrUn_gl000211, chrUn_gl000212, chrUn_gl000213, chrUn_gl000214, chrUn_gl000215, chrUn_gl000216, chrUn_gl000217, chrUn_gl000218, chrUn_gl000219, chrUn_gl000220, chrUn_gl000221, chrUn_gl000222, chrUn_gl000223, chrUn_gl000224, chrUn_gl000225, chrUn_gl000226, chrUn_gl000227, chrUn_gl000228, chrUn_gl000229, chrUn_gl000230, chrUn_gl000231, chrUn_gl000232, chrUn_gl000233, chrUn_gl000234, chrUn_gl000235, chrUn_gl000236, chrUn_gl000237, chrUn_gl000238, chrUn_gl000239, chrUn_gl000240, chrUn_gl000241, chrUn_gl000242, chrUn_gl000243, chrUn_gl000244, chrUn_gl000245, chrUn_gl000246, chrUn_gl000247, chrUn_gl000248, chrUn_gl000249, chrX, chrY]
    features contigs = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, X, Y]

  • SChaluvadiSChaluvadi Member, Broadie, Moderator admin

    @nparmalee I am looking into the known sites file for you and will reply soon!

  • nparmaleenparmalee Member

    Thank you! That will be very helpful!

  • SChaluvadiSChaluvadi Member, Broadie, Moderator admin

    @nparmalee
    Here is a screenshot of a table that contains the best versions of known sites that we suggest:

    Here is a link to the resource bundle: https://console.cloud.google.com/storage/browser/genomics-public-data/resources/broad/hg38/v0

    Please let us know if this does not help.

  • hdhimanhdhiman Member

    Hi,

    I am trying to run second pass of BaseRecalibrator with version 4.1.1.0 but the option -BQSR doesn't seem to work to provide the data table created from the first round of recalibration.
    I tried using -bqsr and --BQSR also as mentioned as an option in some previous versions.
    Even that didn't help. I also tried to find the correct way to do this but failed to find any reference in the current documentation.

    "A USER ERROR has occurred: b is not a recognized option"

    Could you please let me know if there is another tool created for second/repeat recalibration round or the correct option to use the data table (grp file) from the first round.

    Thanks!!

  • nparmaleenparmalee Member

    Thank you, @SChaluvadi! I really appreciate your help!

    Nancy

Sign In or Register to comment.