We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

Bug in GATK exome pipeline - VerifyBamID step

Hi, In the CheckContamination task of the gatk-workflows exome pipeline, there is a bug where VerifyBamID sometimes fails with the following error. Perhaps due to exome data inputs that are a bit smaller than what the Broad usually produces.
NOTICE - Process chr22:50683836-50683836...
NOTICE - Process chr22:50694325-50694325...
NOTICE - Process chr22:50745507-50745507...
NOTICE - Process chr22:50774185-50774185...
NOTICE - Number of marker in Reference Matrix:99976
NOTICE - Number of marker shared with input file:9565
NOTICE - Mean Depth:26.386618
NOTICE - SD Depth:43.449975
NOTICE - 9364 SNP markers remained after sanity check.

WARNING -
Insufficient Available markers, check input bam depth distribution in output pileup file after specifying --OutputPileup
2019/08/31 09:54:55 Starting delocalization.
2019/08/31 09:54:57 Delocalizing output /cromwell_root/UDP-1103.exome.preBqsr.selfSM -> gs://fc-secure-da2df7c1-77ef-4ed0-95ab-20cbed757a2a/95aed782-1446-4314-bb03-e9388219d197/ExomeGermlineSingleSample/b5e52cca-926e-44c1-8ddc-d55571b087ab/call-UnmappedBamToAlignedBam/UnmappedBamToAlignedBam/b6b3104e-8119-4636-8f04-7c601578ddf0/call-CheckContamination/UDP-1103.exome.preBqsr.selfSM
Required file output '/cromwell_root/UDP-1103.exome.preBqsr.selfSM' does not exist.

Answers

  • bshifawbshifaw Member, Broadie, Moderator admin

    @GER Thanks for reporting this.

    Here is what found out from my team

    “It [VerifyBamID] reports warning if the number of polymorphic markers are less than 1,000 or less than 10% of provided marker. I guess you provided genome-wide marker data but the sequence data is exome (is that right?). It is better to create exome-only VCF as your reference for more accurate inference.”

    The marker data being contamination_sites_ud, contamination_sites_mu, and contamination_sites_bed.

    They've mentioned a possible workaround is to use bedtools intersect to remove the markers that are not exome and test the pipeline to see if it works.
    I'll inform the developers of the pipeline of the issue.

Sign In or Register to comment.