Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

FilterMutectCalls fails on some samples using gatk-4.1.2.0

ivan108ivan108 SFMember

java version "1.8.0_45"
Getting an error on some samples:

Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /home/jocostello/shared/LG3_Pipeline_HIDE/tools/gatk-4.1.2.0/gatk-package-4.1.2.0-local.jar FilterMutectCalls --intervals /home/jocostello/repositories/UCSF-Costello-Lab/LG3_Pipeline/resources/SeqCap_EZ_Exome_v3_capture.interval_list --interval-padding 0 --contamination-table Z00600t10-contamination.table --tumor-segmentation Z00600t10-segments.table --orientation-bias-artifact-priors Z00600t10-artifact-prior-table.tar.gz --verbosity ERROR --variant NOR-Z00599t10__TUM-Z00600t10.m2.obmm.vcf.gz --output NOR-Z00599t10__TUM-Z00600t10.m2.obmm.cc.vcf.gz --stats Z00600t10-M2FilteringStats.tsv --contamination-estimate 0 --reference /home/jocostello/repositories/UCSF-Costello-Lab/LG3_Pipeline/resources/UCSC_hg19/hg19.fa
[May 10, 2019 8:07:44 PM PDT] org.broadinstitute.hellbender.tools.walkers.mutect.filtering.FilterMutectCalls done. Elapsed time: 0.07 minutes.
Runtime.totalMemory()=2468872192
java.lang.IllegalArgumentException: log10p: Log10-probability must be 0 or less
at org.broadinstitute.hellbender.utils.Utils.validateArg(Utils.java:724)
at org.broadinstitute.hellbender.utils.MathUtils.log10BinomialProbability(MathUtils.java:917)
at org.broadinstitute.hellbender.utils.MathUtils.binomialProbability(MathUtils.java:910)
at org.broadinstitute.hellbender.tools.walkers.mutect.filtering.ContaminationFilter.calculateErrorProbability(ContaminationFilter.java:56)
at org.broadinstitute.hellbender.tools.walkers.mutect.filtering.Mutect2VariantFilter.errorProbability(Mutect2VariantFilter.java:15)
at org.broadinstitute.hellbender.tools.walkers.mutect.filtering.ErrorProbabilities.lambda$new$1(ErrorProbabilities.java:19)
at org.broadinstitute.hellbender.tools.walkers.mutect.filtering.ErrorProbabilities$$Lambda$131/23218037.apply(Unknown Source)
[...]

Switching back to gatk-4.1.1.0 solves the problem, but it would be nice to be able to use latest version...

Thanks!
Ivan

Answers

  • davidbendavidben BostonMember, Broadie, Dev ✭✭✭

    @ivan108 Could you paste the contents of your contamination table input: Z00600t10-contamination.table?

  • ivan108ivan108 SFMember

    Here it is:

    sample contamination error
    Z00600t10 NaN 1.0

  • davidbendavidben BostonMember, Broadie, Dev ✭✭✭

    @ivan108 The problem is that your contamination estimate is a NaN. Could you share the GATK version and command lines you are using for GetPileupSummaries and CalculateContamination? Also, is this an exome?

  • ivan108ivan108 SFMember

    Yes, this is exome.

    Here is GetPileupSummaries for Normal:

    Using GATK jar /home/jocostello/shared/LG3_Pipeline_HIDE/tools/gatk-4.1.2.0/gatk-package-4.1.2.0-local.jar
    Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx160g -jar /home/jocostello/shared/LG3_Pipeline_HIDE/tools/gatk-4.1.2.0/gatk-package-4.1.2.0-local.jar GetPileupSummaries -V /home/GenomeData/gnomAD_hg19/mutect2/mutect2-contamination-var.biall.vcf.gz -L /home/GenomeData/gnomAD_hg19/mutect2/mutect2-contamination-var.biall.vcf.gz --verbosity ERROR -I /costellolab/data1/jocostello/LG3_TEST/exomes_recal/Patient157t10/Z00599t10.bwa.realigned.rmDups.recal.bam --interval-set-rule INTERSECTION -O Z00600t10-normal_pileups.table
    [May 23, 2019 10:59:36 AM PDT] org.broadinstitute.hellbender.tools.walkers.contamination.GetPileupSummaries done. Elapsed time: 0.39 minutes.
    Runtime.totalMemory()=4842848256
    Tool returned:
    SUCCESS

    Same for Tumor:

    Using GATK jar /home/jocostello/shared/LG3_Pipeline_HIDE/tools/gatk-4.1.2.0/gatk-package-4.1.2.0-local.jar
    Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx160g -jar /home/jocostello/shared/LG3_Pipeline_HIDE/tools/gatk-4.1.2.0/gatk-package-4.1.2.0-local.jar GetPileupSummaries -V /home/GenomeData/gnomAD_hg19/mutect2/mutect2-contamination-var.biall.vcf.gz -L /home/GenomeData/gnomAD_hg19/mutect2/mutect2-contamination-var.biall.vcf.gz --verbosity ERROR -R /home/jocostello/repositories/UCSF-Costello-Lab/LG3_Pipeline/resources/UCSC_hg19/hg19.fa -I /costellolab/data1/jocostello/LG3_TEST/exomes_recal/Patient157t10/Z00600t10.bwa.realigned.rmDups.recal.bam --interval-set-rule INTERSECTION -O Z00600t10-pileups.table
    [May 23, 2019 11:00:03 AM PDT] org.broadinstitute.hellbender.tools.walkers.contamination.GetPileupSummaries done. Elapsed time: 0.39 minutes.
    Runtime.totalMemory()=5135400960
    Tool returned:
    SUCCESS

    CalculateContamination:

    Using GATK jar /home/jocostello/shared/LG3_Pipeline_HIDE/tools/gatk-4.1.2.0/gatk-package-4.1.2.0-local.jar
    Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx160g -jar /home/jocostello/shared/LG3_Pipeline_HIDE/tools/gatk-4.1.2.0/gatk-package-4.1.2.0-local.jar CalculateContamination --verbosity ERROR -I Z00600t10-pileups.table -O Z00600t10-contamination.table --tumor-segmentation Z00600t10-segments.table --matched-normal Z00600t10-normal_pileups.table
    [May 23, 2019 11:00:15 AM PDT] org.broadinstitute.hellbender.tools.walkers.contamination.CalculateContamination done. Elapsed time: 0.14 minutes.
    Runtime.totalMemory()=2363490304
    Tool returned:
    SUCCESS

    Thanks!

  • davidbendavidben BostonMember, Broadie, Dev ✭✭✭

    It's very suspicious that GetPileupSummaries is taking less than a minute. That means that it is only inputting a very small number of reads for some reason. What is the origin of the file mutect2-contamination-var.biall.vcf.gz? Is it the small_exac_common.vcf from the GATK best practices bucket or something else? If it's something else, how was it generated?

  • yyjyyj Member
    @davidben
    I also meet the same problem.this is exome,I use gatk-4.1.2.0
    This is the command lines:
    for sample in *.bam;
    do base=${sample%%.*};
    gatk4 GetPileupSummaries
    -I $sample
    -V ../../reference/somatic-hg38_af-only-gnomad.hg38.SNP_biallelic.vcf.gz
    -L ../../reference/S07604514_hs_hg38/S07604514_hs_hg38/S07604514_Padded.bed
    -O ./contamination/"$base"_getpileupsummaies.table;done
    The first 30 lines of getpileupsummaies.table:
    contig position ref_count alt_count other_alt_count allele_frequency
    chr1 12882 0 0 0 0.021
    chr1 13110 0 0 0 0.149
    chr1 13143 0 0 0 0.05
    chr1 13149 0 0 0 0.011
    chr1 13178 0 0 0 0.061
    chr1 13273 0 0 0 0.115
    chr1 13281 0 0 0 0.04
    chr1 13418 0 0 0 0.183
    chr1 13613 0 0 0 0.02
    chr1 13621 0 0 0 0.011
    chr1 13649 0 0 0 0.054
    chr1 13684 0 0 0 0.027
    chr1 13752 0 0 0 0.018
    chr1 13757 0 0 0 0.021
    chr1 14522 0 0 0 0.05
    chr1 14542 0 0 0 0.072
    chr1 14574 0 0 0 0.101
    chr1 14590 0 0 0 0.097
    chr1 14599 0 0 0 0.123
    chr1 14604 0 0 0 0.125
    chr1 14610 0 0 0 0.128
    chr1 14626 0 0 0 0.011
    chr1 14671 0 0 0 0.011
    chr1 14677 0 0 0 0.058
    chr1 14773 0 0 0 0.016
    chr1 14843 0 0 0 0.018
    chr1 14933 0 0 0 0.158
    chr1 14948 0 0 0 0.055
  • davidbendavidben BostonMember, Broadie, Dev ✭✭✭

    @yyj The best practices are to use this file: gs://gatk-best-practices/somatic-hg38/small_exac_common_3.hg38.vcf.gz, not gnomAD, for the contamination pipeline, and the -L argument should in most cases be the same as the -V argument. That being said I think thi sissue might be something else. Does your pileup summaries table have zeroes all the way down?

  • ivan108ivan108 SFMember

    For testing purposes I am using a small downsampled dataset with only ~700,000 read pairs and most of the reads are from just two chromosomes to make it run faster..

    The mutect2-contamination-var.biall.vcf.gz file was produced from https://storage.googleapis.com/gnomad-public/release/2.1/vcf/exomes/gnomad.exomes.r2.1.sites.vcf.bgz
    by applying a bash script, which is an implementation of Broad script:
    https://github.com/broadinstitute/gatk/blob/master/scripts/mutect2_wdl/mutect_resources.wdl

    Where can I get small_exac_common.vcf? I don't have access to the cloud..

    Thanks!

  • yyjyyj Member
    Hi @ davidben
    My pileup summaries table don't have zeroes all the way down,many like this:
    chr1 8330981 36 0 0 0.019
    chr1 8335500 139 0 0 0.144
    chr1 8339614 104 0 0 0.015
    chr1 8344063 32 0 0 0.041
    chr1 8355230 38 0 0 0.186
    chr1 8355622 27 0 0 0.034
    chr1 8355740 8 0 0 0.016
    chr1 8358159 22 0 0 0.0
    chr1 8358650 149 0

    I use small_exac_common_3.hg38.vcf.gz as the input of -L and -V,six sample only one get getpilupsummaries.table file,the other 5 sample all show that mang reads filtered by : (((((((MappingQualityAvailableReadFilter AND MappingQualityNotZeroReadFilter) AND MappedReadFilter) AND PrimaryLineReadFilter) AND NotDuplicateReadFilter) AND PassesVendorQualityCheckReadFilter) AND NonZeroReferenceLengthAlignmentReadFilter) AND MateOnSameContigOrNoMappedMateReadFilter)


    The first 50 lines of the only one getpileupsummaies.table:
    #<METADATA>SAMPLE=s0020487998-N
    contig position ref_count alt_count other_alt_count allele_frequency
    chr1 17365 6 0 0 0.136
    chr1 17385 7 1 0 0.122
    chr1 69761 0 0 0 0.113
    chr1 187887 0 0 0 0.136
    chr1 187907 0 0 0 0.122
    chr1 494515 87 0 0 0.065
    chr1 786377 0 0 0 0.051
    chr1 803762 3 0 0 0.073
    chr1 962358 656 0 0 0.08
    chr1 963087 317 0 0 0.052
    chr1 972242 379 0 0 0.053
    chr1 973034 106 0 0 0.058
    chr1 973862 75 0 0 0.059
    chr1 974039 35 49 1 0.147
    chr1 976536 186 0 0 0.155
    chr1 981210 67 0 0 0.1
    chr1 982112 58 0 0 0.145
    chr1 1047464 400 0 0 0.057
    chr1 1055000 33 0 0 0.195
    chr1 1175611 127 0 0 0.076
    chr1 1176209 50 0 0 0.065
    chr1 1179288 110 0 0 0.069
    chr1 1180614 250 0 0 0.141
    chr1 1180851 133 0 0 0.111
    chr1 1182895 219 0 0 0.104
    chr1 1184927 141 0 0 0.159
    chr1 1197405 9 22 0 0.139
    chr1 1197558 51 55 0 0.143
    chr1 1197586 45 47 0 0.131
    chr1 1197874 22 13 0 0.1
    chr1 1197893 0 32 0 0.152
    chr1 1203822 173 0 0 0.084
    chr1 1211917 36 35 0 0.059
    chr1 1217733 126 162 2 0.113
    chr1 1228424 134 157 1 0.147
    chr1 1242865 261 0 1 0.123
    chr1 1243102 111 0 1 0.091
    chr1 1243545 167 0 0 0.061
    chr1 1243929 106 0 2 0.095
    chr1 1281201 47 0 0 0.064
    chr1 1281631 141 0 0 0.085
    chr1 1281640 137 0 0 0.068
    chr1 1285574 0 124 0 0.199
    chr1 1288005 42 55 0 0.095
    chr1 1290722 0 158 0 0.13
    chr1 1290912 61 57 0 0.082
    chr1 1291377 104 0 0 0.079

    This sample's contamination.table alse as:
    contamination error
    NaN 1.0

    Thanks!
  • davidbendavidben BostonMember, Broadie, Dev ✭✭✭

    @yyj Thanks for thinking to mention the read filters! I wonder if the MateOnSameContig filter, which we disabled in Mutect2 itself, is the culprit here. It's purely an oversight that GetPileupSummaries still has this filter. Then again, we do test on hg38 and it's odd that we haven't seen this before. Although perhaps the issue was present before and it's just that recent changes to FilterMutectCalls are causing an error to be thrown (which is what we want -- if GetPileupSummaries is giving bad output we don't want it to pass silently). Anyway, this gives me a lead to work with. I will let you know if I need more.

  • davidbendavidben BostonMember, Broadie, Dev ✭✭✭

    Where can I get small_exac_common.vcf? I don't have access to the cloud

    @ivan108 To access our resource files in google buckets you just need to install gsutil and then run the command gsutil cp gs://___bucket path___ <local path> to download files. I myself don't know how to "use the cloud" (ie spin up a VM, run code on the VM, download results -- never done it!) but I find gsutil cp doable.

  • davidbendavidben BostonMember, Broadie, Dev ✭✭✭

    @yyj @ivan108 We recently fixed one issue in the master branch but it's not in a release yet. You could build the master branch of the gatk repo or try out a patch jar here: gs://broad-dsde-methods-davidben/gatk-builds/contamination-patch-5-27-2019.jar.

    This resolved the issue in another forum thread, https://gatkforums.broadinstitute.org/gatk/discussion/comment/58918, although it's possible that here there are additional complications with hg38.

  • Polar_bearPolar_bear FreedomMember
    edited June 21

    @davidben said:
    @yyj @ivan108 We recently fixed one issue in the master branch but it's not in a release yet. You could build the master branch of the gatk repo or try out a patch jar here: gs://broad-dsde-methods-davidben/gatk-builds/contamination-patch-5-27-2019.jar.

    This resolved the issue in another forum thread, https://gatkforums.broadinstitute.org/gatk/discussion/comment/58918, although it's possible that here there are additional complications with hg38.

    Dear David @davidben
    I encountered the same error with 4.1.2.0 for #####java.lang.IllegalArgumentException: log10p: Log10-probability must be 0 or less. Could you plz give me another download link for contamination-patch-5-27-2019.jar to test my problem? The gs:// link seems invalid for have no right to access it in google cloud.
    Chris

  • RishabhRishabh IndiaMember
    @davidben
    I also got the same result. The command i use were

    java -jar /PathToGatk4.1.2.0 GetPileupSummaries
    -I tumor.bam
    -V output.vcf
    -L V6.bed
    -O tumor_getpileupsummaries.table

    I generated output.vcf by this command (is this approach right, if not can you tell me the command used to generate this file )
    COMM : bcftools query -H -f '%CHROM %POS %ID %REF %ALT %QUAL %FILTER AF=[%AF]\t \n' normal1.vcf.gz > filtered.vcf
    and then i added the header to this file and filtered bialleles by select variants (command i used is).

    java -jar /PathToGatk SelectVariants
    -R Hg19_Reference.fa
    -L V6.bed
    -V filter.vcf.gz
    --restrict-alleles-to BIALLELIC
    -O output2.vcf

    OUTPUT
    #SAMPLE=Hg19_P1
    contig position ref_count alt_count other_alt_count allele_frequency
    chr1 981727 0 0 0 0.076
    chr1 986538 0 0 0 0.115
    chr1 986541 0 0 0 0.154
    chr1 986545 0 0 0 0.148
    chr1 986549 0 0 0 0.143
    chr1 986557 0 0 0 0.154

    Command used for generating Calculate Contamination
    java -jar /PathToGatk CalculateContamination
    -I tumor_getpileupsummaries.table
    -tumor-segmentation segments.table
    -O tumor_calculatecontamination.table

    OUTPUT (segments.table)
    #SAMPLE=Hg19_P1
    contig start end minor_allele_fraction

    OUTPUT (tumor_calculatecontamination.table)
    sample contamination error
    Hg19_P1 NaN 1.0

    I am stuck at this, no values are comming, except 0.
  • Tiffany_at_BroadTiffany_at_Broad Cambridge, MAMember, Administrator, Broadie, Moderator admin

    @Polar_bear - did you ever get the jar? I am trying to find a public google bucket to link you to with the jar soon.

  • Tiffany_at_BroadTiffany_at_Broad Cambridge, MAMember, Administrator, Broadie, Moderator admin

    Hey @Rishabh -
    With regard to your question about generating the vcf, it should be a common germline variant sites VCF, derived from the gnomAD resource.
    I grabbed this bit of text and script from this tutorial as it may be helpful:
    "This WDL script mutect_resources.wdl takes a large gnomAD VCF or other typical cohort VCF and from it prepares both a simplified germline resource for Mutect2 and a common biallelic variants resource for use in GetPileUpSummaries. The script first generates a sites-only VCF and in the process removes all extraneous annotations except for AF allele frequencies. We recommend this simplification as the unburdened VCF allows Mutect2 to run much more efficiently. To generate the common biallelic variants resource, the script then selects the biallelic sites from the sites-only VCF."

    Let me know if this is useful!

  • davidbendavidben BostonMember, Broadie, Dev ✭✭✭

    @Rishabh The resource Tiffany mentioned is available in our best practices google bucket: gs://gatk-best-practices/somatic-b37/small_exac_common_3.vcf. Note that, as the tutorial she pointed to mentions, you need to use this resource for both the -V and -L arguments of GetPileupSummaries. Since this resource is already restricted to biallelic SNPs, the SelectVariants step is unnecessary.

  • 29043594952904359495 Member

    I aslo meet this.
    Runtime.totalMemory()=2228224000
    java.lang.IllegalArgumentException: log10p: Log10-probability must be 0 or less

    But 4.1.1.0 is ok to run that

  • Tiffany_at_BroadTiffany_at_Broad Cambridge, MAMember, Administrator, Broadie, Moderator admin

    We have uploaded a copy of the path jar David refers to in a public folder here
    for those not using gsutil to copy it- gs://broad-dsde-methods-davidben/gatk-builds/contamination-patch-5-27-2019.jar

    To access our resource files in google buckets you just need to install gsutil and then run the command gsutil cp gs://___bucket path___ <local path> to download files

  • RishabhRishabh IndiaMember
    edited August 8

    @Tiffany_at_Broad
    Hello,
    Thankyou for replying as you said i have looked into the tutorial and script . As i have taken some other sample data with Hg19 as reference, just using the same parameters as of this tutorial and also tutorial provide by you "https://gatkforums.broadinstitute.org/gatk/discussion/24057/how-to-call-somatic-mutations-using-gatk4-mutect2#latest"
    I am having the same question as of generating the small_exac_common_3.vcf file, what were the input sample and parameters used to generate this file. Please do let me know if i am not clear and have to be more descriptive. If you need I can share all the command and file that i generated.
    Thankyou

Sign In or Register to comment.