ContEst with GATK4

Hi everyone,

We would like to calculate intra-individual contamination in our data with ContEst, i've been trying the whole day to use ContEst with GATK4. Though it looks like ContEst isn't in GATK4 anymore, and i can't find it in the list of tools present in section "Diagnostics and Quality Control".

I tried many command line with example dataset, but none of them worked...

for this command line i get an error:

java -Xmx2g -jar \
/Users/tools/gatk- \
-T ContEst.jar \
-I ContEst_example_data/chr20_sites.bam \
-R human_g1k_v37.fasta \
-B:pop,vcf hg18_population_stratified_af_hapmap_3.3.vcf \
-T Contamination \
-B:genotypes,vcf ContEst_example_data/hg00142.vcf \
-BTI genotypes \
-o contamination_results_chr20.txt

"A USER ERROR has occurred: '-T' is not a valid command."

i also tried the command line from this page :

java -jar ContEst.jar -T Contamination -h

but i get errors:
Exception in thread "main" java.lang.ExceptionInInitializerError
at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.(
at org.broadinstitute.sting.gatk.CommandLineExecutable.(
at org.broadinstitute.sting.gatk.CommandLineGATK.(
at org.broadinstitute.sting.gatk.CommandLineGATK.main(
Caused by: java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.RuntimeException: could not create class file from P11KeyAgreement$AllowKDF.class
at org.reflections.Reflections.scan(
at org.reflections.Reflections.(
at org.broadinstitute.sting.utils.classloader.PluginManager.(
... 4 more

i guess ContEst.jar is looking for GATK...

i also tried this with the same error:

java -Xmx2g -jar ContEst.jar \
-I ContEst_example_data/chr20_sites.bam \
-R human_g1k_v37.fasta \
-B:pop,vcf hg19_population_stratified_af_hapmap_3.3.vcf \
-T Contamination \
-B:genotypes,vcf hg00142.vcf \
-BTI genotypes \
-o contamination_results_chr20.txt

from this page :
i've seen this command line:

java -jar GenomeAnalysisTK.jar \
-T ContEst \
-R hs37d.fa \
-I tumor.bam \
--genotypes Panel_of_normal.vcf \
--popfile hg19_population_stratified_af_hapmap_3.3.FIX.vcf.gz \
-L target.bed \
-isr INTERSECTION -o contamination_out.txt

where we use the GenomeAnalysisTK.jar and ContEst as an option, which is different...
But i've been unable to find any GenomeAnalysisTK.jar but only gatk-package-, are these 2 the same ?

Anyway still doesn't work...

So if you could help me by giving me the correct command line syntax it would be nice.

Thank you and have a nice day


  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin
    edited April 2018


    ContEst is no longer the way to calculate cross sample contamination in tumor samples. There are two new tools called GetPileupSummaries and CalculateContamination that replace it. Have a look at the hands on Mutect2 tutorial we present at workshops for more information. They are in the Presentations section. Also, you may find this tutorial helpful.


  • dpratdprat Member

    @Sheila Thanks for your answer !

    After hours of research, I finally managed to make it work with GATK3 :smiley:
    Though it doesn't work on my data and i only get this in my output file :

    Warning: We're throwing out lane META since it has fewer than 500 read bases at genotyped positions
    name population population_fit contamination confidence_interval_95_width confidence_interval_95_low confidence_interval_95_high sites

    I guess i don't have enough depth on my data...
    I haven't been able to find a solution for this at the moment, if you have any idea ?

    So i tried with the 2 new tools : GetPileupSummaries and CalculateContamination, though i'm not sure to understand which VCF i'm supposed to use, the one from gnomAD ? or my own VCF file ?

    My data are WGS with low depth, around 1 - 2X, should i download all the VCF separately from here :
    and then apply a "cat" command line ?
    or is there any unique file for the entire genome ?

    Thank you

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin


    1-2X coverage is very low. I think ContEst does need to see more coverage than that.

    I also think the new workflow tools do a better job on low coverage. Have a look at this article for more information on which VCF to use.

    Let us know how things go :smile:


  • dpratdprat Member


    So i tested with the 2 new tools GetPileupSummaries and CalculateContamination.
    I used VCF file from gnomAD, for now i tested on chr1, 2 and 3 separately
    And i got 0% contamination in the 3 case

    I also tried by concatening chr1, 2 and 3 in one unique file, but i still got 0%

    I sorted my BAM by chr but i still get 0%

    Should i try on all the chr ?

    Or maybe the sensitivity for my data is too high ?

    I don't know what to do to solve the problem...

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin


    Can you check on the entire BAM file?


  • dpratdprat Member

    hi @Sheila like i said here i'm working on hg19 and not hg38, so i have to modify all the VCF of gnomAD to make it work, and i don't have enough space right now to do it.
    I'm working on this problem to find a way to download all of these and uncompress it...

    I'm actually trying to make this work on my data which are NIPT data. We are working on a way to calculate the amount of foetal DNA circulating in the mother blood. Our idea was to transpose your program which is able to detect contamination "intra-individual" to our case, where the foetal DNA would correspond to a contamination.

    Though for now it's not working

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin


    i'm working on hg19 and not hg38, so i have to modify all the VCF of gnomAD to make it work, and i don't have enough space right now to do it.

    I am not sure if it will help, but can you try running per-chromosome? Or, do you have a cluster or access to the cloud that you can run on? Also, FireCloud is giving free credits if you sign up :smiley:


Sign In or Register to comment.