Service notice: Several of our team members are on vacation so service will be slow through at least July 13th, possibly longer depending on how much backlog accumulates during that time. This means that for a while it may take us more time than usual to answer your questions. Thank you for your patience.

ContEst with GATK4

Hi everyone,

We would like to calculate intra-individual contamination in our data with ContEst, i've been trying the whole day to use ContEst with GATK4. Though it looks like ContEst isn't in GATK4 anymore, and i can't find it in the list of tools present in section "Diagnostics and Quality Control".

I tried many command line with example dataset, but none of them worked...

for this command line i get an error:

java -Xmx2g -jar \
/Users/tools/gatk-4.0.3.0/gatk-package-4.0.3.0-local.jar \
-T ContEst.jar \
-I ContEst_example_data/chr20_sites.bam \
-R human_g1k_v37.fasta \
-B:pop,vcf hg18_population_stratified_af_hapmap_3.3.vcf \
-T Contamination \
-B:genotypes,vcf ContEst_example_data/hg00142.vcf \
-BTI genotypes \
-o contamination_results_chr20.txt

"A USER ERROR has occurred: '-T' is not a valid command."

i also tried the command line from this page : http://archive.broadinstitute.org/cancer/cga/contest_run

java -jar ContEst.jar -T Contamination -h

but i get errors:
Exception in thread "main" java.lang.ExceptionInInitializerError
at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.(GenomeAnalysisEngine.java:150)
at org.broadinstitute.sting.gatk.CommandLineExecutable.(CommandLineExecutable.java:60)
at org.broadinstitute.sting.gatk.CommandLineGATK.(CommandLineGATK.java:50)
at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:86)
Caused by: java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.RuntimeException: could not create class file from P11KeyAgreement$AllowKDF.class
at org.reflections.Reflections.scan(Reflections.java:166)
at org.reflections.Reflections.(Reflections.java:91)
at org.broadinstitute.sting.utils.classloader.PluginManager.(PluginManager.java:60)
... 4 more

i guess ContEst.jar is looking for GATK...

i also tried this with the same error:

java -Xmx2g -jar ContEst.jar \
-I ContEst_example_data/chr20_sites.bam \
-R human_g1k_v37.fasta \
-B:pop,vcf hg19_population_stratified_af_hapmap_3.3.vcf \
-T Contamination \
-B:genotypes,vcf hg00142.vcf \
-BTI genotypes \
-o contamination_results_chr20.txt

from this page : https://gatkforums.broadinstitute.org/gatk/discussion/8363/contest-without-normal-bam#latest
i've seen this command line:

java -jar GenomeAnalysisTK.jar \
-T ContEst \
-R hs37d.fa \
-I tumor.bam \
--genotypes Panel_of_normal.vcf \
--popfile hg19_population_stratified_af_hapmap_3.3.FIX.vcf.gz \
-L target.bed \
-isr INTERSECTION -o contamination_out.txt

where we use the GenomeAnalysisTK.jar and ContEst as an option, which is different...
But i've been unable to find any GenomeAnalysisTK.jar but only gatk-package-4.0.3.0-local.jar, are these 2 the same ?

Anyway still doesn't work...

So if you could help me by giving me the correct command line syntax it would be nice.

Thank you and have a nice day

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator
    edited April 6

    @dprat
    Hi,

    ContEst is no longer the way to calculate cross sample contamination in tumor samples. There are two new tools called GetPileupSummaries and CalculateContamination that replace it. Have a look at the hands on Mutect2 tutorial we present at workshops for more information. They are in the Presentations section. Also, you may find this tutorial helpful.

    -Sheila

  • dpratdprat Member

    @Sheila Thanks for your answer !

    After hours of research, I finally managed to make it work with GATK3 :smiley:
    Though it doesn't work on my data and i only get this in my output file :

    Warning: We're throwing out lane META since it has fewer than 500 read bases at genotyped positions
    name population population_fit contamination confidence_interval_95_width confidence_interval_95_low confidence_interval_95_high sites

    I guess i don't have enough depth on my data...
    I haven't been able to find a solution for this at the moment, if you have any idea ?

    So i tried with the 2 new tools : GetPileupSummaries and CalculateContamination, though i'm not sure to understand which VCF i'm supposed to use, the one from gnomAD ? or my own VCF file ?

    My data are WGS with low depth, around 1 - 2X, should i download all the VCF separately from here : http://gnomad.broadinstitute.org/downloads
    and then apply a "cat" command line ?
    or is there any unique file for the entire genome ?

    Thank you

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @dprat
    Hi,

    1-2X coverage is very low. I think ContEst does need to see more coverage than that.

    I also think the new workflow tools do a better job on low coverage. Have a look at this article for more information on which VCF to use.

    Let us know how things go :smile:

    -Sheila

  • dpratdprat Member

    Hi,

    So i tested with the 2 new tools GetPileupSummaries and CalculateContamination.
    I used VCF file from gnomAD, for now i tested on chr1, 2 and 3 separately
    And i got 0% contamination in the 3 case

    I also tried by concatening chr1, 2 and 3 in one unique file, but i still got 0%

    I sorted my BAM by chr but i still get 0%

    Should i try on all the chr ?

    Or maybe the sensitivity for my data is too high ?

    I don't know what to do to solve the problem...

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @dprat
    Hi,

    Can you check on the entire BAM file?

    -Sheila

  • dpratdprat Member

    hi @Sheila like i said here i'm working on hg19 and not hg38, so i have to modify all the VCF of gnomAD to make it work, and i don't have enough space right now to do it.
    I'm working on this problem to find a way to download all of these and uncompress it...

    I'm actually trying to make this work on my data which are NIPT data. We are working on a way to calculate the amount of foetal DNA circulating in the mother blood. Our idea was to transpose your program which is able to detect contamination "intra-individual" to our case, where the foetal DNA would correspond to a contamination.

    Though for now it's not working

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @dprat
    Hi,

    i'm working on hg19 and not hg38, so i have to modify all the VCF of gnomAD to make it work, and i don't have enough space right now to do it.

    I am not sure if it will help, but can you try running per-chromosome? Or, do you have a cluster or access to the cloud that you can run on? Also, FireCloud is giving free credits if you sign up :smiley:

    -Sheila

Sign In or Register to comment.