Holiday Notice:
The Frontline Support team will be slow to respond December 17-18 due to an institute-wide retreat and offline December 22- January 1, while the institute is closed. Thank you for your patience during these next few weeks. Happy Holidays!

GATK4's CalculateContamination reports no hom alt sites found

I have been trying to use GATK4's CalculateContamination but the output is not as expected:

level   contamination   error
whole_bam   0.0 1.0

The GATK log contained warnings that there was not enough data points to segment and that no hom alt sites were found.

Using GATK jar /mnt/projects/dlho/tancrc/bcbio_pipeline/anaconda/share/gatk4-4.0.4.0-0/gatk-package-4.0.4.0-local.jar
Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx16g -jar /mnt/projects/dlho/tancrc/bcbio_pipeline/anaconda/share/gatk4-4.0.4.0-0/gatk-package-4.0.4.0-local.jar CalculateContamination -I out/BC002-03042014_A_getpileupsummaries.table -O out/BC002-03042014_A_calculatecontamination.table
Picked up _JAVA_OPTIONS: -XX:+UseSerialGC
09:46:05.758 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/mnt/projects/dlho/tancrc/bcbio_pipeline/anaconda/share/gatk4-4.0.4.0-0/gatk-package-4.0.4.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
09:46:05.872 INFO  CalculateContamination - ------------------------------------------------------------
09:46:05.872 INFO  CalculateContamination - The Genome Analysis Toolkit (GATK) v4.0.4.0
09:46:05.872 INFO  CalculateContamination - For support and documentation go to https://software.broadinstitute.org/gatk/
09:46:05.872 INFO  CalculateContamination - Executing as [email protected] on Linux v2.6.32-431.el6.x86_64 amd64
09:46:05.872 INFO  CalculateContamination - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_102-b14
09:46:05.873 INFO  CalculateContamination - Start Date/Time: May 14, 2018 9:46:05 AM SGT
09:46:05.873 INFO  CalculateContamination - ------------------------------------------------------------
09:46:05.873 INFO  CalculateContamination - ------------------------------------------------------------
09:46:05.873 INFO  CalculateContamination - HTSJDK Version: 2.14.3
09:46:05.873 INFO  CalculateContamination - Picard Version: 2.18.2
09:46:05.873 INFO  CalculateContamination - HTSJDK Defaults.COMPRESSION_LEVEL : 2
09:46:05.873 INFO  CalculateContamination - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
09:46:05.873 INFO  CalculateContamination - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
09:46:05.873 INFO  CalculateContamination - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
09:46:05.873 INFO  CalculateContamination - Deflater: IntelDeflater
09:46:05.874 INFO  CalculateContamination - Inflater: IntelInflater
09:46:05.874 INFO  CalculateContamination - GCS max retries/reopens: 20
09:46:05.874 INFO  CalculateContamination - Using google-cloud-java patch 6d11bef1c81f885c26b2b56c8616b7a705171e4f from https://github.com/droazen/google-cloud-java/tree/dr_all_nio_fixes
09:46:05.874 INFO  CalculateContamination - Initializing engine
09:46:05.874 INFO  CalculateContamination - Done initializing engine
09:46:05.935 WARN  KernelSegmenter - Specified dimension of the kernel approximation (100) exceeds the number of data points (2) to segment; using all data points to calculate kernel matrix.
09:46:05.961 WARN  KernelSegmenter - Number of points needed to calculate local changepoint costs (2 * window size = 100) exceeds number of data points (2).  Local changepoint costs will not be calculated for this window size.
09:46:05.961 WARN  KernelSegmenter - No changepoint candidates were found.  The specified window sizes may be inappropriate, or there may be insufficient data points
09:46:06.083 INFO  KernelSegmenter - Found 0 changepoints after applying the changepoint penalty.
09:46:06.090 WARN  KernelSegmenter - Specified dimension of the kernel approximation (100) exceeds the number of data points (3) to segment; using all data points to calculate kernel matrix.
09:46:06.090 WARN  KernelSegmenter - Number of points needed to calculate local changepoint costs (2 * window size = 100) exceeds number of data points (3).  Local changepoint costs will not be calculated for this window size.
09:46:06.090 WARN  KernelSegmenter - No changepoint candidates were found.  The specified window sizes may be inappropriate, or there may be insufficient data points
09:46:06.091 INFO  KernelSegmenter - Found 0 changepoints after applying the changepoint penalty.
09:46:06.091 WARN  KernelSegmenter - Specified dimension of the kernel approximation (100) exceeds the number of data points (2) to segment; using all data points to calculate kernel matrix.
09:46:06.092 WARN  KernelSegmenter - Number of points needed to calculate local changepoint costs (2 * window size = 100) exceeds number of data points (2).  Local changepoint costs will not be calculated for this window size.
09:46:06.092 WARN  KernelSegmenter - No changepoint candidates were found.  The specified window sizes may be inappropriate, or there may be insufficient data points
09:46:06.092 INFO  KernelSegmenter - Found 0 changepoints after applying the changepoint penalty.
09:46:06.093 WARN  KernelSegmenter - Specified dimension of the kernel approximation (100) exceeds the number of data points (1) to segment; using all data points to calculate kernel matrix.
09:46:06.093 WARN  KernelSegmenter - Number of points needed to calculate local changepoint costs (2 * window size = 100) exceeds number of data points (1).  Local changepoint costs will not be calculated for this window size.
09:46:06.093 WARN  KernelSegmenter - No changepoint candidates were found.  The specified window sizes may be inappropriate, or there may be insufficient data points
09:46:06.093 INFO  KernelSegmenter - Found 0 changepoints after applying the changepoint penalty.
09:46:06.113 WARN  CalculateContamination - No hom alt sites found!  Perhaps GetPileupSummaries was run on too small of an interval, or perhaps the sample was extremely inbred or haploid.
09:46:06.116 WARN  CalculateContamination - No hom alt sites found!  Perhaps GetPileupSummaries was run on too small of an interval, or perhaps the sample was extremely inbred or haploid.
09:46:06.117 WARN  CalculateContamination - No hom alt sites found!  Perhaps GetPileupSummaries was run on too small of an interval, or perhaps the sample was extremely inbred or haploid.

To get the pileup file required for CalculateContamination I used GetPileupSummaries and restricted the region with -L to a bedfile containing 77 genes which are of interest. The pileup file looks normal and I have 311 variants in the file though, is this not enough to CalculateContamination? Can CalculateContamination not be performed on small targeted sequencing panels? Would appreciate if someone could assist pls!

Best Answer

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @phu5ion
    Hi,

    It looks like you don't have enough data to run the contamination tools. Can you try running on your entire BAM file or are the 77 genes the entire BAM file? Have a look at this tutorial and the hands on tutorials in the Presentations section for more information.

    -Sheila

  • minemine Member
    edited June 25

    Hello,

    I have used the pipeline given for GATk3 for my RNA-seq samples. https://gatkforums.broadinstitute.org/gatk/discussion/3892/the-gatk-best-practices-for-variant-calling-on-rnaseq-in-full-detail

    I saw on your forum that SplitNcigarReads in GATK3 is not available for GATK4. So I am not sure that I must use GATK4 in my all steps.

    I try to apply GetPileupSummaries and CalculateContamination on my tumor bam file that I obtained from GATK3 RNA-seq pipeline. I have used all vcf files downloaded from gnomeAD while using GetPileupSummaries. But I have similiar problem when I used CalculateContamination. Only the error column is different and it is 0.0 as I gave below.

    level contamination error
    whole_bam 0.0 1.0

    Of course, FilterMutectCalls doesn't execute because of my contamination table.

    What can I do to handle this problem?

    Thank you so much.

  • minemine Member
    edited June 26

    Hello,

    Sorry for my previous post. I realized that I have used wrong gnomad file. I download exomes one from gnomAD now and try to use GetPileupSummaries and CalculateContamination again.

    The output of the contamination is the file I gaved liftover.txt.

    And the output of contamination table is

    level contamination error
    whole_bam 0.0 0.0

    Would you help me to find the wrong thing about this file, please?

    Thank you so much.

  • minemine Member

    And I forgot to denote that I have no matched samples but I created a PON file from my normal samples and I created vcf file by using only tumor sample. Can I still use GetPileupSummaries, CalculateContamination and FilterMutectCalls commands?

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @mine
    Hi,

    The error message says you "hit memory limit at least once during execution. This may or may not result in some failure." It looks like you need to give more memory to the tools. https://software.broadinstitute.org/gatk/documentation/article?id=11050

    -Sheila

  • minemine Member

    I didn't realize the error at the end of the file. I know the meaning of this error. I am really so sorry. I take up your time. Thank you so much.

  • manolismanolis Member ✭✭
    edited November 15

    Hi, I have a similar problem related to the first part of this thread.

    GATK 4.0.11.0, linux server, WES

    I read the other treads 1, 2, 3 but I didn't find a solution to how change the set up.

    I used the entire bam file, is an old WES. I have an output file and I used it in the FilterMutectCalls step without any error.

    Running:
        java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -XX:GCTimeLimit=50 -XX:GCHeapFreeLimit=10 -XX:ConcGCThreads=1 -XX:ParallelGCThreads=2 -Xmx5g -jar /share/apps/bio/gatk-4.0.11.0/gatk-package-4.0.11.0-local.jar CalculateContamination -I /home/manolis/GATK4/2.BQSR/TCRBOA1_T_getpileupsummaries.table -O /home/manolis/GATK4/2.BQSR/TCRBOA1_T_TCRBOA1_N_contamination.table --tumor-segmentation /home/manolis/GATK4/2.BQSR/segments.table -matched /home/manolis/GATK4/2.BQSR/TCRBOA1_N_getpileupsummaries.table --tmp-dir /home/manolis/GATK4/tmp/
    12:12:41.326 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/share/apps/bio/gatk-4.0.11.0/gatk-package-4.0.11.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
    12:12:43.079 INFO  CalculateContamination - ------------------------------------------------------------
    12:12:43.080 INFO  CalculateContamination - The Genome Analysis Toolkit (GATK) v4.0.11.0
    12:12:43.080 INFO  CalculateContamination - For support and documentation go to https://software.broadinstitute.org/gatk/
    12:12:43.080 INFO  CalculateContamination - Executing as [email protected] on Linux v4.4.0-138-generic amd64
    12:12:43.080 INFO  CalculateContamination - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_121-b15
    12:12:43.080 INFO  CalculateContamination - Start Date/Time: November 15, 2018 12:12:41 PM CET
    12:12:43.081 INFO  CalculateContamination - ------------------------------------------------------------
    12:12:43.081 INFO  CalculateContamination - ------------------------------------------------------------
    12:12:43.081 INFO  CalculateContamination - HTSJDK Version: 2.16.1
    12:12:43.081 INFO  CalculateContamination - Picard Version: 2.18.13
    12:12:43.081 INFO  CalculateContamination - HTSJDK Defaults.COMPRESSION_LEVEL : 2
    12:12:43.081 INFO  CalculateContamination - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    12:12:43.081 INFO  CalculateContamination - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    12:12:43.082 INFO  CalculateContamination - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    12:12:43.082 INFO  CalculateContamination - Deflater: IntelDeflater
    12:12:43.082 INFO  CalculateContamination - Inflater: IntelInflater
    12:12:43.082 INFO  CalculateContamination - GCS max retries/reopens: 20
    12:12:43.082 INFO  CalculateContamination - Requester pays: disabled
    12:12:43.082 INFO  CalculateContamination - Initializing engine
    12:12:43.082 INFO  CalculateContamination - Done initializing engine
    12:12:44.071 INFO  KernelSegmenter - Found 0 changepoints after applying the changepoint penalty.
    12:12:44.139 INFO  KernelSegmenter - Found 0 changepoints after applying the changepoint penalty.
    12:12:44.142 WARN  KernelSegmenter - Specified dimension of the kernel approximation (100) exceeds the number of data points (11) to segment; using all data points to calculate kernel matrix.
    12:12:44.143 WARN  KernelSegmenter - Number of points needed to calculate local changepoint costs (2 * window size = 100) exceeds number of data points (11).  Local changepoint costs will not be calculated for this window size.
    12:12:44.143 WARN  KernelSegmenter - No changepoint candidates were found.  The specified window sizes may be inappropriate, or there may be insufficient data points
    12:12:44.144 INFO  KernelSegmenter - Found 0 changepoints after applying the changepoint penalty.
    12:12:44.185 INFO  KernelSegmenter - Found 0 changepoints after applying the changepoint penalty.
    12:12:44.250 INFO  KernelSegmenter - Found 0 changepoints after applying the changepoint penalty.
    12:12:44.318 INFO  KernelSegmenter - Found 0 changepoints after applying the changepoint penalty.
    12:12:44.321 WARN  KernelSegmenter - Specified dimension of the kernel approximation (100) exceeds the number of data points (59) to segment; using all data points to calculate kernel matrix.
    12:12:44.329 INFO  KernelSegmenter - Found 0 changepoints after applying the changepoint penalty.
    12:12:44.359 INFO  KernelSegmenter - Found 0 changepoints after applying the changepoint penalty.
    12:12:44.399 INFO  KernelSegmenter - Found 3 changepoints after applying the changepoint penalty.
    12:12:44.423 INFO  KernelSegmenter - Found 0 changepoints after applying the changepoint penalty.
    12:12:44.461 INFO  KernelSegmenter - Found 0 changepoints after applying the changepoint penalty.
    12:12:44.488 INFO  KernelSegmenter - Found 0 changepoints after applying the changepoint penalty.
    12:12:44.517 INFO  KernelSegmenter - Found 0 changepoints after applying the changepoint penalty.
    12:12:44.554 INFO  KernelSegmenter - Found 0 changepoints after applying the changepoint penalty.
    12:12:44.556 WARN  KernelSegmenter - Specified dimension of the kernel approximation (100) exceeds the number of data points (82) to segment; using all data points to calculate kernel matrix.
    12:12:44.570 INFO  KernelSegmenter - Found 0 changepoints after applying the changepoint penalty.
    12:12:44.598 INFO  KernelSegmenter - Found 0 changepoints after applying the changepoint penalty.
    12:12:44.619 INFO  KernelSegmenter - Found 0 changepoints after applying the changepoint penalty.
    12:12:44.620 WARN  KernelSegmenter - Specified dimension of the kernel approximation (100) exceeds the number of data points (72) to segment; using all data points to calculate kernel matrix.
    12:12:44.629 INFO  KernelSegmenter - Found 0 changepoints after applying the changepoint penalty.
    12:12:44.659 INFO  KernelSegmenter - Found 0 changepoints after applying the changepoint penalty.
    12:12:44.681 INFO  KernelSegmenter - Found 0 changepoints after applying the changepoint penalty.
    12:12:44.701 INFO  KernelSegmenter - Found 0 changepoints after applying the changepoint penalty.
    12:12:44.726 INFO  KernelSegmenter - Found 0 changepoints after applying the changepoint penalty.
    12:12:44.754 INFO  KernelSegmenter - Found 0 changepoints after applying the changepoint penalty.
    12:12:44.827 INFO  CalculateContamination - In 311 homozygous variant sites we find 60 reference reads due to contamination and 16 due to to sequencing error out of a total 20322 reads.
    12:12:44.827 INFO  CalculateContamination - Based on population data, we would expect 17620 reference reads in a contaminant with equal depths at these sites.
    12:12:44.828 INFO  CalculateContamination - Therefore, we estimate a contamination of 0.003.
    12:12:44.828 INFO  CalculateContamination - The error bars on this estimate are 0.00044.
    12:12:44.864 INFO  CalculateContamination - In 310 homozygous variant sites we find 55 reference reads due to contamination and 16 due to to sequencing error out of a total 20279 reads.
    12:12:44.864 INFO  CalculateContamination - Based on population data, we would expect 17584 reference reads in a contaminant with equal depths at these sites.
    12:12:44.864 INFO  CalculateContamination - Therefore, we estimate a contamination of 0.003.
    12:12:44.864 INFO  CalculateContamination - The error bars on this estimate are 0.00042.
    12:12:44.904 INFO  KernelSegmenter - Found 0 changepoints after applying the changepoint penalty.
    12:12:44.934 INFO  KernelSegmenter - Found 0 changepoints after applying the changepoint penalty.
    12:12:44.935 WARN  KernelSegmenter - Specified dimension of the kernel approximation (100) exceeds the number of data points (9) to segment; using all data points to calculate kernel matrix.
    12:12:44.936 WARN  KernelSegmenter - Number of points needed to calculate local changepoint costs (2 * window size = 100) exceeds number of data points (9).  Local changepoint costs will not be calculated for this window size.
    12:12:44.936 WARN  KernelSegmenter - No changepoint candidates were found.  The specified window sizes may be inappropriate, or there may be insufficient data points
    12:12:44.936 INFO  KernelSegmenter - Found 0 changepoints after applying the changepoint penalty.
    12:12:44.959 INFO  KernelSegmenter - Found 0 changepoints after applying the changepoint penalty.
    12:12:44.986 INFO  KernelSegmenter - Found 0 changepoints after applying the changepoint penalty.
    12:12:45.018 INFO  KernelSegmenter - Found 0 changepoints after applying the changepoint penalty.
    12:12:45.019 WARN  KernelSegmenter - Specified dimension of the kernel approximation (100) exceeds the number of data points (60) to segment; using all data points to calculate kernel matrix.
    12:12:45.026 INFO  KernelSegmenter - Found 0 changepoints after applying the changepoint penalty.
    12:12:45.049 INFO  KernelSegmenter - Found 0 changepoints after applying the changepoint penalty.
    12:12:45.075 INFO  KernelSegmenter - Found 3 changepoints after applying the changepoint penalty.
    12:12:45.095 INFO  KernelSegmenter - Found 0 changepoints after applying the changepoint penalty.
    12:12:45.123 INFO  KernelSegmenter - Found 0 changepoints after applying the changepoint penalty.
    12:12:45.145 INFO  KernelSegmenter - Found 0 changepoints after applying the changepoint penalty.
    12:12:45.169 INFO  KernelSegmenter - Found 0 changepoints after applying the changepoint penalty.
    12:12:45.201 INFO  KernelSegmenter - Found 0 changepoints after applying the changepoint penalty.
    12:12:45.203 WARN  KernelSegmenter - Specified dimension of the kernel approximation (100) exceeds the number of data points (80) to segment; using all data points to calculate kernel matrix.
    12:12:45.213 INFO  KernelSegmenter - Found 0 changepoints after applying the changepoint penalty.
    12:12:45.238 INFO  KernelSegmenter - Found 0 changepoints after applying the changepoint penalty.
    12:12:45.257 INFO  KernelSegmenter - Found 0 changepoints after applying the changepoint penalty.
    12:12:45.257 WARN  KernelSegmenter - Specified dimension of the kernel approximation (100) exceeds the number of data points (71) to segment; using all data points to calculate kernel matrix.
    12:12:45.265 INFO  KernelSegmenter - Found 0 changepoints after applying the changepoint penalty.
    12:12:45.291 INFO  KernelSegmenter - Found 0 changepoints after applying the changepoint penalty.
    12:12:45.310 INFO  KernelSegmenter - Found 0 changepoints after applying the changepoint penalty.
    12:12:45.328 INFO  KernelSegmenter - Found 0 changepoints after applying the changepoint penalty.
    12:12:45.351 INFO  KernelSegmenter - Found 0 changepoints after applying the changepoint penalty.
    12:12:45.376 INFO  KernelSegmenter - Found 0 changepoints after applying the changepoint penalty.
    12:12:45.414 INFO  CalculateContamination - In 288 homozygous variant sites we find 69 reference reads due to contamination and 16 due to to sequencing error out of a total 19792 reads.
    12:12:45.414 INFO  CalculateContamination - Based on population data, we would expect 17188 reference reads in a contaminant with equal depths at these sites.
    12:12:45.425 INFO  CalculateContamination - Therefore, we estimate a contamination of 0.004.
    12:12:45.425 INFO  CalculateContamination - The error bars on this estimate are 0.00048.
    12:12:45.428 INFO  CalculateContamination - Shutting down engine
    [November 15, 2018 12:12:45 PM CET] org.broadinstitute.hellbender.tools.walkers.contamination.CalculateContamination done. Elapsed time: 0.07 minutes.
    Runtime.totalMemory()=1306525696
    Tool returned:
    SUCCESS
    
  • shleeshlee CambridgeMember, Broadie, Moderator admin

    Hi @manolis,

    Can you please describe your problem? It looks like your CalculateContamination run ran fine and you have a contamination of 0.004.

  • manolismanolis Member ✭✭

    Sorry! Is about all the warnings.

    12:12:44.142 WARN KernelSegmenter - Specified dimension of the kernel approximation (100) exceeds the number of data points (11) to segment; using all data points to calculate kernel matrix.
    12:12:44.143 WARN KernelSegmenter - Number of points needed to calculate local changepoint costs (2 * window size = 100) exceeds number of data points (11). Local changepoint costs will not be calculated for this window size.
    12:12:44.143 WARN KernelSegmenter - No changepoint candidates were found. The specified window sizes may be inappropriate, or there may be insufficient data points
    12:12:44.321 WARN KernelSegmenter - Specified dimension of the kernel approximation (100) exceeds the number of data points (59) to segment; using all data points to calculate kernel matrix.
    12:12:44.556 WARN KernelSegmenter - Specified dimension of the kernel approximation (100) exceeds the number of data points (82) to segment; using all data points to calculate kernel matrix.
    e.t.c. ...

    How can I fix them?

    Many thanks!

  • manolismanolis Member ✭✭

    Hi @shlee, as always thank you very much for your time!

Sign In or Register to comment.