Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Not getting any output from GetPileupSummaries

Hi,

I am getting GetPileupSummaries to complete seemingly without error. But, the output table has no records; it only has a header column. I have tried different steps to resolved, but I have not yet figured out the problem. Below I show the content of the output file, the content of the error log file, and the first few rows of the input vcf file. The input bam is an exome-sequencing bam. I checked and found that over 29,000 sites of the input file are in the capture kit region. So, I think there should be many records that are output. Please let me know if you have any suggestions to get this to work. Also, if you don't have any suggestions on how to make this work, could you please suggest an alternative approach for creating the input tables for CalculateContamination? Thanks.

$ cat ../output/GetPileupSummaries_sample.table

SAMPLE=sample

contig position ref_count alt_count other_alt_count allele_frequency
[[email protected] output]$ $ cat ../logs/GetPileupSummaries_sample.e1715758
01:51:01.533 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/research/gatk-4.1.1.0/gatk-package-4.1.1.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Apr 15, 2019 1:51:03 AM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
01:51:03.228 INFO GetPileupSummaries - ------------------------------------------------------------
01:51:03.228 INFO GetPileupSummaries - The Genome Analysis Toolkit (GATK) v4.1.1.0
01:51:03.228 INFO GetPileupSummaries - For support and documentation go to https://software.broadinstitute.org/gatk/
01:51:03.229 INFO GetPileupSummaries - Executing as [email protected] on Linux v2.6.32-754.6.3.el6.x86_64 amd64
01:51:03.229 INFO GetPileupSummaries - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_111-b14
01:51:03.229 INFO GetPileupSummaries - Start Date/Time: April 15, 2019 1:51:01 AM CDT
01:51:03.229 INFO GetPileupSummaries - ------------------------------------------------------------
01:51:03.229 INFO GetPileupSummaries - ------------------------------------------------------------
01:51:03.229 INFO GetPileupSummaries - HTSJDK Version: 2.19.0
01:51:03.229 INFO GetPileupSummaries - Picard Version: 2.19.0
01:51:03.229 INFO GetPileupSummaries - HTSJDK Defaults.COMPRESSION_LEVEL : 2
01:51:03.230 INFO GetPileupSummaries - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
01:51:03.230 INFO GetPileupSummaries - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
01:51:03.230 INFO GetPileupSummaries - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
01:51:03.230 INFO GetPileupSummaries - Deflater: IntelDeflater
01:51:03.230 INFO GetPileupSummaries - Inflater: IntelInflater
01:51:03.230 INFO GetPileupSummaries - GCS max retries/reopens: 20
01:51:03.230 INFO GetPileupSummaries - Requester pays: disabled
01:51:03.230 WARN GetPileupSummaries -

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

Warning: GetPileupSummaries is a BETA tool and is not yet ready for use in production

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

01:51:03.230 INFO GetPileupSummaries - Initializing engine
01:51:03.532 INFO FeatureManager - Using codec VCFCodec to read file file:///research/GATKresources/gatk-test-data__wgs_ubam__HCC1143T/af-only-gnomad.hg38.common.bialleliconly.canonicalonly.AFonly.vcf.gz
01:51:03.628 INFO FeatureManager - Using codec VCFCodec to read file file:///research/GATKresources/gatk-test-data__wgs_ubam__HCC1143T/af-only-gnomad.hg38.common.bialleliconly.canonicalonly.AFonly.vcf.gz
01:51:09.675 INFO IntervalArgumentCollection - Processing 3893864 bp from intervals
01:51:09.815 INFO GetPileupSummaries - Done initializing engine
01:51:09.815 INFO ProgressMeter - Starting traversal
01:51:09.816 INFO ProgressMeter - Current Locus Elapsed Minutes Loci Processed Loci/Minute
01:51:25.428 INFO ProgressMeter - chr1:6440859 0.3 1000 3843.2
01:51:35.668 INFO ProgressMeter - chr1:143476552 0.4 11000 25529.9
01:51:46.273 INFO ProgressMeter - chr1:216684973 0.6 19000 31269.7
01:51:56.664 INFO ProgressMeter - chr2:111011916 0.8 31000 39702.9
01:52:08.579 INFO ProgressMeter - chr3:52786965 1.0 44000 44926.2
01:52:18.730 INFO ProgressMeter - chr4:53788564 1.1 57000 49627.1
01:52:30.410 INFO ProgressMeter - chr5:148065222 1.3 73000 54346.5
01:52:41.138 INFO ProgressMeter - chr6:118615131 1.5 86000 56504.0
01:52:51.745 INFO ProgressMeter - chr7:99050742 1.7 98000 57687.2
01:53:01.813 INFO ProgressMeter - chr8:102360515 1.9 109000 58394.4
01:53:12.806 INFO ProgressMeter - chr10:3339720 2.0 122000 59517.0
01:53:22.994 INFO ProgressMeter - chr11:5425116 2.2 134000 60370.3
01:53:33.642 INFO ProgressMeter - chr12:8242332 2.4 146000 60907.4
01:53:44.178 INFO ProgressMeter - chr13:27900338 2.6 158000 61414.1
01:53:54.586 INFO ProgressMeter - chr15:34863204 2.7 171000 62268.6
01:54:05.829 INFO ProgressMeter - chr16:24964104 2.9 181000 61700.3
01:54:17.241 INFO ProgressMeter - chr17:39213163 3.1 191000 61144.8
01:54:27.416 INFO ProgressMeter - chr19:2408074 3.3 202000 61336.3
01:54:37.846 INFO ProgressMeter - chr19:52439481 3.5 211000 60856.9
01:54:48.734 INFO ProgressMeter - chr22:23928590 3.6 223000 61118.8
01:54:57.895 INFO GetPileupSummaries - 4313566 read(s) filtered by: (((((((((MappingQualityAvailableReadFilter AND MappingQualityNotZeroReadFilter) AND MappedReadFilter) AND PrimaryLineReadFilter) AND NotDuplicateReadFilter) AND PassesVendorQualityCheckReadFilter) AND NonZeroReferenceLengthAlignmentReadFilter) AND MateOnSameContigOrNoMappedMateReadFilter) AND GoodCigarReadFilter) AND WellformedReadFilter)
4313566 read(s) filtered by: ((((((((MappingQualityAvailableReadFilter AND MappingQualityNotZeroReadFilter) AND MappedReadFilter) AND PrimaryLineReadFilter) AND NotDuplicateReadFilter) AND PassesVendorQualityCheckReadFilter) AND NonZeroReferenceLengthAlignmentReadFilter) AND MateOnSameContigOrNoMappedMateReadFilter) AND GoodCigarReadFilter)
4313566 read(s) filtered by: (((((((MappingQualityAvailableReadFilter AND MappingQualityNotZeroReadFilter) AND MappedReadFilter) AND PrimaryLineReadFilter) AND NotDuplicateReadFilter) AND PassesVendorQualityCheckReadFilter) AND NonZeroReferenceLengthAlignmentReadFilter) AND MateOnSameContigOrNoMappedMateReadFilter)
4296314 read(s) filtered by: ((((((MappingQualityAvailableReadFilter AND MappingQualityNotZeroReadFilter) AND MappedReadFilter) AND PrimaryLineReadFilter) AND NotDuplicateReadFilter) AND PassesVendorQualityCheckReadFilter) AND NonZeroReferenceLengthAlignmentReadFilter)
4296314 read(s) filtered by: (((((MappingQualityAvailableReadFilter AND MappingQualityNotZeroReadFilter) AND MappedReadFilter) AND PrimaryLineReadFilter) AND NotDuplicateReadFilter) AND PassesVendorQualityCheckReadFilter)
4296314 read(s) filtered by: ((((MappingQualityAvailableReadFilter AND MappingQualityNotZeroReadFilter) AND MappedReadFilter) AND PrimaryLineReadFilter) AND NotDuplicateReadFilter)
592640 read(s) filtered by: (((MappingQualityAvailableReadFilter AND MappingQualityNotZeroReadFilter) AND MappedReadFilter) AND PrimaryLineReadFilter)
583953 read(s) filtered by: ((MappingQualityAvailableReadFilter AND MappingQualityNotZeroReadFilter) AND MappedReadFilter)
583953 read(s) filtered by: (MappingQualityAvailableReadFilter AND MappingQualityNotZeroReadFilter)
583953 read(s) filtered by: MappingQualityNotZeroReadFilter
8687 read(s) filtered by: PrimaryLineReadFilter
3703674 read(s) filtered by: NotDuplicateReadFilter
17252 read(s) filtered by: MateOnSameContigOrNoMappedMateReadFilter

01:54:57.895 INFO ProgressMeter - chrX:143216908 3.8 231568 60917.8
01:54:57.895 INFO ProgressMeter - Traversal complete. Processed 231568 total loci in 3.8 minutes.
01:54:57.904 INFO GetPileupSummaries - Shutting down engine
[April 15, 2019 1:54:57 AM CDT] org.broadinstitute.hellbender.tools.walkers.contamination.GetPileupSummaries done. Elapsed time: 3.94 minutes.
Runtime.totalMemory()=3257925632
[[email protected] output]$ zcat /research/GATKresources/gatk-test-data__wgs_ubam__HCC1143T/af-only-gnomad.hg38.common.bialleliconly.canonicalonly.AFonly.vcf.gz | grep -v ^## | head -n 20

CHROM POS ID REF ALT QUAL FILTER INFO

chr1 10583 rs58108140 G A 1052610 InbreedingCoeff AF=0.229
chr1 12783 . G A 10065800 InbreedingCoeff AF=0.556
chr1 13116 rs201725126 T G 21488400 InbreedingCoeff AF=0.532
chr1 13118 rs200579949 A G 21440300 InbreedingCoeff AF=0.531
chr1 13868 . A G 1610370 PASS AF=0.204
chr1 13896 rs201696125 C A 1115790 PASS AF=0.207
chr1 14464 . A T 3542820 PASS AF=0.204
chr1 14653 rs375086259 C T 836949 InbreedingCoeff AF=0.252
chr1 14699 rs372910670 C G 2774380 InbreedingCoeff AF=0.373
chr1 14907 rs79585140 A G 23517400 InbreedingCoeff AF=0.497
chr1 14930 rs75454623 A G 23449200 InbreedingCoeff AF=0.495
chr1 15118 rs71252250 A G 10335000 InbreedingCoeff AF=0.442
chr1 15190 rs200030104 G A 2612030 InbreedingCoeff AF=0.287
chr1 15688 . C T 1120870 InbreedingCoeff AF=0.223
chr1 16068 rs372319358 T C 2962330 InbreedingCoeff AF=0.466
chr1 16103 rs200358166 T G 7153330 InbreedingCoeff AF=0.535
chr1 16288 rs200736374 C G 1007750 InbreedingCoeff AF=0.278
chr1 16298 rs200451305 C T 8569780 InbreedingCoeff AF=0.538
chr1 16378 rs148220436 T C 27199800 InbreedingCoeff AF=0.535
[[email protected] output]$

Answers

  • dbeckerdbecker MunichMember ✭✭✭

    Hi,

    GetPileupsSummaries needs a bam file, not a vcf. I do it like this:

    gatk GetPileupSummaries \
                -V af_from_gnomad.vcf \
                -I sample.bam \
                -O sample_pileup.table
    

    Best,
    Daniel

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Thank you @dbecker for your input.

  • RishabhRishabh IndiaMember
    @dbecker
    hello
    can u please help me out as to how you generated (af_from_gnomad.vcf) file
    thankyou
  • vptvpt Member

    @dbecker said:
    Hi,

    GetPileupsSummaries needs a bam file, not a vcf. I do it like this:

    gatk GetPileupSummaries \
                -V af_from_gnomad.vcf \
                -I sample.bam \
                -O sample_pileup.table
    

    Best,
    Daniel

    I did have a bam file input. This should be obvious. Please note the extract of the log file that shows the number of reads removed by various filters such as this line:

    "01:54:57.895 INFO GetPileupSummaries - 4313566 read(s) filtered by: (((((((((MappingQualityAvailableReadFilter AND MappingQualityNotZeroReadFilter) AND MappedReadFilter) AND PrimaryLineReadFilter) AND NotDuplicateReadFilter) AND PassesVendorQualityCheckReadFilter) AND NonZeroReferenceLengthAlignmentReadFilter) AND MateOnSameContigOrNoMappedMateReadFilter) AND GoodCigarReadFilter) AND WellformedReadFilter)"

    Such filtering could not occur for a vcf.

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @vpt

    Please post the version of GATK and the exact command you used to help us debug this issue.

Sign In or Register to comment.