Mandatory `--intervals` in GetPileupSummaries 4.0.8.1

darioberdariober Cambridge UKMember
edited August 2018 in Ask the GATK team

Hello - I noticed that in GetPileupSummaries from gatk 4.0.8.1 the argument --intervals has become mandatory. (It was optional in, at least, 4.0.4.0). What is the reason for this change? I cannot see this change in the release notes, any chance this is a mishap during updates?

As always- thanks a lot for the feedback!

Dario

gatk GetPileupSummaries 
Using GATK jar /home/db291g/applications/gatk/gatk-4.0.8.1/gatk-package-4.0.8.1-local.jar
Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /home/db291g/applications/gatk/gatk-4.0.8.1/gatk-package-4.0.8.1-local.jar GetPileupSummaries


**BETA FEATURE - WORK IN PROGRESS**

USAGE: GetPileupSummaries [arguments]

Tabulates pileup metrics for inferring contamination
Version:4.0.8.1


Required Arguments:

--input,-I:String             BAM/SAM/CRAM file containing reads  This argument must be specified at least once.
                              Required. 

--intervals,-L:String         One or more genomic intervals over which to operate  This argument must be specified at
                              least once. Required. 

--output,-O:File              The output table  Required. 

--variant,-V:FeatureInput     A VCF file containing variants and allele frequencies  Required. 

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @dariober
    Hi Dario,

    Have you tried running the tool without --intervals? Does it not run? This may very well be a mistake, but I need to check with the team.

    -Sheila

  • darioberdariober Cambridge UKMember

    Hi Sheila-

    Yes, it seems to me the --intervals option is mandatory:

    gatk GetPileupSummaries -I WW00285a.bam -O tmp.out -V gnomad.genomes.r2.0.2.sites.hg38.simple.vcf.gz
    
    Using GATK jar /home/db291g/applications/gatk/gatk-4.0.8.1/gatk-package-4.0.8.1-local.jar
    Running:
        java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /home/db291g/applications/gatk/gatk-4.0.8.1/gatk-package-4.0.8.1-local.jar GetPileupSummaries -I medusa/WW00285a/bwa/WW00285a.bam -O tmp.out -V /scratch/dberaldi/ref/GRCh38/gnomad.genomes.r2.0.2.sites.hg38.simple.vcf.gz
    
    
    **BETA FEATURE - WORK IN PROGRESS**
    
    USAGE: GetPileupSummaries [arguments]
    
    Tabulates pileup metrics for inferring contamination
    Version:4.0.8.1
    
    
    Required Arguments:
    
    --input,-I:String             BAM/SAM/CRAM file containing reads  This argument must be specified at least once.
                                  Required. 
    
    --intervals,-L:String         One or more genomic intervals over which to operate  This argument must be specified at
                                  least once. Required. 
    
    --output,-O:File              The output table  Required. 
    
    --variant,-V:FeatureInput     A VCF file containing variants and allele frequencies  Required. 
    
    
    Optional Arguments:
    
    --add-output-sam-program-record,-add-output-sam-program-record:Boolean
                                  If true, adds a PG tag to created SAM/BAM/CRAM files.  Default value: true. Possible
                                  values: {true, false} 
    
    --add-output-vcf-command-line,-add-output-vcf-command-line:Boolean
                                  If true, adds a command line header line to created VCF files.  Default value: true.
                                  Possible values: {true, false} 
    
    --arguments_file:File         read one or more arguments files and add them to the command line  This argument may be
                                  specified 0 or more times. Default value: null. 
    
    --cloud-index-prefetch-buffer,-CIPB:Integer
                                  Size of the cloud-only prefetch buffer (in MB; 0 to disable). Defaults to
                                  cloudPrefetchBuffer if unset.  Default value: -1. 
    
    --cloud-prefetch-buffer,-CPB:Integer
                                  Size of the cloud-only prefetch buffer (in MB; 0 to disable).  Default value: 40. 
    
    --create-output-bam-index,-OBI:Boolean
                                  If true, create a BAM/CRAM index when writing a coordinate-sorted BAM/CRAM file.  Default
                                  value: true. Possible values: {true, false} 
    
    --create-output-bam-md5,-OBM:Boolean
                                  If true, create a MD5 digest for any BAM/SAM/CRAM file created  Default value: false.
                                  Possible values: {true, false} 
    
    --create-output-variant-index,-OVI:Boolean
                                  If true, create a VCF index when writing a coordinate-sorted VCF file.  Default value:
                                  true. Possible values: {true, false} 
    
    --create-output-variant-md5,-OVM:Boolean
                                  If true, create a a MD5 digest any VCF file created.  Default value: false. Possible
                                  values: {true, false} 
    
    --disable-bam-index-caching,-DBIC:Boolean
                                  If true, don't cache bam indexes, this will reduce memory requirements but may harm
                                  performance if many intervals are specified.  Caching is automatically disabled if there
                                  are no intervals specified.  Default value: false. Possible values: {true, false} 
    
    --disable-read-filter,-DF:String
                                  Read filters to be disabled before analysis  This argument may be specified 0 or more
                                  times. Default value: null. Possible Values: {GoodCigarReadFilter, MappedReadFilter,
                                  MappingQualityAvailableReadFilter, MappingQualityNotZeroReadFilter,
                                  MateOnSameContigOrNoMappedMateReadFilter, NonZeroReferenceLengthAlignmentReadFilter,
                                  NotDuplicateReadFilter, PassesVendorQualityCheckReadFilter, PrimaryLineReadFilter,
                                  WellformedReadFilter}
    
    --disable-sequence-dictionary-validation,-disable-sequence-dictionary-validation:Boolean
                                  If specified, do not check the sequence dictionaries from our inputs for compatibility.
                                  Use at your own risk!  Default value: false. Possible values: {true, false} 
    
    --exclude-intervals,-XL:StringOne or more genomic intervals to exclude from processing  This argument may be specified 0
                                  or more times. Default value: null. 
    
    --gatk-config-file:String     A configuration file to use with the GATK.  Default value: null. 
    
    --gcs-max-retries,-gcs-retries:Integer
                                  If the GCS bucket channel errors out, how many times it will attempt to re-initiate the
                                  connection  Default value: 20. 
    
    --help,-h:Boolean             display the help message  Default value: false. Possible values: {true, false} 
    
    --interval-exclusion-padding,-ixp:Integer
                                  Amount of padding (in bp) to add to each interval you are excluding.  Default value: 0. 
    
    --interval-merging-rule,-imr:IntervalMergingRule
                                  Interval merging rule for abutting intervals  Default value: ALL. Possible values: {ALL,
                                  OVERLAPPING_ONLY} 
    
    --interval-padding,-ip:IntegerAmount of padding (in bp) to add to each interval you are including.  Default value: 0. 
    
    --interval-set-rule,-isr:IntervalSetRule
                                  Set merging approach to use for combining interval inputs  Default value: UNION. Possible
                                  values: {UNION, INTERSECTION} 
    
    --lenient,-LE:Boolean         Lenient processing of VCF files  Default value: false. Possible values: {true, false} 
    
    --maxDepthPerSample,-maxDepthPerSample:Integer
                                  Maximum number of reads to retain per sample per locus. Reads above this threshold will be
                                  downsampled. Set to 0 to disable.  Default value: 0. 
    
    --maximum-population-allele-frequency,-max-af:Double
                                  Maximum population allele frequency of sites to consider.  Default value: 0.2. 
    
    --min-mapping-quality,-mmq:Integer
                                  Minimum read mapping quality  Default value: 50. 
    
    --minimum-population-allele-frequency,-min-af:Double
                                  Minimum population allele frequency of sites to consider.  A low value increases accuracy
                                  at the expense of speed.  Default value: 0.01. 
    
    --QUIET:Boolean               Whether to suppress job-summary info on System.err.  Default value: false. Possible
                                  values: {true, false} 
    
    --read-filter,-RF:String      Read filters to be applied before analysis  This argument may be specified 0 or more
                                  times. Default value: null. Possible Values: {AlignmentAgreesWithHeaderReadFilter,
                                  AllowAllReadsReadFilter, AmbiguousBaseReadFilter, CigarContainsNoNOperator,
                                  FirstOfPairReadFilter, FragmentLengthReadFilter, GoodCigarReadFilter,
                                  HasReadGroupReadFilter, LibraryReadFilter, MappedReadFilter,
                                  MappingQualityAvailableReadFilter, MappingQualityNotZeroReadFilter,
                                  MappingQualityReadFilter, MatchingBasesAndQualsReadFilter, MateDifferentStrandReadFilter,
                                  MateOnSameContigOrNoMappedMateReadFilter, MetricsReadFilter,
                                  NonZeroFragmentLengthReadFilter, NonZeroReferenceLengthAlignmentReadFilter,
                                  NotDuplicateReadFilter, NotOpticalDuplicateReadFilter, NotSecondaryAlignmentReadFilter,
                                  NotSupplementaryAlignmentReadFilter, OverclippedReadFilter, PairedReadFilter,
                                  PassesVendorQualityCheckReadFilter, PlatformReadFilter, PlatformUnitReadFilter,
                                  PrimaryLineReadFilter, ProperlyPairedReadFilter, ReadGroupBlackListReadFilter,
                                  ReadGroupReadFilter, ReadLengthEqualsCigarLengthReadFilter, ReadLengthReadFilter,
                                  ReadNameReadFilter, ReadStrandFilter, SampleReadFilter, SecondOfPairReadFilter,
                                  SeqIsStoredReadFilter, ValidAlignmentEndReadFilter, ValidAlignmentStartReadFilter,
                                  WellformedReadFilter}
    
    --read-index,-read-index:String
                                  Indices to use for the read inputs. If specified, an index must be provided for every read
                                  input and in the same order as the read inputs. If this argument is not specified, the
                                  path to the index for each input will be inferred automatically.  This argument may be
                                  specified 0 or more times. Default value: null. 
    
    --read-validation-stringency,-VS:ValidationStringency
                                  Validation stringency for all SAM/BAM/CRAM/SRA files read by this program.  The default
                                  stringency value SILENT can improve performance when processing a BAM file in which
                                  variable-length data (read, qualities, tags) do not otherwise need to be decoded.  Default
                                  value: SILENT. Possible values: {STRICT, LENIENT, SILENT} 
    
    --reference,-R:String         Reference sequence  Default value: null. 
    
    --seconds-between-progress-updates,-seconds-between-progress-updates:Double
                                  Output traversal statistics every time this many seconds elapse  Default value: 10.0. 
    
    --sequence-dictionary,-sequence-dictionary:String
                                  Use the given sequence dictionary as the master/canonical sequence dictionary.  Must be a
                                  .dict file.  Default value: null. 
    
    --sites-only-vcf-output:Boolean
                                  If true, don't emit genotype fields when writing vcf file output.  Default value: false.
                                  Possible values: {true, false} 
    
    --TMP_DIR:File                Undocumented option  This argument may be specified 0 or more times. Default value: null. 
    
    --use-jdk-deflater,-jdk-deflater:Boolean
                                  Whether to use the JdkDeflater (as opposed to IntelDeflater)  Default value: false.
                                  Possible values: {true, false} 
    
    --use-jdk-inflater,-jdk-inflater:Boolean
                                  Whether to use the JdkInflater (as opposed to IntelInflater)  Default value: false.
                                  Possible values: {true, false} 
    
    --verbosity,-verbosity:LogLevel
                                  Control verbosity of logging.  Default value: INFO. Possible values: {ERROR, WARNING,
                                  INFO, DEBUG} 
    
    --version:Boolean             display the version number for this tool  Default value: false. Possible values: {true,
                                  false} 
    
    
    Advanced Arguments:
    
    --disable-tool-default-read-filters,-disable-tool-default-read-filters:Boolean
                                  Disable all tool default read filters (WARNING: many tools will not function correctly
                                  without their default read filters on)  Default value: false. Possible values: {true,
                                  false} 
    
    --showHidden,-showHidden:Boolean
                                  display hidden arguments  Default value: false. Possible values: {true, false} 
    
    
    ***********************************************************************
    
    A USER ERROR has occurred: Argument intervals was missing: Argument 'intervals' is required.
    
    ***********************************************************************
    Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace.
    
  • Tiffany_at_BroadTiffany_at_Broad Cambridge, MAMember, Broadie, Moderator admin
    edited September 2018

    Hi @dariober I ran into this as well. By passing the vcf file into the --intervals or -L, the tool completed. I raised a ticket so that the documentation in the tool gets updated. It says --intervals needs a string, but I passed a file.

    For example:
    gatk GetPileupSummaries \
    -I bams/tumor.bam \
    -V resources/chr17_small_exac_common_3_grch38.vcf.gz \
    -L resources/chr17_small_exac_common_3_grch38.vcf.gz \
    -O sandbox/7_tumor_getpileupsummaries.table

  • FPBarthelFPBarthel HoustonMember

    I noticed this change as well, it says here that -L and -V can be identical?

  • bhanuGandhambhanuGandham Member, Administrator, Broadie, Moderator admin

    Hi @FPBarthel

    I am sorry but i didn't quite catch the question you are asking.
    But here is an explanation that might help:

    Although the sites (-L) and variants (-V) resources will often be identical, this need not be the case. For example,
    gatk GetPileupSummaries \
    -I normal.bam \
    -V gnomad.vcf.gz \
    -L common_snps.interval_list \
    -O pileups.table

    attempts to get pileups at a list of common snps and emits output for those sites that are present in gnomAD, using the allele frequencies from gnomAD. Note that the sites may be a subset of the variants, the variants may be a subset of the sites, or they may overlap partially. In all cases pileup summaries are emitted for the overlap and nowhere else. The most common use case in which sites and variants differ is when the variants resources is a large file and the sites is an interval list subset from that file.

    I hope this helps.

    Regards
    Bhanu

Sign In or Register to comment.