We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

ASEReadcounter error

Hi, I am having a similar problem as in this thread
I am runnung ASEReadcounter on RNA-seq data and I get this error

  • gatk ASEReadCounter -I /mnt/beegfs/Steph_WKDIR/1XXXXXXX_Single.bam -V filtered_Phased_1831.vcf.gz -R /mnt/XXXXXXs/Genomes/genome_hg19/hg19.fa -O ASE_1831.csv
    14:56:58.684 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/mnt/XXXXXXX/tools/gatk-4.1.3.0/gatk-package-4.1.3.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
    Oct 25, 2019 2:57:00 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
    INFO: Failed to detect whether we are running on Google Compute Engine.
    14:57:00.552 INFO ASEReadCounter - ------------------------------------------------------------
    14:57:00.553 INFO ASEReadCounter - The Genome Analysis Toolkit (GATK) v4.1.3.0
    14:57:00.554 INFO ASEReadCounter - For support and documentation go to https://software.broadinstitute.org/gatk/
    14:57:00.555 INFO ASEReadCounter - Executing as [email protected] on Linux v3.10.0-514.el7.x86_64 amd64
    14:57:00.556 INFO ASEReadCounter - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_102-b14
    14:57:00.557 INFO ASEReadCounter - Start Date/Time: October 25, 2019 2:56:58 PM PDT
    14:57:00.558 INFO ASEReadCounter - ------------------------------------------------------------
    14:57:00.559 INFO ASEReadCounter - ------------------------------------------------------------
    14:57:00.560 INFO ASEReadCounter - HTSJDK Version: 2.20.1
    14:57:00.561 INFO ASEReadCounter - Picard Version: 2.20.5
    14:57:00.562 INFO ASEReadCounter - HTSJDK Defaults.COMPRESSION_LEVEL : 2
    14:57:00.563 INFO ASEReadCounter - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    14:57:00.563 INFO ASEReadCounter - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    14:57:00.564 INFO ASEReadCounter - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    14:57:00.565 INFO ASEReadCounter - Deflater: IntelDeflater
    14:57:00.566 INFO ASEReadCounter - Inflater: IntelInflater
    14:57:00.570 INFO ASEReadCounter - GCS max retries/reopens: 20
    14:57:00.571 INFO ASEReadCounter - Requester pays: disabled
    14:57:00.571 INFO ASEReadCounter - Initializing engine
    WARNING: BAM index file /mnt/XXXXE_Single.bai is older than BAM /mnt/XXXXX_WKDIR/1831_CD4_NAIVE_Single.bam
    14:57:01.356 INFO FeatureManager - Using codec VCFCodec to read file file:///mnt/XXXXX/filtered_Phased_1831.vcf.gz
    14:57:01.521 INFO ASEReadCounter - Done initializing engine
    14:57:01.523 INFO ProgressMeter - Starting traversal
    14:57:01.524 INFO ProgressMeter - Current Locus Elapsed Minutes Loci Processed Loci/Minute
    14:57:06.958 INFO ASEReadCounter - Shutting down engine
    [October 25, 2019 2:57:06 PM PDT] org.broadinstitute.hellbender.tools.walkers.rnaseq.ASEReadCounter done. Elapsed time: 0.14 minutes.
    Runtime.totalMemory()=3250061312

A USER ERROR has occurred: More then one variant context at position: chr1:11125729


Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace.
Using GATK jar /......and so on. Please do not pay attention to the paths :)

1- I am sure that my vcf does not have duplicate variants. I used SelectVariants to remove multiallelic variants and awk to remove duplicated variants by location. For example, this is my output for this location
$ zcat myvcf.vcf.gz |grep 11125729
chr1 11125729 rs2039841:11125729:T:C T C . PASS . GT 0|1

2- I get ASE output up to this position, so that might rule out formatting?
In fact, this error came up in a different position
(chr1 9355278 rs4080311:9355278:T:A T A . PASS . GT 0|1);
which I removed and reran on the new vcf. This lead to more output and stopped on this locus. I have whole human genomes, so it is not practical to manually remove loci.
Please help. Any input is welcome
Thanks

Tagged:

Best Answers

  • Steph_UCSteph_UC
    edited October 2019 Accepted Answer

    Thanks for your feedback bhanuGandham.
    Update: here is an overview of the solution:
    Removing duplicate by position was successful but the reason is is not successful is this location ovelap an indel at the same location.
    gatk SelectVariants -R XXX/hg19/hg19.fa -V XXX_hg19.recode.vcf.gz -sn Mysample -L chr1:11125729 -O XXX/chr1_11125729.vcf

    CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Mysample
    chr1 11125727 rs138698709:11125727:TTTAG:T TTTAG T . PASS AC=1;AF=0.500;AN=2 GT 0|1
    chr1 11125729 rs2039841:11125729:T:C T C . PASS AC=1;AF=0.500;AN=2 GT 0|1

    I am using option -select-type SNP of SelectVariants on the whole dataset before ASEReadcounter and will update if that works

Answers

  • Steph_UCSteph_UC Member
    edited October 2019 Accepted Answer

    Thanks for your feedback bhanuGandham.
    Update: here is an overview of the solution:
    Removing duplicate by position was successful but the reason is is not successful is this location ovelap an indel at the same location.
    gatk SelectVariants -R XXX/hg19/hg19.fa -V XXX_hg19.recode.vcf.gz -sn Mysample -L chr1:11125729 -O XXX/chr1_11125729.vcf

    CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Mysample
    chr1 11125727 rs138698709:11125727:TTTAG:T TTTAG T . PASS AC=1;AF=0.500;AN=2 GT 0|1
    chr1 11125729 rs2039841:11125729:T:C T C . PASS AC=1;AF=0.500;AN=2 GT 0|1

    I am using option -select-type SNP of SelectVariants on the whole dataset before ASEReadcounter and will update if that works

Sign In or Register to comment.