Holiday Notice:
The Frontline Support team will be slow to respond December 17-18 due to an institute-wide retreat and offline December 22- January 1, while the institute is closed. Thank you for your patience during these next few weeks. Happy Holidays!

about ASEReadCounter

BogdanBogdan Palo Alto, CAMember ✭✭

Dear all,

i am using ASEReadCounter in order to count the number of reads per variant in a BAM file, and somehow related to a previous post (below), I am encountering a similar error :

"MESSAGE: More then one variant context at position: chr19:125517"

i.e. in the vcf file, there are 2 entries for the same position :

chr19 125517 . A G 42.01 . AC1=1;AF1=0.5;BQB=0.950129;DP=43;DP4=12,7,4,1;FQ=45.0154;MQ=46;MQ0F=0;MQB=0.984335;MQSB=0.998127;PV4=0.631094,1,1,1;RPB=1;SGB=-0.590765;VDB=0.233642 GT:PL 0/1:72,0,255

chr19 125517 . AA AAGAGA 5.79 . AC1=1;AF1=0.499984;DP=43;DP4=7,5,5,0;FQ=8.19012;IDV=2;IMF=0.0444444;INDEL;MQ=45;MQ0F=0;MQSB=0.99446;PV4=0.244505,1,0.0559047,0.273348;SGB=-0.590765;VDB=0.125771 GT:PL 0/1:42,0,151

The question would be : is there any way in GATK to remove these sites ? Of course, i could do it with a simple script outside GATK, although doing it outside GATK may complicate a bit the pipeline. Thank you very much !

-- bogdan

ps : the previous post was :
http://gatkforums.broadinstitute.org/gatk/discussion/comment/30752#Comment_30752

Best Answer

Answers

  • BogdanBogdan Palo Alto, CAMember ✭✭

    Oh, well, ;ve done it with the options:

    -selectType SNP \
    -restrictAllelesTo BIALLELIC

    and the full command is :smile:

    $GATK \
    -T SelectVariants \
    -R $REFERENCE_HG38 \
    -select 'vc.getGenotype("7G").isHet()' \
    -V 7G.GRCh38p5M.MD_samtools.germline.HETERO.chr19.vcf \
    -o 7G.GRCh38p5M.MD_samtools.germline.HETERO.chr19.vcf.HET.SNP.BIALLELIC.vcf \
    -selectType SNP \
    -restrictAllelesTo BIALLELIC

  • BogdanBogdan Palo Alto, CAMember ✭✭

    Good morning. if i may add another question, please : would it be perfectly legitimate to ASEReadCounter in order to compute the allele counts in DNA (and WGS), beside RNA-seq. I know that ASEReadCounter has been developed originally for RNA-seq allele analysis.

    thank you very much !

  • BogdanBogdan Palo Alto, CAMember ✭✭

    Dear Sheila, thank you, and have a good weekend !

  • melnueschmelnuesch UruguayMember
    edited May 4

    Hello all, I also got that error message plus a lot of warning messages that say "Ignoring site: variant is not het at position: chr1:(numbers)". I googled the message but I did not find anything about this particular one. Any thoughts? I am doing allele specific expression analysis in Illumina RNA-seq data from two NK-cell subtypes cell lines (so, two samples, in separated vcfs.. I already did variant calling). I am using the vcf produced in the variant calling (HaplotypeCaller), and the input bam is the one produced after the BaseRecalibrator step (following the GATK best practices pipeline for variant calling in RNA-seq; basically, I followed that pipeline, for you to have an idea of what I did to my data). My GATK version is v4.0.3.0. Thank you very much!

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @melnuesch
    Hi,

    ASEReadCounter ignores any sites that are not het, as it is not useful to output allele specific expression in homozygous sites.

    -Sheila

  • vsvintivsvinti Member ✭✭

    Hi @Sheila

    I'm posting my question here since the thread title relates very generally to ASEReadCounter.

    I would like to run ASEReadCounter on a set of genotypes I have from genotyping arrays (in addition to the vcf calls generated from the rnaseq data itself).

    I have bgens with imputed genotypes (and also dosages), which I can convert into vcf-like format, with the exception that the alleles are not REF and ALT, but they mapped to alleleA and alleleB on the reference panels. I would prefer to get counts for the alleles in this format - as it is consistent with our gwas results.

    Does ASEReadCounter require the alleles to be REF and ALT, in line with vcf specification, or can it calculate counts for whatever alleles I give it?

    Thanks
    Vicky

  • bhanuGandhambhanuGandham Member, Administrator, Broadie, Moderator admin

    HI @vsvinti

    ASEReadCounter requires the alleles to be REF and ALT, in line with vcf specification. The output headers look like this: contig position variantID refAllele altAllele refCount altCount totalCount lowMAPQDepth lowBaseQDepth rawDepth otherBases improperPairs

    Hope this helps.

    Regards
    Bhanu

Sign In or Register to comment.