The frontline support team will be slow on the forum because we are occupied with the GATK Workshop on March 21st and 22nd 2019. We will be back and more available to answer questions on the forum on March 25th 2019.

Questions about the RNAseq variant discovery workflow



  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin


    Have a look at this article.


  • YogeshYogesh south koreaMember

    Hi ,

    I am working on SNP detection on multiple pair end RNAseq library:

    after indel realignment, I want perform base quality score recalibration by using these two steps, I do not have any information about known SNP,INDEL.VCF. If I do not provide this option is it ok?

    java -jar GenomeAnalysisTK.jar \
    -T BaseRecalibrator \
    -R refgenome.fasta\
    -knownSites known_snps_indels.vcf \
    -I sample1.sorted.dedup.realigned.fixmate.bam \
    -o sample1.sorted.dedup.realigned.fixmate.recal_data.table \
    -cov ReadGroupCovariate \
    -cov QualityScoreCovariate \
    -cov CycleCovariate

    java -jar GenomeAnalysisTK.jar \
    -T PrintReads \
    -R refgenome.fasta \
    -BQSR sample1.sorted.dedup.realigned.fixmate.recal_data.table \
    -I sample1.sorted.dedup.realigned.fixmate.bam \
    -o sample1.sorted.dedup.realigned.fixmate.recal.bam

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin


    You should not use BQSR if you do not have known variant sites. This is because BQSR considers all differences in your reads from the reference to be errors. If those differences are indeed common variants in the population, they will be seen as errors and penalized. This will cause the scores to be lowered for no reason. If you input a known sites file, those sites will be masked and not considered as errors.

    If you do not have a known sites file, you can try bootstrapping. Have a look under "I'm working on a genome that doesn't really have a good SNP database yet. I'm wondering if it still makes sense to run base quality score recalibration without known SNPs." here.


  • claushclaush BaselMember

    Many thanks for this impressive workflow.
    May I kindly ask if testing on how the tools perform on RNAseq for multiple samples have continued and what is the outcome?
    Looking forward to your reply.
    Kind regards,

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    Hi Claus,

    I think testing halted for a while, but we plan to publish new workflows in GATK4.


  • claushclaush BaselMember

    Hello Sheila,
    Many thanks for this clarification.

  • picard_gatk_mjpicard_gatk_mj Unconfirmed

    @Geraldine_VdAuwera said:
    Hi David,

    By default HaplotypeCaller only outputs variant sites. To also output non-variant sites, you need to use the -ERC GVCF mode (for compressed/banded GVCF output) or -ERC BP_RESOLUTION (for per-site output) (see the FAQ doc on GVCFs for details).

    but u did not tell us to use in gvcf, so is it properly, use HaplotypeCaller and then java -jar GenomeAnalysisTK.jar \
    -T GenotypeGVCFs \
    -R reference.fasta \
    --variant sample1.g.vcf \
    --variant sample2.g.vcf \
    -o output.vcf

    in rna-seq;

    looking forwarding to your fast reply

    java -jar GenomeAnalysisTK.jar -T java -jar GenomeAnalysisTK.jar -T HaplotypeCaller -R ref.fasta -I input.bam -dontUseSoftClippedBases -stand_call_conf 20.0 -o output.vcf -R ref.fasta -I input.bam -dontUseSoftClippedBases -stand_call_conf 20.0 -o output.vcf

  • picard_gatk_mjpicard_gatk_mj Unconfirmed

    @Mikebesanski said:
    Thanks a lot @ami‌.
    I'm currently running the pipeline on RNA-seq data in a "two step manner" (HC + GenotypeGVCFs) and without any downsampling.
    As soon as I perform some comparisons (with the "one step manner" with HC only and/or with downsampling), it will be a pleasure to let you know.

    I'm currently running the pipeline on RNA-seq data in a "two step manner" (HC + GenotypeGVCFs) and without any downsampling.

    hi , is this suitable? thanks

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin


    Are you asking how to get all sites (both variant and non-variant) in your output VCF? If so, you need to run GenotypeGVCFs with --includeNonVariantSites.

    Note, we have not validated RNA seq pipeline with GVCF workflow, but some people have used it. Please make sure to properly validate your results when using it.


  • Zea1nfOZea1nfO Member

    when i use haplotypecaller to call snps and indel with rna-seq,and i get several samples data.
    should i use the -ERC GVCF ,and then combine them together?

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @Zea1nfO

    We currently only support variant calling on one rna seq sample at a time and do not support joint calling for rna seq variant calling.


Sign In or Register to comment.