Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Questions about the RNAseq variant discovery workflow

2»

Comments

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @picasa1983
    Hi,

    Have a look at this article.

    -Sheila

  • YogeshYogesh south koreaMember

    Hi ,

    I am working on SNP detection on multiple pair end RNAseq library:

    after indel realignment, I want perform base quality score recalibration by using these two steps, I do not have any information about known SNP,INDEL.VCF. If I do not provide this option is it ok?

    java -jar GenomeAnalysisTK.jar \
    -T BaseRecalibrator \
    -R refgenome.fasta\
    -knownSites known_snps_indels.vcf \
    -I sample1.sorted.dedup.realigned.fixmate.bam \
    -o sample1.sorted.dedup.realigned.fixmate.recal_data.table \
    -cov ReadGroupCovariate \
    -cov QualityScoreCovariate \
    -cov CycleCovariate

    java -jar GenomeAnalysisTK.jar \
    -T PrintReads \
    -R refgenome.fasta \
    -BQSR sample1.sorted.dedup.realigned.fixmate.recal_data.table \
    -I sample1.sorted.dedup.realigned.fixmate.bam \
    -o sample1.sorted.dedup.realigned.fixmate.recal.bam

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @Yogesh
    Hi,

    You should not use BQSR if you do not have known variant sites. This is because BQSR considers all differences in your reads from the reference to be errors. If those differences are indeed common variants in the population, they will be seen as errors and penalized. This will cause the scores to be lowered for no reason. If you input a known sites file, those sites will be masked and not considered as errors.

    If you do not have a known sites file, you can try bootstrapping. Have a look under "I'm working on a genome that doesn't really have a good SNP database yet. I'm wondering if it still makes sense to run base quality score recalibration without known SNPs." here.

    -Sheila

  • claushclaush BaselMember

    Hello,
    Many thanks for this impressive workflow.
    May I kindly ask if testing on how the tools perform on RNAseq for multiple samples have continued and what is the outcome?
    Looking forward to your reply.
    Kind regards,
    Claus

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @claush
    Hi Claus,

    I think testing halted for a while, but we plan to publish new workflows in GATK4.

    -Sheila

  • claushclaush BaselMember

    Hello Sheila,
    Many thanks for this clarification.
    Best,
    Claus

  • picard_gatk_mjpicard_gatk_mj Unconfirmed

    @Geraldine_VdAuwera said:
    Hi David,

    By default HaplotypeCaller only outputs variant sites. To also output non-variant sites, you need to use the -ERC GVCF mode (for compressed/banded GVCF output) or -ERC BP_RESOLUTION (for per-site output) (see the FAQ doc on GVCFs for details).

    but u did not tell us to use in gvcf, so is it properly, use HaplotypeCaller and then java -jar GenomeAnalysisTK.jar \
    -T GenotypeGVCFs \
    -R reference.fasta \
    --variant sample1.g.vcf \
    --variant sample2.g.vcf \
    -o output.vcf

    in rna-seq;

    looking forwarding to your fast reply

    java -jar GenomeAnalysisTK.jar -T java -jar GenomeAnalysisTK.jar -T HaplotypeCaller -R ref.fasta -I input.bam -dontUseSoftClippedBases -stand_call_conf 20.0 -o output.vcf -R ref.fasta -I input.bam -dontUseSoftClippedBases -stand_call_conf 20.0 -o output.vcf

  • picard_gatk_mjpicard_gatk_mj Unconfirmed

    @Mikebesanski said:
    Thanks a lot @ami‌.
    I'm currently running the pipeline on RNA-seq data in a "two step manner" (HC + GenotypeGVCFs) and without any downsampling.
    As soon as I perform some comparisons (with the "one step manner" with HC only and/or with downsampling), it will be a pleasure to let you know.

    I'm currently running the pipeline on RNA-seq data in a "two step manner" (HC + GenotypeGVCFs) and without any downsampling.

    hi , is this suitable? thanks

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @picard_gatk_mj
    Hi,

    Are you asking how to get all sites (both variant and non-variant) in your output VCF? If so, you need to run GenotypeGVCFs with --includeNonVariantSites.

    Note, we have not validated RNA seq pipeline with GVCF workflow, but some people have used it. Please make sure to properly validate your results when using it.

    -Sheila

  • Zea1nfOZea1nfO Member

    when i use haplotypecaller to call snps and indel with rna-seq,and i get several samples data.
    should i use the -ERC GVCF ,and then combine them together?

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @Zea1nfO

    We currently only support variant calling on one rna seq sample at a time and do not support joint calling for rna seq variant calling.

    Regards
    Bhanu

  • BegaliBegali GermanyMember

    @Geraldine_VdAuwera
    @bhanuGandham
    @Sheila
    @shlee
    Hallo,

    I would like to receive your information that
    Q1/ Can I use workflow for calling germline variants in RNA-seq (https://software.broadinstitute.org/gatk/documentation/article.php?id=3891) for miRNA seq for skin biopsy in childhood caner but those miRNA-seq belongs to their healthy tissue in order to investigate their germline genetics variants.

    Q2/ can I generate GVCF for all population within -ERC GVCF or not yet support as the discuss above on Nov 2018.
    with best regards

  • BegaliBegali GermanyMember

    @Geraldine_VdAuwera
    @bhanuGandham
    @Sheila
    @shlee
    @SkyWarrior
    Hallo,
    Another Q adding to my previous post
    Or as also have DNA-seq for those childhood, I can call germline GVs for all DNA-Seq then there is any method to extracted only those Germline GVs that are located at introns or UTR(miRNA)..
    any information on that

    with best regards

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @Begali

    Another Q adding to my previous post

    Which previous post?

    Or as also have DNA-seq for those childhood, I can call germline GVs for all DNA-Seq then there is any method to extracted only those Germline GVs that are located at introns or UTR(miRNA)..

    I don't quite understand. Would you please elaborate.

Sign In or Register to comment.