To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at https://software.broadinstitute.org/firecloud/documentation/freecredits

About the haplotypes in hg19.fasta and the GTF file

Dear all,

I am new to GATK and want to call variants for RNA-seq according to the best practice (https://software.broadinstitute.org/gatk/documentation/article.php?id=3891). I have three questions.

  1. The UCSC.hg19.fasta downloaded from ftp://ftp.broadinstitute.org/bundle/hg19/ contains several haplotypes: chr4_ctg9_hap1, chr6_apd_hap1, chr6_ssto_hap7, chr17_ctg5_hap1, ... while the STAR manual (2.5.3a) says "> Generally, patches and alternative haplotypes should not be included in the genome." (section 2.2.1, page 5). So should I use the hg19 fasta without halotypes downloaded from UCSC as instead?

  2. The STAR manual also points out that "using annotations is highly recommended whenever they are available. " The current 2-pass mapping part of best practice does not refer to GTF file. I searched the forum and found the relevant topic (https://gatkforums.broadinstitute.org/gatk/discussion/comment/33853#Comment_33853); I think the GTF file should be involved to improve the mapping quality.

  3. In the step of "Split'N'Trim and reassign mapping qualities" and "variant calling" the "ref.fasta" are used:

    java -jar GenomeAnalysisTK.jar -T SplitNCigarReads -R ref.fasta -I dedupped.bam -o split.bam -rf ReassignOneMappingQuality -RMQF 255 -RMQT 60 -U ALLOW_N_CIGAR_READS

    java -jar GenomeAnalysisTK.jar -T HaplotypeCaller -R ref.fasta -I input.bam -dontUseSoftClippedBases -stand_call_conf 20.0 -o output.vcf

Do they mean the "hg19.fasta"?

Many thanks,
Stanley

Tagged:

Issue · Github
by Sheila

Issue Number
2886
State
closed
Last Updated
Assignee
Array
Milestone
Array
Closed By
sooheelee

Best Answer

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @swlx08
    Hi Stanley,

    I need to check with the team on 1 and 2.

    3) Yes, the ref.fasta is used to represent the reference FASTA file you are using. Sometimes people use hg38 or b37, so we just use "ref.fasta"

    -Sheila

  • Dear Sheila,

    Thank you very much for your answer. May I ask one more question?

    The current 2-pass mapping steps used the default options; however, to analyze different expression I need to use the command as below:

    STAR --runMode alignReads --runThreadN 4 --genomeDir ~/hg19/star_ref/ --readFilesIn a_1.fq a_2.fq --quantMode TranscriptomeSAM GeneCounts --outSAMtype BAM Unsorted --outSAMreadID Number --outSAMattrIHstart 0 --outSAMattributes All --outFilterMismatchNoverReadLmax 0.05 --outSAMstrandField intronMotif --outFilterIntronMotifs RemoveNoncanonical --outFilterMultimapNmax 1 --outReadsUnmapped Fastx --outFileNamePrefix ./a

    So may I use my self_defined options instead of the default ones in both the 1st and 2nd mapping?

    Best regards,
    Stanley

Sign In or Register to comment.