We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

About the haplotypes in hg19.fasta and the GTF file

Dear all,

I am new to GATK and want to call variants for RNA-seq according to the best practice (https://software.broadinstitute.org/gatk/documentation/article.php?id=3891). I have three questions.

  1. The UCSC.hg19.fasta downloaded from ftp://ftp.broadinstitute.org/bundle/hg19/ contains several haplotypes: chr4_ctg9_hap1, chr6_apd_hap1, chr6_ssto_hap7, chr17_ctg5_hap1, ... while the STAR manual (2.5.3a) says "> Generally, patches and alternative haplotypes should not be included in the genome." (section 2.2.1, page 5). So should I use the hg19 fasta without halotypes downloaded from UCSC as instead?

  2. The STAR manual also points out that "using annotations is highly recommended whenever they are available. " The current 2-pass mapping part of best practice does not refer to GTF file. I searched the forum and found the relevant topic (https://gatkforums.broadinstitute.org/gatk/discussion/comment/33853#Comment_33853); I think the GTF file should be involved to improve the mapping quality.

  3. In the step of "Split'N'Trim and reassign mapping qualities" and "variant calling" the "ref.fasta" are used:

    java -jar GenomeAnalysisTK.jar -T SplitNCigarReads -R ref.fasta -I dedupped.bam -o split.bam -rf ReassignOneMappingQuality -RMQF 255 -RMQT 60 -U ALLOW_N_CIGAR_READS

    java -jar GenomeAnalysisTK.jar -T HaplotypeCaller -R ref.fasta -I input.bam -dontUseSoftClippedBases -stand_call_conf 20.0 -o output.vcf

Do they mean the "hg19.fasta"?

Many thanks,


Issue · Github
by Sheila

Issue Number
Last Updated
Closed By

Best Answer


  • SheilaSheila Broad InstituteMember, Broadie ✭✭✭✭✭

    Hi Stanley,

    I need to check with the team on 1 and 2.

    3) Yes, the ref.fasta is used to represent the reference FASTA file you are using. Sometimes people use hg38 or b37, so we just use "ref.fasta"


  • Dear Sheila,

    Thank you very much for your answer. May I ask one more question?

    The current 2-pass mapping steps used the default options; however, to analyze different expression I need to use the command as below:

    STAR --runMode alignReads --runThreadN 4 --genomeDir ~/hg19/star_ref/ --readFilesIn a_1.fq a_2.fq --quantMode TranscriptomeSAM GeneCounts --outSAMtype BAM Unsorted --outSAMreadID Number --outSAMattrIHstart 0 --outSAMattributes All --outFilterMismatchNoverReadLmax 0.05 --outSAMstrandField intronMotif --outFilterIntronMotifs RemoveNoncanonical --outFilterMultimapNmax 1 --outReadsUnmapped Fastx --outFileNamePrefix ./a

    So may I use my self_defined options instead of the default ones in both the 1st and 2nd mapping?

    Best regards,

Sign In or Register to comment.