This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!
About the haplotypes in hg19.fasta and the GTF file
I am new to GATK and want to call variants for RNA-seq according to the best practice (https://software.broadinstitute.org/gatk/documentation/article.php?id=3891). I have three questions.
The UCSC.hg19.fasta downloaded from ftp://ftp.broadinstitute.org/bundle/hg19/ contains several haplotypes: chr4_ctg9_hap1, chr6_apd_hap1, chr6_ssto_hap7, chr17_ctg5_hap1, ... while the STAR manual (2.5.3a) says "> Generally, patches and alternative haplotypes should not be included in the genome." (section 2.2.1, page 5). So should I use the hg19 fasta without halotypes downloaded from UCSC as instead?
The STAR manual also points out that "using annotations is highly recommended whenever they are available. " The current 2-pass mapping part of best practice does not refer to GTF file. I searched the forum and found the relevant topic (https://gatkforums.broadinstitute.org/gatk/discussion/comment/33853#Comment_33853); I think the GTF file should be involved to improve the mapping quality.
In the step of "Split'N'Trim and reassign mapping qualities" and "variant calling" the "ref.fasta" are used:
java -jar GenomeAnalysisTK.jar -T SplitNCigarReads -R ref.fasta -I dedupped.bam -o split.bam -rf ReassignOneMappingQuality -RMQF 255 -RMQT 60 -U ALLOW_N_CIGAR_READS
java -jar GenomeAnalysisTK.jar -T HaplotypeCaller -R ref.fasta -I input.bam -dontUseSoftClippedBases -stand_call_conf 20.0 -o output.vcf
Do they mean the "hg19.fasta"?