Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
About the haplotypes in hg19.fasta and the GTF file
I am new to GATK and want to call variants for RNA-seq according to the best practice (https://software.broadinstitute.org/gatk/documentation/article.php?id=3891). I have three questions.
The UCSC.hg19.fasta downloaded from ftp://ftp.broadinstitute.org/bundle/hg19/ contains several haplotypes: chr4_ctg9_hap1, chr6_apd_hap1, chr6_ssto_hap7, chr17_ctg5_hap1, ... while the STAR manual (2.5.3a) says "> Generally, patches and alternative haplotypes should not be included in the genome." (section 2.2.1, page 5). So should I use the hg19 fasta without halotypes downloaded from UCSC as instead?
The STAR manual also points out that "using annotations is highly recommended whenever they are available. " The current 2-pass mapping part of best practice does not refer to GTF file. I searched the forum and found the relevant topic (https://gatkforums.broadinstitute.org/gatk/discussion/comment/33853#Comment_33853); I think the GTF file should be involved to improve the mapping quality.
In the step of "Split'N'Trim and reassign mapping qualities" and "variant calling" the "ref.fasta" are used:
java -jar GenomeAnalysisTK.jar -T SplitNCigarReads -R ref.fasta -I dedupped.bam -o split.bam -rf ReassignOneMappingQuality -RMQF 255 -RMQT 60 -U ALLOW_N_CIGAR_READS
java -jar GenomeAnalysisTK.jar -T HaplotypeCaller -R ref.fasta -I input.bam -dontUseSoftClippedBases -stand_call_conf 20.0 -o output.vcf
Do they mean the "hg19.fasta"?