I’m trying to detect SNPs using GATK, I have 8 RNA samples represent two different families low and high 4 samples for each. I perform the following steps and I do not know what to do next. Please help.
I did the following steps
1. use STAR to do alignment to the reference genome
2. picard tool AddOrReplaceReadGreous where I assign different read group for each sample. I assigned 4 samples with RGSM= “low” PU=”unit1”and the other 4 with RGSM=”high” PU=”unit2”
3. Picard tool SortSam
4. Picard tool MarkDuplicates
5. Picard tool BuildBamIndex
6. GenomeAnalysisTk .jar –T SplitNCigarReads with the following parameters –RMQF 255 –RMQT 60
7. GenomeAnalysisTk .jar –T RealingerTargetCreateor
8. GenomeAnalysisTk .jar –T IndelRealigner
9. GenomeAnalysisTk .jar –T HapltypeCaller with the following parameters –stand_call_conf 20 –stand_emit_conf 20
10. GenomeAnalysisTk .jar –T VariantFiltration with the following parameters –window 35 –cluster 3 –filterExpression “QD >2 || FS > 30 || MQ > 40 || MQRankSum <-1.5 ||ReadPosRankSum”
As I mentioned above I ran each sample using the above script and it provide me with average 60,000 SNPs. My question is how and should I group these samples together? What function and am I doing the pipeline correct? Any help will be appreciated. If you need more information let me know.