If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We appreciate your help!
Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Is this procedure reasonable for SNP calling on RNA-seq data by GATK4?
Hi, I have some RNA-seq samples from different individuals which belong to a species. I would like to combine these data as one bam file for SNP calling on species level by using GATK4. But I am not sure if it is reasonable?(The GATK Best Practices by GATK3 as reference)
Here is my steps:
1. STAR for alignment
2. Add read groups, sort, mark duplicates, and create index
#java -jar picard-tools-1.95/AddOrReplaceReadGroups.jar I=out.sam O=f.bam RGID=MN_S1B RGLB=library RGPL=platform RGPU=machine RGSM=S1B SORT_ORDER=coordinate #java -jar picard-tools-1.95/MarkDuplicates.jar I=f.bam O=dedupped.bam CREATE_INDEX=true VALIDATION_STRINGENCY=SILENT M=output.metrics
(I am not understand why need the output.metric because next steps don't apply it and this index also not apply next steps)
3. merge(How to set parameters for merge data from different individual?)**
#samtools merge [-nr] [-h] out.bam 1.bam 2.bam ...
4. create index****(I am not sure whether the input file is genome data or not)**
#samtools index genomic.fna
5. faidx(I am not sure whether the input file is genome data or not)**
#samtools faidx genomic.fa
#gatk SplitNCigarReads -R genomic.fna out.bam -O split.bam
7. Variant calling
8. Variant filtering**
In summary, I have the following question:
2. Is this procedure reasonable?
3. (Step2) I am not understand why need the output.metric because next steps don't apply it and this index also not apply next steps?
4. (Step 3) How to set parameters for merge data from different individual?
5. (Step4&5) I am not sure whether the input file is genome data or not?
Maybe bam file?
6. I don't do the Base Quality Score Recalibration (BQSR) for the lake of the known sites on the species. Does the lack of this step have an impact on the results? Can I replace it with known sites of similar species? _