Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
I got someting wrong with “ Value was put into PairInfoMap more than once ” and "READ_NAME_REGEX "
I am trying to call snps with my mutiple sample RNA-seq data,
The RNA-seq data i have come from paired-end sequencing ,just looks like that:
sampleA.r1.fastq.gz , sampleA.r2.fastq.gz
sampleB.r1.fastq.gz , sampleB.r2.fastq.gz
sampleC.r1.fastq.gz , sampleC.r2.fastq.gz
then i got a idea to call snp :
1)use the linux command "cat" to combine sampleA.r1.fastq.gz，sampleB.r1.fastq.gz , sampleC.r1.fastq.gz into one file named " R1.fastq.gz"
2)use the linux command "cat" to combine sampleA.r2.fastq.gz，sampleB.r2.fastq.gz , sampleC.r2.fastq.gz into one file named “R2.fastq.gz”
3) then i plan to use R1.fastq.gz and R2.fastq.gz to call-snp with this workflow
here is my step :
a) use Hisat2 to align R1.fastq.gz and R2.fastq.gz to reference sequence , then i got a sam file.
b)use samtools to convert sam file into bam file
c)use AddOrReplaceReadGroups to add ReadGroup information to my bam file
d)use SortSam to sort my bam file with coordinate
e)use MarkDuplicates to mark the duplicate sequence
then i got two error:
i try to find a solution in our forums , but i failed.
i think the problem came from my first step , but it is sorry that i dont know how to fix it.
i believe you guys can tell me how to fix it ,thanks a lot