It looks like you're new here. If you want to get involved, click one of these buttons!
I have a whole genome resequencing Illumina reads from two contrasting genotypes.
I have few queries regarding GATK analysis.
Objective: I want to identify the homozygous SNP and Indels between these two genotypes by mapping raw read against the reference genome.
what are the prefiltering parameter need to take care before starting the GATK pipeline?
I already removed the adapter and low-quality bases from reads, do I need to remove repetitive reads also, if yes then please suggest how to do it? What are the other pre-read filtering parameter that also I should need to look?
In GATK pipeline why we are creating sequence dictionary? where it is used? What it the role of assign read group? how do I assign read group, does it has specifc feature or just any random name I can put?
java -jar~/bin/picard-tools-1.8.5/CreateSequenceDictionary.jar REFERENCE=reference.fasta OUTPUT=reference.dict
bwa mem -R “@RG\tID:FLOWCELL1.LANE1\tPL:ILLUMINA\tLB:test\tSM:PA01” reference.fasta R1.fastq.gz R2.fastq.gz > aln.sam