If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!
Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
adapter removal and variant calling in samples with different library prep/pre-processing
This question is an amalgamation of good practice and conceptual doubts. So I have a cohort of a non-model organism of say approx. 100 animals. 40 have been sequenced at 10x depth by Illumina 2500 machine and rest 60 have been sequenced at 30x depth by Illumina 4000. The samples that have been sequenced at 30x had their adapters removed during bcl2fastq conversion stage. Unfortunately, the samples that were sequenced at 10x did not have their adapters removed. On doing some fastqc analysis, adapters were found in those samples, but except for one or two samples, the lines did not reach the red zone.
I used BWA-mem for alignment. Theoretically, the adapters present in 10x samples get soft-clipped be default as they won't match the reference genome. Hence, I did not remove the adapters from those samples. My aim is to understand the genetic variation amongst those samples and hence followed the germline variant discovery pipeline (SNPs+Indels). The questions are:
1) Haplotypecaller does local-reassembly and throws away MAPQ information and also uses soft-clipped bases for re-alignment unless '--dontUseSoftClippedBases' is used. During realignment, technically, the adapter sequences won't align again and Haplotypecaller will call SNPs or indels from those regions?
2) Since a joint genotype calling is done at a later stage, when genotype calling is done at a region where adapter is present in a 10x sample, adapter won't be found at that region in the 30x samples, and a lower genotype quality score will be given to that particular locus with and SNP or probably an indel? I will be filtering positions (put them to missing) when GQ will be less than 40 which may reduce wrongly assigned variants/ genotypes.
3) Should I have removed the adapters before performing variant calling? I wanted to keep the pipeline same for all samples and because of my above understanding, I followed my procedure of not removing adapters from 10x depth samples.