This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!
Ambiguity in the reads for a same position
I am quite new in the field of metagenomic analysis, so please excuse me if I ask my questions in a strange way!
I am working on a metagenomic dataset, and I am interested on adaptation in a specific species, the algae Bathycoccus prasinos. What I did was first to align my reads against my reference genome of Bathycoccus, then I created a vcf file using GATK UnifiedGenotyper.
There is my command line:
java -jar ../apps/GenomeAnalysisTK-3.3-0/GenomeAnalysisTK.jar -R ../genome/Bathycoccus_genome_FINAL_RELEASE.fasta -T UnifiedGenotyper -glm BOTH -I ../BAMfile/ReadsConcat_Bathy_VerySensitiveLocal_bowtie_sorted_readsgroup.bam -o output_GATK_test
I was wondering myself many things:
I am not sure to understand the parameter -dbsnp, is it a big deal if I didn't used it in my command line? From what I've understood it is a database that lists SNP often found and helps to make the difference between snp and sequencing error? Is it specific to each species? Is it going to change my vcf results if I dont use it?
I tried to find this information every where on internet, I could not success. I am wondering what is doing the program when you have some ambiguity in the reads for a same position? Imagine that we have many reads that aligned to the same position, and they display different bases than the reference and different bases between each others reads. What s happening in this case? The SNP is ignored? It choose the most abundant one? It refers to the quality score?
Thanks a lot for your answer, I am quite stuck right now because I am not sure of my vcf file.