About panel sequencing
Hi, dear GATK team:
I am analysing a dataset of exomes of 50 genes from human samples, generated from Ion torrent, average ~1000x. I marked the duplicates (which Life suggests not to do so), skipped realignment and base recalibration,since the regions are very small (mainly <200bp) and the depth is high. Called SNP with Unifiedgenotyper.
java -Djava.io.tmpdir=$tmp_dir -Xmx20G -jar $gatk_dir/GenomeAnalysisTK.jar -T UnifiedGenotyper -L $region -R $ref -glm SNP -mte -nct $thread_num --sample_ploidy $ploidy -I $bamfile --output_mode EMIT_VARIANTS_ONLY --dbsnp $db_vcf_file -o $gatk_vcf
In one of our case, over 1000 raw SNPs were called form a normal sample, which is abnormal.
The quality of the reads and mapping were fine.
I checked some low scored SNPs with IGV, they are far less coverd( <200x). some are less than 10x.
Why the caller called so many SNPs? What options or commands should I use to deal with this problem?