Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Multi-sample SNP calling with UnifiedGenotyper
I've been scouring the forums, but I fear that my question is so basic that I am alone:
I have whole genome sequences of 6 samples (and so 6 .bam files) of a non-model organism and I am trying to compare SNPs for downstream population genetics analyses. I attempted this using UnifiedGenotyper (I realize that HaplotypeCaller is better, but UG finished first, while HC has been running for days at the time of this writing.) Here is what I entered:
java -jar GenomeAnalysisTK.jar -R reference.fasta -T UnifiedGenotyper -I sample01.bam -I sample02.bam -I sample03.bam -I sample04.bam -I sample05.bam -I sample06.bam -o output.raw.snps.indels.vcf
Unless I am mis-reading the output VCF file (first few lines are pasted below), it seems to contain only a single sample (based on the fact that there is only a single column for ALT, rather than one per sample). I tried to use this file in SNPHYLO, but it errors because "There are no SNPs", which seems to confirm this.
What am I doing wrong? Thanks, and apologies for any redundancies.
CHROM POS ID REF ALT QUAL FILTER INFO FORMAT TU114
223232 3 . T C 450.77 . AC=1;AF=0.500;AN=2;BaseQRankSum=-2.751;DP=245;Dels=0.00;ExcessHet=3.0103;FS=0.966;HaplotypeScore=4.7345;MLEAC=1;MLEAF=0.500;MQ=18.74;MQ0=51;MQRankSum=-1.838;QD=2.12;ReadPosRankSum=-1.234;SOR=0.965 GT:AD:DP:GQ:PL 0/1:163,50:245:99:479,0,1492
223232 19 . C T 529.77 . AC=1;AF=0.500;AN=2;BaseQRankSum=0.273;DP=250;Dels=0.00;ExcessHet=3.0103;FS=5.648;HaplotypeScore=10.3851;MLEAC=1;MLEAF=0.500;MQ=21.77;MQ0=32;MQRankSum=0.356;QD=2.12;ReadPosRankSum=1.028;SOR=1.757 GT:AD:DP:GQ:PL 0/1:198,52:250:99:558,0,2443