Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
GenotypeGVCFs with dbSNP ROD in vcf: output vcf contains ROD variants as well as .g.vcf samples
Not a question, more of a bug report - I'm using the new v3.1 best practices pipeline, so I'd successfully produced my per-sample (n=23 in total) gVCFs with no worries.
Then I used GenotypeGVCFs to combine them as follows, including a dbSNP ROD (the Sanger Mouse Genome Project's SNP calls for 17 samples, in vcf format... not gvcf!):
java -Djava.io.tmpdir=/tmp -Xmx28g -jar ./tmp/GenomeAnalysisTK_3.1-1/GenomeAnalysisTK.jar -T GenotypeGVCFs -nt 8 -R ./mm10.fa --dbsnp ./tmp/mgp.v3.SNPs.indels/ftp-mouse.sanger.ac.uk/REL-1303-SNPs_Indels-GRCm38/mgp.v3.snps.rsIDdbSNPv137.vcf.ordinalsorted.vcf -V GenotypeGVCFs.run1.sample.list -o ./CombinedGenotyping.run1.vcf -A InbreedingCoeff -A FisherStrand -A QualByDepth -A ChromosomeCounts
So I was very surprised to see that my output CombinedGenotyping vcf has 40 samples in it, not 23 - and of course, 23 + 17 = 40. Checking the VCF headers itself confirms that the genotype calls from the 17 Sanger strains have been included in the output vcf, not just the rsIDs as intended(?). I'm guessing that this combining of .g.vcfs and extra ROD isn't the expected behaviour...!