We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

GenotypeGVCFs with dbSNP ROD in vcf: output vcf contains ROD variants as well as .g.vcf samples


Not a question, more of a bug report - I'm using the new v3.1 best practices pipeline, so I'd successfully produced my per-sample (n=23 in total) gVCFs with no worries.

Then I used GenotypeGVCFs to combine them as follows, including a dbSNP ROD (the Sanger Mouse Genome Project's SNP calls for 17 samples, in vcf format... not gvcf!):

java -Djava.io.tmpdir=/tmp -Xmx28g -jar ./tmp/GenomeAnalysisTK_3.1-1/GenomeAnalysisTK.jar -T GenotypeGVCFs -nt 8 -R ./mm10.fa --dbsnp ./tmp/mgp.v3.SNPs.indels/ftp-mouse.sanger.ac.uk/REL-1303-SNPs_Indels-GRCm38/mgp.v3.snps.rsIDdbSNPv137.vcf.ordinalsorted.vcf -V GenotypeGVCFs.run1.sample.list -o ./CombinedGenotyping.run1.vcf -A InbreedingCoeff -A FisherStrand -A QualByDepth -A ChromosomeCounts

So I was very surprised to see that my output CombinedGenotyping vcf has 40 samples in it, not 23 - and of course, 23 + 17 = 40. Checking the VCF headers itself confirms that the genotype calls from the 17 Sanger strains have been included in the output vcf, not just the rsIDs as intended(?). I'm guessing that this combining of .g.vcfs and extra ROD isn't the expected behaviour...!



Best Answer


  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    By the way, I forgot to ask you: does it look like the dbsnp samples just get copy/pasted into list of samples, or are the genotype-level fields actually different?

  • mfletchermfletcher DEMember

    Hi @Geraldine_VdAuwera‌,

    The samples from the ROD have been filled with empty (./.) fields:

    #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 129P2 129S1 129S5 AJ AKRJ BALBcJ BEH_02 BEL_02 C3HHeJ C57BL6NJ CASTEiJ CBAJ CD1 DAH DBA2J DUH_23 DUK01M DUK01W DUK02M DUK02W DUK03M DUK03W DUK04M DUK04W DUK05M DUK05W FVBNJ LPJ MUH_06 MUL_09 NODShiLtJ NZOHlLtJ PHI_17 PLI_20 PWKPhJ RAH ROH_12 ROL_14 SPRETEiJ WSBEiJ chr1 3000019 . G GA 247.54 . AC=4;AF=1.00;AN=4;DP=5;FS=0.000;MLEAC=4;MLEAF=1.00;MQ=35.98;MQ0=0;QD=34.24 GT:AD:DP:GQ:PL ./. ./. ./. ./. ./. ./. ./.:.:0 ./.:.:0 ./. ./. ./. ./. ./.:.:0 ./.:.:0 ./. ./.:.:0 ./.:.:0 ./.:.:0 ./.:.:0 ./.:.:0 ./.:.:0 ./.:.:0 ./.:.:0 ./.:.:0 ./.:.:0 ./.:.:0 ./. ./. ./.:.:0 ./.:.:0 ./. ./. ./.:.:0 1/1:0,2:2:9:135,9,0 ./. 1/1:0,3:3:9:135,9,0 ./.:.:0 ./.:.:0 ./. ./.

  • pdexheimerpdexheimer Member ✭✭✭✭

    Just as a sanity check - the dbSNP vcf didn't end up as an entry in GenotypeGVCFs.run1.sample.list, did it?

  • mfletchermfletcher DEMember

    Hi @pdexheimer‌ - no, definitely not (I double checked actually!), it was only referenced in the --dbsnp option.

  • thibaultthibault Broad InstituteMember, Broadie, Dev admin

    This should be fixed in our nightly builds and will be in release 3.2.

    Nightly builds can be found at https://www.broadinstitute.org/gatk/nightly .

Sign In or Register to comment.