We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

GenotypeGVCFs number of

Hi all,

I'm very new to GATK. I'm trying to map an EMS mutation in Arabidopsis. I have fastq files of a wt M3 bulk and a mut M3 bullk (both offspring of the same parent). The strategy is to call for SNPs->GenotypeGVCFs to a single file. That was done succesfully (I think). Next step is to look for SNPs that are homozygous (1/1) for the mut reads and het (1/0 or 0/0) or ref in the wt bulk; I used this command for this:

grep -v '^##' $line.genotype10.vcf | awk 'BEGIN{FS=" "; OFS=" "} $10~/^1\/1/ && ($11~/^1\/0/ || $11~/^0\/0/) {$3=$7=""; print $0}' | sed 's/ */ /g' >file.taxt

Tha also worked pretty well.
I noticed that I have ~150,000 records (SNPs or indels) using the HC but after merging the files using the GenotypeGVCFs I'm left w/ only a few thousands records. The same happens if I use CombineGVCFs (which keep ~150,000 records) and then go for GenotypeGVCFs.

The problem is that with such low # of reads it doesn't recognize a genomic region that fulfil that hom requirement for the mut bulk and het/ref for the wt one.
My question are:

  1. Why does GenotypeGVCFs reduces the read #.
  2. If anyone has other suggestions that would be great.

Thanks a lot,

Best Answer


Sign In or Register to comment.