We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

Why a variant site is listed in a GVCF run on a single sample, with no reads showing ALT variant?

aushaush Member
edited April 2019 in Ask the GATK team
I have read couple documents on GVCF but still can't understand how it works. Just one example from the GVCF file I got from HaplotypeCaller from a single bam file with `-ERC GVCF` option:
```
chr22 10718959 . T <NON_REF> . . END=10718959 GT:DP:GQ:MIN_DP:PL 0/0:1:3:1:0,3,42
chr22 10718960 . T <NON_REF> . . END=10718997 GT:DP:GQ:MIN_DP:PL 0/0:1:0:1:0,0,0
chr22 10718998 . C <NON_REF> . . END=10719058 GT:DP:GQ:MIN_DP:PL 0/0:2:3:2:0,3,45
```
When I look at the original bam file around position 10718959, I see that there is indeed 1 read (as indicated in `DP` field), but its sequence matches the reference, with no variations! Why it is listed as a potential variant site at all?
Another example of the same kind:
```
chr22 12602453 . G <NON_REF> . . END=12602461 GT:DP:GQ:MIN_DP:PL 0/0:33:99:33:0,99,1038
chr22 12602462 . A <NON_REF> . . END=12602462 GT:DP:GQ:MIN_DP:PL 0/0:36:96:36:0,96,1440
chr22 12602463 . G <NON_REF> . . END=12602464 GT:DP:GQ:MIN_DP:PL 0/0:37:99:37:0,99,1485
```
Very high genotyping quality score, and in the BAM file I see indeed 33-37 reads on this position - but again, all of them are same as a reference.

I will be very grateful if you could point me to any reference/resource that would be detailed enough to learn this sort of details. So far I have read
[GVCF - Genomic Variant Call Format](software.broadinstitute.org/gatk/documentation/article?id=11004) document, [FAQ on GVCF](software.broadinstitute.org/gatk/documentation/article.php?id=4017), and VCFv4.2 specs.
Post edited by aush on

Best Answer

Answers

  • aushaush Member
    Thank you @bhanuGandham ! That video was useful. (What I didn't realize was the concept of the blocks in fact)
Sign In or Register to comment.