Holiday Notice:
The Frontline Support team will be offline February 18 for President's Day but will be back February 19th. Thank you for your patience as we get to all of your questions!

Compare VCF files for SVs

dummygenomedummygenome Member
edited January 30 in Ask the GATK team
Hello there,

I'm interested to compare two VCF files using GATK - genotype concordance.

Input example:
```22 17307808 22:17307808_TAATTAATTTATTTATTTATTTATTTATTTATTTATTTATTATTT_TAATTAATTTATTTATTTATTTATTTATTATTT TAATTAATTTATTTATTTATTTATTTATTTATTTATTTATTATTT TAATTAATTTATTTATTTATTTATTTATTATTT
22 17307808 22:17307808_TAATTAATTTATTTATTTATTTATTTATTTATTTATTTATTATTT_TAATTTATTTATTTATTTATTTATTTATTTATTTATTATTT TAATTTATTTATTTATTTATTTATTTATTTATTTATTATTT TAATTAATTTATTTATTTATTTATTTATTTATTTATTTATTATTT
22 17307808 22:17307808_TAATTAATTTATTTATTTATTTATTTATTTATTTATTTATTATTT_TTATTTATTTATTTATTTATTTATTTATTTATTTATTATTT TAATTAATTTATTTATTTATTTATTTATTTATTTATTTATTATTT TTATTTATTTATTTATTTATTTATTTATTTATTTATTATTT
22 17307808 22:17307808_TAATTAATTTATTTATTTATTTATTTATTTATTTATTTATTATTT_TAATTTATTTATTTATTTATTTATTTATTTATTTATTAATTATTT TAATTAATTTATTTATTTATTTATTTATTTATTTATTTATTATTT TAATTTATTTATTTATTTATTTATTTATTTATTTATTAATTATTT
22 17307808 22:17307808_TAATTAATTTATTTATTTATTTATTTATTTATTTATTTATTATTT_TAATTTATTTATTTATTTATTTATTTATTTATTTATTTATTATTT TAATTAATTTATTTATTTATTTATTTATTTATTTATTTATTATTT TAATTTATTTATTTATTTATTTATTTATTTATTTATTTATTATTT
22 17307808 22:17307808_TAATTAATTTATTTATTTATTTATTTATTTATTTATTTATTATTT_TAATTTATTTATTTATTTATTTATTTATTTATTTATTTATTTATTATTT TAATTAATTTATTTATTTATTTATTTATTTATTTATTTATTATTT TAATTTATTTATTTATTTATTTATTTATTTATTTATTTATTTATTATTT
```

This is an instance for a position with multiple alternate alleles from one of the input VCFs.

When I run GATK, I get error as below:

INFO 10:58:41,954 GenotypeConcordance - Eval or Comp Rod at position 22:17236819 has multiple records. Resolving.
INFO 10:58:41,967 GenotypeConcordance - Eval or Comp Rod at position 22:17246374 has multiple records. Resolving.
WARN 10:58:41,973 GenotypeConcordance - Eval or Comp Rod at position 22:17278678 has multiple records of the same type. This locus will be skipped.
WARN 10:58:41,983 GenotypeConcordance - Eval or Comp Rod at position 22:17278679 has multiple records of the same type. This locus will be skipped.
WARN 10:58:42,000 GenotypeConcordance - Eval or Comp Rod at position 22:17279066 has multiple records of the same type. This locus will be skipped.

```
~/softwares/GenomeAnalysisTK-3.7/GenomeAnalysisTK.jar \
-T GenotypeConcordance \
-R $REF \
-o test.out \
-eval normalized_CHR22_STR.vcf \
-comp plink_STRs_68_0.8_versionFixed.vcf
```

I'd like to know how to compare when multi-allelic VCF is provided.

Yours,
dummygenome


PS: I am unable to get the back ticks working to format my code and input data.

Answers

  • AdelaideRAdelaideR Unconfirmed, Member, Broadie, Moderator admin

    @dummygenome

    I think you need to hit return after putting the three backticks to make a code block. For inline code you can just use on backtick.

    As for your question, according to the documenation found here

    "Site-level allelic concordance

    For strictly bi-allelic VCFs, only the ALLELES_MATCH, EVAL_ONLY, TRUTH_ONLY fields will be populated, but where multi-allelic sites are involved counts for EVAL_SUBSET_TRUTH and EVAL_SUPERSET_TRUTH will be generated."

Sign In or Register to comment.