Compare VCF files for SVs

dummygenomedummygenome Member
edited January 30 in Ask the GATK team
Hello there,

I'm interested to compare two VCF files using GATK - genotype concordance.

Input example:
```22 17307808 22:17307808_TAATTAATTTATTTATTTATTTATTTATTTATTTATTTATTATTT_TAATTAATTTATTTATTTATTTATTTATTATTT TAATTAATTTATTTATTTATTTATTTATTTATTTATTTATTATTT TAATTAATTTATTTATTTATTTATTTATTATTT
22 17307808 22:17307808_TAATTAATTTATTTATTTATTTATTTATTTATTTATTTATTATTT_TAATTTATTTATTTATTTATTTATTTATTTATTTATTATTT TAATTTATTTATTTATTTATTTATTTATTTATTTATTATTT TAATTAATTTATTTATTTATTTATTTATTTATTTATTTATTATTT
22 17307808 22:17307808_TAATTAATTTATTTATTTATTTATTTATTTATTTATTTATTATTT_TTATTTATTTATTTATTTATTTATTTATTTATTTATTATTT TAATTAATTTATTTATTTATTTATTTATTTATTTATTTATTATTT TTATTTATTTATTTATTTATTTATTTATTTATTTATTATTT
22 17307808 22:17307808_TAATTAATTTATTTATTTATTTATTTATTTATTTATTTATTATTT_TAATTTATTTATTTATTTATTTATTTATTTATTTATTAATTATTT TAATTAATTTATTTATTTATTTATTTATTTATTTATTTATTATTT TAATTTATTTATTTATTTATTTATTTATTTATTTATTAATTATTT
22 17307808 22:17307808_TAATTAATTTATTTATTTATTTATTTATTTATTTATTTATTATTT_TAATTTATTTATTTATTTATTTATTTATTTATTTATTTATTATTT TAATTAATTTATTTATTTATTTATTTATTTATTTATTTATTATTT TAATTTATTTATTTATTTATTTATTTATTTATTTATTTATTATTT
22 17307808 22:17307808_TAATTAATTTATTTATTTATTTATTTATTTATTTATTTATTATTT_TAATTTATTTATTTATTTATTTATTTATTTATTTATTTATTTATTATTT TAATTAATTTATTTATTTATTTATTTATTTATTTATTTATTATTT TAATTTATTTATTTATTTATTTATTTATTTATTTATTTATTTATTATTT
```

This is an instance for a position with multiple alternate alleles from one of the input VCFs.

When I run GATK, I get error as below:

INFO 10:58:41,954 GenotypeConcordance - Eval or Comp Rod at position 22:17236819 has multiple records. Resolving.
INFO 10:58:41,967 GenotypeConcordance - Eval or Comp Rod at position 22:17246374 has multiple records. Resolving.
WARN 10:58:41,973 GenotypeConcordance - Eval or Comp Rod at position 22:17278678 has multiple records of the same type. This locus will be skipped.
WARN 10:58:41,983 GenotypeConcordance - Eval or Comp Rod at position 22:17278679 has multiple records of the same type. This locus will be skipped.
WARN 10:58:42,000 GenotypeConcordance - Eval or Comp Rod at position 22:17279066 has multiple records of the same type. This locus will be skipped.

```
~/softwares/GenomeAnalysisTK-3.7/GenomeAnalysisTK.jar \
-T GenotypeConcordance \
-R $REF \
-o test.out \
-eval normalized_CHR22_STR.vcf \
-comp plink_STRs_68_0.8_versionFixed.vcf
```

I'd like to know how to compare when multi-allelic VCF is provided.

Yours,
dummygenome


PS: I am unable to get the back ticks working to format my code and input data.

Answers

  • AdelaideRAdelaideR Unconfirmed, Member, Broadie, Moderator admin

    @dummygenome

    I think you need to hit return after putting the three backticks to make a code block. For inline code you can just use on backtick.

    As for your question, according to the documenation found here

    "Site-level allelic concordance

    For strictly bi-allelic VCFs, only the ALLELES_MATCH, EVAL_ONLY, TRUTH_ONLY fields will be populated, but where multi-allelic sites are involved counts for EVAL_SUBSET_TRUTH and EVAL_SUPERSET_TRUTH will be generated."

Sign In or Register to comment.