We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

GenomicsDBImport returns duplicate field error from SelectVariants output

Hi,

I successfully ran Mutect2 using tumor-only mode with a PoN and germline resource. However, I didn't set the flag 'max-mnp-distance 0' meaning I get MNPs in my output. This results in GenomicsDBImport failing as it doesn't support MNPs.
OK, so I then ran SelectVariants with '--select-type-to-exclude MNP' to remove them and used this as input to GenomicsDBImport instead. However, it errors out with the message that duplicate field names exist (see below).

What should/can I do to correct this? I'd prefer not to have to run Mutect2 again if possible.

Answers

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @foxyjohn

    However, it errors out with the message that duplicate field names exist (see below).

    Looks like you have not added the error message here.

  • Arrrgghh, sorry my bad. I didn't save the log output either ...

    At any rate, SelectVariants added the line '##FORMAT=<ID=AF, ...> to the header of its output VCF. But, the input VCF already had a separate header line '##INFO=<ID=DP, ...> generated from Mutect2.
    These 2 lines conflicted to cause GenomicsDBImport to error out with the duplicate field error.

    For now, I have hacked out the FORMAT line from the header as it wasn't even found in the data, so not sure why SelectVariants was putting it there. Not a pretty solution but ....

    I'd love to know how to efficiently avoid this problem in the future.

  • Apologies - that should have read '##INFO=<ID=AF, ...>

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @foxyjohn

    The best way to avoid this problem in the future is to do it the best practices way by setting 'max-mnp-distance 0'. That option does not remove MNPs, it only breaks them up into SNPs in individual records, unlike SelectVariants with '--select-type-to-exclude MNP, which will remove the MNPs. In addition to that, there are different kind of variant, namely INDEL, SNP, MIXED, MNP, SYMBOLIC, NO_VARIATION. With SelectVariants you are only removing MNPs but not the MIXED variants which can sill cause the error. Hence the best way to do this is to follow the best practices.
    I hope this helps.

Sign In or Register to comment.