Filter Error: Bad input: We encountered a non-standard non-IUPAC base in the provided:"10"

THPTHP Athens, GAMember

Hi!
I'm using GATK to call SNPs for a F2 mapping population design using GBS. So, my reference is derived from all my GBS lines. Running the GATK using UnifiedGenotyper worked great, generating perfectly valid output and a viewable vcf (in excel and IGV). However, I'm now trying to filter for just biallelic SNPs using SelectVariants, and I keep getting this error:

ERROR MESSAGE: Bad input: We encountered a non-standard non-IUPAC base in the provided reference: '10'

I'm using the same reference files (.fa, .fai, .dict, etc) as for the GATK, and haven't modified them in anyway. They've never been processed by Windows software, and I can't seem to find any non-standard formatting or data (at least in a cursory look). Any thoughts?

Thanks!

Tagged:

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @THP
    Hi,

    Can you tell me what version of GATK you are using and the exact command you ran for SelectVariants?

    Also, this thread may have some helpful hints: http://gatkforums.broadinstitute.org/discussion/2917/error-message-bad-input-we-encountered-a-non-standard-non-iupac-base-in-the-provided-reference-13

    -Sheila

  • THPTHP Athens, GAMember

    Hi Sheila,
    I did look at that discussion thread, and have tried a few of things suggested. I've searched for any extraneous symbols or empty lines, rebooted my reference index & .dict, and double checked my naming scheme.

    I'm using GATK v3.4-0 (-g7e26428)

    My command is:
    java -jar /usr/local/gatk/latest/GenomeAnalysisTK.jar -T SelectVariants -R FM_ref.fa -o biallelic_FM_SNPsZCL.vcf --variant raw_SNPs_FM.vcf -restrictAllelesTo BIALLELIC

    Let me know if I've done something foolish.

    Thanks for your help!

    Tom

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @THP
    Hi Tom,

    Is there any reason you are using UnifiedGenotyper over HaplotypeCaller? We recommend using HaplotypeCaller instead of UnifiedGenotyper. Can you try validating your VCF with ValidateVariants? https://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_tools_walkers_variantutils_ValidateVariants.php

    -Sheila

  • THPTHP Athens, GAMember

    Thanks for your help, Sheila.

    We've used UnifiedGenotyper…mostly due to lab culture? In this case, we're working with a tetraploid plant, and my impression was that UG dealt with ploidy better than HC (at least historically). No longer true?

    I ran ValidateVariant, as well as other filters (by allele frequency using VariantFiltration) and got the exact same error:

    ERROR MESSAGE: Bad input: We encountered a non-standard non-IUPAC base in the provided reference: '10'

    Should I rerun the GATK with HaplotypeCaller? (I realize this is likely the answer…it just took 2 weeks last time)

    Thanks!

    Tom

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    UG dealt with ploidy better than HC

    Historically yes, no longer true. Best to re-run with HC -- sorry! But results will be better.

    Not that this will fix your reference problem, though... Do you have a backed up copy of your reference just in case this is the result of a random file corruption?

  • pdexheimerpdexheimer Member ✭✭✭✭

    I know you said that your reference hasn't been modified, but I think it must have been in some way. If you get this error with one GATK tool, you'll get it with all of them - the same code processes the reference the same way, no matter which individual analysis you run.

    What happens when you run grep -c [[:cntrl:]] FM_ref.fa? Character 10, at least in ASCII, as a non-printable control character - so I wouldn't expect a cursory look to find it

Sign In or Register to comment.