To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at https://software.broadinstitute.org/firecloud/documentation/freecredits

Filter Error: Bad input: We encountered a non-standard non-IUPAC base in the provided:"10"

THPTHP Athens, GAMember

Hi!
I'm using GATK to call SNPs for a F2 mapping population design using GBS. So, my reference is derived from all my GBS lines. Running the GATK using UnifiedGenotyper worked great, generating perfectly valid output and a viewable vcf (in excel and IGV). However, I'm now trying to filter for just biallelic SNPs using SelectVariants, and I keep getting this error:

ERROR MESSAGE: Bad input: We encountered a non-standard non-IUPAC base in the provided reference: '10'

I'm using the same reference files (.fa, .fai, .dict, etc) as for the GATK, and haven't modified them in anyway. They've never been processed by Windows software, and I can't seem to find any non-standard formatting or data (at least in a cursory look). Any thoughts?

Thanks!

Tagged:

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @THP
    Hi,

    Can you tell me what version of GATK you are using and the exact command you ran for SelectVariants?

    Also, this thread may have some helpful hints: http://gatkforums.broadinstitute.org/discussion/2917/error-message-bad-input-we-encountered-a-non-standard-non-iupac-base-in-the-provided-reference-13

    -Sheila

  • THPTHP Athens, GAMember

    Hi Sheila,
    I did look at that discussion thread, and have tried a few of things suggested. I've searched for any extraneous symbols or empty lines, rebooted my reference index & .dict, and double checked my naming scheme.

    I'm using GATK v3.4-0 (-g7e26428)

    My command is:
    java -jar /usr/local/gatk/latest/GenomeAnalysisTK.jar -T SelectVariants -R FM_ref.fa -o biallelic_FM_SNPsZCL.vcf --variant raw_SNPs_FM.vcf -restrictAllelesTo BIALLELIC

    Let me know if I've done something foolish.

    Thanks for your help!

    Tom

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @THP
    Hi Tom,

    Is there any reason you are using UnifiedGenotyper over HaplotypeCaller? We recommend using HaplotypeCaller instead of UnifiedGenotyper. Can you try validating your VCF with ValidateVariants? https://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_tools_walkers_variantutils_ValidateVariants.php

    -Sheila

  • THPTHP Athens, GAMember

    Thanks for your help, Sheila.

    We've used UnifiedGenotyper…mostly due to lab culture? In this case, we're working with a tetraploid plant, and my impression was that UG dealt with ploidy better than HC (at least historically). No longer true?

    I ran ValidateVariant, as well as other filters (by allele frequency using VariantFiltration) and got the exact same error:

    ERROR MESSAGE: Bad input: We encountered a non-standard non-IUPAC base in the provided reference: '10'

    Should I rerun the GATK with HaplotypeCaller? (I realize this is likely the answer…it just took 2 weeks last time)

    Thanks!

    Tom

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    UG dealt with ploidy better than HC

    Historically yes, no longer true. Best to re-run with HC -- sorry! But results will be better.

    Not that this will fix your reference problem, though... Do you have a backed up copy of your reference just in case this is the result of a random file corruption?

  • I know you said that your reference hasn't been modified, but I think it must have been in some way. If you get this error with one GATK tool, you'll get it with all of them - the same code processes the reference the same way, no matter which individual analysis you run.

    What happens when you run grep -c [[:cntrl:]] FM_ref.fa? Character 10, at least in ASCII, as a non-printable control character - so I wouldn't expect a cursory look to find it

Sign In or Register to comment.