If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
We will be out of the office on November 11th and 13th 2019, due to the U.S. holiday(Veteran's day) and due to a team event(Nov 13th). We will return to monitoring the GATK forum on November 12th and 14th respectively. Thank you for your patience.

Filter Error: Bad input: We encountered a non-standard non-IUPAC base in the provided:"10"

THPTHP Athens, GAMember

I'm using GATK to call SNPs for a F2 mapping population design using GBS. So, my reference is derived from all my GBS lines. Running the GATK using UnifiedGenotyper worked great, generating perfectly valid output and a viewable vcf (in excel and IGV). However, I'm now trying to filter for just biallelic SNPs using SelectVariants, and I keep getting this error:

ERROR MESSAGE: Bad input: We encountered a non-standard non-IUPAC base in the provided reference: '10'

I'm using the same reference files (.fa, .fai, .dict, etc) as for the GATK, and haven't modified them in anyway. They've never been processed by Windows software, and I can't seem to find any non-standard formatting or data (at least in a cursory look). Any thoughts?




  • SheilaSheila Broad InstituteMember, Broadie admin


    Can you tell me what version of GATK you are using and the exact command you ran for SelectVariants?

    Also, this thread may have some helpful hints:


  • THPTHP Athens, GAMember

    Hi Sheila,
    I did look at that discussion thread, and have tried a few of things suggested. I've searched for any extraneous symbols or empty lines, rebooted my reference index & .dict, and double checked my naming scheme.

    I'm using GATK v3.4-0 (-g7e26428)

    My command is:
    java -jar /usr/local/gatk/latest/GenomeAnalysisTK.jar -T SelectVariants -R FM_ref.fa -o biallelic_FM_SNPsZCL.vcf --variant raw_SNPs_FM.vcf -restrictAllelesTo BIALLELIC

    Let me know if I've done something foolish.

    Thanks for your help!


  • SheilaSheila Broad InstituteMember, Broadie admin

    Hi Tom,

    Is there any reason you are using UnifiedGenotyper over HaplotypeCaller? We recommend using HaplotypeCaller instead of UnifiedGenotyper. Can you try validating your VCF with ValidateVariants?


  • THPTHP Athens, GAMember

    Thanks for your help, Sheila.

    We've used UnifiedGenotyper…mostly due to lab culture? In this case, we're working with a tetraploid plant, and my impression was that UG dealt with ploidy better than HC (at least historically). No longer true?

    I ran ValidateVariant, as well as other filters (by allele frequency using VariantFiltration) and got the exact same error:

    ERROR MESSAGE: Bad input: We encountered a non-standard non-IUPAC base in the provided reference: '10'

    Should I rerun the GATK with HaplotypeCaller? (I realize this is likely the answer…it just took 2 weeks last time)



  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    UG dealt with ploidy better than HC

    Historically yes, no longer true. Best to re-run with HC -- sorry! But results will be better.

    Not that this will fix your reference problem, though... Do you have a backed up copy of your reference just in case this is the result of a random file corruption?

  • pdexheimerpdexheimer Member ✭✭✭✭

    I know you said that your reference hasn't been modified, but I think it must have been in some way. If you get this error with one GATK tool, you'll get it with all of them - the same code processes the reference the same way, no matter which individual analysis you run.

    What happens when you run grep -c [[:cntrl:]] FM_ref.fa? Character 10, at least in ASCII, as a non-printable control character - so I wouldn't expect a cursory look to find it

Sign In or Register to comment.