We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

Error: Input files reference and features have incompatible cont; while running GATK SelectVariants

Hi All,

While I am running SelectVariants modules for splitting SNPs ans Indels from 1000GP phase3 vcf files, it throws an error "A USER ERROR has occurred: Input files reference and features have incompatible contigs: No overlapping contigs found."

Files used for running SelectVariants module is:

1. Vcf file downloaded from http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/supporting/GRCh38_positions/ (this geneotype vcf is
lifteOver to hg38)

2. Reference file downloaded from ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/GRCh38_full_analysis_set_plus_decoy_hla.fa

3. Gatk version: GATK_4.1.2.0

Command used:

gatk SelectVariants -R /data1/ref-gen/GRCh38_full_analysis_set_plus_decoy_hla.fa -V ALL.chr22_GRCh38.genotypes.20170504.vcf.gz --select-type-to-include SNP -O outputChr22.vcf

I see the difference in 1000GP vcf file they have "<ID=1" where as dbsnps or other vcf its "<IDchr=1". Whether this is causing the problem?

1000GP VCF:
##source=1000GenomesPhase3Pipeline
##contig=<ID=1,assembly=b37,length=249250621>
##contig=<ID=2,assembly=b37,length=243199373>
##contig=<ID=3,assembly=b37,length=198022430>
##contig=<ID=4,assembly=b37,length=191154276>
##contig=<ID=5,assembly=b37,length=180915260>
##contig=<ID=6,assembly=b37,length=171115067>
##contig=<ID=7,assembly=b37,length=159138663>
##contig=<ID=8,assembly=b37,length=146364022>
##contig=<ID=9,assembly=b37,length=141213431>
------------------------------------------------------------------------------
In normal VCF they have:
##contig=<ID=chr1,length=248956422>
##contig=<ID=chr2,length=242193529>
##contig=<ID=chr3,length=198295559>
##contig=<ID=chr4,length=190214555>
##contig=<ID=chr5,length=181538259>
##contig=<ID=chr6,length=170805979>

Could any one help me how to solve this error, and exact reason for occurring this

Thanks,
Nitha

Answers

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    HI @nitha

    This error has been resolved a few times on this forum. Take a look at older threads and you should find some useful information there.

  • nithanitha indiaMember
    edited December 2019

    @bhanuGandham Thanks. I will search it.

  • nithanitha indiaMember

    Hai @bhanuGandham

    I had tried all the option provided in the forum to correct my errors, but still my problem is not fixed. In this same post, I am posting LiftOverVcf issues also, because to give an idea that I had tried many other ways to solve my problem.

    Initially, I used latest reference genome our GRCh38 for SelecteVariants it was showing error, so after reading various post from gatk and then downloaded reference genome from gatk resource bundle still error was persisting. Then later I downloaded same reference of 1kGP used for uplift phase3 vcf files, in spite of doing these, still my error is not fixed.

    As per your suggestion "This error has been resolved a few times on this forum. Take a look at older threads and you should find some useful information there." I had gone through the support provided by @Sheila, @Geraldine_VdAuwera the following links of the post which related to my error and tried to fix my error. Nothing is fixing my error.

    1) https://gatkforums.broadinstitute.org/gatk/discussion/11359/input-files-reference-and-features-have-incompatible-contigs
    
    2) https://gatkforums.broadinstitute.org/gatk/discussion/63/errors-about-input-files-having-missing-or-incompatible-contigs
    
    3) https://gatkforums.broadinstitute.org/gatk/discussion/2604/input-files-known-and-reference-have-incompatible-contigs
    
    4) https://gatkforums.broadinstitute.org/gatk/discussion/13468/baserecalibrator-a-user-error-has-occurred-input-files-reference-and-features-have-incompatible
    

    Files downloaded sources:

    1. 1000GP vcf files (liftover to hg38): http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/supporting/GRCh38_positions/
    2. 1000GP reference file: http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/%20GRCh38_reference_genome/
    
    A) Using these file I ran gatk SelectVariants (gatk SelectVariants -R /data1/ref-gen/GRCh38_full_analysis_set_plus_decoy_hla.fa -V ALL.chr22_GRCh38.genotypes.20170504.vcf.gz --select-type-to-include SNP -O 1k_Chr22_SNPs.vcf)
    
    **Note: **I tried 2 ways by giving input files compressed & decompressed 1) .vcf.gz; 2) .vcf 
    
    still i am getting error as "A USER ERROR has occurred: Input files reference and features have incompatible contigs: No overlapping contigs found."
    
    B) while running "gatk SelectVariants" for recent updated 1kGP vcf files with their own reference file "GRCh38_full_analysis_set_plus_decoy_hla.fa", getting an 
    new error as "Error initializing feature reader for path ALL.chr22.shapeit2_integrated_v1a.GRCh38.20181129.phased.vcf.gz: Caused by: htsjdk.tribble.TribbleException$MalformedFeatureFile: Unable to parse header with error: Your input file has a malformed header: VCFv4.3 is not a supported version, for input source:"
     How to solve this VCF version problem in GATK. I had seen many other post they are just asking to change the version number in vcf file "https://www.biostars.org/p/138514/", after doing this
    I observe there is no output return but in log file it show "htsjdk.tribble.TribbleException: Contig 22 does not have a length field." How to correct it?
    

    --2nd Issues LitfoverVcf--

    I used 1kGP Phase3 older VCF which is aligned to hg37 and tried in 2 ways by using ucsc reference and gatk resource bundle reference and chain liftovervcf
    was working but i get some warnings:
    
    Please explain these warning and how should clear this? Wheather these warings create problems in my results?
    
    ***********Warning**************
    INFO    2019-12-16 11:58:21     LiftOver        I**nterval 22:16050654-16063474 failed to match chain 135 because intersection length 12817 < minMatchSize 12821.0 (0.999688 < 1.0)**
    INFO    2019-12-16 11:58:31     LiftOver        Interval 22:16877156-16877157 failed to match chain 23 because intersection length 1 < minMatchSize 2.0 (0.5 < 1.0)
    ----
    ----
    INFO    2019-12-16 12:00:11     LiftoverVcf     **1651 variants failed to liftover.**
    INFO    2019-12-16 12:00:11     LiftoverVcf     **1805 variants lifted over but had mismatching reference alleles after lift over.**
    INFO    2019-12-16 12:00:11     LiftoverVcf    ** 0.3132% of variants were not successfully lifted over and **written to the output.
    INFO    2019-12-16 12:00:11     LiftoverVcf     liftover success by source contig:
    INFO    2019-12-16 12:00:11     LiftoverVcf     22: 1100091 / 1103547 (99.6868%)
    ---
    ---
    WARNING 2019-12-16 12:00:11     LiftoverVcf     1759 variants with a swapped REF/ALT were identified, but were not recovered.  See RECOVER_SWAPPED_REF_ALT and associated caveats.
    INFO    2019-12-16 12:00:13     LiftoverVcf     Writing out sorted records to final VCF.
    *********************************
    

    Please guide me to fix this errors.

    Thanks!

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @nitha

  • nithanitha indiaMember
    edited December 2019

    @bhanuGandham , Thank you. As you see my previous question, I had given this link which I have seen and had tried the two possibilities 1. used reference files from gatk resource bundle and ran select variants 2) Tried using same reference files used by 1000GP files. But still the same error persists ** "Caused by: htsjdk.tribble.TribbleException$InvalidHeader: Your input file has a malformed header: VCFv4.3 is not a supported version" **

    So, Now I am using 1KGP updated phase3 files http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/supporting/GRCh38_positions/. As you said not to to worry about warning while running LiftOverVcf module, for time being I will proceed with old phase3 VCF files ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ and I will liftover to hg38.

    Post edited by nitha on
Sign In or Register to comment.