Basic question about CombineVariants and the ESP 6500 exomes

Sorry for my ignorance, but the ESP files contains 24 vcd in total. I tried to include them all on CombineVariants, but gets an error.
I have tried to merge the 24 file to one, both with vcftools and picard, failing with both..

Somebody knows?

Thank you!


Best Answer


  • myoglumyoglu Member

    ERROR MESSAGE: Input files ESP6500SI-V2-SSA137.dbSNP138-rsIDs.snps_indels.vcf/ESP6500SI-V2-SSA137.updatedRsIds.chr1.snps_indels.vcf and reference have incompatible contigs: No overlapping contigs found.

  • myoglumyoglu Member

    I figured it out!

    Use this:

    perl -pe 's/^([^#])/chr\1/' file.vcf > out.vcf

    This adds "chr". My ref is chr1 etc. And the ESP was only 1 etc.

  • myoglumyoglu Member

    Sorry for lots of comments, but wired errors on some:

    ERROR MESSAGE: Sequence name contains invalid character: chr13

    This is true for chr14, ch16 and a lot more..

    And this one:

    ERROR GenomeAnalysisTK-2.7-2-g6bda569/ESP6500SI-V2-SSA137.dbSNP138-rsIDs.snps_indels.vcf/ESP6500SI-V2-SSA137.updatedRsIds.chr10.snps_indels_FIX.vcf contigs = [chr10]
    ERROR reference contigs = [chrM, chr1, chr2, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chr10,

    And this one:

    ERROR MESSAGE: Line 32: there aren't enough columns for line chr15 (we expected 9 tokens, and saw 1 )

    Also true for chr10..

  • myoglumyoglu Member

    @Geraldine_VdAuwera said:
    myoglu, when the contig names are different it's a strong indication that the variants and alignments were generated using different reference builds. There can be important differences between reference builds that will affect your results. You can only use the solution you posted if you are absolutely sure that the references are equivalent.

    I have googled for hours now, seams many are struggling with the same.

    The reference builds are the same, should be ok.

    I had to manually recheck all chromosome names, replacing i.e "chr11" with the EXACT SAME THING (chr11).
    Sometimes also manually fix wired spacing errors. I also had to replace all "A" with "A" in the GATK output file (from variant annotator). Very wired, they are the same, but still not found by the "find" function in text edit.

    Finally I ran VCFtools validation getting these errors for every of the 24 ESP files:


    The header tag 'reference' not present. (Not required but highly recommended.)
    The header tag 'contig' not present for CHROM=chrY. (Not required but highly recommended.)

    So, it looks ok, but still I get the famous error when running CombineVariants:

    ERROR MESSAGE: Invalid command line: No tribble type was provided on the command line and the type of the file could not be determined dynamically. Please add an explicit type tag :NAME listing the correct type from among the supported types:

    So, I checked the VCF version of the ESP files, they are VCF4.1, should be good.. But Im stuck.

    What is wrong with these ESP files?

    Sorry to bother the GATK team, but does any of you other users know this? Ive seem more people struggling.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    This sounds like a file encoding error -- are you using a Windows PC, or did you get the files from someone using Windows? We've seen encoding issues pop up when files are written or edited on a Windows machine.

  • myoglumyoglu Member

    No, only OSX mountain lion all the way.

    I finally figured it out:

    It was something off the the VariantAnnotator vcf from GATK, I re-ran it and used the new file, I also deleted the old index file.

    The manual correction of the ESP VCF works! Its a bit boring, but it does the job.

    Hope this help anyone else getting this error!

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hmm, that's weird, but let's put it down to unspecified system glitch. Thanks for reporting your solution.

Sign In or Register to comment.