The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Get notifications!


You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

Did you remember to?


1. Search using the upper-right search box, e.g. using the error message.
2. Try the latest version of tools.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

Did we ask for a bug report?


Then follow instructions in Article#1894.

Formatting tip!


Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ``` ) each to make a code block as demonstrated here.

Jump to another community
Picard 2.9.0 is now available. Download and read release notes here.
GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.

Basic question about CombineVariants and the ESP 6500 exomes

myoglumyoglu Member Posts: 39

Hi!
Sorry for my ignorance, but the ESP files contains 24 vcd in total. I tried to include them all on CombineVariants, but gets an error.
I have tried to merge the 24 file to one, both with vcftools and picard, failing with both..

Somebody knows?

Thank you!

Best Answer

Answers

  • myoglumyoglu Member Posts: 39

    ERROR MESSAGE: Input files ESP6500SI-V2-SSA137.dbSNP138-rsIDs.snps_indels.vcf/ESP6500SI-V2-SSA137.updatedRsIds.chr1.snps_indels.vcf and reference have incompatible contigs: No overlapping contigs found.

  • myoglumyoglu Member Posts: 39

    I figured it out!

    Use this:

    perl -pe 's/^([^#])/chr\1/' file.vcf > out.vcf

    This adds "chr". My ref is chr1 etc. And the ESP was only 1 etc.

  • myoglumyoglu Member Posts: 39

    Sorry for lots of comments, but wired errors on some:

    ERROR MESSAGE: Sequence name contains invalid character: chr13

    This is true for chr14, ch16 and a lot more..

    And this one:

    ERROR GenomeAnalysisTK-2.7-2-g6bda569/ESP6500SI-V2-SSA137.dbSNP138-rsIDs.snps_indels.vcf/ESP6500SI-V2-SSA137.updatedRsIds.chr10.snps_indels_FIX.vcf contigs = [chr10]
    ERROR reference contigs = [chrM, chr1, chr2, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chr10,

    And this one:

    ERROR MESSAGE: Line 32: there aren't enough columns for line chr15 (we expected 9 tokens, and saw 1 )

    Also true for chr10..

  • myoglumyoglu Member Posts: 39

    @Geraldine_VdAuwera said:
    myoglu, when the contig names are different it's a strong indication that the variants and alignments were generated using different reference builds. There can be important differences between reference builds that will affect your results. You can only use the solution you posted if you are absolutely sure that the references are equivalent.

    I have googled for hours now, seams many are struggling with the same.

    The reference builds are the same, should be ok.

    I had to manually recheck all chromosome names, replacing i.e "chr11" with the EXACT SAME THING (chr11).
    Sometimes also manually fix wired spacing errors. I also had to replace all "A" with "A" in the GATK output file (from variant annotator). Very wired, they are the same, but still not found by the "find" function in text edit.

    Finally I ran VCFtools validation getting these errors for every of the 24 ESP files:

    i.e.

    The header tag 'reference' not present. (Not required but highly recommended.)
    The header tag 'contig' not present for CHROM=chrY. (Not required but highly recommended.)

    So, it looks ok, but still I get the famous error when running CombineVariants:

    ERROR MESSAGE: Invalid command line: No tribble type was provided on the command line and the type of the file could not be determined dynamically. Please add an explicit type tag :NAME listing the correct type from among the supported types:

    So, I checked the VCF version of the ESP files, they are VCF4.1, should be good.. But Im stuck.

    What is wrong with these ESP files?

    Sorry to bother the GATK team, but does any of you other users know this? Ive seem more people struggling.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie Posts: 11,669 admin

    This sounds like a file encoding error -- are you using a Windows PC, or did you get the files from someone using Windows? We've seen encoding issues pop up when files are written or edited on a Windows machine.

    Geraldine Van der Auwera, PhD

  • myoglumyoglu Member Posts: 39

    No, only OSX mountain lion all the way.

    I finally figured it out:

    It was something off the the VariantAnnotator vcf from GATK, I re-ran it and used the new file, I also deleted the old index file.

    The manual correction of the ESP VCF works! Its a bit boring, but it does the job.

    Hope this help anyone else getting this error!

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie Posts: 11,669 admin

    Hmm, that's weird, but let's put it down to unspecified system glitch. Thanks for reporting your solution.

    Geraldine Van der Auwera, PhD

Sign In or Register to comment.