It looks like you're new here. If you want to get involved, click one of these buttons!
Contents |
This script converts a VCF file from one reference build to another. It runs 3 modules within our toolkit that are necessary for lifting over a VCF.
1. LiftoverVariants walker
2. sortByRef.pl to sort the lifted-over file
3. Filter out records whose ref field no longer matches the new reference
The liftOverVCF.pl script is available in our public source repository under the 'perl' directory. Instructions for pulling down our source are available here.
./liftOverVCF.pl -vcf calls.b36.vcf \ -chain b36ToHg19.broad.over.chain \ -out calls.hg19.vcf \ -gatk /humgen/gsa-scr1/ebanks/Sting_dev -newRef /seq/references/Homo_sapiens_assembly19/v0/Homo_sapiens_assembly19 -oldRef /humgen/1kg/reference/human_b36_both -tmp /broad/shptmp [defaults to /tmp]
Running the script with no arguments will show the usage:
Usage: liftOverVCF.pl
-vcf <input vcf>
-gatk <path to gatk trunk>
-chain <chain file>
-newRef <path to new reference prefix; we will need newRef.dict, .fasta, and .fasta.fai>
-oldRef <path to old reference prefix; we will need oldRef.fasta>
-out <output vcf>
-tmp <temp file location; defaults to /tmp>
Chain files from b36/hg18 to hg19 are located here within the Broad:
/humgen/gsa-hpprojects/GATK/data/Liftover_Chain_Files/
External users can get them off our ftp site:
location: ftp.broadinstitute.org username: gsapubftp-anonymous path: Liftover_Chain_Files
Comments
Where can I get the liftOverVCF.pl script? I am not able to find the "perl" dir ftp.broadinstitute.org/distribution. Thanks.
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •It's available in our source code through github (instructions for downloading source are available on this forum).
Eric Banks, PhD -- Group Leader, Methods Development, MPG, Broad Institute of Harvard and MIT
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Got it, thanks!
- Spam
- Abuse
- Troll
0 • Off Topic 1Disagree Agree Like WTF •Is the liftOverVCF.pl doing the right thing with the arguments for LiftoverVariants? When I'm using it according to the help, it fails complaining "Input files variant and reference have incompatible contigs: Found contigs with the same name but different lengths".
I think what happens is that it passes to LiftoverVariants old version of reference genome instead of a new one (in LiftoverVariants documentation refrence genome is not listed?) and then it clashes with the dict file. Below is what gets invoked by my command:
Am I right or is something wrong with my input files?
thanks,
Ania
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Something is wrong with your input files.
Eric Banks, PhD -- Group Leader, Methods Development, MPG, Broad Institute of Harvard and MIT
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •location: ftp.broadinstitute.org username: gsapubftp-anonymous path: Liftover_Chain_Files
This path no longer exists. Is there an updated location for the Chain files? I'm looking for mm9 to mm10.
Thanks, Phil
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Hi Phil,
I just checked and it's there. If you're accessing the ftp through a shell connection, try using a gui client instead. People have reported having problems with shell access; the ftp setup is not under our control so I can only recommend working around the problem.
Geraldine Van der Auwera, PhD
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •You can also find these files in our public repository (see Downloads page for link).
Geraldine Van der Auwera, PhD
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Hi I was wondering what went wrong with my command to lift a VCF file: The script stops with a message:
Re-sorting the vcf... Unknown option: tmp
The sorting step failed. Please correct the necessary errors before retrying.
So I did sorting using vcfsorter.pl, then ran again, but still getting the same error message. could I get some advise on how to fix this issue please? Thanks, -Charles
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Can you tell me what version you are using and what is your command line?
Geraldine Van der Auwera, PhD
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •GATK version is (version 2.2-8-gec077cd) and the command line as follows:
perl liftOverVCF.pl -vcf my.vcf \ -chain Liftover_Chain_Files/b37tohg19.chain \ -out mynew.vcf \ -gatk /usr/local/GATK \ -newRef hg19_canonical \ -oldRef human_g1k_v37 \ -tmp temporary_folder
Thanks, -Charles
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •My guess is that you don't have the sortByRef.pl program in $gatk/public/perl/
Eric Banks, PhD -- Group Leader, Methods Development, MPG, Broad Institute of Harvard and MIT
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •liftOverVCF.pl and sortByRef.pl is under the same directory as gatk directory I used.
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Can you try without specifying a tmp folder?
Geraldine Van der Auwera, PhD
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •without specifying a tmp folder still gives the same error...
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •This is a weird error. I'll put this on my to-do bugfix list, but I won't be able to get to it until Tuesday at the earliest, sorry.
Geraldine Van der Auwera, PhD
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Is there a liftover chain file called 'b36ToHg19.broad.over.chain' as you see in the example? I have looked in the 'Liftover Chain Files' and dont see it.
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Some of the liftover chain files may only be accessible internally at the Broad, sorry.
Geraldine Van der Auwera, PhD
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Okay so how can I go about lifting over from b36 to hg19 in that case?
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •I'll see if we can make the other liftover files publicly available. In the meantime I'm afraid I can't offer you any solutions, sorry.
Geraldine Van der Auwera, PhD
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Are the
.chainfiles different to those provided by the UCSC genome browser, e.g. http://hgdownload.cse.ucsc.edu/goldenPath/mm9/liftOver/mm9ToMm10.over.chain.gz? If so, are you able to make the mm9 to mm10 files publicly available? Thanks- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •I believe they are the same format, so the UCSC ones should work with our tool if that's what you're asking.
Geraldine Van der Auwera, PhD
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Thanks, Geraldine. That's indeed what I was asking. It might be worth noting that in the documentation.
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •I would like to liftover from hg19 to GRCh37, but there is no chain file for this conversion on the FTP server. Is it possible to get access to such a file?
If I have understood this correctly, the difference is that hg19 uses 0-based coordinates and GRCh37 uses 1-based coordinates, as well as 'chr1' instead of '1' for the contig names. If I were to just write a script that removes all the 'chr' from the contig names and adds one to the coordinates, would that do the job correctly?
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Sure, I just added it to the ftp site (and we'll add it to the codebase on github too).
Incidentally, hg19 does NOT use 0-based coordinates. Also, your renaming proposal wouldn't work with some of the contigs (e.g. chrM -> MT).
Eric Banks, PhD -- Group Leader, Methods Development, MPG, Broad Institute of Harvard and MIT
- Spam
- Abuse
- Troll
1 • Off Topic Disagree Agree 1Like WTF •Oh right, thank you for enlightening me. :) And thanks a lot for providing the file!
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Hm, I get a strange error when I try to do the liftover:
I checked the file and line 1 does indeed start with 'chain'. I compared with the b37tohg19 chain file and the only difference is the spacing; single space in hg19tob37 and tab space in b37tohg19. I replaced single space to tab, since b37tohg19 worked for me before, but it didn't help.
Full command:
What is wrong?
EDIT:
I figured out what was wrong: the file was stored with some strange encoding, when I read the first line of 'hg19tob37.chain' in Python I got this: '\xff\xfec\x00h\x00a\x00i\x00n\x00 \x001\x00 \x00c\x00h\x00r\x001\x00 \x002\x004\x009\x002\x005\x000\x006\x002\x001\x00 \x00+\x00 \x000\x00 \x002\x004\x009\x002\x005\x000\x006\x002\x001\x00 \x001\x00 \x002\x004\x009\x002\x005\x000\x006\x002\x001\x00 \x00+\x00 \x000\x00 \x002\x004\x009\x002\x005\x000\x006\x002\x001\x00 \x001\x00\n'
And when I opened the file in GEdit (Ubuntu 12.04) it looked normal. I simply copy-pasted the contents into a new text-file and saved it in UTF-8 encoding so now the liftover works. I hope I didn't mess anything up by doing this, but so far everything seems to have worked out well.
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Thanks for the diagnosis. I have updated the file in the ftp site with a version that should work.
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •