To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at https://software.broadinstitute.org/firecloud/documentation/freecredits

Picard liftover from build 38 to 37 fails on alternate contigs

tommycarstensentommycarstensen United KingdomMember
edited November 2017 in Ask the GATK team

I am trying to lift a VCF from build 38 to build 37 using Picard liftover.

I'm using this chain file:

http://hgdownload.cse.ucsc.edu/goldenPath/hg38/liftOver/hg38ToHg19.over.chain.gz

I get this error:

ERROR   2017-11-08 22:32:01 LiftoverVcf Encountered a contig, chr9_gl000199_random that is not part of the target reference.

I am deleting these problematic contigs from the chain file (using curl -s http://hgdownload.cse.ucsc.edu/goldenPath/hg38/liftOver/hg38ToHg19.over.chain.gz | gunzip -c | sed '/random/,/^$/d' hg38ToHg19.over.chain > hg38ToHg19.over.modified.chain). Is that an OK approach?

However, now I get this error:

INFO    2017-11-08 23:35:39 LiftOver    Interval chr18:15781697-15781698 failed to match chain 1430656 because intersection length 1 < minMatchSize 2.0 (0.5 < 1.0)

I'm getting the same error as described in this thread while trying to lift over in the opposite direction.

Is it possible to achieve what I want? Thanks Sheila, Geraldine and "shlee".

Best Answer

Answers

  • shleeshlee CambridgeMember, Broadie, Moderator
    edited November 2017

    Hi again @tommycarstensen,

    First, are you using the latest Picard? LiftoverVcf got a facelift so to speak around July (v2.10.2). Looks like version 2.14.1 (released two days ago) has additional performance improvements.

    That being said, I have seen a similar error with v2.10.2+ Picard:

    ERROR   2017-08-11 14:01:57 LiftoverVcf Encountered a contig, 1 that is not part of the target reference.
    

    Yes, this has to do with contig incompatibilities.

    The INFO line on failed to match chain informs you of regions that the tool could not liftover based on the CHAIN you provided. It's not clear to me, did your run that gives

    INFO    2017-11-08 23:35:39 LiftOver    Interval chr18:15781697-15781698 failed to match chain 1430656 because intersection length 1 < minMatchSize 2.0 (0.5 < 1.0)
    

    complete? Variants for these regions should be in your REJECT file.

    If it didn't complete, what was the exact command you ran? Did you change any parameters?

    Soo Hee

    P.S. Contigs not in the target reference seems like something our tool should handle. It's possible we've only been forward thinking. Do you have an expectation on how these should be handled so you don't have to go through the trouble of a sed command? We can put in a feature request in the Picard repo. I'm aware that the ALT contigs represent variation that is hard to represent in a VCF record. However, chr9_gl000199_random is a primary assembly contig and I should hope this could be handled.

  • shleeshlee CambridgeMember, Broadie, Moderator
  • tommycarstensentommycarstensen United KingdomMember

    @shlee said:
    First, are you using the latest Picard? LiftoverVcf got a facelift so to speak around July (v2.10.2). Looks like version 2.14.1 (released two days ago) has additional performance improvements.

    Thanks Soo Hee @shlee! I did indeed use version2.14.1:

    wget https://github.com/broadinstitute/picard/releases/download/2.14.1/picard.jar
    

    @shlee said:
    The INFO line on failed to match chain informs you of regions that the tool could not liftover based on the CHAIN you provided. It's not clear to me, did your run [...] complete? Variants for these regions should be in your REJECT file.

    The job fails. Some variants are written to the REJECT file, but the O output file is empty.

    @shlee said:
    If it didn't complete, what was the exact command you ran? Did you change any parameters?

    I also tried lifting some 1000G build 37 files over to build 38. That job completed, but all files were written to the REJECT file. Here is my command in each case:

    $java -jar picard.jar LiftoverVcf \
         I=$I \
         O=$O \
         CHAIN=$CHAIN \
         REJECT=$REJECT \
         R=$R \
    

    Case 1 (38 to 37):
    $I generated with GATK GenotypeGVCFs3.8
    $CHAIN generated with the sed command above.
    $R=Homo_sapiens.GRCh38_full_analysis_set_plus_decoy_hla.fa

    Case 2 (37 to 38):
    $I downloaded from ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/supporting/hd_genotype_chip/ALL.chip.omni_broad_sanger_combined.20140818.snps.genotypes.vcf.gz
    $CHAIN downloaded from http://hgdownload.cse.ucsc.edu/goldenPath/hg19/liftOver/hg19ToHg38.over.chain.gz
    $R=human_g1k_v37.fasta

    @shlee said:
    P.S. Contigs not in the target reference seems like something our tool should handle. It's possible we've only been forward thinking. Do you have an expectation on how these should be handled so you don't have to go through the trouble of a sed command? We can put in a feature request in the Picard repo. I'm aware that the ALT contigs represent variation that is hard to represent in a VCF record. However, chr9_gl000199_random is a primary assembly contig and I should hope this could be handled.

    I don't do lift-overs frequently (and I'm guessing other people don't either) and I'm not really an expert on the reference genome. I don't mind the sed command at all. I previously lifted over the 1000G VCF mentioned above using the UCSC liftover tool after conversion to PLINK binary ped and then converted back to VCF. But that caused REF/ALT errors according to a 1000G team member, which are currently being fixed with the fixref plugin of bcftools. It would be great to have a liftover tool working on VCF files. It will probably be relevant despite infrequent usage.

  • tommycarstensentommycarstensen United KingdomMember
    edited November 2017

    Thanks @shlee! I got my hands on this chain file:

    ftp://ftp.ensembl.org/pub/assembly_mapping/homo_sapiens/GRCh37_to_GRCh38.chain.gz
    

    Now I get this error message, which I don't quite know how to interpret:

    Runtime.totalMemory()=11312562176
    To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
    Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String index out of range: 198022930
        at java.lang.String.checkBounds(String.java:385)
        at java.lang.String.<init>(String.java:324)
        at htsjdk.samtools.util.StringUtil.bytesToString(StringUtil.java:301)
        at picard.vcf.LiftoverVcf.tryToAddVariant(LiftoverVcf.java:362)
        at picard.vcf.LiftoverVcf.doWork(LiftoverVcf.java:306)
        at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:268)
        at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:98)
        at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:108)
    

    I tried java -Xmx128000m, but it still fails. Possibly it's still an issue with the chain and/or reference. I noticed the rejected output file being populated, whereas the lifted over output file remains empty.

    If you got it to work, then I should be able to get it to work. I'll post my solution here. I just need to work on something else for a few weeks.

    Thanks!

    OK, I just checked the input file. It contains this variant. Not sure what is wrong with it. It's towards the end of the q arm of the chromosome. It's probably still an issue of not having the right chain and reference file.

    3   198018337   SNP3-199229605  G   A   .   PASS    AC=66;AF=0.014;AN=4556;set=Intersection
    
    Post edited by tommycarstensen on
  • shleeshlee CambridgeMember, Broadie, Moderator

    @tommycarstensen,

    I hope you figure it out. One thing you could do is try to rule out whether it is the input VCF versus the chain file by trying other chains or the online NCBI remap service. I actually opened up a few chain files with a text editor to see if the naming nomenclature match expectations.

    Let us know how it goes.

Sign In or Register to comment.