Attention:
The frontline support team will be slow on the forum because we are occupied with the GATK Workshop on March 21st and 22nd 2019. We will be back and more available to answer questions on the forum on March 25th 2019.

LiftoverVcf no output

Hi!

I'm using LiftoverVcf to make a hg38 version of the gnomad file from the GATK resource bundle
The gnomad file is in b37 version, as there isn't a chain file for b37 -> hg38, I have to do b37 -> hg19, and then hg19 -> hg38 (even if I'm not very enthusiast to do so).
Anyway, I performed the first "conversion step" with the following command line:

java -Xmx8g -jar picard.jar LiftoverVcf 
I=~/af-only-gnomad.raw.sites.b37.vcf.gz 
O=~/af-only-gnomad.raw.sites_LIFTOVERhg19.vcf 
CHAIN=~/b37tohg19.chain 
REJECT=~/af-only-gnomad.raw.sites_LIFTrejected19.vcf 
R=~/hg19_M_rCRS.fasta

The result was an empty VCF file, I thought that something went wrong and looked at the "rejected" file, but it was empty as well...it only contained the Header (I don't even know if it is normal that the header is rejected)

the log on the terminal was this:

INFO    2017-12-06 13:20:39 LiftoverVcf Loading up the target reference genome.
INFO    2017-12-06 13:20:45 LiftoverVcf Lifting variants over and sorting.
INFO    2017-12-06 13:20:51 LiftoverVcf read     1.000.000 records.  Elapsed time: 00:00:05s.  Time for last 1.000.000:    5s.  Last read position: 1:9.514.996
INFO    2017-12-06 13:20:55 LiftoverVcf read     2.000.000 records.  Elapsed time: 00:00:09s.  Time for last 1.000.000:    4s.  Last read position: 1:19.801.042
[...]
INFO    2017-12-06 13:37:29 LiftoverVcf read   267.000.000 records.  Elapsed time: 00:16:44s.  Time for last 1.000.000:    3s.  Last read position: X:131.552.026
INFO    2017-12-06 13:37:33 LiftoverVcf read   268.000.000 records.  Elapsed time: 00:16:48s.  Time for last 1.000.000:    3s.  Last read position: X:146.780.876
INFO    2017-12-06 13:37:36 LiftoverVcf Processed 268545311 variants.
INFO    2017-12-06 13:37:36 LiftoverVcf 0 variants failed to liftover.
INFO    2017-12-06 13:37:36 LiftoverVcf 0 variants lifted over but had mismatching reference alleles after lift over.
INFO    2017-12-06 13:37:36 LiftoverVcf 0,0000% of variants were not successfully lifted over and written to the output.
INFO    2017-12-06 13:37:36 LiftoverVcf Writing out sorted records to final VCF.
[Wed Dec 06 13:37:36 CET 2017] picard.vcf.LiftoverVcf done. Elapsed time: 16,96 minutes.
Runtime.totalMemory()=7299137536
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" java.lang.NumberFormatException: For input string: "30,35"
    at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043)
    at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110)
    at java.lang.Double.parseDouble(Double.java:538)
    at java.lang.Double.valueOf(Double.java:502)
    at htsjdk.variant.vcf.AbstractVCFCodec.parseQual(AbstractVCFCodec.java:518)
    at htsjdk.variant.vcf.AbstractVCFCodec.parseVCFLine(AbstractVCFCodec.java:322)
    at htsjdk.variant.vcf.AbstractVCFCodec.decodeLine(AbstractVCFCodec.java:284)
    at htsjdk.variant.vcf.AbstractVCFCodec.decode(AbstractVCFCodec.java:262)
    at htsjdk.variant.vcf.VCFRecordCodec.decode(VCFRecordCodec.java:53)
    at htsjdk.variant.vcf.VCFRecordCodec.decode(VCFRecordCodec.java:18)
    at htsjdk.samtools.util.SortingCollection$FileRecordIterator.advance(SortingCollection.java:497)
    at htsjdk.samtools.util.SortingCollection$FileRecordIterator.<init>(SortingCollection.java:469)
    at htsjdk.samtools.util.SortingCollection$MergingIterator.<init>(SortingCollection.java:407)
    at htsjdk.samtools.util.SortingCollection.iterator(SortingCollection.java:273)
    at picard.vcf.LiftoverVcf.doWork(LiftoverVcf.java:329)
    at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:268)
    at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:98)
    at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:108)

According to it, it seems that 0 variants failed to liftover, but they weren't printed in the output file.
I also don't actually know what the last lines of the log mean...I'm talking about "Exception in thread etc. etc. ..."

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @Puputnik
    Hi,

    I am not sure if this will make a difference, but can you try unzipping your input VCF?

    Thanks,
    Sheila

  • PuputnikPuputnik ItalyMember

    hi
    @Sheila

    Unfortunately it still doesn't work even with unzipped VCF

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @Puputnik
    Hi,

    What happens if you run ValidateVariants on your input VCF?

    -Sheila

  • Hi @Puputnik and @Sheila

    Did you solve this issue? I would love to get my hands on a hg38-version of gnomAD as I am currently calling somatic variants on matched tumor/normal samples using Mutect2 GATK4 Beta. We have up until now used hg38 as reference, but as you know Mutect2 can utilize gnomAD as --germline_resource.

    Do you happen to have any insights on using b37 with the gnomAD resource in Mutect2 versus not using it at all and sticking with hg38? It is probably 'just' a trade-off. Intuition tells me the newest build is overall better to use, but please, let me know if I am overestimating the significance of this.

    Thank you for your help!
    -Nanna

  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭
    edited December 2017

    Hi @nanna,

    You can liftover to GRCh38 the sites-only gnomAD file that we provide for GRCh37 in the GATK Resource bundle. It is under beta>Mutect2. Similarly, you'll find the common germline resource file (GRCh37) for contamination estimation under beta>GetPileupSummaries.

    I have taken some time to explore liftovers so that I can generate GRCh38 equivalent files for use in writing a Mutect2 tutorial. However, I use these for educational purposes (to demonstrate software tool features) and need not go through the vetting that those in the research field should be subjecting their data transformations. All that I can offer you is to share my experience in lifting over.

    I have also asked the gnomAD folks when we can expect GRCh38 resources. They are also using lifted over resources as of now. Perhaps those in the research community could ask the 1000 Genomes Project and gnomAD folks directly for a small equivalent germline resource, e.g. generated by reverting and realigning (to GRCh38) some 1000-2500 samples.

    Sorry we cannot be more helpful.

  • nannananna Member

    @shlee
    No worries, thank for for your elaborate response.

  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭

    You're welcome @nanna. Fyi, I hear that there will soon be GRCh38 calls from the gnomAD folks. Stay tuned!

Sign In or Register to comment.