Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
I am having a problem with picard's LiftoverVcf.
I am trying to Liftover hapmap files (downloaded plink files from hapmap and converted to vcf using plink) from ncbi36 to hg38. I was able to do this with GATK LiftoverVariants. My problem came when I had to merge the hapmap.hg38 with some genotype files (that I liftover from hg19 to hg38 using GATK LiftoverVariants). I am merging them so that I can run population stratification using plink. I used vcf-merge but it complained that a SNP has different reference allele in both files: rs3094315, should be reference allele G (which was correct in the genotype.hg38 files but in the hapmap.hg38 files it was wrong). I also first tried to lift hapmap.ncbi36 to hg19 then to hg38 but the offending allele was still there. So I decided to try and lift the hapmap.ncbi36 using LiftoverVCF from picard.
- I downloaded the newest picard build (20 hours old) picard-tools-1.138.
- Used the command: java -jar -Xmx6000m ../../../tools/picard-tools-1.138/picard.jar LiftoverVcf I=all_samples_hapmap3_r3_b36_fwd.qc.poly.tar.vcf O=all_samples_hapmap3_r3_b36_fwd.qc.poly.tar.picard.hg38.vcf C=../../../tools/liftover/chain_files/hg18ToHg38.over.chain REJECT=all_samples_hapmap3_r3_b36_fwd.qc.poly.tar.picard.hg38.reject.vcf R=../../../data/assemblies/hg38/hg38.fa VERBOSITY=ERROR
Here is the run:
[Thu Aug 13 00:43:40 CEST 2015] picard.vcf.LiftoverVcf INPUT=all_samples_hapmap3_r3_b36_fwd.qc.poly.tar.vcf OUTPUT=all_samples_hapmap3_r3_b36_fwd.qc.poly.tar.picard.hg38.vcf CHAIN=......\tools\liftover\chain_files\hg18ToHg38.over.chain REJECT=all_samples_hapmap3_r3_b36_fwd.qc.poly.tar.picard.hg38.reject.vcf REFERENCE_SEQUENCE=......\data\assemblies\hg19\assemble\hg38.fa VERBOSITY=ERROR QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json
Here is the error:
Exception in thread "main" java.lang.IllegalStateException: Allele in genotype A* not in the variant context [T*, C]
- I have no idea which SNP is the problem.
- I do not know what T* means (does not seem to exist in the file).
- I am new to picard so I thought VERBOSE=ERROR will give me something more but nothing more appeared.
- Given that lifting hapmap.ncbi36 to hg19 then to hg38 produced the same erroneous reference allele I suppose lifting will not fix this and I will have to work with dnsnp to correct my file. Do you know how I can change reference allele in a vcf? Is there a tool for this? Is there a liftover tool for dbsnp?
- As a side note I want to make picard work because I read that you will be deprecating the GATK liftover and will support the picard liftover (at some point in the future) so help with this tool will be appreciated.