When the bundle of supportive files are going to be available in ftp site for hg38 release.

sourav8888sourav8888 Member
edited August 2014 in Ask the GATK team

Hi,
It is becoming more and more difficult for us who are analyzing data in a transition period. After hg38 public release all the databases are updated with recent assembly. Now some people like us who are using GATK best practices facing a real problem to match our data with the databases. My basic question is when the supprotive files for hg38 are going to be available in your ftp site as a bundle.

I may have missed, but can you please give me information about the release version of GRCh37 (p13 or what) available in 2.5 and 2.8 bundle.
Thanks in advance.

Tagged:

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi there,

    It's difficult for us too -- releasing a new version of the bundle for the latest resources takes a lot of validation work. I will ask the devs where we are at in that process, but you must understand that we have other priorities that demand our attention, so we cannot deliver this quickly.

  • HjorleifurHjorleifur Landspítalinn, ReykjavíkMember

    Hi, I am facing the same issue as sourav8888. Is it possible to get an estimation on when the bundle might be updated for hg38? Are we talking about weeks or what?

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    @Hjorleifur, I expect the timeline will be on the scale of months, not weeks. We just cannot devote any resources to that at this time, sorry.

  • bioSGbioSG Member

    I've been trying to "convert" latest hg19 bundle with liftover to hg38. It's getting more and more difficult, when I think I had a well formed/filtered vcf then I get issues like this:

    ERROR MESSAGE: The provided VCF file is malformed at approximately line number 10590: Duplicate allele added to VariantContext: G

    I've been trying to find a tool to filter out "not validating" positions from VCF but I just can't find an automated way to do this.

  • dmyersturnbulldmyersturnbull Stanford UniversityMember

    @bioSG said:
    I've been trying to "convert" latest hg19 bundle with liftover to hg38. It's getting more and more difficult, when I think I had a well formed/filtered vcf then I get issues like this:

    ERROR MESSAGE: The provided VCF file is malformed at approximately line number 10590: Duplicate allele added to VariantContext: G

    I've been trying to find a tool to filter out "not validating" positions from VCF but I just can't find an automated way to do this.

    I had the same issue after running CrossMap. It just means that the reference allele is also one of the variants, which is caused by an update to the reference allele in GRCh38/hg38. To solve it, I just wrote a script to remove a variant allele if it matches the ref allele; you don't even need to filter these lines out.

    When running CrossMap, I also had the problem that it removed the prefix 'chr' on my chromosome names, so I had to re-add those.

Sign In or Register to comment.