Hi i downloaded the file from GATK google cloud but it seems the file is corrupted? only chr1-chr15 sites are present.
You're correct that the file only goes to chr15; we're now looking into why that is. It seems this may have been used as part of a workaround for a technical problem at some point in the past, but I can't find this file being used anywhere in our pipelines right now.
What were you planning to use it for, @csittz?
Are you sure that the transfer completed and was not interrupted, e.g. by a network problem?
I'm sure the transfer is completed from the Google Cloud platform, the size matches. Doing a tail command or cut -f1 | sort | uniq show that it only contains sites from chr1 - chr 15.
Can you post the bucket address you're using? I'd like to check the original and I want to make sure we're talking about the same file/location.
here's the public link that i'm using.
So is this the correct link?
Any update on this?
i was trying to use it for Contamination estimation, as population file.
It seems to me that GRCh38.p11 was used to construct this file whereas GRCh38 should have been used
All of our resource files should be aligned to GRCh38 major release. Why do you think this file is aligned to a patched version?
Sorry I got this wrong, It turns out that chr15 has been skipped, chr16 replaced chr15 and that chr17 has been used twice.
@SQ SN:chr15 LN:101991189
@SQ SN:chr16 LN:90338345
@SQ SN:chr17 LN:83257441
Unfortunately, this is not a high priority for us to replace this right now. Perhaps you can download the file from the 1000Genomes website. We are hoping to make changes to the bundle after the GATK4 release.