Hi i downloaded the file from GATK google cloud but it seems the file is corrupted? only chr1-chr15 sites are present.
You're correct that the file only goes to chr15; we're now looking into why that is. It seems this may have been used as part of a workaround for a technical problem at some point in the past, but I can't find this file being used anywhere in our pipelines right now.
What were you planning to use it for, @csittz?
Are you sure that the transfer completed and was not interrupted, e.g. by a network problem?
I'm sure the transfer is completed from the Google Cloud platform, the size matches. Doing a tail command or cut -f1 | sort | uniq show that it only contains sites from chr1 - chr 15.
Can you post the bucket address you're using? I'd like to check the original and I want to make sure we're talking about the same file/location.
here's the public link that i'm using.
So is this the correct link?
Any update on this?
i was trying to use it for Contamination estimation, as population file.
It seems to me that GRCh38.p11 was used to construct this file whereas GRCh38 should have been used
All of our resource files should be aligned to GRCh38 major release. Why do you think this file is aligned to a patched version?
Sorry I got this wrong, It turns out that chr15 has been skipped, chr16 replaced chr15 and that chr17 has been used twice.
@SQ SN:chr15 LN:101991189
@SQ SN:chr16 LN:90338345
@SQ SN:chr17 LN:83257441
Unfortunately, this is not a high priority for us to replace this right now. Perhaps you can download the file from the 1000Genomes website. We are hoping to make changes to the bundle after the GATK4 release.
@john156 These files are provided 'as is' and are a mirror of a few reference files from the 1000 genomes project, and no curation or documentation is being provided by the Broad. I looked at the link, and the two lines in the file are still incorrect.
A few suggestions:
1.) Individual chromosome files can be found here: http://www.internationalgenome.org/data-portal/data-collection/phase-3
2.) There is contact info for the people who generated the data in the first place, you might try contacting them.
If you have any questions about the phase3 data or any other aspect of the project please email [email protected]
Also, this may just be an issue with the header, the "##" refers to a comment line. I looked into the many discussions about this in the forum and was not able to determine if it is just the header or the entire chr16 that is in error.
So, basically use it with the caveat that it is not being maintained.