Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Resource files are missing

Hi, I am trying to download the index files for gold standard indels from the resource package, but found that both the b36 and corresponding b37 versions are empty. The complete link to one of these files is:
ftp://[email protected]/bundle/2.8/b36/Mills_and_1000G_gold_standard.indels.b36.vcf.idx.gz. Can someone please provide me with both index files? I need them for my analyses.

Thanks,
Siddharth Prakash, UT-Houston

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    You can regenerate the index files from the original vcf yourself. In fact, the GATK will do this for you automatically if the index file is not provided for a vcf.

  • sprakashsprakash Member

    I tried to do so but GATK does not automatically regenerate the index file. Instread, I receive the following error: "I/O error loading or writing tribble index file for Z:\gold_indels.b36.vcf." What do I need to do to regenerate the index file?

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hmm, are you sure your VCF file is valid? This suggests it might be corrupted.

  • sprakashsprakash Member

    It's your own .vcf file from the resource package. I'm running the RealignerTargetCreator command suggested in your Best Practices, like this: java -jar GATK.jar -T RealignerTargetCreator -R human_b36.fasta -known gold_indels.b36.vcf -I GVD_170.89235.bam -o target_intervals.txt
    The .fasta and indels file were renamed from the files in the b36 resources package. Any suggestions?

  • tommycarstensentommycarstensen United KingdomMember ✭✭✭

    Are md5 sums available? Have you checked them? Have you checked the file size is correct? Have you done a tail on the file? Have you validated the file in other ways? Do you have write permission to your disk?

  • sprakashsprakash Member

    Here is the link to one of the resource files I'm talking about:
    ftp://[email protected]/bundle/2.8/b36/Mills_and_1000G_gold_standard.indels.b36.vcf.idx.gz When I download this file, the archive is empty. Can you explain?

  • pdexheimerpdexheimer Member ✭✭✭✭

    I don't have any problems with it:

    $ curl ftp://[email protected]/bundle/2.8/b36/Mills_and_1000G_gold_standard.indels.b36.vcf.idx.gz -o Mills_and_1000G_gold_standard.indels.b36.vcf.idx.gz % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 411k 100 411k 0 0 70218 0 0:00:06 0:00:06 --:--:-- 70224 $ ls -oh Mills_and_1000G_gold_standard.indels.b36.vcf.idx.gz -rw-r--r-- 1 dexzb9 412K Mar 30 16:22 Mills_and_1000G_gold_standard.indels.b36.vcf.idx.gz $ gzip -dc Mills_and_1000G_gold_standard.indels.b36.vcf.idx.gz | wc -c 1434349 $ gzip -t Mills_and_1000G_gold_standard.indels.b36.vcf.idx.gz $

  • sprakashsprakash Member

    Here's what happened when I tried this:

    [email protected] /cygdrive/z
    $ ls -oh Mills_and_1000G_gold_standard.indels.b36.vcf.idx.gz
    -rw-r--r-- 1 sprakash 1.5K Mar 30 15:27 Mills_and_1000G_gold_standard.indels.b36.vcf.idx.gz

    [email protected] /cygdrive/z
    $ gzip -dc Mills_and_1000G_gold_standard.indels.b36.vcf.idx.gz | wc -c

    gzip: Mills_and_1000G_gold_standard.indels.b36.vcf.idx.gz: not in gzip format
    0

    [email protected] /cygdrive/z
    $ gzip -t Mills_and_1000G_gold_standard.indels.b36.vcf.idx.gz

    gzip: Mills_and_1000G_gold_standard.indels.b36.vcf.idx.gz: not in gzip format

  • pdexheimerpdexheimer Member ✭✭✭✭

    That's very different from being an empty archive. That's a corrupt download - your download got interrupted for some reason, before it even got very far. I believe that there are md5s in the bundle directory that you can use to validate your downloads

  • sprakashsprakash Member

    I don't understand why there are no problems with the other 4 files I downloaded from the same directory at the same time. I will try "curling" it after I install in cygwin.

  • pdexheimerpdexheimer Member ✭✭✭✭

    <shrug> Sometimes downloads fail. I'd probably try grabbing it the same way a second or third time before I went to the trouble of installing a new download client

  • sprakashsprakash Member

    I have tried 10 times, same result.

Sign In or Register to comment.