The current GATK version is 3.2-2

Bug Bulletin: The recent 3.2 release fixes many issues. If you run into a problem, please try the latest version before posting a bug report, as your problem may already have been solved.

# How can I access the GSA public FTP server?

We make various files available for public download from the GSA FTP server, such as the GATK resource bundle and presentation slides. We also maintain a public upload feature for processing bug reports from users.

location: ftp.broadinstitute.org


Geraldine Van der Auwera, PhD

Hi Geraldine,

I have checked all the main directories on ftp.broadinstitute.org, but I cannot find the bundle anymore. Can you tell me exactly where it is?

Thanks Eva

Hi Eva, there should be a bundle/ directory right at the root of the FTP server.

Geraldine Van der Auwera, PhD

Hi Geraldine,

I also can not find the bundle/ directory.

Thanks

Make sure you are connecting as user gsapubftp-anonymous (for downloads). If you connect as user gsapubftp (which is for uploads), you will not see the bundle.

Geraldine Van der Auwera, PhD

I also cannot find the bundle/ directory, and there is no prompt to connect using a username and password. Thanks for any help.

@Ick, I'm not sure what you mean by "no prompt". What program are you using?

Geraldine Van der Auwera, PhD

I'm just using Internet Explorer 8.0, which I've often used in the past to download components of the resource bundle.

If I enter the address as ftp://ftp.broadinstitute.org/, it brings up a list of directories and files (distribution/, incoming/, outgoing/, ftp, pub and welcome.msg). I don't know how to ask it to let me log in using a username/password. Is there another way I should be doing this? Thanks!

I would recommend using a separate program like FileZilla, which will make it much easier for you to set up and manage your file transfers.

Geraldine Van der Auwera, PhD

Thanks, I'll do that.

Hi Geraldine, thanks for the explanation. I found the bundle now. I think our confusion came from the fact that until one or two weeks ago one could access the bundle with an internet browser simply by clicking on ftp://ftp.broadinstitute.org/

Seems like the reference ucsc.hg19.fasta in hg19 folder is not sorted correctly. It throws an error when I ran the program.

Are you sure it's not your bam that is mis-sorted? That seems the more likely explanation (unless you can show us otherwise)...

Eric Banks, PhD -- Senior Group Leader, MPG Analysis, Broad Institute of Harvard and MIT

Hi Eric, does it need to be sorted in the order of 1,2,3,4,...X,Y,MT as described in the introduction?

No, the official ordering for UCSC is a little different: chrM, chr1, chr2, ..., chrX, chrY - which is exactly the ordering in the fasta in our bundle.

Eric Banks, PhD -- Senior Group Leader, MPG Analysis, Broad Institute of Harvard and MIT

I see. Thanks!

Hi, I was wondering if there's a way to do this from the command line. I tried ftp but the cluster I'm logged in to doesn't allow it.

Or to grab all the bundle/1.5/ files in unix:

wget -r ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/1.5/*

Sure -- just keep in mind that when we update the bundle, you'll want to change the version number, and you might not know you need to do that unless you look at what's in the bundle.

Geraldine Van der Auwera, PhD

#!/bin/sh

RESOURCE="hg19"

list=$(curl -silent -l ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/) highest=1 for e in$list
do
if [ "$e" > "$highest" ]
then highest=$e fi done VERSION=$highest

echo "Downloading bundle version: "$VERSION echo "Of the resource: "$RESOURCE

mkdir broad_bundle_${RESOURCE}_v${VERSION}
cd broad_bundle_${RESOURCE}_v${VERSION}

wget --no-directories --user=gsapubftp-anonymous -r ftp://ftp.broadinstitute.org/bundle/$VERSION/$RESOURCE/

for fn in *.gz
do
md5ftp=$(cat$fn.md5 | awk '{print $1}') md5loc=$(md5sum $fn | awk '{print$1}')
if [ $md5ftp ==$md5loc ]
then
echo "md5 PASS"
else
echo "md5 FAIL"
exit 1
fi
gzip -d $fn rm$fn.md5
echo $fn" finished." done  It requires that curl, gzip, md5sum and wget are installed. To download a bundle other than hg19, change the RESOURCE variable. • Posts: 2Member Better (and last) version of the script above: #!/bin/sh RESOURCE="hg19" list=$(curl -silent -l ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/)

VERSION=$(echo$list | tr " " "\n" | sort -V | tail -n1)

echo "Downloading bundle version: "$VERSION echo "Of the resource: "$RESOURCE

mkdir broad_bundle_${RESOURCE}_v${VERSION}
cd broad_bundle_${RESOURCE}_v${VERSION}

wget --no-directories --user=gsapubftp-anonymous -r ftp://ftp.broadinstitute.org/bundle/$VERSION/$RESOURCE/

for fn in *.gz
do
md5ftp=$(cat$fn.md5 | awk '{print $1}') md5loc=$(md5sum $fn | awk '{print$1}')
if [ $md5ftp ==$md5loc ]
then
echo "md5 PASS"
else
echo "md5 FAIL"
exit 1
fi
gzip -d $fn rm$fn.md5
echo $fn" finished." done  • Posts: 12Member Thanks! It works perfectly. Cheers, Fernando • Posts: 4Member Hi Geraldine, Would it be possible that, in the future, the resource bundle could be made accessible using rsync? This will shrink that data transfer load when updating.... • Posts: 5,893Administrator, GATK Developer admin Hi Didier, That's an interesting idea; we don't have the resources to work on that right now but I'll try to get that on our TODO list of improvements for the future. Can't guarantee when that'll get done though, it could be a while -- that TODO list is already pretty long! Geraldine Van der Auwera, PhD • Posts: 4Member edited December 2012 I am sure if you tell your network administrator that you can shrink the network data transfer load, this will climb up your list pretty fast ;-). The idea is to 1) Create a link called "latest" to the bundle's latest version directory 2) gzip the data with the parameter : --rsyncable Make rsync-friendly archive 3) install and allow rsync connection The users have to keep the gz version of the files. If the users are interested in versioning, they can make a copy of the bundle files before rsyncing. Post edited by didiercroes on • Posts: 5,893Administrator, GATK Developer admin Hah, I'm sure it will climb high on the network admin's list; but it's my todo list I'm worried about ;) Geraldine Van der Auwera, PhD • Posts: 11Member Can I set read/write permissions on files I upload to the ftp server? For instance if there is a file I want to share just with the GATK team. • Posts: 5,893Administrator, GATK Developer admin Hi there, It's not necessary; anything you upload using the upload-specific login gets put in a directory that only we can take data from. Anybody else can only browse the directory and see filenames, but they cannot download the files. So your data will be protected. Geraldine Van der Auwera, PhD • Posts: 1Member It appears that I can't do an "ls" on the download server. Viz: alanmac:~ alan$ ftp gsapubftp-anonymous@ftp.broadinstitute.org

220 ProFTPD 1.3.3g Server (Broad Institute of MIT and Harvard) [69.173.80.251]

230 Anonymous access granted, restrictions apply

Remote system type is UNIX.

Using binary mode to transfer files.

ftp> ls

421 Service not available, remote server timed out. Connection closed.

257 "/" is the current directory

ftp>

Am I doing something wrong?

The server has some restrictions on what you can and can't do in a session which aren't under our control. We recommend you simply use a GUI client to access it and you'll be fine.

Geraldine Van der Auwera, PhD

Hi I'm getting the following error. There should be no password right???

       => ftp.broadinstitute.org/bundle/hg19'


Logging in as gsapubftp-anonymous ...

Thanks,
MC

MC

As mentioned previously, the ftp server doesn't respond well to shell access, so we recommend you simply use a GUI client to access it. Sorry for the inconvenience.

Geraldine Van der Auwera, PhD

Thanks for the clarification Geraldine!

Thanks,
MC

MC

I did use the gsapubftp-anonymous username but I don't see a bundle directory. However, I found distribution/gsa/gatk_resources.tgz. So I ended up this wget ftp://ftp.broadinstitute.org/distribution/gsa/gatk_resources.tgz It's downloading now, but Is it correct?

No, what you want is the bundle directory. It should be at the root of the dir you end up in when you log on.

Geraldine Van der Auwera, PhD

I am getting multiple failed transfers (>=43) and the rest successful. The reason for failed transfers is listed as "Incorrect password". Are there some files there that aren't meant to be downloadable or is something wrong?

Edit/Update: I selected the failed transfers and shoved them back into queue, second time around they worked. (╯°□°)╯︵ ┻━┻

Our FTP server is a little temperamental, sorry... (loving the text meme btw!)

Geraldine Van der Auwera, PhD

Hi! I can't connect to ftp server. Well.. i can but i lost the connection after the first command. I would like to download the latest version of GATK-Lite, there is another way to do it? Thanks in advance, cheers!

Geraldine Van der Auwera, PhD

• Posts: 23Member

Thank you Geraldine. I'm compiling using 'ant' and I have configurated build.xml build "public" scripts only.

And I see

--2013-03-06 17:07:07-- ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/2.3/b37/Mills_and_1000G_gold_standard.indels.b37.vcf.gz => Mills_and_1000G_gold_standard.indels.b37.vcf.gz' Resolving ftp.broadinstitute.org... 69.173.80.251 Connecting to ftp.broadinstitute.org|69.173.80.251|:21... connected. Logging in as gsapubftp-anonymous ... Logged in! ==> SYST ... done. ==> PWD ... done. ==> TYPE I ... done. ==> CWD /bundle/2.3/b37 ... done. ==> SIZE Mills_and_1000G_gold_standard.indels.b37.vcf.gz ... 19868212 ==> PASV ... done. ==> RETR Mills_and_1000G_gold_standard.indels.b37.vcf.gz ... done. Mills_and_1000G_gold_standard.indels.b37.vcf.gz: Permission denied

Using a graphical client from my desktop, I seem to be able to download files, but with the above command I get 'permission denied'. What I am I doing wrong? I'd like to download the larger files directly to our compute cluster so using the graphical client is not a good option for anything except testing.

Thanks for any help!

Unfortunately right now you cannot bypass using a GUI client. This is not under our direct control; I will try to look into a better solution with our IT infrastructure folks but I can't guarantee a solution will be forthcoming. Sorry for the inconvenience.

Geraldine Van der Auwera, PhD

Hi @Clare

Quick follow-up on your question -- can you confirm that you have write permissions on the destination directory (local) from which you're opening the connection? Just want to rule out a client-side error.

Geraldine Van der Auwera, PhD

Hi Geraldine,

I could not connect to GSA FTP server. Could you check whether the server is shutdown or not?

Hi Geraldine

I can't access the ftp download server. I have tried cyberduck, filezilla, chrome browser, terminal. At all of themI get the following error.

Status: Resolving address of ftp.broadinstitute.org Status: Connecting to 69.173.80.251:21... Status: Connection established, waiting for welcome message... Response: 220 ProFTPD 1.3.3g Server (Broad Institute of MIT and Harvard) [69.173.80.251] Command: USER gsapubftp-anonymous Response: 331 Anonymous login ok, send your complete email address as your password Command: PASS ************** Response: 530 Login incorrect. Error: Critical error Error: Could not connect to server

Doesn't matter if I leave password blank as in the instructions on this site or if I put in my email address.

Any help?

Thanks
TS

TS

Hi folks,

We are experiencing issues with the FTP server. We're in touch with our IT support and trying to get service resumed as quickly as possible. We're very sorry for any inconvenience this may cause you.

Geraldine Van der Auwera, PhD

UPDATE: the FTP issues have been resolved and it is working normally again. Thanks for your patience!

Geraldine Van der Auwera, PhD

Hi,

I am trying to download hg19.fasta and dbsnp137 from the ftp server. After it downloads around 180mb, the connection is closed by sever stating 600sec idle timeout. I am using filezilla. I have tried multiple time but same issue continues. What could be the problem?

• Posts: 19Member

Hello,

The bundle directory holds example fasta files as well as hg19 and hg18 files. But what are the 'b' directories? They look like the same files as hg19 and hg18.

Joe White MEEI joseph_white@meei.harvard.edu

Hi Joe,

The b directories contain the Broad versions of the human genome reference. They're very similar but the chromosome names don't have the "chr" part prepended, and there are a few sequence differences. What's really important is to always use the resources that were generated with the same reference as was used to align your data. Most of our resources derive from b37 so if you haven't aligned your reads yet I'd recommend using b37, at least if you want to take advantage of our resources.

Geraldine Van der Auwera, PhD

Can you please post the full error message that you get?

Geraldine Van der Auwera, PhD

Thank you for the prompt reply. I get the following error:

Response: Proxy reply: HTTP/1.0 200 Connection established Response: 150 Opening BINARY mode data connection for ucsc.hg19.fasta.gz (948729977 bytes) Response: 421 Idle timeout (600 seconds): closing control connection Error: Connection closed by server

Response: 227 Entering Passive Mode (69,173,80,251,247,131). Command: REST 170561368 Response: 502 Command REST not allowed by policy. Error: File transfer failed Status: Starting download of /bundle/2.3/hg19/dbsnp_137.hg19.vcf.gz Error: Connection closed by server Error: File transfer failed after transferring 180,294,824 bytes in 900 seconds

This looks like a firewall issue, I'll ask our IT department to look into it. You may need to consult your own IT support as well, to ask if they can do anything about this part of the error: "502 Command REST not allowed by policy"

Geraldine Van der Auwera, PhD

Hi @ashwinipatil, can you try setting your FTP client to active mode? In Filezilla you can find this setting in the Preferences. See screenshot attached.

Geraldine Van der Auwera, PhD

Hello Geraldine,

I'm trying to remotely FTP to ftp.broadinstitute.org, via an ssh into a remote machine cluster.

The bundle is to be used on this cluster, and I'd like to save a lot of time on down-then-upload (the cluster is in a different country from where I am) by FTPing directly to BROAD, from the machine I'm ssh'ed into. It would save me days!

Do you think that terminal command "ftp ftp.broadinstitute.org" should work? I get a connection refused. Normal ftp from my own computer (using filezilla) works fine, of course.

Post edited by LaurentTellier on

Hi Laurent,

We've had reports that terminal FTP does not work for many users. We are looking into a cloud-based solution to replace the FTP, but for now I'm afraid you'll need to use a GUI such as FileZilla.

Geraldine Van der Auwera, PhD

• Posts: 3Member

FYI I just had success using the command line program wget like this:

I first used the GUI tool fetch to log in and browse the FTP directory, then right clicked on the file I wanted and selected "copy fetch address" which gave me the above ftp address with the associated username. Works fine with wget which is a very easy-to-install command line tool.

@Geraldine_VdAuwera said: Hi Laurent,

We've had reports that terminal FTP does not work for many users. We are looking into a cloud-based solution to replace the FTP, but for now I'm afraid you'll need to use a GUI such as FileZilla.

Hi, I'm trying to upload a bug report but I seem not to be able to transfer the file. I used FileZilla as recommended, and checked it is in active mode. I checked the usernam/password multiple times, but the connection itself is successfully established. However, I get an "access denied" when trying to transfer the file

Status:    Resolving address of ftp.broadinstitute.org
Status: Connecting to 69.173.80.251:21...
Status: Connection established, waiting for welcome message...
Response:   220 ProFTPD 1.3.3g Server (Broad Institute of MIT and Harvard) [69.173.80.251]
Command:    USER gsapubftp
Response:   331 Password required for gsapubftp
Command:    PASS ********
Response:   230 Anonymous access granted, restrictions apply
Command:    OPTS UTF8 ON
Response:   550 Access is denied.
Status: Connected
Command:    CWD /
Response:   250 CWD command successful
Command:    PWD
Response:   257 "/" is the current directory
Status: Retrieving directory listing...
Command:    TYPE I
Response:   200 Type set to I
Command:    PASV
Response:   227 Entering Passive Mode (69,173,80,251,246,127).
Command:    MLSD
Response:   550 Access is denied.
Command:    SIZE recalibrator_startOverStop.tar.gz
Response:   550 Access is denied.
Command:    PASV
Response:   227 Entering Passive Mode (69,173,80,251,239,42).
Command:    STOR recalibrator_startOverStop.tar.gz
Response:   550 Access is denied.
Error:  Critical file transfer error

Any suggestions? thanks!

Sounds like a firewall issue -- have you tried uploading from a different network, eg from home?

Geraldine Van der Auwera, PhD

Identify target regions for realignment (Genome Analysis Toolkit) ->Realign BAM to get better Indel calling (Genome Analysis Toolkit) ->Reindex the realigned BAM (SAM Tools) ->Call Indels (Genome Analysis Toolkit) ->Call SNPs (Genome Analysis Toolkit)->View aligned reads in BAM/BAI (Integrated Genome Viewer)

Please let me know if this looks correct or not. The VCF files from the 1kG and the DBSNP are already there in compressed form in the bundle repository of the GATK website which I am currently downloading and I can use them directly after unzipping them.

Hi @vivekdas,

For the record we do prefer that you do not repost the same comment content. If you have additional information/questions to post, that is fine, but the parts that are repeated should be left out. Keep in mind also that we respond to all comments and questions, but we are mainly active during business hours (Monday-Friday between 9 am - 5 pm US. Eastern Standard Time) so if you post outside of those hours, you may not get an answer for a little while.

Regarding the bundle, you should get the latest version (2.5 presently). You don't need to put it in the same directory as the GATP program, as long as you pass in the location correctly in your command line. If you have any trouble with this you should ask your IT department for help.

In the workflow you outlined, you are missing the base recalibration step. Also, you don't need to reindex the bam file after realignment; GATK will automatically generate an up-to-date index file for every bam or vcf file it produces. Additionally, you can call SNPs and Indels at the same time by specifying -glm BOTH with UnifiedGenotyper; or HaplotypeCaller will automatically call both at the same time. Finally, you should also plan to filter variants, either using hard-filtering or variant recalibration. Again, see the Best Practices documentation for more details.

Geraldine Van der Auwera, PhD

• Posts: 33Member

Hi,

I wasn't able to connect to the FTP server using Filezilla, not even after forcing "Active Mode".

This is the error I get:

Response:   220 ProFTPD 1.3.3g Server (Broad Institute of MIT and Harvard) [69.173.80.251]
Command:    USER gsapubftp-anonymous
Command:    PASS
Error:  Critical error
Error:  Could not connect to server


Thanks, Carlos

Did you type something in the password field? To clarify, you should leave that field blank, i.e. not type anything there.

Geraldine Van der Auwera, PhD

• Posts: 33Member

A day after my comment here the FTP server connection started working without me changing anything as far as I can tell. I believe this was a temporary issue. Please disregard.

Thanks, Carlos

Great, I love problems that go away on their own :)

Geraldine Van der Auwera, PhD

Hello there - I go through the following most times so perhaps it will help some: