(howto) Install all software packages required to follow the GATK Best Practices.

Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,423Administrator, GATK Developer admin
edited September 15 in Tutorials

Objective

Install all software packages required to follow the GATK Best Practices.

Prerequisites

To follow these instructions, you will need to have a basic understanding of the meaning of the following words and command-line operations. If you are unfamiliar with any of the following, you should consult a more experienced colleague or your systems administrator if you have one. There are also many good online tutorials you can use to learn the necessary notions.

  • Basic Unix environment commands
  • Binary / Executable
  • Compiling a binary
  • Adding a binary to your path
  • Command-line shell, terminal or console
  • Software library

You will also need to have access to an ANSI compliant C++ compiler and the tools needed for normal compilations (make, shell, the standard library, tar, gunzip). These tools are usually pre-installed on Linux/Unix systems. On MacOS X, you may need to install the MacOS Xcode tools. See https://developer.apple.com/xcode/ for relevant information and software downloads.

Starting with version 2.6, the GATK requires Java Runtime Environment version 1.7. All Linux/Unix and MacOS X systems should have a JRE pre-installed, but the version may vary. To test your Java version, run the following command in the shell:

java -version 

This should return a message along the lines of ”java version 1.7.0_25” as well as some details on the Runtime Environment (JRE) and Virtual Machine (VM). If you have a version other than 1.7.x, be aware that you may run into trouble with some of the more advanced features of the Picard and GATK tools. The simplest solution is to install an additional JRE and specify which you want to use at the command-line. To find out how to do so, you should seek help from your systems administrator.

Software packages

  1. BWA
  2. SAMtools
  3. HTSlib (optional)
  4. Picard
  5. Genome Analysis Toolkit (GATK)
  6. IGV
  7. RStudio IDE and R libraries ggplot2 and gsalib

1. BWA

Read the overview of the BWA software on the BWA project homepage, then download the latest version of the software package.

  • Installation

Unpack the tar file using:

tar xvzf bwa-0.7.5a.tar.bz2 

This will produce a directory called bwa-0.7.5a containing the files necessary to compile the BWA binary. Move to this directory and compile using:

cd bwa-0.7.5a
make

The compiled binary is called bwa. You should find it within the same folder (bwa-0.7.5a in this example). You may also find other compiled binaries; at time of writing, a second binary called bwamem-lite is also included. You can disregard this file for now. Finally, just add the BWA binary to your path to make it available on the command line. This completes the installation process.

  • Testing

Open a shell and run:

bwa 

This should print out some version and author information as well as a list of commands. As the Usage line states, to use BWA you will always build your command lines like this:

bwa <command> [options] 

This means you first make the call to the binary (bwa), then you specify which command (method) you wish to use (e.g. index) then any options (i.e. arguments such as input files or parameters) used by the program to perform that command.


2. SAMtools

Read the overview of the SAMtools software on the SAMtools project homepage, then download the latest version of the software package.

  • Installation

Unpack the tar file using:

tar xvzf samtools-0.1.19.tar.bz2 

This will produce a directory called samtools-0.1.19 containing the files necessary to compile the SAMtools binary. Move to this directory and compile using:

cd samtools-0.1.19 
make 

The compiled binary is called samtools. You should find it within the same folder (samtools-0.1.19 in this example). Finally, add the SAMtools binary to your path to make it available on the command line. This completes the installation process.

  • Testing

Open a shell and run:

samtools 

This should print out some version information as well as a list of commands. As the Usage line states, to use SAMtools you will always build your command lines like this:

samtools <command> [options] 

This means you first make the call to the binary (samtools), then you specify which command (method) you wish to use (e.g. index) then any options (i.e. arguments such as input files or parameters) used by the program to perform that command. This is the same convention as used by BWA.


3. HTSlib (optional)

Read the overview of the HTSlib software on the HTSlib project homepage, then download the latest version of the software package.

  • Installation

Unpack the tar file using:

tar xjf htslib-master.zip 

This will produce a directory called htslib-master containing the files necessary to compile the HTSlib binary. Move to this directory and compile using:

cd htslib-master 
make 

The compiled binary is called htscmd. You should find it within the same folder (htslib-master in this example). Finally, add the HTSlib binary to your path to make it available on the command line. This completes the installation process.

  • Testing

Open a shell and run:

htscmd 

This should print out some version information as well as a list of commands. As the Usage line states, to use HTSlib you will always build your command lines like this:

htscmd <command> [options] 

This means you first make the call to the binary (htscmd), then you specify which command (method) you wish to use (e.g. index) then any options (i.e. arguments such as input files or parameters) used by the program to perform that command. This is the same convention as used by BWA and SAMtools.


4. Picard

Read the overview of the Picard software on the Picard project homepage, then download the latest version of the software package.

  • Installation

Unpack the zip file using:

tar xjf picard-tools-1.94.zip 

This will produce a directory called picard-tools-1.94 containing the Picard jar files. Picard tools are distributed as pre-compiled Java executables (jar files) so there is no need to compile them. Finally, add the Picard directory to your path to make the tools available on the command line. This completes the installation process.

  • Testing

Open a shell and run:

java -jar AddOrReplaceReadGroups.jar -h 

This should print out some version and usage information about the AddOrReplaceReadGroups.jar tool. At this point you will have noticed an important difference between BWA and Picard tools. To use BWA, we called on the BWA program and specified which of its internal tools we wanted to apply. To use Picard, we called on Java itself as the main program, then specified which jar file to use, knowing that one jar file = one tool. This applies to all Picard tools; to use them you will always build your command lines like this:

java -jar <ToolName.jar> [options] 

Next we will see that GATK tools are called in yet another way. The reasons for how tools in a given software package are organized and invoked are largely due to the preferences of the software developers. They generally do not reflect strict technical requirements, although they can have an effect on speed and efficiency.


5. Genome Analysis Toolkit (GATK)

Hopefully if you're reading this, you're already acquainted with the purpose of the GATK, so go ahead and download the latest version of the software package.

In order to access the downloads, you need to register for a free account on the GATK support forum. You will also need to read and accept the license agreement before downloading the GATK software package. Note that if you intend to use the GATK for commercial purposes, you will need to purchase a license from our commercial partner, Appistry. See Appistry's GATK FAQ page for an overview of the commercial licensing conditions.

  • Installation

Unpack the tar file using:

tar xjf GenomeAnalysisTK-2.6-4.tar.bz2 

This will produce a directory called GenomeAnalysisTK-2.6-4-g3e5ff60 containing the GATK jar file, which is called GenomeAnalysisTK.jar, as well as a directory of example files called resources. GATK tools are distributed as a single pre-compiled Java executable so there is no need to compile them. Finally, add the GATK directory to your path to make the tools available on the command line. This completes the installation process.

  • Testing

Open a shell and run:

java -jar GenomeAnalysisTK.jar -h 

This should print out some version and usage information, as well as a list of the tools included in the GATK. As the Usage line states, to use GATK you will always build your command lines like this:

java -jar GenomeAnalysisTK.jar -T <ToolName> [arguments] 

This means you first make the call to Java itself as the main program, then specify the GenomeAnalysisTK.jar file, then specify which tool you want, and finally you pass whatever other arguments (input files, parameters etc.) are needed for the analysis.

So this way of calling the program and selecting which tool to run is a little like a hybrid of how we called BWA and how we called Picard tools. To put it another way, if BWA is a standalone game device that comes preloaded with several games, Picard tools are individual game cartridges that plug into the Java console, and GATK is a single cartridge that also plugs into the Java console but contains many games.


6. IGV

The Integrated Genomics Viewer is a genome browser that allows you to view BAM, VCF and other genomic file information in context. It has a graphical user interface that is very easy to use, and can be downloaded for free (though registration is required) from this website.


7. RStudio IDE and R libraries ggplot2 and gsalib

Download the latest version of RStudio IDE. The webpage should automatically detect what platform you are running on and recommend the version most suitable for your system.

  • Installation

Follow the installation instructions provided. Binaries are provided for all major platforms; typically they just need to be placed in your Applications (or Programs) directory. Open RStudio and type the following command in the console window:

install.packages("ggplot2") 

This will download and install the ggplot2 library as well as any other library packages that ggplot2 depends on for its operation. Note that some users have reported having to install one additional package themselves, called reshape, which you can do as follows:

install.packages("reshape")

Finally, do the same thing to install the gsalib library:

install.packages("gsalib")

This will download and install the gsalib library.

Important note

If you are using a recent version of ggplot2 and a version of GATK older than 3.2, you may encounter an error when trying to generate the BQSR or VQSR recalibration plots. This is because until recently our scripts were still using an older version of certain ggplot2 functions. This has been fixed in GATK 3.2, so you should either upgrade your version of GATK (recommended) or downgrade your version of ggplot2. If you experience further issues generating the BQSR recalibration plots, please see this tutorial.

Post edited by Geraldine_VdAuwera on

Geraldine Van der Auwera, PhD

Comments

  • haseleyhaseley Broad InstitutePosts: 1Member
    edited July 2013

    Hello,

    I'm having an issue getting picard tools configured to work in any directory. I've downloaded and unpacked the picard zip file and added the picard-tools-1.94 directory to my path, however when I run:

    java -jar AddOrReplaceReadGroups.jar -h
    

    I get the following error: Error: Unable to access jarfile AddOrReplaceReadGroups.jar

    The command works if I am in the picard-tools-1.94 directory, making me think that something is wrong with my path variable but when I echo my path variable and copy the relevant path directly into a cd command I move to the correct directory (so there are no typos) and the command works (so I should be adding the correct directory). Any suggestions? Here is the value of my PATH variable:

    bash:tin:~ 53 $ echo $PATH
    /idi/hunglabusers/GenomeAnalysisTK-2.6-4-g3e5ff60/:/idi/hunglabusers/SalmonellaRNAseq/picard/picard-tools-1.94/:/idi/hunglabusers/GATK_workshop/htslib-master/:/broad/software/free/Linux/redhat_5_x86_64/pkgs/oracle-java-jdk_1.7.0-17_x86_64/bin:/broad/software/free/Linux/redhat_5_x86_64/pkgs/bwa_0.7.4:/broad/software/free/Linux/redhat_5_x86_64/pkgs/samtools/samtools_0.1.19/bin:/home/unix/haseley/bin:/home/unix/haseley/bin:/broad/tools/NoArch/pkgs/local:/usr/lib64/qt-3.3/bin:/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/usr/X11R6/bin
    

    The relevant path is the second one listed. Below is the version information for java:

    bash:node1379:~ 53 $ java -version
    java version "1.7.0_17"
    Java(TM) SE Runtime Environment (build 1.7.0_17-b02)
    Java HotSpot(TM) 64-Bit Server VM (build 23.7-b01, mixed mode)
    

    Thanks!

    Nathan

    Post edited by Geraldine_VdAuwera on
  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,423Administrator, GATK Developer admin

    What I do is set up an environment variable that points to the directory where the jars live, so I can so something like $picardDir/AddOrReplaceReadGroups.jar

    Geraldine Van der Auwera, PhD

  • briankweinerbriankweiner Broad Institute of MIT & HarvardPosts: 2Member

    I'm having some trouble with the very last bit when trying to install the gaslb2 in R. If you are using the most recent version of R 3.0.1 for Mac OS X then you'll receive the following error:

    install.packages("gsalib2") Warning in install.packages : package ‘gsalib2’ is not available (for R version 3.0.1)

    You can however install "gsalib" if that will work.

  • briankweinerbriankweiner Broad Institute of MIT & HarvardPosts: 2Member

    Also, for any other Mac users who are frustrated with the inability of the command line interface to recognize the most recent Java install, you can correct this problem by going to this webpage: http://stackoverflow.com/questions/12757558/installed-java-7-on-mac-os-x-but-terminal-is-still-using-version-6

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,423Administrator, GATK Developer admin

    @briankweiner, thanks for pointing out the gsalib2 typo -- I've corrected the name of the gsalib library in the article.

    And thanks for linking to that article! It is sure to be helpful for others.

    Geraldine Van der Auwera, PhD

  • sourav8888sourav8888 Posts: 3Member
    edited July 2013

    Hi. I am new to GATK. While installing everything went well except installation of ggplot2 and gsalib. I am getting error msg as : Warning in install.packages : unable to connect to 'cran.rstudio.com' on port 80.

    Is it a problem of network only or something else I have to do. Thanks in advance.

    Post edited by sourav8888 on
  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,423Administrator, GATK Developer admin

    Hi @sourav8888, that is a network error that has nothing to do with GATK, so we can't help you with that. You should ask for help from a colleague or your IT department.

    Geraldine Van der Auwera, PhD

  • sourav8888sourav8888 Posts: 3Member

    Thanks a lot Geraldine_VdAuwera. Yes I will contact them.

  • JPCJPC Posts: 2Member
    edited October 2013

    Hello, when I make htslib I don't get a htscmd binary in /htslib-master, lookign back through the install i see the following;

    vcfcheck.c:703:39: warning: format specifies type 'long' but the argument has type 'uint64_t' (aka 'unsigned long long') [-Wformat] printf("\t%ld\t%f\n", stats->dp.vals[i], stats->dp.vals[i]*100./sum); ~~~ ^~~~~~~~~~~~~~~~~ %llu 1 warning generated.

    and at the end; > clang: error: linker command failed with exit code 1 (use -v to see invocation) make: *** [htscmd] Error 1 >

    I don't understand the error sorry

    JPC

    Post edited by JPC on
  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,423Administrator, GATK Developer admin

    I'm sorry, neither do I -- I recommend you contact the makers of htslib; they will be better able to help you.

    Geraldine Van der Auwera, PhD

  • psanchez820psanchez820 Mexico CityPosts: 2Member

    JPC:

    I also get the same error, did you manage to get an answer for this?

    @JPC said: Hello, when I make htslib I don't get a htscmd binary in /htslib-master, lookign back through the install i see the following; JPC

    Thanks!

  • alastair_kerralastair_kerr Posts: 3Member

    R-dependencies: I found that all the following were needed to run the later AnalyzeCovariates example: ggplot2, gplots, reshape, grid, tools, gsalib

    It would be useful if these were noted here

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,423Administrator, GATK Developer admin

    Hi Alastair, we only list ggplot2 and gsalib because the rest are dependencies of ggplot2 and should get installed automatically when you install ggplot2.

    Geraldine Van der Auwera, PhD

  • alastair_kerralastair_kerr Posts: 3Member

    Hi Geraldine, this was not the case in my install, perhaps because ggplot2 had been installed on my system for a few years. It took me a while to figure out the problem and I would save others such inconvenience if the full list were included.

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,423Administrator, GATK Developer admin

    The problem is that we would then have to update the dependencies every time the developer of another library changes their package, and that's just too much burden on us. As it is now, it is your responsibility to keep your software up to date. If you have some software that has been installed for several years, one of the first things you should think of if you run into problems is to update everything.

    Geraldine Van der Auwera, PhD

  • alastair_kerralastair_kerr Posts: 3Member

    Sorry I was not clear. The libraries were completely up to date, they just did not have the additional packages as dependencies.

  • mkasiedumkasiedu Mayo Clinic, Rochester, MinnesotaPosts: 2Member

    @Geraldine_VdAuwera said: What I do is set up an environment variable that points to the directory where the jars live, so I can so something like $picardDir/AddOrReplaceReadGroups.jar

    Geraldine, I am having the same problem as haseley above but I am not sure how to set up the envirinment variable using the information you provided above. I am new to linux. Can you send me a command line to run. Do I have to run "$picardDir/AddOrReplaceReadGroups.jar" before running "java -jar AddOrReplaceReadGroups.jar -h "

    this is the java version [michael@asl158 ~]$ java -version java version "1.7.0_45" OpenJDK Runtime Environment (fedora-2.4.3.0.fc19-x86_64 u45-b15) OpenJDK 64-Bit Server VM (build 24.45-b08, mixed mode)

    and the PATH is [michael@asl158 ~]$ echo $PATH /usr/lib64/qt-3.3/bin:/usr/lib64/ccache:/usr/local/bin:/usr/bin:/bin:/usr/local/sbin:/usr/sbin:/home/michael/Downloads/htslib-master:/home/michael/Downloads/vcftools_0.1.11:/home/michael/Downloads/samtools-0.1.19:/home/michael/Downloads/bwa-0.7.5a:/home/michael/Downloads/picard-tools-1.105:/home/michael/Downloads/GenomeAnalysisTK-2.8-1-g932cd3a:/home/michael/.local/bin:/home/michael/bin:/home/michael/Downloads/htslib-master:/home/michael/Downloads/vcftools_0.1.11:/home/michael/Downloads/samtools-0.1.19:/home/michael/Downloads/bwa-0.7.5a:/home/michael/Downloads/picard-tools-1.105:/home/michael/Downloads/GenomeAnalysisTK-2.8-1-g932cd3a

    Thanks,

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,423Administrator, GATK Developer admin

    Hi @mkasiedu,

    I recommend you look for an online tutorial that covers working with environment variables. There are many fine tutorials for Linux beginners, and I think this will be more useful to you in the long run than giving you a set of commands to run. Good luck!

    Geraldine Van der Auwera, PhD

  • adaywilladaywill BostonPosts: 4Member

    Hi,

    The most current branch of htslib is the develop branch. Is there a reason to install the master branch that looks like it has been stopped being developed and merged into the develop branch?

    Thanks,

    Aaron

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,423Administrator, GATK Developer admin

    Hi @adaywill,

    That's a fair point, but we've only tested the "master" package (in keeping with the usual Earth-logic software naming convention; not sure what the htslib devs are doing merging master into develop...), so proceed with "develop" at your own risk.

    Geraldine Van der Auwera, PhD

  • mkasiedumkasiedu Mayo Clinic, Rochester, MinnesotaPosts: 2Member

    @haseley said: Hello,

    I'm having an issue getting picard tools configured to work in any directory. I've downloaded and unpacked the picard zip file and added the picard-tools-1.94 directory to my path, however when I run:

    java -jar AddOrReplaceReadGroups.jar -h
    

    I get the following error: Error: Unable to access jarfile AddOrReplaceReadGroups.jar

    The command works if I am in the picard-tools-1.94 directory, making me think that something is wrong with my path variable but when I echo my path variable and copy the relevant path directly into a cd command I move to the correct directory (so there are no typos) and the command works (so I should be adding the correct directory). Any suggestions? Here is the value of my PATH variable:

    bash:tin:~ 53 $ echo $PATH
    /idi/hunglabusers/GenomeAnalysisTK-2.6-4-g3e5ff60/:/idi/hunglabusers/SalmonellaRNAseq/picard/picard-tools-1.94/:/idi/hunglabusers/GATK_workshop/htslib-master/:/broad/software/free/Linux/redhat_5_x86_64/pkgs/oracle-java-jdk_1.7.0-17_x86_64/bin:/broad/software/free/Linux/redhat_5_x86_64/pkgs/bwa_0.7.4:/broad/software/free/Linux/redhat_5_x86_64/pkgs/samtools/samtools_0.1.19/bin:/home/unix/haseley/bin:/home/unix/haseley/bin:/broad/tools/NoArch/pkgs/local:/usr/lib64/qt-3.3/bin:/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/usr/X11R6/bin
    

    The relevant path is the second one listed. Below is the version information for java:

    bash:node1379:~ 53 $ java -version
    java version "1.7.0_17"
    Java(TM) SE Runtime Environment (build 1.7.0_17-b02)
    Java HotSpot(TM) 64-Bit Server VM (build 23.7-b01, mixed mode)
    

    Thanks!

    Nathan

    Hi Nathan, Were you able to resolve the java issue with picard? I am having the same problem and not making any progress resolving it. Will appreciate the help.

    Thanks,

    Michael

  • virshuvirshu USAPosts: 3Member

    Hi,

    I am sysadmin helping our scientists set up GATK environment. I am following the instructions in this post, and some of them cause doubts - at least on Linux (don't know much about Mac).

    Step 4 (Picard installations) tells to "add the Picard directory to your path to make the tools available on the command line". As somebody else already mentioned, this doesn't make sense - jar invocation doesn't use PATH to find jar file. So, java -jar AddOrReplaceReadGroups.jar -h doesn't work, unless AddOrReplaceReadGroups.jar is in the current directory. I assume, GATK itself relies on Picard archives; the typical way is to add jars to the CLASSPATH. Should I add all of them? That doesn't seem right? Could you please correct the instructions.

    Step 5. The trivial invocation (with -h flag) works; however, any ToolName throws an error that the tool is not found. I don't know if it's related to Picard jars missing, or there is some other reason - but some verification that goes beyond just -h would be very helpful.

    Step 6. Most of our Linux servers don't have GUI installed. Scientists use R Studio Server for all their R development and modelling needs. However, you recommend installing R Studio IDE, which is a client-based software. Do you want me to install it on the server? It won't work without X. Or you want the scientists to install it on their workstations (many of them have it already). Then I would need some instructions as to how to integrate such client installation with GATK. Or I misunderstood this whole step?

    Thanks.

  • pdexheimerpdexheimer Posts: 360Member, GSA Collaborator ✭✭✭

    Hi @virshu -

    I can try to help with a couple of these.

    For Picard, we define an environment variable PICARD_HOME, and then invoke with java -jar $PICARD_HOME/AddOrReplaceReadGroups.jar. Actually, we do the same thing for GATK.

    For GATK, I assume that you tried tool names that should exist (like PrintReads or UnifiedGenotyper)? If you build the jars yourself, it's possible to mess things up and not compile in the tools, but the downloadable jar for distribution (at least v2.8-1) doesn't have this problem.

    You don't need an R IDE, I suspect that the recommendation was made just for ease of installation. Just make sure that Rscript is on the path and that the ggplot2 package (and all dependencies) is installed

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,423Administrator, GATK Developer admin

    Hi there,

    Step 4: Apologies for the confusion; what it means is what @pdexheimer outlines: create an environment variable to use as shortcut to the directory where you store the jars. We typically have several versions on the same machine so it's easier to control what we're using that way rather than using classpaths. We'll try to clarify the doc.

    Step 5: Not sure what you mean -- could you please post the command line you tried that didn't work, and what result or error message you got?

    Step 6: The IDE is not required, it's just a recommendation for people who don't already work with R, as it can help make installing the libraries easier for them.

    Geraldine Van der Auwera, PhD

  • virshuvirshu USAPosts: 3Member

    Wow, thanks for such speedy reply!

    Step 4: I assume that GATK needs to somehow know where Picard jar files are located, right? So, whether I use directory name or environment variable - it doesn't let GATK know about it. Or there is no dependency, and the scientist is supposed to invoke Picard jars independently of GATK? Then I don't have any questions.

    Step 5: As I said, I am sysadmin (although hanging around the scientists for a long time). And our lawyers asked scientists to stay out of the system while they are finalizing contract. In short, I don't have the tools "that should exist". That's exactly my question - can somebody suggest a command (beyond just help screen) that should work.

    I didn't build jars myself; as Step 5 instructs, the jars are pre-built in the download... The goal is to package Amazon AMIs for the scientists, and I want to make sure that all the pieces work correctly before I start packaging.

    Thanks again...

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,423Administrator, GATK Developer admin

    We're here to help :)

    Step 4: there is no direct dependency; GATK does not make calls to Picard, if that's what you mean. We just ask users to get Picard because there is some of data preprocessing that needs to be done with Picard before the data can be input to GATK.

    Step 5: Oh I see. Well you could run one of the simple analysis tools on the example data that is provided with the download (if I remember correctly, in the resources subdirectory). E.g. you would do:

    java -jar GenomeAnalysisTK.jar -T CountReads -R exampleFASTA.fasta -I exampleBAM.bam
    

    Let me know if you have any trouble with that.

    Geraldine Van der Auwera, PhD

  • virshuvirshu USAPosts: 3Member

    YES! Thank you so much (on both steps)! Step 4: that certainly clarifies. and Step 5: The results are much more comforting than just help screen! I got "CountReads - CountReads counted 33 reads in the traversal" and "0 reads were filtered out during the traversal" which looks really great! Thanks a lot

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,423Administrator, GATK Developer admin

    You're welcome! We'll look into providing some more helpful quick-start examples along those lines.

    Geraldine Van der Auwera, PhD

  • adaywilladaywill BostonPosts: 4Member

    @Geraldine_VdAuwera said: Hi adaywill,

    That's a fair point, but we've only tested the "master" package (in keeping with the usual Earth-logic software naming convention; not sure what the htslib devs are doing merging master into develop...), so proceed with "develop" at your own risk.

    Hi Geraldine,

    Thanks. With the newer version of htslib you can just recompile samtools with the new htslib library and all the functionality is available from samtools.

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,423Administrator, GATK Developer admin

    Oh, that is very cool. Thanks for reporting back on this, thanks @adaywill

    Geraldine Van der Auwera, PhD

  • FabriceBesnardFabriceBesnard ParisPosts: 23Member

    Dear Geraldine,

    I'm getting some troubles with setting the path of the picard tools... I have the same issue as Nathan and Mickaël: it works if I move in the "picard" dir, but not from somewhere else.

    I red forums explaining what an environment variable is. As you suggested, I modified my .bashrc as follows: -I created an environmental variable " picardDir": export picardDir="$HOME/picard.tools-1.110" -I added it to my path: export PATH="$PATH:$picardDir" By typing "env" I could verify that both environmental variable "picardDir" and "PATH" were modified correctly.

    then I run: java -jar $picardDir/AddOrRpeplaceReadGroups.jar -h (tested also with quotes: java -jar "$picardDir/AddOrRpeplaceReadGroups.jar" -h) However, I still get the same error message: Error: Unable to access jarfile /home/fabrice/picard-tools-1.110/AddOrRpeplaceReadGroups.jar And if I try to type the absolute path in the command rather that my "picardDir" environment variable: java -jar /home/fabrice/picard-tools-1.110/AddOrRpeplaceReadGroups.jar -h I get the same error message.

    So would you know what I am doing wrong ?

    Thanks a lot for your help, Fabrice

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,423Administrator, GATK Developer admin

    Hi Fabrice,

    There's a typo in the name of the program you're trying to call... an extra 'p' after the 'R'

    Geraldine Van der Auwera, PhD

  • FabriceBesnardFabriceBesnard ParisPosts: 23Member

    Hey,

    I am trying to install GATK and all all required packages on a MacOS 10.8.5 I installed Xtools (+ command lines) 5.1.

    I successfully installed bwa and samtools, but I failed compiling htslib-master. I get this error message: 5 warnings generated. gcc -c -g -Wall -Wc++-compat -O2 -Ihtslib vcfnorm.c -o vcfnorm.o gcc -c -g -Wall -Wc++-compat -O2 -Ihtslib vcfgtcheck.c -o vcfgtcheck.o gcc -g -Wall -Wc++-compat -O2 -o htscmd main.o samview.o vcfview.o bamidx.o bcfidx.o bamshuf.o bam2fq.o tabix.o abreak.o bam2bed.o vcfcheck.o vcfisec.o vcfmerge.o vcfquery.o vcffilter.o vcfnorm.o vcfgtcheck.o -Lhtslib -lhts -lpthread -lz -lm Undefined symbols for architecture x86_64: "_bcf_gt_type", referenced from: _main_vcfcheck in vcfcheck.o ld: symbol(s) not found for architecture x86_64 clang: error: linker command failed with exit code 1 (use -v to see invocation) make: *** [htscmd] Error 1

    would you have any idea how to help me fixing this issue ?

    Thanks a lot, Fabrice

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,423Administrator, GATK Developer admin

    Hi Fabrice,

    I'm not sure -- htslib is under active development and is maintained by others (not us) so I can't really comment on compilation issues. You'd have to ask the developers of htslib for support.

    However I can tell you that htslib is not really required in order to run GATK, so you can skip it unless you need the ability to revert a bam file to FastQ (which is what we use it for in the tutorial).

    Geraldine Van der Auwera, PhD

  • FabriceBesnardFabriceBesnard ParisPosts: 23Member

    Ok thanks Geraldine for your answer, I haven't fixed the issue so I am running GATK without htslib and that's fine for what I want to do. However, I will let you know if ever I find a fix to this issue...

  • mayaabmayaab IsraelPosts: 24Member ✭✭

    Hello, I'm running a virtual machine, and get the following error while compiling bwa: fatal error: stdio.h: no such file or directory

    the command: g++ -v returns an error

    I guess I should install g++, but can't find something on the web to help me with this. do you know how can I install it?

    Maya

  • SheilaSheila Broad InstitutePosts: 540Member, GATK Developer, Broadie, Moderator admin
    edited June 24

    @mayaab

    Hi Maya,

    Unfortunately, this is not something we can help with.

    -Sheila

    Post edited by Sheila on
  • shangzhong0619shangzhong0619 La JollaPosts: 3Member

    I just have a comment on the part 7. R packages. It seems that ggplot2 and gsalib are not enough. I installed another package called 'reshape', then it worked for generating the figures in the BQSR step.

  • engrasif09engrasif09 PakistanPosts: 1Member

    Absolutely amazing. Going great till now. Will seek your help if required!

Sign In or Register to comment.