Installing Linux tools
Introduction
OK. So you just read the latest issue of Bioinformatics (or did a Google search) and have discovered some new pieces of software that promise to slice and dice your data in new, interesting, and useful ways. Most often, these tools will be designed to run in a Linux environment. Unfortunately, the helpful support staff at TACC may not have had time to test these tools and make a proper module out of them (or maybe they didn't want to make 1,000+ modules for every piece of bioinformatics software out there). Perhaps there is a TACC module, but it was made a month or two back when the software was at version 1.01 and now it's at version 1.03, which has a bug fix or some nifty new bell and whistle.
The bottom line is that you are going to find yourself in a situation where module spider
will come up empty and you're on your own installing a piece of software that you are dying try out on TACC.
Unfortunately, there is no double-click installer for TACC. Fortunately, a majority of the better and more mature programs out there (but by no means all bioinformatics software) can be fairly easily installed. If these instructions fail, you might need to find your nearest Linux guru. Or, you might try to consult Google and tinker with things a bit.
The overall steps for installing a program on a Linux system are:
- Download the executable or source code
- Compile or make the project (if installing from source code)
- Set up your
$PATH
to find the new executable
Note: Most Linux installs will work similarly on MacOSX, with just a few additional preambles (install XCode, maybe some extra libraries, etc). With more extra work, it is possible to set up a Linux-like environment in Windows as well. Both of these topics are outside the scope of what we are going to cover here.
Case 1: Installing a precompiled binary (executable)
For programs that are already compiled (converted from high level source code in a language like C into machine specific code), you are often given some choices and need to determine how to download the version that has the correct CPU architecture for your machine.
You can get your CPU architecture with this command:
login1$ arch
Output might be something like i386 (for my MacBook) or x86_64 (for Lonestar).
Example: Install SSAHA2 precompiled binary
The website for the SSAHA2 read mapper has links to download executables compiled for several different architectures. Using commands that you have learned in earlier lessons, download the correct one to Lonestar and place it under the directory $HOME/local/bin
.
How the shell finds executables: $PATH
Now, you might want to tell your login shell that it should look for executable files in this new directory $HOME/local/bin
. This will allow you to use the executable as a one-word command like you are used to:
login1$ ssaha2
Instead of writing out the entire path to the executable to run it, like in one of these examples:
login1$ /home1/01502/jbarrick/local/bin/ssaha2 login1$ $HOME/local/bin/ssaha2
Assuming you are using the bash shell, you can do this by editing your $HOME/.profile
or $HOME/.profile_user
configuration file. These files are basically just bash scripts that are run whenever you log in. You want to add a line that looks like this to the top of $HOME/.profile
:
export PATH="$HOME/local/bin:$PATH"
This sets the environmental variable PATH
to point to its old value with your new directory appended to the front (the : separates multiple paths). This means the shell will look for executables in this new location first, then it will look in all of the standard locations after that. For more information on environmental variables see the Bash Beginner's Guide.
Important! In order to have this change take effect, you must log out or log in again to force the shell to re-read the $HOME/.profile_user
file. (Alternately, you can use one of these commands to re-read it at any time:
login1$ . $HOME/.profile_user login1$ source $HOME/.profile_user
If your path is not working or you're curious about where else your shell is looking for commands and the order, then you might want to see the value of your $PATH
.
login1$ echo $PATH login1$ env
Warning! If you forget to include $PATH
on the right side in the above example, then you will tell your shell to not look in the usual places for executables any more. This means that ls
, cd
, and other common commands will no longer work without typing out their whole paths, e.g. /bin/ls
. This can be extremely confusing!!
Handling multiple versions If you install a newer version of a command that is already available on TACC for yourself, then you might get confused about what version you are running when you type the command. You can see the whole path to the executable that will be run when you type a one-word command using the which
command.
login1$ which ssaha2
Many tools will also have a -v
or --version
flag, or output their version information in a header when they are run. This can help you be sure that you are running the version that you think you are.
login1$ ssaha2 -v
Case 2: Install from the source code
Note on TACC compilers
There are multiple compilers available on TACC:
intel
oricc
- the default compiler. Preferred for optimizing speed of compiled executables.gcc
- the GNU compiler collection. Tends to be more compatible.
Be aware that if you compile libraries and programs that link to them, that generally you must compile all components with the same compiler.
If you run into an error during compilation, try the gcc
compiler by loading its module. You may get a message like this:
login1$ module load gcc Error: You can only have one compiler module loaded at time. You already have intel loaded. To correct the situation, please enter the following command: module swap intel gcc/4.4.5 Please submit a consulting ticket if you require additional assistance.
So, follow the directions:
login1$ module swap intel gcc/4.4.5
You will need to do this to get breseq to compile in the next example.
Example: Install breseq from a source code archive
breseq is a tool developed by the Barrick lab. You might use it in a later lesson. It is a good example of a tool that can be downloaded and compiled.
breseq web page
breseq download page
breseq uses the common GNU build system install sequence. If you install other GNU tools then the same ./configure; make; make install
command sequence will often be used.
$login1 cdw $login1 wget http://breseq.googlecode.com/files/breseq-0.19.tar.gz $login1 tar -xvzf breseq-0.19.tar.gz $login1 cd breseq-0.19 $login1 ./configure --prefix=$HOME/local $login1 make $login1 make install
The extra option --prefix
to ./configure
sets where the executable and any other files associated with the program will be installed. If you leave off this flag, then it will try to install them in a system-side location. You must have administrator privileges to do this and would generally have to substitute sudo make install
for the last step to get this to work. That won't work on TACC! (sudo
means "super-user do".)
For some other tools, the instructions may tell you to skip straight to make
, or you might also have to install some other programs or libraries that the tool you want to use needs to run. Generally, you can find this information in the online documentation or an INSTALL
file in the root of the downloaded code.
More Examples
Example: Install the latest version of Bowtie2
There is a newer version of Bowtie2 available than the one loaded into a module on TACC. You might want to use it because it includes some new bug fixes. You can download either a source code version to compile using the above instructions or a binary version of bowtie2. Try to get this running on your own.
Other Cases
In other lessons we'll cover various deviations and elaborations on these two procedures in order to install specific programs, R modules, Perl modules, Python modules, etc.
Welcome to the University Wiki Service! Please use your IID (yourEID@eid.utexas.edu) when prompted for your email address during login or click here to enter your EID. If you are experiencing any issues loading content on pages, please try these steps to clear your browser cache.