Packages and file system
Packages
You can think of a package as a collection of functions. There are three main sources of packages: CRAN (Comprehensive R Archive Network), Bioconductor, Github.
The base installation of R comes with many useful packages as standard. These packages will contain many of the functions you will use on a daily basis. However, as you start using R for more diverse projects (and as your own use of R evolves) you will find that there comes a time when you will need to extend R’s capabilities
There are different ways to load packages from each. Most of the packages we need are already present in your RStudio environment.
CRAN packages
To install a package from CRAN you can use the install.packages() function. For example if you want to install the remotes package enter the following code into the Console window of RStudio (note: you will need a working internet connection to do this)
install.packages('remotes', dependencies = TRUE)
You may be asked to select a CRAN mirror, just select ‘0-cloud’ or a mirror near to your location. The dependencies = TRUE argument ensures that additional packages that are required will also be installed.
It’s good practice to occasionally update your previously installed packages to get access to new functionality and bug fixes. To update CRAN packages you can use the update.packages() function.
update.packages(ask = FALSE)
The ask = FALSE argument avoids having to confirm every package download.
Bioconductor packages
To install packages from Bioconductor the process is a little different. You first need to install the BiocManager package. You only need to do this once unless you subsequently reinstall or upgrade R.
install.packages('BiocManager', dependencies = TRUE)
Once the BiocManager package has been installed you can either install all of the ‘core’ Bioconductor packages with
BiocManager::install()
or install specific packages such as the ‘GenomicRanges’ and ‘edgeR’ packages.
BiocManager::install(c("GenomicRanges", "edgeR"))To update Bioconductor packages just use the BiocManager::install() function again.
BiocManager::install(ask = FALSE)Again, you can use the ask = FALSE argument to avoid having to confirm every package download.
GitHub packages
There are multiple options for installing packages hosted on GitHub. Perhaps the most efficient method is to use the install_github() function from the remotes package (you installed this package previously). Before you use the function you will need to know the GitHub username of the repository owner and also the name of the repository. For example, the development version of dplyr from Hadley Wickham is hosted on the tidyverse GitHub account and has the repository name ‘dplyr’ (just Google ‘github dplyr’). To install this version from GitHub, use
remotes::install_github('tidyverse/dplyr')The safest way (that we know of) to update a package installed from GitHub is to just reinstall it using the above command.
Using packages
Once you have installed a package onto your computer it is not immediately available for you to use. To use a package you first need to load the package by using the library() function. For example, to load the remotes package you previously installed.
library(remotes)
You will need to load the packages you will be using every time you start a new RStudio session. Place all library() statements required for analysis near the top of your .R or .Rmd file to make them easily accessible and obvious. If you try to use a specific package function without first loading the package library, you will receive an error message.
Try:
install_github('tidyverse/dplyr')
Sometimes it can be useful to use a function without first using the library() function. If, for example, you will only be using one or two functions in your script and don’t want to load all of the other functions in a package then you can access the function directly by specifying the package name followed by two colons and then the function name. This is why we used the following command above to install from github
remotes::install_github('tidyverse/dplyr')
This may seem a bit confusing at first, but managing packages will become second nature after using R for a short while. In addition, if you are following a manual or a vignette (a short tutorial) for a specific tool, the packages will all be listed at the top, and there is no extra knowledge needed about packages in order to get going with analysis (in most cases).
File Orientation
Working directories
The working directory is the default location where R will look for files you want to load and where it will put any files you save. You can check the file path of your working directory by looking at bar at the top of the Console pane.
If we were using a project, the home directory would be associated with the project name. We will see that later in the class. For now, I'm not going to have you work with projects, since we are just getting oriented to the RStudio IDE.
getwd()
setwd("/Users/nhy163/Documents/Alex/Teaching/first_project/")
dataf <- read.table("raw_data/mydata.txt", header = TRUE, sep = "\t")
Examples of ways that R can interact with the underlying file system.
#to list files in current directory
list.files()
#to perform operations in the file system
dir.create('test_dir')
setwd('test_dir')
# return to parent directory
setwd('..')
#send commands to the terminal
cmd1 <- 'mkdir another_test_dir'
cmd2 <- 'touch another_test_dir/a_test_file'
system(cmd1); system(cmd2)
Naming R files
Although there’s not really a recognised standard approach to naming files, there are a couple of things to bear in mind.
First, avoid using spaces in file names by replacing them with underscores or hyphens. One reason is that some command line software (especially many bioinformatic tools) won’t recognise a file name with a space which creates numerous problems. For the same reason, also avoid using special characters (i.e. @£$%^&*(:/) in your file names.
Never use the word final in any file name - it never is!
Whatever file naming convention you decide to use, try to adopt early, stick with it and be consistent.