...
But we will not be using R from the terminal, we will be using the RStudio IDE.
RStudio Setup
You should have RStudio up and running in a browser. If not, go to the home page for the course, and follow the instructions.
Orientation
The large window (aka pane) on the left is the Console window. If you open or create a file, it is placed in the Source window, generally to the top left. The window on the top right is the Environments window (Environment / History / Connections) and the bottom right window is the Output window (Files / Plots / Packages / Help / Viewer). We will discuss each of these panes in turn below.
You can customise the location of each pane by clicking on the ‘Tools’ menu then selecting Global Options –> Pane Layout. You can resize the panes by clicking and dragging the middle of the window borders in the direction you want.
| Info |
|---|
To set up RStudio to run the current line by pressing < |
Console
The Console is where R evaluates all the code you write. You can type R code directly into the Console at the command line prompt, >.
Environment/History/Connections
The Environment / History / Connections window shows you lots of useful information. You can access each component by clicking on the appropriate tab in the pane.
· The ‘Environment’ tab displays all the objects you have created in the current (global) environment. Drop-down/list
· The ‘History’ tab contains a list of all the commands you have entered into the R Console.
· The ‘Connections’ tab allows you to connect to various data sources such as external databases.
Files/Plots/Packages/Help/Viewer
· The ‘Files’ tab lists all external files and directories in the current working directory on your computer. It works like file explorer (Windows) or Finder (Mac). You can open, copy, rename, move and delete files listed in the window.
· The ‘Plots’ tab is where all the plots you create in R are displayed (unless you tell R otherwise). Zoom/scroll/export
· The ‘Packages’ tab lists all of the packages that you have installed on your computer. You can also install new packages and update existing packages by clicking on the ‘Install’ and ‘Update’ buttons respectively.
· The ‘Help’ tab displays the R help documentation for any function.
Packages
You can think of a package as a collection of functions. There are three main sources of packages: CRAN (Comprehensive R Archive Network), Bioconductor, Github.
The base installation of R comes with many useful packages as standard. These packages will contain many of the functions you will use on a daily basis. However, as you start using R for more diverse projects (and as your own use of R evolves) you will find that there comes a time when you will need to extend R’s capabilities
There are different ways to load packages from each. Most of the packages we need are already present in your RStudio environment.
CRAN packages
To install a package from CRAN you can use the install.packages() function. For example if you want to install the remotes package enter the following code into the Console window of RStudio (note: you will need a working internet connection to do this)
| Code Block |
|---|
install.packages('remotes', dependencies = TRUE) |
You may be asked to select a CRAN mirror, just select ‘0-cloud’ or a mirror near to your location. The dependencies = TRUE argument ensures that additional packages that are required will also be installed.
It’s good practice to occasionally update your previously installed packages to get access to new functionality and bug fixes. To update CRAN packages you can use the update.packages() function.
| Code Block |
|---|
update.packages(ask = FALSE) |
The ask = FALSE argument avoids having to confirm every package download.
Bioconductor packages
To install packages from Bioconductor the process is a little different. You first need to install the BiocManager package. You only need to do this once unless you subsequently reinstall or upgrade R.
| Code Block |
|---|
install.packages('BiocManager', dependencies = TRUE) |
Once the BiocManager package has been installed you can either install all of the ‘core’ Bioconductor packages with
| Code Block |
|---|
BiocManager::install() |
or install specific packages such as the ‘GenomicRanges’ and ‘edgeR’ packages.
| Code Block |
|---|
BiocManager::install(c("GenomicRanges", "edgeR")) |
To update Bioconductor packages just use the BiocManager::install() function again.
| Code Block |
|---|
BiocManager::install(ask = FALSE) |
Again, you can use the ask = FALSE argument to avoid having to confirm every package download.
GitHub packages
There are multiple options for installing packages hosted on GitHub. Perhaps the most efficient method is to use the install_github() function from the remotes package (you installed this package previously). Before you use the function you will need to know the GitHub username of the repository owner and also the name of the repository. For example, the development version of dplyr from Hadley Wickham is hosted on the tidyverse GitHub account and has the repository name ‘dplyr’ (just Google ‘github dplyr’). To install this version from GitHub, use
| Code Block |
|---|
remotes::install_github('tidyverse/dplyr') |
The safest way (that we know of) to update a package installed from GitHub is to just reinstall it using the above command.
Using packages
Once you have installed a package onto your computer it is not immediately available for you to use. To use a package you first need to load the package by using the library() function. For example, to load the remotes package you previously installed.
| Code Block |
|---|
library(remotes) |
You will need to load the packages you will be using every time you start a new RStudio session. Place all library() statements required for analysis near the top of your .R or .Rmd file to make them easily accessible and obvious. If you try to use a specific package function without first loading the package library, you will receive an error message.
Try:
| Code Block |
|---|
install_github('tidyverse/dplyr') |
Sometimes it can be useful to use a function without first using the library() function. If, for example, you will only be using one or two functions in your script and don’t want to load all of the other functions in a package then you can access the function directly by specifying the package name followed by two colons and then the function name. This is why we used the following command above to install from github
| Code Block |
|---|
remotes::install_github('tidyverse/dplyr') |
This may seem a bit confusing at first, but managing packages will become second nature after using R for a short while. In addition, if you are following a manual or a vignette (a short tutorial) for a specific tool, the packages will all be listed at the top, and there is no extra knowledge needed about packages in order to get going with analysis (in most cases).
File Orientation
Working directories
The working directory is the default location where R will look for files you want to load and where it will put any files you save. You can check the file path of your working directory by looking at bar at the top of the Console pane.
If we were using a project, the home directory would be associated with the project name. We will see that later in the class. For now, I'm not going to have you work with projects, since we are just getting oriented to the RStudio IDE.
| Code Block |
|---|
getwd()
setwd("/Users/nhy163/Documents/Alex/Teaching/first_project/")
dataf <- read.table("raw_data/mydata.txt", header = TRUE, sep = "\t")
|
Examples of ways that R can interact with the underlying file system.
| Code Block |
|---|
#to list files in current directory
list.files()
#to perform operations in the file system
dir.create('test_dir')
setwd('test_dir')
# return to parent directory
setwd('..')
#send commands to the terminal
cmd1 <- 'mkdir another_test_dir'
cmd2 <- 'touch another_test_dir/a_test_file'
system(cmd1); system(cmd2) |
Naming R files
Although there’s not really a recognised standard approach to naming files, there are a couple of things to bear in mind.
First, avoid using spaces in file names by replacing them with underscores or hyphens. One reason is that some command line software (especially many bioinformatic tools) won’t recognise a file name with a space which creates numerous problems. For the same reason, also avoid using special characters (i.e. @£$%^&*(:/) in your file names.
Never use the word final in any file name - it never is!
Whatever file naming convention you decide to use, try to adopt early, stick with it and be consistent.
Coding in R
Some guidelines:
R is case sensitive i.e.
Anything that follows a
#symbol is interpreted as a comment and ignored by R. Annotate heavily!! Your future self will love you for it.In R, commands are generally separated by a new line. You can also use a semicolon
;to separate your commands but this is rarely used.If a continuation prompt
+appears in the console after you execute your code this means that you haven’t completed your code correctly. This often happens if you forget to close a bracket. Either try to finish the command on the new line, or hit escape on your keyboard until the console resets.In general, R allows extra spaces inserted into your code, in fact using spaces is actively encouraged. However, spaces should not be inserted into operators i.e.
<-should not read< -(note the space)If your console ‘hangs’ and becomes unresponsive, try pressing the escape key (esc) or clicking on the stop icon in the top right of your console. This will terminate the current operation in most cases.
R as a calculator
| Code Block |
|---|
2+2
log(1) # log base e
log2(1) # log base 2
exp(1) # e^x
sqrt(4) # square root
4^2 # 4 to the power of 2
pi # not a function but useful
17%%6 # modulo operator |
Objects in R
Objects are the central concept that unites R code. Everything in R is an object
Objects can be, for example, a single number, a character string (like a word), a vector, a data table, or a highly complex plot or function. Understanding how you create objects and assign values to objects is key to understanding R.
To create an object we simply give the object a name. We can then assign a value to this object using the assignment operator <- (sometimes called the gets operator). The assignment operator is a composite symbol comprised of a ‘less than’ symbol < and a hyphen - .
| Code Block |
|---|
my_obj <- 5 |
In the code above, we created an object called my_obj and assigned it a value of the number 5 using the assignment operator. You can also use = instead of <- to assign values. Some people do this, but it is considered bad practice.
To view the value of the object type the name of the object, or execute it from the IDE <CTRL><ENTER>.
| Code Block |
|---|
my_obj |
| Code Block |
|---|
## [1] 5 |
Check ‘Environment’ tab for the object.
If you click on the down arrow on the ‘List’ icon in the same pane and change to ‘Grid’ view RStudio will show you a summary of the objects including the type (numeric - it’s a number), the length (only one value in this object), its ‘physical’ size and its value (5 in this case).
Naming objects
Naming your objects in R is important. Good object names should be short and informative. If you need to create objects with multiple words in their name then use either an underscore or a dot between words, or capitalise the first letter of new words. I prefer using underscores (snake case).
| Code Block |
|---|
input_argument_last <- "cell type 1" |
| Code Block |
|---|
input.argument.last <- "cell type 1" |
| Code Block |
|---|
InputArgumentLast <- "cell type 1" |
| Panel | ||||||||
|---|---|---|---|---|---|---|---|---|
| ||||||||
Code break: Create some additional objects Write some code in the Source window, and execute it < |
| Code Block |
|---|
#Examples:
num_object_1 <- 33
a_char_object <- 'don\'t think so!'
num_object_2 <- num_object_1 %% 3
num_object_3 <- num_object_1 / num_object_2
num_object_3 <- num_object_1 + a_char_object |
When learning R, understanding errors and warnings can be frustrating (what’s an argument or a ‘binary operator’ ?). To find out more information about a particular error, Google a version of the error message, e.g. ‘non-numeric argument to binary operator error + r’ .
‘Base R’ Functions
There are many functions, operators, and objects already available in R distributions. These are referred to as ‘base R’ functions, ‘base R’ operators, etc. Functions are R objects that take an argument, carry out some operations, and typically return a value.
For example, the log() call made above used the log() function. It also takes other arguments, such as the base of the logarithm that you want to use.
To get help for a function, type a question mark before the name of the function and execute it.
| Code Block |
|---|
?mean |
Our first R-specific object, the vector.
The c() function is short for concatenate and we use it to join together a series of values and store them in a data structure called a vector.
| Code Block |
|---|
my_vec <- c(2, 3, 1, 6, 4, 3, 3, 7) |
A vector is essentially a one-dimensional container that holds a sequence of elements of the same type. Although it’s really a data structure, it's also considered a basic object because many other data structures in R consist of vectors.
Now that we’ve created a vector we can use other functions to do useful stuff with this object. For example, we can calculate the mean, variance, standard deviation and number of elements in our vector by using the mean(), var(), sd() and length() functions.
| Panel | ||||||||
|---|---|---|---|---|---|---|---|---|
| ||||||||
Code Break: Introduction to vectors Some code to play with vectors is given below. Don’t cut and paste!! Write it out in your source window on your own, and play around with the objects to become familiar. |
| Code Block |
|---|
my_vec <- c(2, 2, 3, 9, 4, 5)
typeof(my_vec)
mean(my_vec) # returns the mean of my_vec
var(my_vec) # returns the variance of my_vec
sd(my_vec) # returns the standard deviation of my_vec
length(my_vec) # returns the number of elements in my_vec
vec_mean <- mean(my_vec)
|
| Panel | ||||||||
|---|---|---|---|---|---|---|---|---|
| ||||||||
Code Break: Creating vectors with seq(), rep(), and sample() Some code for creating vectors is given below. Play with the code, or make your own and try to understand what is happening. If you want to read about the functions, type a question mark followed by the function name. |
| Code Block |
|---|
#The seq() function
seq2 <- seq(1,20)
seq3 <- seq(1,20,.1)
length(seq3)
seq4 <- c(seq(1,10),seq(20,30))
#The rep() function
seq4 <- c( rep(1, 10), rep(2,10) )
seq5 <- rep("abc", 10)
#see if you can figure out how to use the sample function
|
The next section focuses on extracting or altering elements of vectors.
Vectors and extracting elements
...
By positional indices
| Code Block |
|---|
my_vec <- c(2, 2, 3, 9, 4, 5)
my_vec[3]
my_vec[c(1, 5, 6, 8)]
my_vec[3:8]
my_vec[-3:-1] |
Logical operators
...
| Panel | ||||||||
|---|---|---|---|---|---|---|---|---|
| ||||||||
Code break:
|
Examples (create your own)
| Code Block |
|---|
#examples:
my_vec < 5
my_vec <= 5
my_vec1 <- c(1:10)
my_vec2 <- c(5:14)
my_vec1 < my_vec2 |
Extraction of elements using logical operators
Extraction of elements can be carried out using boolean vectors resulting from the above types of logical ‘tests’. Here is an example. Try to predict what the sub_vector will look like, before executing that line.
| Code Block |
|---|
a_vector = c(1:3, 10:15, 5:20) # note another way to create a vector !
sub_vector = a_vector[a_vector < 12]
sub_vector |
Vectorization
One of the great things about R functions is that most of them are vectorized. This means that the function will operate on all elements of a vector without needing to apply the function on each element separately. For example, to multiple each element of a vector by 5 we can simply use
When you’re operating on vectors, if vectorization makes sense with the operation, then the operation will likely be carried out in a fully vectorized manner.
Re-Entry into R
At this point, we have come into contact with a few object types in R: numeric, character, vector.
We’ve also learned how to interact with a vector by extracting elements, using base R functions, and using logical operators.
Until now, however, we’ve been interacting with these objects without fully appreciating what these types of objects represent in the R language. This was intentional, because gaining some familiarity with objects and how they interact from a more intuitive standpoint is a great background to have for what comes next.
At this point, I now would like to take a more detailed approach to introducing more complex aspects of R, namely, a filler range of it’s objects, its data types, as well as some useful packages and functions.
Objects | Data Structures |
|---|---|
|
|
We will start our coding in this section with the vector, and we will encounter all of the basic object types as individual elements of these vectors. We will then discuss lists, and move on to dataframes as quickly as possible, since it is the dataframe where the power of R is best leveraged.
There are four primary types of atomic vectors: logical, integer, double, and character.
Have a go at creating some of them.
Code:
| Code Block |
|---|
num_1 <- as.integer(c(1:10))
num_2 <- as.double(c(1:10))
char_1 <- 'coding region'
boolean_1 <- c(T,F,F) |
| Note |
|---|
Warning: Recycling We saw how boolean vectors can interact with numeric vectors, for example. Try indexing the above num_1 vector with boolean_1. Figure out what is going on when the boolean vector is shorter than the vector it is indexing. |
| Expand | ||
|---|---|---|
| ||
Recycling! The boolean vector is recycled however many times is needed to reach the end of the vector it is indexing |
Coercion
Even though R’s vectors have a specific type, it’s quite easy to convert them to another type. This is called coercion. As a language for data analysis, this flexibility works mostly to our advantage. It’s why we generally don’t stress out over integer versus double in R. It’s why we can compute a proportion as the mean of a logical vector (we exploit automatic coercion to integer in this case). But unexpected coercion is a rich source of programming puzzles, so always consider this possibility when coding.
For explicit coercion, use the as.*() functions.
| Code Block | ||
|---|---|---|
| ||
v_log <- c(T,F,F,T)
as.integer(v_log)
as.numeric(v_log)
as.character(v_log) |
Note: logical vectors can be expressed equally well by the integers 0 for FALSE and 1 for TRUE.
| Code Block |
|---|
as_logical <- as.logical(c(1,1,0,0,0))
as_logical
|
| Note |
|---|
Warning: implicit coercion Coercion can also be triggered by other actions, such as assigning a scalar of the wrong type into an existing vector. |
| Code Block | ||
|---|---|---|
| ||
num_vect <- c(1:10)
num_vect <- c(num_vect, 'string')
num_vect |
Our numeric vector was silently coerced to character. Notice that R did this quietly, without warning. Always pay attention to the question: Is this object of the type I think it is? How sure am I about that?
Some loose ends: NA, NaN, Inf, -Inf
NA is the object that R uses to indicate ‘not available’, or ‘missing’. It will incorporate itself into a vector and take the same data type as the elements around it. Therefore, it is necessary to know for certain if the data contains NAs by testing using the function is.na().
| Code Block |
|---|
a_vect <- c(1,-3,5,NA,7, NA)
typeof(a_vect[4])
is.na(a_vect)
sum(is.na(a_vect)) |
NaN is the object produce when there is no possible mathematical result for an operation, for example, when dividing 0 by 0, or when taking the square root of a negative number.
| Code Block |
|---|
a_vect/0
sqrt(a_vect)
is.nan(sqrt(a_vect)) |
We can see above how Inf is produced when dividing by zero. The common way that -Inf is produced is when taking the logarithm of zero
| Code Block |
|---|
log(rep(0, 5)) |
Lists
Lists are needed for holding objects that violate the constraints imposed by an atomic vector: in other words, if one or both of the following is true.
The object has length greater than 1.
The individual objects to be collected together are not of the same type.
Lists are a more general form of a vector. Whereas a vector must contain elements that are all the same object type, the elements of a list may be various types. Lists are also commonly created while providing names for each element.
Three common ways to create a list from scratch:
Provide the element names together with the elements of the list
| Code Block |
|---|
list1 <- list(element1 = 1, element2 = c(2,3,4), element3 = T) |
Create a list and then provide the names as a vector.
| Code Block |
|---|
list1 <- list(1, c(2,3,4), T)
names(list1) = c("element1" "element2" "element3") |
Convert some other object to a list
| Code Block |
|---|
a_vect <- c(11:20)
a_list <- as.list(a_vect) |
Indexing a list:
With vectors, we saw that there was one way of indexing elements: using brackets.
With lists, the elements can be indexed in three ways, all of which are important to understand.
1. Single brackets (returns a list of the selected element or elements)
| Code Block |
|---|
list1 <- list(name1 = 1, name2 = c(2,3,4), name3 = T, name4 = NA)
list1[2] |
Double brackets (returns the element indexed)
| Code Block |
|---|
list1 <- list(element1 = 1, element2 = c(2,3,4), element3 = T)
list1[[2]] |
Using
$<element name>
| Code Block |
|---|
list1$element2 |
| Panel | ||||||||
|---|---|---|---|---|---|---|---|---|
| ||||||||
Question: how do we identify a list from what R outputs to the console in the above 3 cases. Hint: In 2. above, use |
| Expand | ||
|---|---|---|
| ||
A list always has names, and will print them. |
By the way, a vector can also have element names! However, note the format when R prints out the object.
| Code Block |
|---|
a_named_vect <- c(5:10)
names(a_named_vect) <- letters[1:6]
a_named_vect |
Dataframes
R is designed for handlign data in the form of large tables, so this section is where we’ll see the full power of R for data analysis. The underlying structure of a dataframe is just a list of vectors, so we already understand much about how to handle them. However, we are going to need a few tools in order to work effectively with dataframes.
stringr: Library containing many functions for string operations.
| Code Block |
|---|
library(stringr)
str <- c("Hello", "hello there", "hi", "ahoy", "nope")
str_detect(str, 'he') #tests whether elements contain a pattern
str_split_i(str, 'll', 1) #splits each element at a pattern,and selects the first split
str_split_i(str, 'll', 2) #splits each element at a pattern,and selects the second split
str_c( rep('vector', 10), rep('_', 10), c(1:10) ) # pastes together strings |
table(): Quickly count instances of elements occurring in a vector
| Code Block |
|---|
number_vect <- sample(1:10, 1000)
table(number_vect) |
rnorm(): sample from a normal distribution
| Code Block |
|---|
?rnorm
rnorm(5,10,1)
|
order(): Sort a vector numerically or alphabetically. Note: returns the indexes of the vector used to sort the vector.
| Code Block |
|---|
number_vect <- sample(1:10, 10)
order(number_vect, decreasing = T)
number_vect[order(number_vect)] |
%in%: This strange-looking operator returns a boolean vector based on whether the elements of the first vector are found in the second.
| Code Block |
|---|
vect_1 <- c(1:20)
vect2 <- c(7:11)
vect_1[vect1 %in% vect2]
vect_1[!vect1 %in% vect2] |
Functions: We won’t spend much time with writing functions, but you will need to be able to create some simple functions of the type shown below. We can spend some time here if people are not familiar with function creation.
| Code Block | ||
|---|---|---|
| ||
#A function that takes in a value, and samples a normal distribution
using that value x as a mean, returning a vector of length y.
add_two_values <- function(x,y) { return(x+y) }
sample_normal <- function(x,y) { return( rnorm(y, mean=x) ) } |
lapply: lapply is a convenient way to generate a list of vectors, for example. It is also a convenient way to modify each element of a list. There are other ‘apply’ functions in R (e.g., apply, sapply, mapply), but we will stick with lapply for this course. It is best to understand one function before trying to use them all.
| Code Block |
|---|
generate_vect <- function(x){ return( rnorm(100, mean = x, sd = 3) ) }
list_of_vectors <- lapply(c(3,7,20), FUN = generate_vect) |
| Expand | ||
|---|---|---|
| ||
|
Creating a dataframe:
Method 1: from vectors, by column
| Code Block |
|---|
#generate vectors
vect1 <- c(1:10)
vect2 <- sample(c(20:25), 10, replace=T)
vect3 <- str_c(LETTERS[1:10], letters[1:10])
#create dataframe
df <- as.data.frame(cbind(vect1, vect2, vect3))
df$vect1 <- as.numeric(df$vect1)
df$vect2 <- as.numeric(df$vect2) |
Method 2: from vectors, by row
| Code Block |
|---|
vect1 <- c(1:10)
vect2 <- sample(c(20:25), 10, replace=T)
vect3 <- str_c(LETTERS[1:10], letters[1:10])
df <- as.data.frame(rbind(vect1, vect2, vect3)) |
Method 3: from a named list, by column
| Code Block |
|---|
#generate list of vectors
generate_vect <- function(x){ return( rnorm(100, mean = x, sd = 3) ) }
list_of_vectors <- lapply(c(3,7,20), FUN = generate_vect)
#provide names for the list
names(list_of_vectors) <- str_c(rep('vect', 10), rep('_', 10), c(1:10))
#create dataframe
df <- as.data.frame(list_of_vectors) |
Slicing Dataframes:
Slicing a dataframes means extracting specific rows and columns. This can be done in a number of ways in base R:
numerical row and column indices:
| Code Block |
|---|
dataf <- read.table("data/walrus_sounds.tsv", header = F, sep = "\t")
dataf[3,2]
dataf[,2]
dataf[1,]
dataf[1:2,1:2] |
Using column names and row names
| Code Block |
|---|
#Read in dataframe, define column names and row names
dataf <- read.table("data/walrus_sounds.tsv", header = F, sep = "\t")
colnames(dataf) <- c('name', 'sound_type', 'time_min_sec')
row_names_vect <- str_c(expand.grid(letters, letters)$Var2, expand.grid(letters, letters)$Var1)
rownames(dataf) <- row_names_vect[1:dim(dataf)[1]]
#slice using column and row names
dataf[c('aa','ab'), c('name', 'time_min_sec')] |
| Panel | ||||||||
|---|---|---|---|---|---|---|---|---|
| ||||||||
Code challenge: Convert the dataframe ‘time_min_sec’ column to total seconds in a new vector called ‘sec’. |
| Expand | ||
|---|---|---|
| ||
|
Now, add the new vector to our dataframe in a column named ‘total_sec’ as follows:
| Code Block |
|---|
dataf$total_sec = sec |
| Panel | ||||||||
|---|---|---|---|---|---|---|---|---|
| ||||||||
Lets try that again but round to the nearest minute this time. Problem: We can’t use the round() function on seconds, because round only understands decimal numbers. What to do? |
Slicing dataframes using logical vectors
Above, we have been selecting rows and columns using vectors containing the indices of the columns or rows or the names of the columns or rows.
Another and perhaps more important way to manipulate dataframes is to use logical vectors, whereby selection occurs only where TRUE appears in the vector.
Selecting rows using a test on a column vector
| Code Block |
|---|
dataf_Jocko <- dataf[,dataf$name == "Jocko"]
dataf_Jocko_chort <- dataf[dataf$name == "Jocko" & dataf$sound_type == 'chortle',]
mean(dataf$total_sec[dataf$name == "Jocko" & dataf$sound_type == 'chortle'])
|
mean(dataf$total_sec[dataf$name == "Jocko" & (dataf$sound_type == 'chortle' | dataf$sound_type == 'chortle')])
| Panel | ||||||||
|---|---|---|---|---|---|---|---|---|
| ||||||||
Question: What is the following code accomplishing in extracting elements of the dataframe? |
| Code Block |
|---|
mean(dataf$total_sec[dataf$name == "Jocko" & (dataf$sound_type == 'chortle' | dataf$sound_type == 'gong')])x |
Writing dataframes to files
It’s important to take analysis results from R and save them in a readable format. It is also important to use functions whenever possible in order to make your code modular.
Let’s look at how to create a function to analyze dataframes, where this function also writes the results to new files. In this example, we will generate separate dataframes for each walrus, and write the dataframes to separate files in a new directory named for each walrus. We will be writing a single function to do all of this, so it will take some time, but it’s a good example of a useful function.
The input to the function will be the walrus name, and the function will automatically do the slicing, creation of new column, and sorting in the previous manner. We will then include writing of the data to the files. We will be using the base R function write.table(), which is the companion to read.table.
| Code Block |
|---|
separate_and_write = function(w_name, dff = dataf) {
dataf_out <- dff[dff$name == w_name,]
dataf_out$total_sec <- as.numeric(str_split_i(dataf_out$time_min_sec, ":",1))*60 + as.numeric(str_split_i(dataf_out$time_min_sec, ":",2))
dataf_out <- dataf_out[order(dataf_out$total_sec, decreasing = T),]
system(str_c('mkdir ', w_name, "_output"))
write.table(dataf_out, str_c(w_name, "_output/", w_name, '_table.txt' ),
quote = F, sep = '\t', col.names = T, row.names = F)
} |
Other functions that write dataframes:
fwrite (from data.table package)
More information/Resources
...