But we will not be using R from the terminal, we will be using the RStudio IDE.

RStudio Setup

You should have RStudio up and running in a browser. If not, go to the home page for the course, and follow the instructions.

Course Home

Orientation

The large window (aka pane) on the left is the Console window. If you open or create a file, it is placed in the Source window, generally to the top left. The window on the top right is the Environments window (Environment / History / Connections) and the bottom right window is the Output window (Files / Plots / Packages / Help / Viewer). We will discuss each of these panes in turn below.

You can customise the location of each pane by clicking on the ‘Tools’ menu then selecting Global Options –> Pane Layout. You can resize the panes by clicking and dragging the middle of the window borders in the direction you want.

Info
To set up RStudio to run the current line by pressing <`CTRL`><`ENTER`>, go to tools->global options->code). I recommend this !

Console

The Console is where R evaluates all the code you write. You can type R code directly into the Console at the command line prompt, >.

Environment/History/Connections

The Environment / History / Connections window shows you lots of useful information. You can access each component by clicking on the appropriate tab in the pane.

· The ‘Environment’ tab displays all the objects you have created in the current (global) environment. Drop-down/list

· The ‘History’ tab contains a list of all the commands you have entered into the R Console.

· The ‘Connections’ tab allows you to connect to various data sources such as external databases.

Files/Plots/Packages/Help/Viewer

· The ‘Files’ tab lists all external files and directories in the current working directory on your computer. It works like file explorer (Windows) or Finder (Mac). You can open, copy, rename, move and delete files listed in the window.

· The ‘Plots’ tab is where all the plots you create in R are displayed (unless you tell R otherwise). Zoom/scroll/export

· The ‘Packages’ tab lists all of the packages that you have installed on your computer. You can also install new packages and update existing packages by clicking on the ‘Install’ and ‘Update’ buttons respectively.

· The ‘Help’ tab displays the R help documentation for any function.

Packages

You can think of a package as a collection of functions. There are three main sources of packages: CRAN (Comprehensive R Archive Network), Bioconductor, Github.

The base installation of R comes with many useful packages as standard. These packages will contain many of the functions you will use on a daily basis. However, as you start using R for more diverse projects (and as your own use of R evolves) you will find that there comes a time when you will need to extend R’s capabilities

There are different ways to load packages from each. Most of the packages we need are already present in your RStudio environment.

CRAN packages

To install a package from CRAN you can use the install.packages() function. For example if you want to install the remotes package enter the following code into the Console window of RStudio (note: you will need a working internet connection to do this)

Code Block
install.packages('remotes', dependencies = TRUE)

You may be asked to select a CRAN mirror, just select ‘0-cloud’ or a mirror near to your location. The dependencies = TRUE argument ensures that additional packages that are required will also be installed.

It’s good practice to occasionally update your previously installed packages to get access to new functionality and bug fixes. To update CRAN packages you can use the update.packages() function.

Code Block
update.packages(ask = FALSE)

The ask = FALSE argument avoids having to confirm every package download.

Bioconductor packages

To install packages from Bioconductor the process is a little different. You first need to install the BiocManager package. You only need to do this once unless you subsequently reinstall or upgrade R.

Code Block
install.packages('BiocManager', dependencies = TRUE)

Once the BiocManager package has been installed you can either install all of the ‘core’ Bioconductor packages with

Code Block
BiocManager::install()

or install specific packages such as the ‘GenomicRanges’ and ‘edgeR’ packages.

Code Block
BiocManager::install(c("GenomicRanges", "edgeR"))

To update Bioconductor packages just use the BiocManager::install() function again.

Code Block
BiocManager::install(ask = FALSE)

Again, you can use the ask = FALSE argument to avoid having to confirm every package download.

GitHub packages

There are multiple options for installing packages hosted on GitHub. Perhaps the most efficient method is to use the install_github() function from the remotes package (you installed this package previously). Before you use the function you will need to know the GitHub username of the repository owner and also the name of the repository. For example, the development version of dplyr from Hadley Wickham is hosted on the tidyverse GitHub account and has the repository name ‘dplyr’ (just Google ‘github dplyr’). To install this version from GitHub, use

Code Block
remotes::install_github('tidyverse/dplyr')

The safest way (that we know of) to update a package installed from GitHub is to just reinstall it using the above command.

Using packages

Once you have installed a package onto your computer it is not immediately available for you to use. To use a package you first need to load the package by using the library() function. For example, to load the remotes package you previously installed.

Code Block
library(remotes)

You will need to load the packages you will be using every time you start a new RStudio session. Place all library() statements required for analysis near the top of your .R or .Rmd file to make them easily accessible and obvious. If you try to use a specific package function without first loading the package library, you will receive an error message.

Try:

Code Block
install_github('tidyverse/dplyr')

Sometimes it can be useful to use a function without first using the library() function. If, for example, you will only be using one or two functions in your script and don’t want to load all of the other functions in a package then you can access the function directly by specifying the package name followed by two colons and then the function name. This is why we used the following command above to install from github

Code Block
remotes::install_github('tidyverse/dplyr')

This may seem a bit confusing at first, but managing packages will become second nature after using R for a short while. In addition, if you are following a manual or a vignette (a short tutorial) for a specific tool, the packages will all be listed at the top, and there is no extra knowledge needed about packages in order to get going with analysis (in most cases).

File Orientation

Working directories

The working directory is the default location where R will look for files you want to load and where it will put any files you save. You can check the file path of your working directory by looking at bar at the top of the Console pane.

If we were using a project, the home directory would be associated with the project name. We will see that later in the class. For now, I'm not going to have you work with projects, since we are just getting oriented to the RStudio IDE.

Code Block
getwd() setwd("/Users/nhy163/Documents/Alex/Teaching/first_project/") dataf <- read.table("raw_data/mydata.txt", header = TRUE, sep = "\t")

Examples of ways that R can interact with the underlying file system.

Code Block

#to list files in current directory
list.files()

#to perform operations in the file system
dir.create('test_dir')
setwd('test_dir')

# return to parent directory
setwd('..')

#send commands to the terminal 
cmd1 <- 'mkdir another_test_dir'
cmd2 <- 'touch another_test_dir/a_test_file'
system(cmd1); system(cmd2)

Naming R files

Although there’s not really a recognised standard approach to naming files, there are a couple of things to bear in mind.

First, avoid using spaces in file names by replacing them with underscores or hyphens. One reason is that some command line software (especially many bioinformatic tools) won’t recognise a file name with a space which creates numerous problems. For the same reason, also avoid using special characters (i.e. @£$%^&*(:/) in your file names.

Never use the word final in any file name - it never is!

Whatever file naming convention you decide to use, try to adopt early, stick with it and be consistent.

Coding in R

Some guidelines:

R is case sensitive i.e.
Anything that follows a # symbol is interpreted as a comment and ignored by R. Annotate heavily!! Your future self will love you for it.
In R, commands are generally separated by a new line. You can also use a semicolon ; to separate your commands but this is rarely used.
If a continuation prompt + appears in the console after you execute your code this means that you haven’t completed your code correctly. This often happens if you forget to close a bracket. Either try to finish the command on the new line, or hit escape on your keyboard until the console resets.
In general, R allows extra spaces inserted into your code, in fact using spaces is actively encouraged. However, spaces should not be inserted into operators i.e. <- should not read < - (note the space)
If your console ‘hangs’ and becomes unresponsive, try pressing the escape key (esc) or clicking on the stop icon in the top right of your console. This will terminate the current operation in most cases.

R as a calculator

Code Block
2+2 log(1) # log base e log2(1) # log base 2 exp(1) # e^x sqrt(4) # square root 4^2 # 4 to the power of 2 pi # not a function but useful 17%%6 # modulo operator

Objects in R

Objects are the central concept that unites R code. Everything in R is an object

Objects can be, for example, a single number, a character string (like a word), a vector, a data table, or a highly complex plot or function. Understanding how you create objects and assign values to objects is key to understanding R.

To create an object we simply give the object a name. We can then assign a value to this object using the assignment operator <- (sometimes called the gets operator). The assignment operator is a composite symbol comprised of a ‘less than’ symbol < and a hyphen - .

Code Block
my_obj <- 5

In the code above, we created an object called my_obj and assigned it a value of the number 5 using the assignment operator. You can also use = instead of <- to assign values. Some people do this, but it is considered bad practice.

To view the value of the object type the name of the object, or execute it from the IDE <CTRL><ENTER>.

Code Block
my_obj

Code Block
## [1] 5

Check ‘Environment’ tab for the object.

If you click on the down arrow on the ‘List’ icon in the same pane and change to ‘Grid’ view RStudio will show you a summary of the objects including the type (numeric - it’s a number), the length (only one value in this object), its ‘physical’ size and its value (5 in this case).

Naming objects

Naming your objects in R is important. Good object names should be short and informative. If you need to create objects with multiple words in their name then use either an underscore or a dot between words, or capitalise the first letter of new words. I prefer using underscores (snake case).

Code Block
input_argument_last <- "cell type 1"

Code Block
input.argument.last <- "cell type 1"

Code Block
InputArgumentLast <- "cell type 1"

Panel

panelIconId	1f600
panelIcon	:grinning:
panelIconText	😀
bgColor	#DEEBFF

Code break: Create some additional objects

Write some code in the Source window, and execute it <CTRL><ENTER>. Don’t copy, but write you’re own objects, and execute the code.

Code Block
#Examples: num_object_1 <- 33 a_char_object <- 'don\'t think so!' num_object_2 <- num_object_1 %% 3 num_object_3 <- num_object_1 / num_object_2 num_object_3 <- num_object_1 + a_char_object

When learning R, understanding errors and warnings can be frustrating (what’s an argument or a ‘binary operator’ ?). To find out more information about a particular error, Google a version of the error message, e.g. ‘non-numeric argument to binary operator error + r’ .

‘Base R’ Functions

There are many functions, operators, and objects already available in R distributions. These are referred to as ‘base R’ functions, ‘base R’ operators, etc. Functions are R objects that take an argument, carry out some operations, and typically return a value.

For example, the log() call made above used the log() function. It also takes other arguments, such as the base of the logarithm that you want to use.

To get help for a function, type a question mark before the name of the function and execute it.

Code Block
?mean

Our first R-specific object, the vector.

The c() function is short for concatenate and we use it to join together a series of values and store them in a data structure called a vector.

Code Block
my_vec <- c(2, 3, 1, 6, 4, 3, 3, 7)

A vector is essentially a one-dimensional container that holds a sequence of elements of the same type. Although it’s really a data structure, it's also considered a basic object because many other data structures in R consist of vectors.

Now that we’ve created a vector we can use other functions to do useful stuff with this object. For example, we can calculate the mean, variance, standard deviation and number of elements in our vector by using the mean(), var(), sd() and length() functions.

Panel

panelIconId	1f600
panelIcon	:grinning:
panelIconText	😀
bgColor	#DEEBFF

Code Break: Introduction to vectors

Some code to play with vectors is given below. Don’t cut and paste!! Write it out in your source window on your own, and play around with the objects to become familiar.

Code Block

my_vec <- c(2, 2, 3, 9, 4, 5)
typeof(my_vec)

mean(my_vec)    # returns the mean of my_vec
var(my_vec)     # returns the variance of my_vec
sd(my_vec)      # returns the standard deviation of my_vec
length(my_vec)  # returns the number of elements in my_vec
vec_mean <- mean(my_vec)

Panel

panelIconId	1f600
panelIcon	:grinning:
panelIconText	😀
bgColor	#DEEBFF

Code Break: Creating vectors with seq(), rep(), and sample()

Some code for creating vectors is given below. Play with the code, or make your own and try to understand what is happening. If you want to read about the functions, type a question mark followed by the function name.

Code Block

#The seq() function
seq2 <- seq(1,20)
seq3 <- seq(1,20,.1)
length(seq3)
seq4 <- c(seq(1,10),seq(20,30))

#The rep() function
seq4 <- c( rep(1, 10), rep(2,10) )
seq5 <- rep("abc", 10)    

#see if you can figure out how to use the sample function

The next section focuses on extracting or altering elements of vectors.

Vectors and extracting elements

...

By positional indices

Code Block
my_vec <- c(2, 2, 3, 9, 4, 5) my_vec[3] my_vec[c(1, 5, 6, 8)] my_vec[3:8] my_vec[-3:-1]

Logical operators

...

Panel

panelIconId	1f600
panelIcon	:grinning:
panelIconText	😀
bgColor	#DEEBFF

Code break:

Test my_vec with the above operators using a number, and look at the output.
Create another vector and see how the operators behave when used on two vectors.
(You must use vectors of the same length.)

Examples (create your own)

Code Block
#examples: my_vec < 5 my_vec <= 5 my_vec1 <- c(1:10) my_vec2 <- c(5:14) my_vec1 < my_vec2

Extraction of elements using logical operators

Extraction of elements can be carried out using boolean vectors resulting from the above types of logical ‘tests’. Here is an example. Try to predict what the sub_vector will look like, before executing that line.

Code Block
a_vector = c(1:3, 10:15, 5:20) # note another way to create a vector ! sub_vector = a_vector[a_vector < 12] sub_vector

Vectorization

One of the great things about R functions is that most of them are vectorized. This means that the function will operate on all elements of a vector without needing to apply the function on each element separately. For example, to multiple each element of a vector by 5 we can simply use

When you’re operating on vectors, if vectorization makes sense with the operation, then the operation will likely be carried out in a fully vectorized manner.

Re-Entry into R

At this point, we have come into contact with a few object types in R: numeric, character, vector.

We’ve also learned how to interact with a vector by extracting elements, using base R functions, and using logical operators.

Until now, however, we’ve been interacting with these objects without fully appreciating what these types of objects represent in the R language. This was intentional, because gaining some familiarity with objects and how they interact from a more intuitive standpoint is a great background to have for what comes next.

At this point, I now would like to take a more detailed approach to introducing more complex aspects of R, namely, a filler range of it’s objects, its data types, as well as some useful packages and functions.

Objects	Data Structures
Numeric (integer, double) character logical factor complex	vector list dataframe matrix array

We will start our coding in this section with the vector, and we will encounter all of the basic object types as individual elements of these vectors. We will then discuss lists, and move on to dataframes as quickly as possible, since it is the dataframe where the power of R is best leveraged.

There are four primary types of atomic vectors: logical, integer, double, and character.

Have a go at creating some of them.

Code:

Code Block
num_1 <- as.integer(c(1:10)) num_2 <- as.double(c(1:10)) char_1 <- 'coding region' boolean_1 <- c(T,F,F)

Note

Warning: Recycling

We saw how boolean vectors can interact with numeric vectors, for example. Try indexing the above num_1 vector with boolean_1. Figure out what is going on when the boolean vector is shorter than the vector it is indexing.

Expand

title	Answer:

Recycling! The boolean vector is recycled however many times is needed to reach the end of the vector it is indexing

Coercion

Even though R’s vectors have a specific type, it’s quite easy to convert them to another type. This is called coercion. As a language for data analysis, this flexibility works mostly to our advantage. It’s why we generally don’t stress out over integer versus double in R. It’s why we can compute a proportion as the mean of a logical vector (we exploit automatic coercion to integer in this case). But unexpected coercion is a rich source of programming puzzles, so always consider this possibility when coding.

For explicit coercion, use the as.*() functions.

Code Block

language	r

v_log <- c(T,F,F,T)
as.integer(v_log)
as.numeric(v_log)
as.character(v_log)

Note: logical vectors can be expressed equally well by the integers 0 for FALSE and 1 for TRUE.

Code Block
as_logical <- as.logical(c(1,1,0,0,0)) as_logical

Note

Warning: implicit coercion

Coercion can also be triggered by other actions, such as assigning a scalar of the wrong type into an existing vector.

Code Block

language	r

num_vect <- c(1:10)
num_vect <- c(num_vect, 'string')
num_vect

Our numeric vector was silently coerced to character. Notice that R did this quietly, without warning. Always pay attention to the question: Is this object of the type I think it is? How sure am I about that?

Some loose ends: NA, NaN, Inf, -Inf

NA is the object that R uses to indicate ‘not available’, or ‘missing’. It will incorporate itself into a vector and take the same data type as the elements around it. Therefore, it is necessary to know for certain if the data contains NAs by testing using the function is.na().

Code Block
a_vect <- c(1,-3,5,NA,7, NA) typeof(a_vect[4]) is.na(a_vect) sum(is.na(a_vect))

NaN is the object produce when there is no possible mathematical result for an operation, for example, when dividing 0 by 0, or when taking the square root of a negative number.

Code Block
a_vect/0 sqrt(a_vect) is.nan(sqrt(a_vect))

We can see above how Inf is produced when dividing by zero. The common way that -Inf is produced is when taking the logarithm of zero

Code Block
log(rep(0, 5))

Lists

Lists are needed for holding objects that violate the constraints imposed by an atomic vector: in other words, if one or both of the following is true.

The object has length greater than 1.
The individual objects to be collected together are not of the same type.

Lists are a more general form of a vector. Whereas a vector must contain elements that are all the same object type, the elements of a list may be various types. Lists are also commonly created while providing names for each element.
Three common ways to create a list from scratch:

Provide the element names together with the elements of the list

Code Block
list1 <- list(element1 = 1, element2 = c(2,3,4), element3 = T)

Create a list and then provide the names as a vector.

Code Block
list1 <- list(1, c(2,3,4), T) names(list1) = c("element1" "element2" "element3")

Convert some other object to a list

Code Block
a_vect <- c(11:20) a_list <- as.list(a_vect)

Indexing a list:
With vectors, we saw that there was one way of indexing elements: using brackets.
With lists, the elements can be indexed in three ways, all of which are important to understand.
1. Single brackets (returns a list of the selected element or elements)

Code Block
list1 <- list(name1 = 1, name2 = c(2,3,4), name3 = T, name4 = NA) list1[2]

Double brackets (returns the element indexed)

Code Block
list1 <- list(element1 = 1, element2 = c(2,3,4), element3 = T) list1[[2]]

Using $<element name>

Code Block
list1$element2

Panel

panelIconId	2753
panelIcon	:question:
panelIconText	❓
bgColor	#FFEBE6

Question: how do we identify a list from what R outputs to the console in the above 3 cases.

Hint: In 2. above, use : to select a range of elements

Expand

title	Answer:

A list always has names, and will print them.

By the way, a vector can also have element names! However, note the format when R prints out the object.

Code Block
a_named_vect <- c(5:10) names(a_named_vect) <- letters[1:6] a_named_vect

Dataframes

R is designed for handlign data in the form of large tables, so this section is where we’ll see the full power of R for data analysis. The underlying structure of a dataframe is just a list of vectors, so we already understand much about how to handle them. However, we are going to need a few tools in order to work effectively with dataframes.

stringr: Library containing many functions for string operations.

Code Block

library(stringr)
str <- c("Hello", "hello there", "hi", "ahoy", "nope")
str_detect(str, 'he')     #tests whether elements contain a pattern
str_split_i(str, 'll', 1) #splits each element at a pattern,and selects the first split
str_split_i(str, 'll', 2) #splits each element at a pattern,and selects the second split
str_c( rep('vector', 10), rep('_', 10), c(1:10) ) # pastes together strings

table(): Quickly count instances of elements occurring in a vector

Code Block
number_vect <- sample(1:10, 1000) table(number_vect)

rnorm(): sample from a normal distribution

Code Block
?rnorm rnorm(5,10,1)

order(): Sort a vector numerically or alphabetically. Note: returns the indexes of the vector used to sort the vector.

Code Block
number_vect <- sample(1:10, 10) order(number_vect, decreasing = T) number_vect[order(number_vect)]

%in%: This strange-looking operator returns a boolean vector based on whether the elements of the first vector are found in the second.

Code Block
vect_1 <- c(1:20) vect2 <- c(7:11) vect_1[vect1 %in% vect2] vect_1[!vect1 %in% vect2]

Functions: We won’t spend much time with writing functions, but you will need to be able to create some simple functions of the type shown below. We can spend some time here if people are not familiar with function creation.

Code Block

language	r

#A function that takes in a value, and samples a normal distribution 
using that value x as a mean, returning a vector of length y.
add_two_values <- function(x,y) { return(x+y) }
sample_normal <- function(x,y) { return( rnorm(y, mean=x) ) }

lapply: lapply is a convenient way to generate a list of vectors, for example. It is also a convenient way to modify each element of a list. There are other ‘apply’ functions in R (e.g., apply, sapply, mapply), but we will stick with lapply for this course. It is best to understand one function before trying to use them all.

Code Block
generate_vect <- function(x){ return( rnorm(100, mean = x, sd = 3) ) } list_of_vectors <- lapply(c(3,7,20), FUN = generate_vect)

Expand

title	Mapply, more advanced

Code Block
generate_vect <- function(x,y){ return( rnorm(100, mean = x, sd = y) ) } output <- mapply(generate_vect, c(1,10,100), c(.1, 1, 10))

Creating a dataframe:

Method 1: from vectors, by column

Code Block

#generate vectors
vect1 <- c(1:10)
vect2 <- sample(c(20:25), 10, replace=T)
vect3 <- str_c(LETTERS[1:10], letters[1:10])
#create dataframe
df <- as.data.frame(cbind(vect1, vect2, vect3))
df$vect1 <- as.numeric(df$vect1)
df$vect2 <- as.numeric(df$vect2)

Method 2: from vectors, by row

Code Block
vect1 <- c(1:10) vect2 <- sample(c(20:25), 10, replace=T) vect3 <- str_c(LETTERS[1:10], letters[1:10]) df <- as.data.frame(rbind(vect1, vect2, vect3))

Method 3: from a named list, by column

Code Block

#generate list of vectors
generate_vect <- function(x){ return( rnorm(100, mean = x, sd = 3) ) }
list_of_vectors <- lapply(c(3,7,20), FUN = generate_vect)
#provide names for the list
names(list_of_vectors) <- str_c(rep('vect', 10), rep('_', 10), c(1:10)) 
#create dataframe
df <- as.data.frame(list_of_vectors)

Slicing Dataframes:

Slicing a dataframes means extracting specific rows and columns. This can be done in a number of ways in base R:

numerical row and column indices:

Code Block
dataf <- read.table("data/walrus_sounds.tsv", header = F, sep = "\t") dataf[3,2] dataf[,2] dataf[1,] dataf[1:2,1:2]

Using column names and row names

Code Block

#Read in dataframe, define column names and row names
dataf <- read.table("data/walrus_sounds.tsv", header = F, sep = "\t")
colnames(dataf) <- c('name', 'sound_type', 'time_min_sec')
row_names_vect <- str_c(expand.grid(letters, letters)$Var2, expand.grid(letters, letters)$Var1)
rownames(dataf) <- row_names_vect[1:dim(dataf)[1]]
#slice using column and row names
dataf[c('aa','ab'), c('name', 'time_min_sec')]

Panel

panelIconId	1f600
panelIcon	:grinning:
panelIconText	😀
bgColor	#DEEBFF

Code challenge:

Convert the dataframe ‘time_min_sec’ column to total seconds in a new vector called ‘sec’.

Expand

title	solution:

Code Block
sec = as.numeric(str_split_i(dataf$time_min_sec, ":",1))*60 + as.numeric(str_split_i(dataf$time_min_sec, ":",2))

Now, add the new vector to our dataframe in a column named ‘total_sec’ as follows:

Code Block
dataf$total_sec = sec

Panel

panelIconId	1f600
panelIcon	:grinning:
panelIconText	😀
bgColor	#DEEBFF

Lets try that again but round to the nearest minute this time. Problem: We can’t use the round() function on seconds, because round only understands decimal numbers. What to do?

Slicing dataframes using logical vectors

Above, we have been selecting rows and columns using vectors containing the indices of the columns or rows or the names of the columns or rows.

Another and perhaps more important way to manipulate dataframes is to use logical vectors, whereby selection occurs only where TRUE appears in the vector.

Selecting rows using a test on a column vector

Code Block

dataf_Jocko <- dataf[,dataf$name == "Jocko"]
dataf_Jocko_chort <- dataf[dataf$name == "Jocko" & dataf$sound_type == 'chortle',]
mean(dataf$total_sec[dataf$name == "Jocko" & dataf$sound_type == 'chortle'])

mean(dataf$total_sec[dataf$name == "Jocko" & (dataf$sound_type == 'chortle' | dataf$sound_type == 'chortle')])

Panel

panelIconId	2753
panelIcon	:question:
panelIconText	❓
bgColor	#FFEBE6

Question: What is the following code accomplishing in extracting elements of the dataframe?

Code Block
mean(dataf$total_sec[dataf$name == "Jocko" & (dataf$sound_type == 'chortle' \| dataf$sound_type == 'gong')])x

Writing dataframes to files

It’s important to take analysis results from R and save them in a readable format. It is also important to use functions whenever possible in order to make your code modular.

Let’s look at how to create a function to analyze dataframes, where this function also writes the results to new files. In this example, we will generate separate dataframes for each walrus, and write the dataframes to separate files in a new directory named for each walrus. We will be writing a single function to do all of this, so it will take some time, but it’s a good example of a useful function.

The input to the function will be the walrus name, and the function will automatically do the slicing, creation of new column, and sorting in the previous manner. We will then include writing of the data to the files. We will be using the base R function write.table(), which is the companion to read.table.

Code Block

separate_and_write = function(w_name, dff = dataf) {
  dataf_out <- dff[dff$name == w_name,]
  dataf_out$total_sec <- as.numeric(str_split_i(dataf_out$time_min_sec, ":",1))*60 + as.numeric(str_split_i(dataf_out$time_min_sec, ":",2))
  dataf_out <- dataf_out[order(dataf_out$total_sec, decreasing = T),]
  system(str_c('mkdir ', w_name, "_output"))
  
  write.table(dataf_out, str_c(w_name, "_output/", w_name, '_table.txt' ), 
              quote = F, sep = '\t', col.names = T, row.names = F) 
}

Other functions that write dataframes:

fwrite (from data.table package)

More information/Resources

...

Version	Old Version 1	New Version 2
Changes made by	Matt Bramble	Matt Bramble
Saved on	May 27, 2025	May 29, 2025

Versions Compared

Key

RStudio Setup

Orientation

Environment/History/Connections

Files/Plots/Packages/Help/Viewer

Packages

CRAN packages

Bioconductor packages

GitHub packages

Using packages

File Orientation

Working directories

Naming R files

Coding in R

Some guidelines:

R as a calculator

Objects in R

Naming objects

‘Base R’ Functions

Our first R-specific object, the vector.

Logical operators

Re-Entry into R

Coercion

Some loose ends: NA, NaN, Inf, -Inf

Lists

Dataframes

More information/Resources

Content Comparison

Versions Compared

Key

RStudio Setup

Orientation

Environment/History/Connections

Files/Plots/Packages/Help/Viewer

Packages

CRAN packages

Bioconductor packages

GitHub packages

Using packages

File Orientation

Working directories

Naming R files

Coding in R

Some guidelines:

R as a calculator

Objects in R

Naming objects

‘Base R’ Functions

Our first R-specific object, the vector.

Logical operators

Re-Entry into R

Coercion

Some loose ends: NA, NaN, Inf, -Inf

Lists

Dataframes

More information/Resources