
Ryan Clement, Data Services Librarian: go/ryan/

Wendy Shook, Science Data Librarian: go/wshook/

Jonathan Kemp, Telescope & Scientific Computing Specialist: go/jkemp/
June 21, 2021, 1:00-3:00 PM EDT

Ryan Clement, Data Services Librarian: go/ryan/

Wendy Shook, Science Data Librarian: go/wshook/

Jonathan Kemp, Telescope & Scientific Computing Specialist: go/jkemp/
RStudio extends what R can do, and makes it easier to write R code and interact with R.
dir.create("data")
dir.create("data_output")
dir.create("fig_output")
download.file("https://ndownloader.figshare.com/files/11492171",
"data/SAFI_clean.csv", mode = "wb")
install.packages() functionUsing all you’ve learned so far, make sure that you have the tidyverse package installed, and then install two more packages: here and lubridate
When you’re done, please give a :thumbs-up: or a :green-check: in the Zoom reactions.
What do you think is the current content of the object area_acres?
123.5 or 6.175?
Go to PollEv.com/ryanclement191 to respond!
Create two variables r_length and r_width and assign them values. It should be noted that, because length is a built-in R function, R Studio might add “()” after you type length and if you leave the parentheses you will get unexpected results. This is why you might see other programmers abbreviate common words. Create a third variable r_area and give it a value based on the current values of r_length and r_width. Show that changing the values of either r_length and r_width does not affect the value of r_area.
When you’re done, please give a :thumbs-up: or a :green-check: in the Zoom reactions.
Type in ?round at the console and then look at the output in the Help pane. What other functions exist that are similar to round? How do you use the digits parameter in the round function?
When you’re done, please give a :thumbs-up: or a :green-check: in the Zoom reactions.
We’ve seen that atomic vectors can be of type character, numeric (or double), integer, and logical. But what happens if we try to mix these types in a single vector?
What will happen in each of these examples? (hint: use class() to check the data type of your objects):
num_char <- c(1, 2, 3, "a")
num_logical <- c(1, 2, 3, TRUE)
char_logical <- c("a", "b", "c", TRUE)
tricky <- c(1, 2, 3, "4")
Why do you think this happens?
When you’re done, please give a :thumbs-up: or a :green-check: in the Zoom reactions.
You’ve probably noticed that objects of different types get converted into a single, shared type within a vector. In R, we call converting objects from one class into another class coercion. These conversions happen according to a hierarchy, whereby some types get preferentially coerced into other types. Can you draw a diagram that represents the hierarchy of how these data types are coerced?
When you’re done, please give a :thumbs-up: or a :green-check: in the Zoom reactions.
logical < integer < numeric < character
Using this vector of rooms, create a new vector with the NAs removed.
rooms <- c(1, 2, 1, 1, NA, 3, 1, 3, 2, 1, 1, 8, 3, 1, NA, 1)
When you’re done, please give a :thumbs-up: or a :green-check: in the Zoom reactions.
Use the function median() to calculate the median of the rooms vector.
Go to PollEv.com/ryanclement191 to respond!
Use R to figure out how many households in the set (rooms) use more than 2 rooms for sleeping.
Go to PollEv.com/ryanclement191 to respond!
A data frame can be created by hand, but most commonly they are generated by the functions read_csv() or read_table(); in other words, when importing spreadsheets from your hard drive (or the web). We will now demonstrate how to import tabular data using read_csv().
dim(interviews) - returns a vector with (rows, columns)nrow(interviews) - returns the number of rowsncol(interviews) - returns the number of columnshead(interviews) - shows the first 6 rowstail(interviews) - shows the last 6 rowsnames(interviews) - returns the column namesstr(interviews) - structure + information about the class, length, and content of each columnsummary(interviews) - summary statistics for each columnglimpse(interviews) - returns dimensions of the tibble, name/class and preview of each columnNote: most of these functions are “generic.” They can be used on other types of objects besides data frames or tibbles.
Create a tibble (interviews_100) containing only the data in row 100 of the interviews dataset.
Notice how nrow() gave you the number of rows in the tibble? Use that number to pull out just that last row in the tibble. Create a new tibble (interviews_last) using the nrow() instead of the row number. Compare that with what you see as the last row using tail() to make sure it’s meeting expectations.
Using the number of rows in the interviews dataset that you found in question 2, extract the row that is in the middle of the dataset. Store the content of this middle row in an object named interviews_middle. (hint: This dataset has an odd number of rows, so finding the middle is a bit trickier than dividing n_rows by 2. Use the median( ) function and what you’ve learned about sequences in R to extract the middle row!
Combine nrow() with the - notation above to reproduce the behavior of head(interviews), keeping just the first through 6th rows of the interviews dataset.
When you’re done, please give a :thumbs-up: or a :green-check: in the Zoom reactions.
R has a special data class, called factor, to deal with categorical data that you may encounter when creating plots or doing statistical analyses. Factors are very useful and actually contribute to making R particularly well suited to working with data. So we are going to spend a little time introducing them.
Factors represent categorical data. They are stored as integers associated with labels and they can be ordered (ordinal) or unordered (nominal). Factors create a structured relation between the different levels (values) of a categorical variable, such as days of the week or responses to a question in a survey. This can make it easier to see how one element relates to the other elements in a column. While factors look (and often behave) like character vectors, they are actually treated as integer vectors by R. So you need to be very careful when treating them as strings.
Once created, factors can only contain a pre-defined set of values, known as levels. By default, R always sorts levels in alphabetical order.
When you’re done, please give a :thumbs-up: or a :green-check: in the Zoom reactions.
One of the most common issues that new (and experienced!) Rusers have is converting date and time information into a variable that is appropriate and usable during analyses. As a reminder, the best practice for dealing with date data is to ensure that each component of your date is stored as a separate variable. In our dataset, we have a column interview_date which contains information about the year, month, and day that the interview was conducted. Let’s convert those dates into three separate columns.
We are going to use the package lubridate, which is included in the tidyverse installation but not loaded by default, so we have to load it explicitly with library(lubridate).
To sign up for more sessions: go/summer-data-workshops/
Assessment survey: go/summer-data-assessment/
Ryan’s contact info: go/ryan/