Opening an R Markdown file in R Studio

Open up Rstudio
Create a new Markdown file and while creating it choose PDF in the pop-up
Save the file
R Markdown is saved as a .Rmd file
knitr() then converts to Pandoc which then converts it to a Word/PDF/Html etc.

Writing functions within R Markdown

Code chunks start with the following line: ```{r} and end with ```.
You can quickly insert chunks into your R Markdown file with the keyboard shortcut Cmd + Option + I (Windows Ctrl + Alt + I).
Let’s try to import our dataset by typing the command within the code chunk

Your code should look something like this:

getwd()
setwd("/Users/rithika/Google\ Drive/Penn/TA/Intro\ to\ DS/Rk_Recitation/Data") 
#insert the path of the directory of your choice
country.profile <- read.csv("country_profile_variables.csv")

Other things to know about R Markdown

In case you don’t want the code chunk to be visible in your document you can use the following syntax: ```{r, echo = FALSE} and end with ```
Using # creates a new section while ## creates a sub-sub section Eg. # Answer 1.1 –> this will create a new section within your document
Once you are done writing, you must knit your document using the Knit button at the top. Click on it and knit to PDF.
Let’s now go over the example on the screen to better understand how this is done.

Download the data

Download the data lfp1.csv and *lfp2.csv from the Canvas website
Set your working directory and make sure the file you just downloaded is there
Load them into R Studio

setwd("/Users/rithika/Google\ Drive/Penn/TA/Intro\ to\ DS/Rk_Recitation/Data") 

lfp1 <- read.csv("lfp1.csv")
lfp2 <- read.csv("lfp2.csv")

Our goal: Merge these two datasets

We want to see if there is a unique ID that links both these datasets

head(lfp1)
head(lfp2)

Merging the data

It looks like the first column in these two datasets is the same. The next thing we want to do is check whether these variables have the same class using class().

class(lfp1)
class(lfp2)

In order to merge the data, we need to ensure that the unique identifier has the same name in both datasets and for this we need to rename one of them

# Let's rename the one in lfp2

lfp2$id <- lfp2$id.no

Checking the dimensions

# Verifying that the number of unique IDs is similar to the number of rows
length(unique(lfp2$id))==dim(lfp2)[1]

# How many passengers in this dataset?
dim(lfp2)[1]

# Verifying that the number of unique IDs is similar to the number of rows
length(unique(lfp1$id))==dim(lfp1)[1]

# How many passengers in this dataset?
dim(lfp1)[1]

However, we now see that number of unique IDs in lfp1 is lesser than that in lfp2. This will result in NAs when we merge Which might be needed to be deleted after the merge

Merging the data!!

library(dplyr)
# Merging using merge()
lfp.merge <- merge(lfp1,lfp2,by="id")

# Merging using right_join()
lfp.merge <- right_join(lfp1,lfp2,by="id")

# Checking the dimensions of the new dataset
dim(lfp.merge)

# Remove old datasets (if needed)
#rm(lfp1)
#rm(lfp2)

Merged data

We have successfully merged both datasets! Let’s take a look at the content of our new dataset.

Before that let’s clear all the NAs

# Removing all rows with missing values using the na.omit() function (we call this "listwise deletion")
lfp.clean <- na.omit(lfp.merge)

# Looking at the content of the first five rows using head()
head(lfp.merge,5)

As you see, we have all of the information about our respondents within one dataframe rather than it being split over two separate df

Using the lfp1 and lfp2 datasets, identify if they have a unique ID between them (you already did this if you followed recitation)
Use the left_join() command to merge lfp1 and lfp2 and identify the difference in this dataset from the dataset formed using right_join. Hint: the summary() command might provide some insights
Similarly, use the full_join() and inner_join() to merge lfp1 and lfp2 and get the dimensions of the resulting datasets.

Merging Data

Recitation 3

Goal for today

Reviewing R Markdown