Rithika Kumar
September 19, 2019
knitr() then converts to Pandoc which then converts it to a Word/PDF/Html etc.```{r} and end with ```.Your code should look something like this:
In case you don’t want the code chunk to be visible in your document you can use the following syntax: ```{r, echo = FALSE} and end with ```
Using # creates a new section while ## creates a sub-sub section Eg. # Answer 1.1 –> this will create a new section within your document
Once you are done writing, you must knit your document using the Knit button at the top. Click on it and knit to PDF.
Let’s now go over the example on the screen to better understand how this is done.
setwd("/Users/rithika/Google\ Drive/Penn/TA/Intro\ to\ DS/Rk_Recitation/Data")
lfp1 <- read.csv("lfp1.csv")
lfp2 <- read.csv("lfp2.csv")Our goal: Merge these two datasets
We want to see if there is a unique ID that links both these datasets
It looks like the first column in these two datasets is the same. The next thing we want to do is check whether these variables have the same class using class().
In order to merge the data, we need to ensure that the unique identifier has the same name in both datasets and for this we need to rename one of them
# Verifying that the number of unique IDs is similar to the number of rows
length(unique(lfp2$id))==dim(lfp2)[1]
# How many passengers in this dataset?
dim(lfp2)[1]
# Verifying that the number of unique IDs is similar to the number of rows
length(unique(lfp1$id))==dim(lfp1)[1]
# How many passengers in this dataset?
dim(lfp1)[1]However, we now see that number of unique IDs in lfp1 is lesser than that in lfp2. This will result in NAs when we merge Which might be needed to be deleted after the merge
We have successfully merged both datasets! Let’s take a look at the content of our new dataset.
Before that let’s clear all the NAs
# Removing all rows with missing values using the na.omit() function (we call this "listwise deletion")
lfp.clean <- na.omit(lfp.merge)As you see, we have all of the information about our respondents within one dataframe rather than it being split over two separate df
Using the lfp1 and lfp2 datasets, identify if they have a unique ID between them (you already did this if you followed recitation)
Use the left_join() command to merge lfp1 and lfp2 and identify the difference in this dataset from the dataset formed using right_join. Hint: the summary() command might provide some insights
Similarly, use the full_join() and inner_join() to merge lfp1 and lfp2 and get the dimensions of the resulting datasets.