class: center, middle, inverse, title-slide .title[ # Advanced quantitative data analysis ] .subtitle[ ## Reshaping and joininig ] .author[ ### Mengni Chen ] .institute[ ### Department of Sociology, University of Copenhagen ] --- <style type="text/css"> .remark-slide-content { font-size: 20px; padding: 20px 80px 20px 80px; } .remark-code, .remark-inline-code { background: #f0f0f0; } .remark-code { font-size: 14px; } </style> #Let's get ready ```r install.packages("skimr") #provide a descriptive information about your data install.packages("splitstackshape") #to transform a long-format data into a wide-format data ``` ```r library(tidyverse) # Add the tidyverse package to my current library. library(haven) # Import data. library(janitor)# Cleaning data library(ggplot2) # Allows us to create nice figures. library(skimr) library(splitstackshape) ``` --- #prepre the data [Click here and copy to the code to get your data ready for today.](https://rpubs.com/fancycmn/1236392) --- #Second, clean data **Now let us look at the cleaned data by using the function `skim()` under package "skimr"** ```r skimr::skim(wave1a) ``` <img src="https://github.com/fancycmn/slide-7/blob/main/S7_Pic10.PNG?raw=true" width="100%" style="display: block; margin: 30px;"> --- #Third, combine to generate a panel data <img src="https://github.com/fancycmn/24-Session8/blob/main/Figure1a.JPG?raw=true" width="75%" style="display: block; margin: 30px;"> --- #Third, combine to generate a panel data - Generate a long data, use `rbind()`. - `rbind()` is to combine rows together ```r panel_long <- rbind(wave1a, wave2a, wave3a, wave4a, wave5a, wave6a) #rbind is combine dataset by rows ``` <img src="https://github.com/fancycmn/slide-7/blob/main/S7_Pic11.PNG?raw=true" width="60%" style="display: block; margin: 30px;"> --- # Generate a wide data When it is this case, it is easy to generate a wide data <img src="https://github.com/fancycmn/24-Session8/blob/main/Figure2.JPG?raw=true" width="70%" style="display: block; margin: 30px;"> --- # Generate a wide data When it is this case, there are several ways to generate a wide data <img src="https://github.com/fancycmn/24-Session8/blob/main/Figure3.JPG?raw=true" width="100%" style="display: block; margin: 30px;"> --- # Generate a wide data When it is this case, there are several ways to generate a wide data **Way1: inner join**, joining only those who participate in all three waves <img src="https://github.com/fancycmn/24-Session8/blob/main/Figure4.JPG?raw=true" width="65%" style="display: block; margin: 30px;"> --- # Generate a wide data When it is this case, there are several ways to generate a wide data **Way2: left join**, focus on those who participate in wave1 and follow them over time <img src="https://github.com/fancycmn/24-Session8/blob/main/Figure5.JPG?raw=true" width="65%" style="display: block; margin: 30px;"> --- # Generate a wide data When it is this case, there are several ways to generate a wide data **Way3: right join**, focus on those who participate in wave3 and follow them retrospectively in time <img src="https://github.com/fancycmn/24-Session8/blob/main/Figure6.JPG?raw=true" width="65%" style="display: block; margin: 30px;"> --- # Generate a wide data When it is this case, there are several ways to generate a wide data **Way4: full join**, no matter in which wave the respondents participate, we follow all of them over time <img src="https://github.com/fancycmn/24-Session8/blob/main/Figure7.JPG?raw=true" width="60%" style="display: block; margin: 30px;"> --- # Generate a wide data, use `join()` An example of full join, using `full_join()` ```r wave1b <- wave1a %>% rename(wave.1=wave, age.1=age, sex.1=sex, relstat.1=relstat, hlt.1=hlt, sat.1=sat) #rename variables, with variable name.1 to specify it is from wave1 wave2b <- wave2a %>% rename(wave.2=wave, age.2=age, sex.2=sex, relstat.2=relstat, hlt.2=hlt, sat.2=sat) #rename variables, with variable name.2 to specify it is from wave2 wave3b <- wave3a %>% rename(wave.3=wave, age.3=age, sex.3=sex, relstat.3=relstat, hlt.3=hlt, sat.3=sat) wave4b <- wave4a %>% rename(wave.4=wave, age.4=age, sex.4=sex, relstat.4=relstat, hlt.4=hlt, sat.4=sat) wave5b <- wave5a %>% rename(wave.5=wave, age.5=age, sex.5=sex, relstat.5=relstat, hlt.5=hlt, sat.5=sat) wave6b <- wave6a %>% rename(wave.6=wave, age.6=age, sex.6=sex, relstat.6=relstat, hlt.6=hlt, sat6.6=sat) panel_wide <- full_join(wave1b, wave2b, by = "id") %>% # Full join wave1b and wave2b full_join(wave3b, by = "id") %>% # Full join with wave3b full_join(wave4b, by = "id") %>% # Full join with wave4b full_join(wave5b, by = "id") %>% # Full join with wave5b full_join(wave6b, by = "id") # Full join with wave6b ``` --- # Generate a wide data, use `join()` An example of full join, using `full_join()` ```r panel_wide_inner <- inner_join(wave1b, wave2b, by = "id") %>% # inner join wave1b and wave2b inner_join(wave3b, by = "id") %>% # inner join with wave3b inner_join(wave4b, by = "id") %>% # inner join with wave4b inner_join(wave5b, by = "id") %>% # inner join with wave5b inner_join(wave6b, by = "id") # inner join with wave6b ``` Overview of the two joined data ```r skimr::skim(panel_wide) skimr::skim(panel_wide_inner) ``` --- #Reshape data long <---> wide <img src="https://github.com/fancycmn/24-Session8/blob/main/Figure8.JPG?raw=true" width="95%" style="display: block; margin: 30px;"> --- #Reshape data from long to wide - Long to wide, using `pivot_wider()`in the package of "tidyr" (embedded in "tidyverse") ```r panel_towide <- pivot_wider( panel_long, #dataset to transform id_cols = id, #to identify the id column, in this case, "id" is the column names_from = wave, #i want use "wave" to create columns that reflect variables in each wave values_from = c(age, sex, relstat, hlt, sat) #specify where the value comes from ) ``` data of panel_long <img src="https://github.com/fancycmn/24-Session8/blob/main/Figure9.JPG?raw=true" width="50%" style="display: block; margin: 30px;"> --- #Reshape data from long to wide: how does it looks like data of panel_long <img src="https://github.com/fancycmn/24-Session8/blob/main/Figure9.JPG?raw=true" width="60%" style="display: block; margin: 30px;"> data of panel_towide <img src="https://github.com/fancycmn/24-Session8/blob/main/Figure10.JPG?raw=true" width="90%" style="display: block; margin: 30px;"> --- #Reshape data from wide to long - Wide to long, using `merged.stack` in the package "splitstackshape" ```r panel_tolong<- merged.stack(panel_wide, #dataset for transfrom var.stubs = c("age", "sex","relstat", "hlt", "sat", "wave"), #var.stubs is to specify the prefixes of the variable groups sep = ".") #sep is to specify the character that separates the "variable name" from the "times" in the source ``` data of panel_wide <img src="https://github.com/fancycmn/24-Session8/blob/main/Figure11.JPG?raw=true" width="100%" style="display: block; margin: 30px;"> --- #Reshape data from wide to long: how does it looks like data of panel_wide <img src="https://github.com/fancycmn/24-Session8/blob/main/Figure11.JPG?raw=true" width="100%" style="display: block; margin: 30px;"> data of panel_tolong <img src="https://github.com/fancycmn/slide-7/blob/main/S7_Pic15.PNG?raw=true" width="80%" style="display: block; margin: 10px;"> --- #Reshape data from wide to long - Wide to long, using `merged.stack` in the package "splitstackshape" ```r panel_tolong<- merged.stack(panel_wide, #dataset for transfrom var.stubs = c("age", "sex","relstat", "hlt", "sat", "wave"), #var.stubs is to specify the prefixes of the variable groups sep = ".") %>% #sep is to specify the character that separates the "variable name" from the "times" in the source drop_na(wave) #drop the observations which did not join the wave ``` <img src="https://github.com/fancycmn/slide-7/blob/main/S7_Pic16.PNG?raw=true" width="50%" style="display: block; margin: 10px;"> --- #Take home - Understand what panel data looks like - Make a long data (person-period data) - Make a wide data (person level data) - Reshape long data to wide data - Reshape wide data to long data - Important code: - `skimr::skim` to view the descriptive statistics - `rbind()`to append data - join functions: e.g. `inner_join()`, `full_join()` - `pivot_wider()`to transform a long data to wide data - `merged.stack()` to transform a wide data to long data --- class: center, middle #[Exercise](https://rpubs.com/fancycmn/1236439)