Task
Create an Example. Using one or more TidyVerse packages, and any dataset from fivethirtyeight.com or Kaggle, create a programming sample “vignette” that demonstrates how to use one or more of the capabilities of the selected TidyVerse package with your selected dataset. (25 points)
Extend an Existing Example. Using one of your classmate’s examples (as created above), extend his or her example with additional annotated code. (15 points)
The dataset: The data source for this project is from Kaggle using the link below: https://www.kaggle.com/theworldbank/sustainable-development-goals
I’m familiar with this dataset, I intend to analyze and perform data mungling, data transformation and visualization with the Tidyverse package.
The data consist of the following scenrios:
SSP1 = low challenges for both climate change adaptation and mitigation resulting from income growth which does not rely heavily on natural resources and technological change, coupled with low fertility rate and high educational attainment.
SSP2 = benchmark scenario and assumes the continuation of current global socioeconomic trends at the global level.
SSP3 = low economic growth coupled with low educational attainment levels and high population growth at the global level are the main elements of the narrative,which is characterized by high mitigation and adaptation challenges.
SSP4 = narrative of worldwide polarization, with high income countries exhibiting relatively high growth rates of income, while developing economies present low levels of education, high fertility and economic stagnation.
SSP5 = high economic growth coupled with high demand for fossil energy from developing economies, thus increasing global Carbon dioxide emissions.
Load the Tidyverse Package
library(tidyverse)
## -- Attaching packages ------------------------------------------------------------------------ tidyverse 1.2.1 --
## v ggplot2 3.2.1 v purrr 0.3.2
## v tibble 2.1.3 v dplyr 0.8.3
## v tidyr 0.8.3 v stringr 1.4.0
## v readr 1.3.1 v forcats 0.4.0
## -- Conflicts --------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
Importing the data from my GitHub
SDG <- read.csv("https://raw.githubusercontent.com/Emahayz/Data-607-Class/master/data_poverty_gdppc.csv", header = T, sep = ",")
Lets view the new structure
str(SDG)
## 'data.frame': 15980 obs. of 6 variables:
## $ ccode : Factor w/ 188 levels "AFG","AGO","ALB",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ year : int 2015 2015 2015 2015 2015 2016 2016 2016 2016 2016 ...
## $ ssp : Factor w/ 5 levels "SSP1","SSP2",..: 1 2 3 4 5 1 2 3 4 5 ...
## $ hc.source : Factor w/ 2 levels "regression","survey": 1 1 1 1 1 1 1 1 1 1 ...
## $ extreme.Poor: int 12698235 12698235 12698235 12698235 12698235 12888301 12888301 12888301 12888301 12888301 ...
## $ gdp.capita : num 1791 1791 1791 1791 1791 ...
head(SDG)
## ccode year ssp hc.source extreme.Poor gdp.capita
## 1 AFG 2015 SSP1 regression 12698235 1790.51
## 2 AFG 2015 SSP2 regression 12698235 1790.51
## 3 AFG 2015 SSP3 regression 12698235 1790.51
## 4 AFG 2015 SSP4 regression 12698235 1790.51
## 5 AFG 2015 SSP5 regression 12698235 1790.51
## 6 AFG 2016 SSP1 regression 12888301 1780.16
Some Visualization- Scatter Plot
ggplot(SDG, aes(x=gdp.capita, y=extreme.Poor,shape=ssp, color=ssp)) +geom_point()+
labs(
title="Global Poverty Projection",
x="Gross Domestic Product", y = "Extreme Poverty"
)
The scatter plot shows that as National GDP decreases, Extreme Poverty Increases with more indication of scenario SSP3 to SSP5.
This data seems to be presented in long format,I will Transform the data to wide format
SDGWide <- SDG %>% spread(ssp, extreme.Poor)
Cleaning the data by removing the missing values
SDGWide_New <- na.omit(SDGWide)
View the data again
head(SDGWide_New)
## ccode year hc.source gdp.capita SSP1 SSP2 SSP3 SSP4
## 1 AFG 2015 regression 1790.51 12698235 12698235 12698235 12698235
## 2 AFG 2016 regression 1780.16 12888301 12888301 12888301 12888301
## 3 AFG 2017 regression 1790.68 12789616 12789616 12789616 12789616
## 4 AFG 2018 regression 1812.69 12532599 12532599 12532599 12532599
## 5 AFG 2019 regression 1845.41 12149180 12149180 12149180 12149180
## 6 AFG 2020 regression 1888.56 11666924 11666924 11666924 11666924
## SSP5
## 1 12698235
## 2 12888301
## 3 12789616
## 4 12532599
## 5 12149180
## 6 11666924
The Average GDP per Capita is less than $20 Million ($19,106M)
summary(SDGWide_New$gdp.capita)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 429.8 3841.4 11977.2 19106.0 27007.5 138404.0
Histogram showing GDP per Capita
hist(SDGWide_New$gdp.capita)
Question: What percentage of these projections were made using Regression Model?
table(SDGWide_New$hc.source)
##
## regression survey
## 185 1364
round(prop.table(table(SDGWide_New$hc.source)) *100,digit = 1)
##
## regression survey
## 11.9 88.1
About 12% of the Extreme Poverty Projections were made using Regression model while 88% of the projections were survey based.
Show on a Map the Annual GDP per Capita of Nigeria (NGA) and Uganda (UGA) First I will need to subset this data and create a dataframe for the Nigeria and Uganda
SDGdata1 <- subset(SDGWide_New, SDGWide_New$ccode == "NGA" | SDGWide_New$ccode == "UGA")
SDGdata1
## ccode year hc.source gdp.capita SSP1 SSP2 SSP3 SSP4
## 6412 NGA 2015 survey 5638.93 70203439 70203439 70203439 70203439
## 6413 NGA 2016 survey 5409.98 76901167 76901167 76901167 76901167
## 6414 NGA 2017 survey 5317.09 80989339 80989339 80989339 80989339
## 6415 NGA 2018 survey 5282.29 83881602 83881602 83881602 83881602
## 6416 NGA 2019 survey 5247.83 86845070 86845070 86845070 86845070
## 6417 NGA 2020 survey 5211.18 89949154 89949154 89949154 89949154
## 6418 NGA 2021 survey 5176.03 93102799 93102799 93102799 93102799
## 6419 NGA 2022 survey 5142.19 96313552 96313552 96313552 96313552
## 9043 UGA 2015 survey 1928.27 12490438 12490438 12490438 12490438
## 9044 UGA 2016 survey 1953.63 12638075 12638075 12638075 12638075
## 9045 UGA 2017 survey 1985.99 12712471 12712471 12712471 12712471
## 9046 UGA 2018 survey 2033.55 12624153 12624153 12624153 12624153
## 9047 UGA 2019 survey 2090.75 12432557 12432557 12432557 12432557
## 9048 UGA 2020 survey 2155.42 12162143 12162143 12162143 12162143
## 9049 UGA 2021 survey 2234.61 11748949 11748949 11748949 11748949
## 9050 UGA 2022 survey 2341.04 11084169 11084169 11084169 11084169
## SSP5
## 6412 70203439
## 6413 76901167
## 6414 80989339
## 6415 83881602
## 6416 86845070
## 6417 89949154
## 6418 93102799
## 6419 96313552
## 9043 12490438
## 9044 12638075
## 9045 12712471
## 9046 12624153
## 9047 12432557
## 9048 12162143
## 9049 11748949
## 9050 11084169
Some Visualization- Box Plot of Annual GDP of Nigeria and Uganda
ggplot(SDGdata1, aes(x=ccode, y=gdp.capita, color=ccode)) +
geom_bar(stat="identity", fill="white")+facet_wrap(~year)+
labs(
title="Annual GDP of Nigeria and Uganda",
x="Country Code", y = "GDP per Capita"
)
The largest differences in poverty headcount and poverty rates across scenarios appear for Sub-Sahara Africa, where the projections for the most optimistic scenario imply over 300 million individuals living in extreme poverty in 2030. The analysis indicate that about 647 million people live in extreme poverty. This implies that the big bulk of the poverty reduction challenge is expected to be in Africa, which is expected to make progress slowly.
dat <- as_tibble(read.csv('https://raw.githubusercontent.com/amberferger/DATA607_Masculinity/master/raw-responses.csv'))
str(dat)
## Classes 'tbl_df', 'tbl' and 'data.frame': 1615 obs. of 98 variables:
## $ X : int 1 2 3 4 5 6 7 8 9 10 ...
## $ StartDate : Factor w/ 1378 levels "5/10/18 10:04",..: 3 4 5 6 7 8 9 1 2 19 ...
## $ EndDate : Factor w/ 1377 levels "5/10/18 10:11",..: 3 4 5 6 7 8 9 1 2 20 ...
## $ q0001 : Factor w/ 5 levels "No answer","Not at all masculine",..: 4 4 5 5 5 5 4 4 5 4 ...
## $ q0002 : Factor w/ 5 levels "No answer","Not at all important",..: 4 4 3 3 5 4 3 4 2 4 ...
## $ q0004_0001 : Factor w/ 2 levels "Father or father figure(s)",..: 2 1 1 1 2 1 1 1 1 1 ...
## $ q0004_0002 : Factor w/ 2 levels "Mother or mother figure(s)",..: 2 2 2 1 2 2 1 2 2 2 ...
## $ q0004_0003 : Factor w/ 2 levels "Not selected",..: 1 1 1 2 2 1 2 1 1 1 ...
## $ q0004_0004 : Factor w/ 2 levels "Not selected",..: 2 1 1 1 1 1 1 2 1 2 ...
## $ q0004_0005 : Factor w/ 2 levels "Friends","Not selected": 2 2 2 2 2 2 1 1 1 2 ...
## $ q0004_0006 : Factor w/ 2 levels "Not selected",..: 1 1 2 1 1 1 1 1 1 2 ...
## $ q0005 : Factor w/ 3 levels "No","No answer",..: 3 3 1 1 3 3 1 3 1 1 ...
## $ q0007_0001 : Factor w/ 6 levels "Never, and not open to it",..: 4 5 6 5 6 2 6 5 6 6 ...
## $ q0007_0002 : Factor w/ 6 levels "Never, and not open to it",..: 4 6 6 5 5 6 6 5 6 5 ...
## $ q0007_0003 : Factor w/ 6 levels "Never, and not open to it",..: 4 2 6 6 1 6 1 1 6 1 ...
## $ q0007_0004 : Factor w/ 6 levels "Never, and not open to it",..: 4 5 5 5 2 5 5 2 6 5 ...
## $ q0007_0005 : Factor w/ 6 levels "Never, and not open to it",..: 1 1 2 5 2 2 5 1 2 1 ...
## $ q0007_0006 : Factor w/ 6 levels "Never, and not open to it",..: 1 5 4 4 6 4 5 4 4 4 ...
## $ q0007_0007 : Factor w/ 6 levels "Never, and not open to it",..: 4 1 1 1 1 1 1 1 1 1 ...
## $ q0007_0008 : Factor w/ 6 levels "Never, and not open to it",..: 6 4 5 1 4 6 6 6 4 4 ...
## $ q0007_0009 : Factor w/ 6 levels "Never, and not open to it",..: 6 1 6 5 5 4 6 6 6 5 ...
## $ q0007_0010 : Factor w/ 6 levels "Never, and not open to it",..: 1 6 5 1 2 2 1 2 1 2 ...
## $ q0007_0011 : Factor w/ 6 levels "Never, and not open to it",..: 4 3 1 1 6 1 5 5 5 6 ...
## $ q0008_0001 : Factor w/ 2 levels "Not selected",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ q0008_0002 : Factor w/ 2 levels "Not selected",..: 1 2 1 1 2 1 2 2 2 2 ...
## $ q0008_0003 : Factor w/ 2 levels "Not selected",..: 2 1 1 1 1 1 1 1 2 1 ...
## $ q0008_0004 : Factor w/ 2 levels "Not selected",..: 1 1 1 1 1 1 1 2 1 2 ...
## $ q0008_0005 : Factor w/ 2 levels "Appearance of your genitalia",..: 2 2 2 2 2 2 2 2 2 2 ...
## $ q0008_0006 : Factor w/ 2 levels "Not selected",..: 1 1 1 1 1 1 1 1 1 2 ...
## $ q0008_0007 : Factor w/ 2 levels "Not selected",..: 1 1 1 1 1 1 1 1 2 2 ...
## $ q0008_0008 : Factor w/ 2 levels "Not selected",..: 1 2 1 1 1 1 1 1 1 1 ...
## $ q0008_0009 : Factor w/ 2 levels "Not selected",..: 2 2 2 1 1 1 1 1 2 2 ...
## $ q0008_0010 : Factor w/ 2 levels "Not selected",..: 2 1 1 1 1 1 2 1 2 1 ...
## $ q0008_0011 : Factor w/ 2 levels "Not selected",..: 1 1 1 1 1 1 2 1 2 1 ...
## $ q0008_0012 : Factor w/ 2 levels "None of the above",..: 2 2 2 1 2 1 2 2 2 2 ...
## $ q0009 : Factor w/ 7 levels "Employed, working full-time",..: 6 4 1 4 1 1 1 4 1 1 ...
## $ q0010_0001 : Factor w/ 2 levels "Men make more money",..: NA NA 2 NA 2 2 2 NA 2 2 ...
## $ q0010_0002 : Factor w/ 2 levels "Men are taken more seriously",..: NA NA 2 NA 2 2 2 NA 2 1 ...
## $ q0010_0003 : Factor w/ 2 levels "Men have more choice",..: NA NA 2 NA 2 2 2 NA 2 2 ...
## $ q0010_0004 : Factor w/ 2 levels "Men have more promotion/professional development opportunities",..: NA NA 2 NA 2 2 2 NA 2 1 ...
## $ q0010_0005 : Factor w/ 2 levels "Men are explicitly praised more often",..: NA NA 2 NA 2 2 2 NA 2 2 ...
## $ q0010_0006 : Factor w/ 2 levels "Men generally have more support from their managers",..: NA NA 2 NA 2 2 2 NA 2 1 ...
## $ q0010_0007 : Factor w/ 2 levels "None of the above",..: NA NA 1 NA 1 2 1 NA 1 2 ...
## $ q0010_0008 : Factor w/ 2 levels "Not selected",..: NA NA 1 NA 1 2 1 NA 1 1 ...
## $ q0011_0001 : Factor w/ 2 levels "Managers want to hire and promote women",..: NA NA 1 NA 2 2 2 NA 2 2 ...
## $ q0011_0002 : Factor w/ 2 levels "Greater risk of being accused of sexual harassment",..: NA NA 2 NA 1 2 2 NA 2 1 ...
## $ q0011_0003 : Factor w/ 2 levels "Greater risk of being accused of being sexist or racist",..: NA NA 2 NA 1 1 2 NA 2 2 ...
## $ q0011_0004 : Factor w/ 2 levels "None of the above",..: NA NA 2 NA 2 2 1 NA 1 2 ...
## $ q0011_0005 : Factor w/ 2 levels "Not selected",..: NA NA 1 NA 1 1 1 NA 1 1 ...
## $ q0012_0001 : Factor w/ 2 levels "Confronted the accused person",..: NA NA 2 NA 2 1 2 NA 2 2 ...
## $ q0012_0002 : Factor w/ 2 levels "Contacted the HR department",..: NA NA 2 NA 2 2 2 NA 2 2 ...
## $ q0012_0003 : Factor w/ 2 levels "Contacted the manager of the accused person",..: NA NA 2 NA 2 2 2 NA 2 2 ...
## $ q0012_0004 : Factor w/ 2 levels "Not selected",..: NA NA 1 NA 1 1 2 NA 1 1 ...
## $ q0012_0005 : Factor w/ 2 levels "Did not respond at all",..: NA NA 2 NA 2 2 2 NA 2 2 ...
## $ q0012_0006 : Factor w/ 2 levels "Never witnessed sexual harassment",..: NA NA 2 NA 1 2 2 NA 1 1 ...
## $ q0012_0007 : Factor w/ 2 levels "Not selected",..: NA NA 2 NA 1 1 1 NA 1 1 ...
## $ q0013 : Factor w/ 6 levels "No answer","Other (please specify)",..: NA NA NA NA NA NA NA NA NA NA ...
## $ q0014 : Factor w/ 5 levels "A lot","No answer",..: NA NA 1 NA 1 4 3 NA 5 5 ...
## $ q0015 : Factor w/ 3 levels "No","No answer",..: NA NA 1 NA 3 1 NA NA 1 1 ...
## $ q0017 : Factor w/ 3 levels "No","No answer",..: 3 1 3 3 1 1 3 3 1 3 ...
## $ q0018 : Factor w/ 6 levels "Always","Never",..: 6 5 6 1 1 1 6 4 1 1 ...
## $ q0019_0001 : Factor w/ 2 levels "It?s the right thing to do",..: NA NA NA 1 2 1 NA 2 1 1 ...
## $ q0019_0002 : Factor w/ 2 levels "Not selected",..: NA NA NA 1 1 1 NA 1 1 1 ...
## $ q0019_0003 : Factor w/ 2 levels "Not selected",..: NA NA NA 1 1 1 NA 1 2 2 ...
## $ q0019_0004 : Factor w/ 2 levels "Not selected",..: NA NA NA 1 2 1 NA 2 1 2 ...
## $ q0019_0005 : Factor w/ 2 levels "Not selected",..: NA NA NA 2 1 1 NA 2 2 2 ...
## $ q0019_0006 : Factor w/ 2 levels "Not selected",..: NA NA NA 1 1 1 NA 2 1 1 ...
## $ q0019_0007 : Factor w/ 2 levels "Not selected",..: NA NA NA 1 1 1 NA 1 1 1 ...
## $ q0020_0001 : Factor w/ 2 levels "Not selected",..: 2 1 1 1 1 1 2 2 2 1 ...
## $ q0020_0002 : Factor w/ 2 levels "Ask for a verbal confirmation of consent",..: 1 2 2 2 1 1 1 2 2 1 ...
## $ q0020_0003 : Factor w/ 2 levels "Make a physical move to see how they react",..: 1 2 2 2 2 2 1 2 1 1 ...
## $ q0020_0004 : Factor w/ 2 levels "Every situation is different",..: 1 2 1 2 2 2 1 1 1 1 ...
## $ q0020_0005 : Factor w/ 2 levels "It isn?t always clear how to gauge someone?s interest",..: 1 2 2 2 2 2 2 1 2 2 ...
## $ q0020_0006 : Factor w/ 2 levels "Not selected",..: 1 2 1 1 1 1 1 1 1 1 ...
## $ q0021_0001 : Factor w/ 2 levels "Not selected",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ q0021_0002 : Factor w/ 2 levels "Not selected",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ q0021_0003 : Factor w/ 2 levels "Contacted a past sexual partner to ask whether you went too far in any of you sexual encounters.",..: 2 2 2 2 2 2 2 2 2 2 ...
## $ q0021_0004 : Factor w/ 2 levels "None of the above",..: 1 1 1 2 1 1 1 1 1 1 ...
## $ q0022 : Factor w/ 3 levels "No","No answer",..: 1 1 1 2 1 1 1 3 1 1 ...
## $ q0024 : Factor w/ 6 levels "Divorced","Married",..: 3 6 2 2 3 2 3 2 2 2 ...
## $ q0025_0001 : Factor w/ 2 levels "Not selected",..: 1 1 1 1 1 1 1 1 2 1 ...
## $ q0025_0002 : Factor w/ 2 levels "Not selected",..: 1 2 2 2 1 2 2 1 1 1 ...
## $ q0025_0003 : Factor w/ 2 levels "No children",..: 1 2 2 2 1 2 2 1 2 1 ...
## $ q0026 : Factor w/ 5 levels "Bisexual","Gay",..: 2 5 5 3 5 5 2 5 5 5 ...
## $ q0028 : Factor w/ 5 levels "Asian","Black",..: 3 5 5 5 5 5 4 5 3 5 ...
## $ q0029 : Factor w/ 6 levels "Associate's degree",..: 2 6 2 6 2 5 5 2 6 5 ...
## $ q0030 : Factor w/ 51 levels "Alabama","Alaska",..: 33 36 23 15 36 15 12 33 5 38 ...
## $ q0034 : Factor w/ 11 levels "$0-$9,999","$10,000-$24,999",..: 1 9 9 9 9 7 8 5 3 5 ...
## $ q0035 : Factor w/ 9 levels "East North Central",..: 3 1 1 1 1 1 8 3 6 6 ...
## $ q0036 : Factor w/ 5 levels "Android Phone / Tablet",..: 5 2 5 5 5 5 5 5 2 2 ...
## $ race2 : Factor w/ 2 levels "Non-white","White": 1 2 2 2 2 2 1 2 1 2 ...
## $ racethn4 : Factor w/ 4 levels "Black","Hispanic",..: 2 4 4 4 4 4 3 4 2 4 ...
## $ educ3 : Factor w/ 3 levels "College or more",..: 1 3 1 3 1 1 1 1 3 1 ...
## $ educ4 : Factor w/ 4 levels "College or more",..: 1 4 1 4 1 3 3 1 4 3 ...
## $ age3 : Factor w/ 3 levels "18 - 34","35 - 64",..: 2 3 2 3 2 3 1 3 2 2 ...
## $ kids : Factor w/ 2 levels "Has children",..: 2 1 1 1 2 1 1 2 1 2 ...
## $ orientation: Factor w/ 4 levels "Gay/Bisexual",..: 1 4 4 2 4 4 1 4 4 4 ...
## $ weight : num 1.714 1.247 0.516 0.601 1.033 ...
head(dat)
## # A tibble: 6 x 98
## X StartDate EndDate q0001 q0002 q0004_0001 q0004_0002 q0004_0003
## <int> <fct> <fct> <fct> <fct> <fct> <fct> <fct>
## 1 1 5/10/18 ~ 5/10/1~ Some~ Some~ Not selec~ Not selec~ Not selec~
## 2 2 5/10/18 ~ 5/10/1~ Some~ Some~ Father or~ Not selec~ Not selec~
## 3 3 5/10/18 ~ 5/10/1~ Very~ Not ~ Father or~ Not selec~ Not selec~
## 4 4 5/10/18 ~ 5/10/1~ Very~ Not ~ Father or~ Mother or~ Other fam~
## 5 5 5/10/18 ~ 5/10/1~ Very~ Very~ Not selec~ Not selec~ Other fam~
## 6 6 5/10/18 ~ 5/10/1~ Very~ Some~ Father or~ Not selec~ Not selec~
## # ... with 90 more variables: q0004_0004 <fct>, q0004_0005 <fct>,
## # q0004_0006 <fct>, q0005 <fct>, q0007_0001 <fct>, q0007_0002 <fct>,
## # q0007_0003 <fct>, q0007_0004 <fct>, q0007_0005 <fct>,
## # q0007_0006 <fct>, q0007_0007 <fct>, q0007_0008 <fct>,
## # q0007_0009 <fct>, q0007_0010 <fct>, q0007_0011 <fct>,
## # q0008_0001 <fct>, q0008_0002 <fct>, q0008_0003 <fct>,
## # q0008_0004 <fct>, q0008_0005 <fct>, q0008_0006 <fct>,
## # q0008_0007 <fct>, q0008_0008 <fct>, q0008_0009 <fct>,
## # q0008_0010 <fct>, q0008_0011 <fct>, q0008_0012 <fct>, q0009 <fct>,
## # q0010_0001 <fct>, q0010_0002 <fct>, q0010_0003 <fct>,
## # q0010_0004 <fct>, q0010_0005 <fct>, q0010_0006 <fct>,
## # q0010_0007 <fct>, q0010_0008 <fct>, q0011_0001 <fct>,
## # q0011_0002 <fct>, q0011_0003 <fct>, q0011_0004 <fct>,
## # q0011_0005 <fct>, q0012_0001 <fct>, q0012_0002 <fct>,
## # q0012_0003 <fct>, q0012_0004 <fct>, q0012_0005 <fct>,
## # q0012_0006 <fct>, q0012_0007 <fct>, q0013 <fct>, q0014 <fct>,
## # q0015 <fct>, q0017 <fct>, q0018 <fct>, q0019_0001 <fct>,
## # q0019_0002 <fct>, q0019_0003 <fct>, q0019_0004 <fct>,
## # q0019_0005 <fct>, q0019_0006 <fct>, q0019_0007 <fct>,
## # q0020_0001 <fct>, q0020_0002 <fct>, q0020_0003 <fct>,
## # q0020_0004 <fct>, q0020_0005 <fct>, q0020_0006 <fct>,
## # q0021_0001 <fct>, q0021_0002 <fct>, q0021_0003 <fct>,
## # q0021_0004 <fct>, q0022 <fct>, q0024 <fct>, q0025_0001 <fct>,
## # q0025_0002 <fct>, q0025_0003 <fct>, q0026 <fct>, q0028 <fct>,
## # q0029 <fct>, q0030 <fct>, q0034 <fct>, q0035 <fct>, q0036 <fct>,
## # race2 <fct>, racethn4 <fct>, educ3 <fct>, educ4 <fct>, age3 <fct>,
## # kids <fct>, orientation <fct>, weight <dbl>
This data has several mising values, I will use the tidyr function from Tidyverse Package to exclude missing values
dat %>% drop_na()
## # A tibble: 36 x 98
## X StartDate EndDate q0001 q0002 q0004_0001 q0004_0002 q0004_0003
## <int> <fct> <fct> <fct> <fct> <fct> <fct> <fct>
## 1 11 5/11/18 ~ 5/11/1~ Very~ Some~ Father or~ Mother or~ Other fam~
## 2 22 5/11/18 ~ 5/11/1~ Some~ Some~ Not selec~ Not selec~ Other fam~
## 3 55 5/11/18 ~ 5/11/1~ Some~ Not ~ Not selec~ Not selec~ Not selec~
## 4 120 5/13/18 ~ 5/13/1~ Some~ Not ~ Not selec~ Not selec~ Not selec~
## 5 127 5/13/18 ~ 5/13/1~ Some~ Very~ Not selec~ Not selec~ Not selec~
## 6 150 5/14/18 ~ 5/14/1~ Very~ Some~ Father or~ Not selec~ Other fam~
## 7 155 5/14/18 ~ 5/14/1~ Very~ Some~ Father or~ Mother or~ Other fam~
## 8 177 5/14/18 ~ 5/14/1~ Some~ Some~ Father or~ Not selec~ Not selec~
## 9 267 5/15/18 ~ 5/15/1~ Not ~ Not ~ Father or~ Mother or~ Not selec~
## 10 450 5/16/18 ~ 5/16/1~ Some~ Some~ Father or~ Not selec~ Not selec~
## # ... with 26 more rows, and 90 more variables: q0004_0004 <fct>,
## # q0004_0005 <fct>, q0004_0006 <fct>, q0005 <fct>, q0007_0001 <fct>,
## # q0007_0002 <fct>, q0007_0003 <fct>, q0007_0004 <fct>,
## # q0007_0005 <fct>, q0007_0006 <fct>, q0007_0007 <fct>,
## # q0007_0008 <fct>, q0007_0009 <fct>, q0007_0010 <fct>,
## # q0007_0011 <fct>, q0008_0001 <fct>, q0008_0002 <fct>,
## # q0008_0003 <fct>, q0008_0004 <fct>, q0008_0005 <fct>,
## # q0008_0006 <fct>, q0008_0007 <fct>, q0008_0008 <fct>,
## # q0008_0009 <fct>, q0008_0010 <fct>, q0008_0011 <fct>,
## # q0008_0012 <fct>, q0009 <fct>, q0010_0001 <fct>, q0010_0002 <fct>,
## # q0010_0003 <fct>, q0010_0004 <fct>, q0010_0005 <fct>,
## # q0010_0006 <fct>, q0010_0007 <fct>, q0010_0008 <fct>,
## # q0011_0001 <fct>, q0011_0002 <fct>, q0011_0003 <fct>,
## # q0011_0004 <fct>, q0011_0005 <fct>, q0012_0001 <fct>,
## # q0012_0002 <fct>, q0012_0003 <fct>, q0012_0004 <fct>,
## # q0012_0005 <fct>, q0012_0006 <fct>, q0012_0007 <fct>, q0013 <fct>,
## # q0014 <fct>, q0015 <fct>, q0017 <fct>, q0018 <fct>, q0019_0001 <fct>,
## # q0019_0002 <fct>, q0019_0003 <fct>, q0019_0004 <fct>,
## # q0019_0005 <fct>, q0019_0006 <fct>, q0019_0007 <fct>,
## # q0020_0001 <fct>, q0020_0002 <fct>, q0020_0003 <fct>,
## # q0020_0004 <fct>, q0020_0005 <fct>, q0020_0006 <fct>,
## # q0021_0001 <fct>, q0021_0002 <fct>, q0021_0003 <fct>,
## # q0021_0004 <fct>, q0022 <fct>, q0024 <fct>, q0025_0001 <fct>,
## # q0025_0002 <fct>, q0025_0003 <fct>, q0026 <fct>, q0028 <fct>,
## # q0029 <fct>, q0030 <fct>, q0034 <fct>, q0035 <fct>, q0036 <fct>,
## # race2 <fct>, racethn4 <fct>, educ3 <fct>, educ4 <fct>, age3 <fct>,
## # kids <fct>, orientation <fct>, weight <dbl>
Some Visualization with ggplot
ggplot(dat, aes(x=weight)) +
geom_histogram(aes(y = ..density..), binwidth=density(dat$weight)$age) +
geom_density(fill="red", alpha = 0.2) +
labs(title= "Histogram for Weight", x = "Weight", y = "Frequency")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.