The fertility data set includes data on the fertility and woman's supply and
labor force, coming from the 1980 US Census on married woman aged 21-35 with
two or more children
fertility <- read.csv("https://raw.githubusercontent.com/gc521/RWorshhops/Assignment2/Fertility.csv") #Read CSV from GitHub
summary(fertility)
## X morekids gender1 gender2
## Min. : 1 Length:254654 Length:254654 Length:254654
## 1st Qu.: 63664 Class :character Class :character Class :character
## Median :127328 Mode :character Mode :character Mode :character
## Mean :127328
## 3rd Qu.:190991
## Max. :254654
## age afam hispanic other
## Min. :21.00 Length:254654 Length:254654 Length:254654
## 1st Qu.:28.00 Class :character Class :character Class :character
## Median :31.00 Mode :character Mode :character Mode :character
## Mean :30.39
## 3rd Qu.:33.00
## Max. :35.00
## work
## Min. : 0.00
## 1st Qu.: 0.00
## Median : 5.00
## Mean :19.02
## 3rd Qu.:44.00
## Max. :52.00
tail(fertility, 10)
## X morekids gender1 gender2 age afam hispanic other work
## 254645 254645 yes female male 31 no no no 0
## 254646 254646 yes male female 30 no no no 0
## 254647 254647 yes male female 31 no no no 52
## 254648 254648 yes female male 34 no no no 0
## 254649 254649 yes male male 28 no no no 0
## 254650 254650 yes female female 35 no no no 0
## 254651 254651 yes male male 29 no no no 0
## 254652 254652 yes female male 34 no no no 38
## 254653 254653 yes female female 30 no no no 26
## 254654 254654 yes female female 35 no no no 0
GT package contains some useful functions to display tables. Kable package could also work.
library(dplyr)
library(gt)
mean_med_tib <- tibble(Attributes = c("Age", "Work"), Expected_Value = c(30.39, 19.02), Median = c(31.00, 5.00) )
gt_tbl <- gt(mean_med_tib)
gt_tbl <- gt_tbl %>%
tab_header(
title = md("**Mean and Median**"),
subtitle = md("Age and Work Attributes"))
# Show the gt Table
gt_tbl
| Attributes |
Expected_Value |
Median |
| Age |
30.39 |
31 |
| Work |
19.02 |
5 |
Subset/drop variables is next on the agenda. Also rename columns.
#Subset data frame
df <- subset(fertility, select = -c(X,gender2))
#Rename columns, AA==African-American. K==Kids
colnames(df) <- c('<2Kids', 'GenderK1', 'Years', 'MomAA?', 'MomHispanic?', 'Mom!=AA|Mom!=Hispanic', 'WeeksWorked')
tail(df, 10)
## <2Kids GenderK1 Years MomAA? MomHispanic? Mom!=AA|Mom!=Hispanic
## 254645 yes female 31 no no no
## 254646 yes male 30 no no no
## 254647 yes male 31 no no no
## 254648 yes female 34 no no no
## 254649 yes male 28 no no no
## 254650 yes female 35 no no no
## 254651 yes male 29 no no no
## 254652 yes female 34 no no no
## 254653 yes female 30 no no no
## 254654 yes female 35 no no no
## WeeksWorked
## 254645 0
## 254646 0
## 254647 52
## 254648 0
## 254649 0
## 254650 0
## 254651 0
## 254652 38
## 254653 26
## 254654 0
New table but with renamed attributes
mean_med_tib <- tibble(Attributes = c("Years", "WeeksWorked"), Expected_Value = c(30.39, 19.02), Median = c(31.00, 5.00) )
gt_tbl <- gt(mean_med_tib)
gt_tbl <- gt_tbl %>%
tab_header(
title = md("**Mean and Median**"),
subtitle = md("Years and WeeksWorked Attributes"))
# Show the gt Table
gt_tbl
| Attributes |
Expected_Value |
Median |
| Years |
30.39 |
31 |
| WeeksWorked |
19.02 |
5 |
Adjust values in columns/df
df$GenderK1[df$GenderK1 == 'male'] <- 'M'
df$GenderK1[df$GenderK1 == 'female'] <- 'F'
tail(df, 10)
## <2Kids GenderK1 Years MomAA? MomHispanic? Mom!=AA|Mom!=Hispanic
## 254645 yes F 31 no no no
## 254646 yes M 30 no no no
## 254647 yes M 31 no no no
## 254648 yes F 34 no no no
## 254649 yes M 28 no no no
## 254650 yes F 35 no no no
## 254651 yes M 29 no no no
## 254652 yes F 34 no no no
## 254653 yes F 30 no no no
## 254654 yes F 35 no no no
## WeeksWorked
## 254645 0
## 254646 0
## 254647 52
## 254648 0
## 254649 0
## 254650 0
## 254651 0
## 254652 38
## 254653 26
## 254654 0