library(wooldridge)
library(dplyr)
data("bwght")
total_women <- nrow(bwght)
total_women
## [1] 1388
smoker <- sum(bwght$cigs != 0)
smoker
## [1] 212
There are 1388 observations in the dataset, corresponds with 1388 women in the sample.
There are 212 women reported smoking during pregnancy
avg_cigs_per_day <- mean(bwght$cigs)
avg_cigs_per_day
## [1] 2.087176
Average number of cigarettes smaoked per day is 2.087, this is not a good measure of the “typical” woman in this case because the majority of women in this sample did not smoke during pregnancy.
avg_cigs_smoked_women <- sum(bwght$cigs) / smoker
avg_cigs_smoked_women
## [1] 13.66509
Among women who smoked during pregnancy, the average number of cigarettes smoked per day is 13.67, this number is higher than the number we obtained in part (ii) since we excluded women who didn’t smoked from the calculation.
avg_fatheduc <- mean(bwght$fatheduc, na.rm = TRUE)
avg_fatheduc
## [1] 13.18624
The average of father’s years of education is 13.19 years. Only 1192 observations were used to compute this number because there are 196 NA values in the variable, which reduces the sample size.
avg_inc = scales::dollar(mean(bwght$faminc)*1000)
avg_inc
## [1] "$29,026.66"
sd_inc = scales::dollar(sd(bwght$faminc)*1000)
sd_inc
## [1] "$18,739.28"
The average family income is $29,026.66
The family income’s standard deviation is $18,739.28
data("meap01")
range(meap01$math4)
## [1] 0 100
The smallest and largest value of the variable
math4
is 0 and 100 respectively. This range makes sense since they are depicting the minimum score and the maximum score
perfct_count <- sum(meap01$math4 == 100)
perfct_count
## [1] 38
perfct_pct <- perfct_count / nrow(meap01)
perfct_pct
## [1] 0.02084476
38 schools have a perfect pass rate on the math test, accounting for 2.08% of the total sample
fifty_count <- sum(meap01$math4 == 50)
fifty_count
## [1] 17
50 schools have the pass rate of exactly 50%.
avg_math_pass <- mean(meap01$math4)
avg_math_pass
## [1] 71.909
avg_read_pass <- mean(meap01$read4)
avg_read_pass
## [1] 60.06188
The average pass rate of the math test and the reading test is 71.91% and 60.06%. On average, the math test is harder to pass.
cor(meap01$math4, meap01$read4)
## [1] 0.8427281
The correlation between
math4
andread4
is 0.84, which indicates that there is a relatively strong positive relationship between variable. We can say that students who performed well in the math test also performed well in the reading test.
avg_exp = scales::dollar(mean(meap01$exppp)*1000)
avg_exp
## [1] "$5,194,865"
sd_exp = scales::dollar(sd(meap01$exppp)*1000)
sd_exp
## [1] "$1,091,890"
The average per student spending is $5,194,865 and the standard deviation is $1,091,890. Since the standard deviation is relatively significant when taking into consideration the average per pupil spending, it would suggest that there is wide variation in per pupil spending.
per_stu_exp_a <- 6000
per_stu_exp_b <- 5000
pct_dif <- (per_stu_exp_a - per_stu_exp_b) / per_stu_exp_b *100
pct_dif
## [1] 20
approx_pct <- (log(per_stu_exp_a) - log(per_stu_exp_b)) * 100
approx_pct
## [1] 18.23216