library(wooldridge)
library(dplyr)

C2: “Birth weight” dataset

(i)

data("bwght")
total_women <- nrow(bwght)
total_women
## [1] 1388
smoker <- sum(bwght$cigs != 0)
smoker
## [1] 212

There are 1388 observations in the dataset, corresponds with 1388 women in the sample.

There are 212 women reported smoking during pregnancy

(ii)

avg_cigs_per_day <- mean(bwght$cigs)
avg_cigs_per_day
## [1] 2.087176

Average number of cigarettes smaoked per day is 2.087, this is not a good measure of the “typical” woman in this case because the majority of women in this sample did not smoke during pregnancy.

(iii)

avg_cigs_smoked_women <- sum(bwght$cigs) / smoker
avg_cigs_smoked_women
## [1] 13.66509

Among women who smoked during pregnancy, the average number of cigarettes smoked per day is 13.67, this number is higher than the number we obtained in part (ii) since we excluded women who didn’t smoked from the calculation.

(iv)

avg_fatheduc <- mean(bwght$fatheduc, na.rm = TRUE)
avg_fatheduc
## [1] 13.18624

The average of father’s years of education is 13.19 years. Only 1192 observations were used to compute this number because there are 196 NA values in the variable, which reduces the sample size.

(v)

avg_inc = scales::dollar(mean(bwght$faminc)*1000)
avg_inc
## [1] "$29,026.66"
sd_inc = scales::dollar(sd(bwght$faminc)*1000)
sd_inc
## [1] "$18,739.28"

The average family income is $29,026.66
The family income’s standard deviation is $18,739.28

C2: Dataset for the state of Michigan in the year 2001

(i)

data("meap01")
range(meap01$math4)
## [1]   0 100

The smallest and largest value of the variable math4 is 0 and 100 respectively. This range makes sense since they are depicting the minimum score and the maximum score

(ii)

perfct_count <- sum(meap01$math4 == 100)
perfct_count
## [1] 38
perfct_pct <- perfct_count / nrow(meap01)
perfct_pct
## [1] 0.02084476

38 schools have a perfect pass rate on the math test, accounting for 2.08% of the total sample

(iii)

fifty_count <- sum(meap01$math4 == 50)
fifty_count
## [1] 17

50 schools have the pass rate of exactly 50%.

(iv)

avg_math_pass <- mean(meap01$math4)
avg_math_pass
## [1] 71.909
avg_read_pass <- mean(meap01$read4)
avg_read_pass
## [1] 60.06188

The average pass rate of the math test and the reading test is 71.91% and 60.06%. On average, the math test is harder to pass.

(v)

cor(meap01$math4, meap01$read4)
## [1] 0.8427281

The correlation between math4 and read4 is 0.84, which indicates that there is a relatively strong positive relationship between variable. We can say that students who performed well in the math test also performed well in the reading test.

(vi)

avg_exp = scales::dollar(mean(meap01$exppp)*1000)
avg_exp
## [1] "$5,194,865"
sd_exp = scales::dollar(sd(meap01$exppp)*1000)
sd_exp
## [1] "$1,091,890"

The average per student spending is $5,194,865 and the standard deviation is $1,091,890. Since the standard deviation is relatively significant when taking into consideration the average per pupil spending, it would suggest that there is wide variation in per pupil spending.

(vii)

per_stu_exp_a <- 6000 
per_stu_exp_b <- 5000
pct_dif <- (per_stu_exp_a - per_stu_exp_b) / per_stu_exp_b *100
pct_dif
## [1] 20
approx_pct <- (log(per_stu_exp_a) - log(per_stu_exp_b)) * 100
approx_pct
## [1] 18.23216