C2. Use the data in BWGHT to answer this question.
install.packages("wooldridge")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.3'
## (as 'lib' is unspecified)
library(wooldridge)
data<-wooldridge::bwght
C2.i. How many women are in the sample, and how many report smoking during pregnancy?
women <- nrow(data)
women
## [1] 1388
install.packages("dplyr")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.3'
## (as 'lib' is unspecified)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
nocigaretes <-data %>% filter(cigs==0)
nonsmoking <- nrow(nocigaretes)
nonsmoking
## [1] 1176
C2.ii. What is the average number of cigarettes smoked per day? Is the average a good measure of the “typical” woman in this case? Explain.
average<- mean(data$cigs)
average
## [1] 2.087176
histogram <- hist(data$cigs)
If you compare the average cigaretes per day to the histogram of the
data you’ll get the conclusion that the average is not a suitable
measurement for a typical women. C2.iii.Among women who smoked during
pregnancy, what is the average number of cigarettes smoked per day? How
does this compare with your answer from part (ii), and why?
av_cigar <- data %>% filter(cigs != 0)
mean(av_cigar$cigs)
## [1] 13.66509
There is a big difference between the answers for ii and iii because the total number of women and the number of smokers differ largely. C2.iv.Find the average of fatheduc in the sample. Why are only 1,192 observations used to compute this average?
av_feduc <- mean(data$fatheduc, na.rm= TRUE)
av_feduc
## [1] 13.18624
na_feduc <- sum(is.na(data$fatheduc))
na_feduc
## [1] 196
Because 196 fathers education data is Not Answered. C2.v. Report the average family income and its standard deviation in dollars.
av_faminc <- mean(data$faminc)
std_faminc <- sd(data$faminc)
av_faminc
## [1] 29.02666
std_faminc
## [1] 18.73928
C3.The data in MEAP01 are for the state of Michigan in the year 2001. Use these data to answer the following questions.
data2 <- wooldridge::meap01
C3.i.Find the largest and smallest values of math4. Does the range make sense?
max(data2$math4)
## [1] 100
min(data2$math4)
## [1] 0
This range difference seems too much to be true. C3.ii. How many schools have a perfect pass rate on the math test? What percentage is this of the total sample?
pass100 <- data2 %>% filter(math4==100)
rate100 <- nrow(pass100)
perpass100 <- nrow(pass100)/nrow(data2)*100
rate100
## [1] 38
perpass100
## [1] 2.084476
C3.iii.How many schools have math pass rates of exactly 50%?
pass50 <- data2 %>% filter(math4==50)
rate50 <- nrow(pass50)
rate50
## [1] 17
C3.iv. Compare the average pass rates for the math and reading scores. Which test is harder to pass?
mean(data2$math4)
## [1] 71.909
mean(data2$read4)
## [1] 60.06188
71.909 > 60.06188 Math is harder to pass. C3.v.Find the correlation between math4 and read4. What do you conclude?
cor(data2$math4,data2$read4)
## [1] 0.8427281
There is a strong correlation between math4 and read4. C3.vi.The variable exppp is expenditure per pupil. Find the average of exppp along with its standard deviation. Would you say there is wide variation in per pupil spending?
mean(data2$exppp)
## [1] 5194.865
sd(data2$exppp)
## [1] 1091.89
his <- hist(data2$exppp)
his
## $breaks
## [1] 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 11000 12000
##
## $counts
## [1] 6 20 144 696 594 262 75 17 4 2 3
##
## $density
## [1] 3.291278e-06 1.097093e-05 7.899067e-05 3.817883e-04 3.258365e-04
## [6] 1.437191e-04 4.114098e-05 9.325288e-06 2.194185e-06 1.097093e-06
## [11] 1.645639e-06
##
## $mids
## [1] 1500 2500 3500 4500 5500 6500 7500 8500 9500 10500 11500
##
## $xname
## [1] "data2$exppp"
##
## $equidist
## [1] TRUE
##
## attr(,"class")
## [1] "histogram"
I would say the variation in per pupil spending is not wide. C3.vii. Suppose School A spends $6,000 per student and School B spends $5,500 per student. By what percentage does School A’s spending exceed School B’s? Compare this to 100 • [log(6,000) - log(5,500), which is the approximation percentage difference in the natural logs. if 5,500=100% 6,000=109.09% then 9.1% 100 • [log(6,000) - log(5,500)=3.78% 9.1% > 3.78% —