The data in meap01 are for the state of Michigan in the
year \(2001\). Use these data to answer
the following questions.
Find the largest and smallest values of \(math4\). Does the range make sense? Explain.
How many schools have a perfect pass rate on the math test? What percentage is this of the total sample?
How many schools have math pass rates of exactly \(50\%\)?
Compare the average pass rates for the math and reading scores. Which test is harder to pass?
Find the correlation between \(math4\) and \(read4\). What do you conclude?
The variable \(exppp\) is expenditure per pupil. Find the average of \(exppp\) along with its standard deviation. Would you say there is wide variation in per pupil spending?
Suppose School A spends \(\$6,000\) per student and School B spends \(\$5,500\) per student. By what percentage does School A’s spending exceed School B’s? Compare this to \(100\cdot [\log(6000) - \log(5500)]\), which is the approximate percentage difference based on the difference in the natural logs
head(meap01$math4)
## [1] 83.3 90.3 61.9 85.7 77.3 85.2
max(meap01$math4)
## [1] 100
min(meap01$math4)
## [1] 0
max(meap01$math4)-min(meap01$math4)
## [1] 100
The range gives me an idea of the sort of data set I am working with, and yes it makes sense. It tells me that there are some schools in the state of Michigan in the year 2001 where every student has satisfactory 4th grade math skills, while there exist some whose students have no comprehension. To understand this data further we use statistical methods.
val_tot <- length(meap01$math4)
perf_val <- length(meap01$math4[100])
perf_val #no. of 100% pass rate
## [1] 1
perf_val/val_tot #decimal interpretation of total
## [1] 0.0005485464
Only 1 school, or 0.054% of schools have perfect pass rate in math.
val_req <- length(meap01$math4[50])
val_req
## [1] 1
There is only one school with an exact 50% pass rate.
avg1 <- mean(meap01$math4)
avg2 <- mean(meap01$read4)
avg1
## [1] 71.909
avg2
## [1] 60.06188
min(avg1,avg2, na.rm = FALSE)
## [1] 60.06188
Thus the reading test is harder to pass as it has a lower pass rate.
# Calculating
# Correlation coefficient
# Using cor() method
result = cor(meap01$math4, meap01$read4, method = "pearson")
result
## [1] 0.8427281
The correlation coefficient is 0.8427. This indicates a fairly strong correlation between the two data sets.
avg1 <- mean(meap01$exppp)
std_exp <- sd(meap01$exppp)
avg1 #average
## [1] 5194.865
std_exp #standard deviation
## [1] 1091.89
To answer whether their is a wide range in values for expp
Answer here
actual <- (6000-5500)/5500
log_val <-(log(6000)-log(5500))
actual
## [1] 0.09090909
log_val
## [1] 0.08701138
Converting this values to Percentage: 1. 9.09% 2. 8.7% Thus approximate difference is:
9.09-8.7
## [1] 0.39
or approximately 0.4% based on the difference in the natural logs.
In the linear consumption function \[ \widehat{cons} = \hat{\beta}_0 + \hat{\beta}_1inc \] the (estimated) marginal propensity to consume (MPC) out of income is simply the slope, \(\hat{\beta}_1\), while the average propensity to consume (APC) is \(\hat{\beta}_0 / inc + \hat{\beta}_1\). Using observations for 100 families on annual income and consumption (both measured in INR), the following equation is obtained:
\[\begin{eqnarray} \widehat{cons} &=& -10361.72 + 0.8756 \cdot inc \\ n &=& 100, \qquad R^2 = 0.731 \end{eqnarray}\]
Interpret the intercept in this equation, and comment on its sign and magnitude.
What is the predicted monthly consumption when monthly family income is INR 45,000?
With \(inc\) on the \(x\)-axis, draw a graph of the estimated
\(MPC\) and \(APC\). You will need to simulate \(MPC\) and \(APC\) for a range of \(inc\) values and then plot these in
R.
The intercept is -10361.72. This negative as well as high value indicates to us that the expenditure of this much Indian Rupees is what a family requires for household expenses at the minimum. These expenses do not depend on the increase in wages/income.
inc=45000
cons <- (-10361.72) + 0.8756*inc
cons
## [1] 29040.28
Thus estimated monthly consumption is 29040.28 when family \(inc\) is 45000.
MPC <-0.853 #slope of consumption function
inc1 <- 1:100#x value
APC <- (-10361.72/inc1)+0.8756 #given from question
plot(APC, type="l", col="blue")
#MPC is slope; #APC = formulae given
Consider the savings function \[ sav = \beta_0 + \beta_1inc + u; \qquad u = \sqrt{inc}\cdot e \] where \(e\) is a random variable with \(E(e) = 0\) and \(Var(e) = \sigma^2_e\). Assume that \(e\) is independent of \(inc\).
To show: \(E(u | inc) = 0\) when given: \(e\) is a random variable with \(E(e) = 0\)
Thus when we put in \[ u = \sqrt{inc}\cdot e \] \(\sqrt{inc}\ \) becomes a constant. We know that the key zero conditional mean assumption (Assumption SLR.4) is that the error term u has a mean of zero, conditional on the value of the explanatory variable \(inc\). But since they are independent(e and inc): Therefore the steps so as: \(E(\sqrt{inc}\cdot e| inc)\) = \(\sqrt{inc}\cdot E(e| inc)\)= 0 because \(E(e/inc)\)= \(E(e)\) =\(0\)
Similarly from above, \(\sqrt{inc}\ \) becomes a constant.
So $Var(u | inc) $ = \(Var(\sqrt{inc}\cdot e | inc)\) =\(incVar(e | inc)\)= \(\sigma^2_e.inc\)
By using \(Var(e) = \sigma^2_e\)(given)
Savings is a interesting topic of discussion and we can attribute this to income directly(as we have above). Using this, we can make an argument that all families irregardless of their income have to spend some minimum expenditure. This cuts down on how much they save, and we see that higher income families save more, even if their consumption does increase(but not by the same linear equation).Thus the variance of savings of families increases with family income(more money more problems basically and thus more error in prediction).
What is the relationship between the math pass rate, \(math10\), and spending per student, \(expend\)? Let us use the
meap93 data set to find out.
Do you think each additional dollar spent has the same effect on the pass rate, or does a diminishing effect seem more appropriate? Explain.
In the population model \[ math10 = \beta_0 + \beta_1\log(expend) + u \] argue that \(\beta_1/10\) is the percentage point change in \(math10\) given a \(10\%\) increase in \(expend\).
Use the data in meap93 to estimate the model from
part (ii). Report the estimated equation in the usual way, including the
sample size and R-squared.
How big is the estimated spending effect? Namely, if spending increases by \(10\%\), what is the estimated percentage point increase in \(math10\)?
One might worry that regression analysis can produce \(fitted values\) for math10 that are greater than 100. Why is this not much of a worry in this data set?
Basically, we can see the diminishing effect being a more valid answer wrt to the pass rate, as in education, every dollar spent has a larger effect than one invested later on (Marginal Utility decreases).
Basically it looks like we have a log-log model even tho math10 does not have a log form, its thus given in percentage. When we want to say a 10% increase in log(expend) become 1.1 expend in the log.
summary(lm(math10 ~ lexpend, data= meap93))
##
## Call:
## lm(formula = math10 ~ lexpend, data = meap93)
##
## Residuals:
## Min 1Q Median 3Q Max
## -22.343 -7.100 -0.914 6.148 39.093
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -69.341 26.530 -2.614 0.009290 **
## lexpend 11.164 3.169 3.523 0.000475 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 10.35 on 406 degrees of freedom
## Multiple R-squared: 0.02966, Adjusted R-squared: 0.02727
## F-statistic: 12.41 on 1 and 406 DF, p-value: 0.0004752
Thus R-squared term=0.0297 and Sample Size=408
Answer here
exppp <- (meap93$expend)
math10 = -69.341 + (11.164*log(1.1*(exppp)))
math10
## [1] 28.39838 29.91016 30.29743 23.09807 23.88241 26.96420 25.56805 27.15847
## [9] 26.83575 29.73974 28.72846 28.82021 26.45282 25.00775 27.52670 30.25159
## [17] 29.15453 27.32174 24.85187 26.47814 24.56076 25.62530 24.70457 25.38947
## [25] 24.36794 24.15758 26.06344 25.27222 26.70806 25.69706 29.93887 27.23611
## [33] 26.08014 27.90546 23.37176 25.96274 24.33455 25.10181 26.25748 24.03803
## [41] 25.09400 28.45439 30.30396 24.42061 27.06285 24.99200 25.14595 27.99818
## [49] 25.26196 30.25815 23.01997 24.11219 26.36720 27.79359 24.12071 25.51553
## [57] 24.75030 26.92891 22.27791 26.11821 25.21055 27.56433 25.66495 28.11796
## [65] 22.81111 28.82021 24.33733 26.22930 26.46664 29.80481 26.86689 25.81728
## [73] 25.14077 25.38439 24.70726 25.68472 23.75692 24.63968 25.69953 25.01038
## [81] 24.86251 30.61946 24.68298 24.99988 24.76104 23.93452 25.19250 25.31314
## [89] 28.45247 25.34373 25.57055 24.36238 23.70986 27.74647 25.46527 24.97885
## [97] 25.09140 26.90680 24.40954 23.44141 24.06375 26.60621 24.16324 24.85453
## [105] 23.47758 24.25343 26.70130 27.63298 24.69917 24.82523 24.97622 25.29525
## [113] 28.29136 25.63274 28.64733 24.65323 24.96569 24.04947 26.24105 24.01511
## [121] 24.03803 24.87844 23.70397 23.70986 24.99463 23.41118 23.88531 25.21828
## [129] 23.74224 25.07837 26.03237 23.42933 24.72073 28.55605 23.32305 25.01562
## [137] 23.24037 25.06794 27.51623 24.12923 25.78061 23.53760 27.59559 25.16149
## [145] 27.65577 26.11108 28.34597 23.52263 26.19396 23.05127 24.27307 24.46752
## [153] 30.45477 27.04319 24.25905 26.87799 25.50800 29.17442 23.74517 25.09400
## [161] 27.47634 23.34743 25.07315 23.93741 24.61524 23.74224 25.24656 23.69217
## [169] 24.12639 30.48051 23.21575 24.33176 24.35125 25.35391 24.53342 24.80387
## [177] 25.33609 24.32339 23.45650 23.00114 25.71431 24.01798 24.85187 23.80962
## [185] 23.95472 23.69217 24.28707 23.30778 24.98937 23.67444 24.83056 25.76101
## [193] 23.19108 24.42613 24.42890 22.58524 24.93669 24.89966 22.78553 24.01798
## [201] 23.17254 24.24782 23.53162 24.70187 23.96049 24.58531 23.93163 23.82713
## [209] 24.08657 22.62753 26.45973 24.28987 23.44141 25.71924 28.23255 23.03563
## [217] 25.91930 24.59893 24.77176 25.99160 23.60028 25.21313 26.49883 25.75856
## [225] 25.82703 23.34439 25.71678 24.14625 24.69108 23.80086 24.55803 23.71870
## [233] 27.11077 24.31781 23.58241 23.76865 23.85917 22.87481 23.18181 24.77980
## [241] 23.06065 24.32618 23.50163 23.82422 25.16666 25.41479 24.41784 23.63594
## [249] 25.33099 28.74162 31.21437 23.76865 23.55256 23.58241 24.29267 27.00815
## [257] 22.65672 24.16324 24.39014 22.75988 23.38391 24.38460 23.44141 23.18181
## [265] 23.29862 23.78330 24.66946 24.95779 27.20167 27.70121 23.40513 24.54436
## [273] 23.47758 26.12296 24.67487 23.64187 23.51963 24.09512 25.50800 28.10405
## [281] 24.25905 24.18868 25.43501 24.19433 27.67438 24.92083 23.37480 23.57645
## [289] 23.78623 24.04947 24.55256 25.98199 28.59798 25.47283 22.94762 23.43235
## [297] 24.86516 22.84937 29.97761 23.84171 25.02086 27.08030 24.98411 23.04502
## [305] 29.86950 22.49031 25.51302 23.15706 23.43839 23.70397 24.81189 23.20650
## [313] 24.61796 27.37280 25.76101 23.60028 23.79501 24.30664 24.15758 28.60368
## [321] 29.64142 24.04375 23.04189 23.33830 23.41421 29.84571 24.46752 25.41732
## [329] 22.57546 23.01369 24.57168 23.68331 25.66000 23.56451 24.33176 26.25279
## [337] 26.74180 24.06946 24.16890 25.19508 25.19508 23.40210 23.44745 26.26686
## [345] 23.95761 26.37416 23.43839 28.07818 23.81838 23.44745 25.02348 28.12392
## [353] 22.97913 24.36794 24.52520 25.50048 25.68225 22.55260 25.93138 23.69807
## [361] 24.12923 24.37072 25.32335 23.81546 25.32079 24.91554 27.43420 24.70996
## [369] 24.85187 23.46554 27.30894 23.81546 24.43442 25.22085 26.26452 26.25748
## [377] 22.89068 24.45098 25.86354 26.38577 28.30503 27.93780 23.91718 24.12923
## [385] 23.94030 24.00076 23.37784 23.18490 24.52246 22.98857 23.56152 24.23377
## [393] 26.76424 26.74405 23.77452 23.52263 23.22806 28.94998 23.39604 26.01560
## [401] 23.59730 28.19514 23.79793 27.62676 27.00376 24.15475 26.87355 25.70446
Answer here what is fitted value? It is a predicted value from a particular model. But this doesn’t work well in this example as the data type is in percentage and the maximum value is 100. We cant have a value greater than that as it would just not be a correct answer.