Linear regression equation: \(y = b_0 + b_1 \times parity = 120.07 - 1.93 \times x\)
if x increases
y decreases
else if x decreases
y increasesGiven the \(Pr_{parity}\) = 0.1052 in the table, we can reject \(H_0\) since \(Pr_{parity}\) > 0.05.
It doesn’t seem that the average birth weight and parity have a relationship.
Researchers interested in the relationship between absenteeism from school and certain demographic characteristics of children collected data from 146 randomly sam- pled students in rural New South Wales, Australia, in a particular school year. Below are three observations from this data set.
The summary table below shows the results of a linear regression model for predicting the average number of days absent based on ethnic background (eth: 0 - aboriginal, 1 - not aboriginal), sex(sex: 0 - female, 1 - male), and learner status (lrn: 0 - average learner, 1 - slow learner).
Linear regression equation: \(y = b_0 + b_1 * eth + b_2 * sex + b_3 * lrn = 18.93 - 9.11 * x_1 + 3.10 * x_2 + 2.15 * x_3\)
eth: “ethnic” - if ethnic is not aboriginal, the average # of absent days increases by 9.11
sex: “sex” - if sex is male, the average # of absent days increases by 3.1
lrn: “learner” - if learner is slow learner, the average # of absent days increases by 2.15
Given:
\(y_{missed}\) = 2
\(x_1\) = 0 (aboriginal = 0 | not-aboriginal = 1)
\(x_2\) = 1 (female = 0 | male = 1)
\(x_3\) = 1 (average-learner = 0 | slow-learner = 1)
y = 18.93 - 9.11 * x_1 + 3.10 * x_2 + 2.15 * x_3
y = 18.93 - 9.11 * 0 + 3.10 * 1 + 2.15 * 1
y = 24.18
residual = \(y_{missed}\) - y = 2 - 24.18 = 26.18
\(R^2\) = \(1 - \frac{Var(residual)}{Var(y_i)}\)
\(R^2\) = \(1 - \frac{240.57}{264.17}\)
\(R^2\) = 0.0893364
adjusted:
\(R^2_{adj}\) = \(R^2 \times \frac{n-1}{n-k-1}\)
\(R^2_{adj}\) = 0.0893364 * 1.0211268
\(R^2_{adj}\) = 0.070097
library(knitr)
Model<-c("Full_model","No_ethnicity","No_sex","No_learner_status")
Adjusted_R_squared<-c(0.0701,-0.0033,0.0676,0.0723)
df<-data.frame(Model,Adjusted_R_squared)
kable(df[rev(order(df$Adjusted_R_squared)),])
| Model | Adjusted_R_squared | |
|---|---|---|
| 4 | No_learner_status | 0.0723 |
| 1 | Full_model | 0.0701 |
| 3 | No_sex | 0.0676 |
| 2 | No_ethnicity | -0.0033 |
We notice that 3 out of 4 models have their \(R^2\) between 6.76% and 7.23%: \(6.76\)% < \(R^2\) < \(7.23\)% and 1 around 0.33%.
Therefore, we can conclude from the reversed sorted list that “No learner status” model has the highest \(R^2_{adj}\) value that should be removed for better \(R^2\) results.
library(ggplot2)
Temperature <- c(53,57,58,63,66,67,67,67,68,69,70,70,70,70,72,73,75,75,76,76,78,79,81)
Damaged <- c(5,1,1,1,0,0,0,0,0,0,1,0,1,0,0,0,0,1,0,0,0,0,0)
qplot(x = Temperature, y = factor(Damaged), geom = "point") +
geom_jitter(height = 0.4, alpha = 0.5)
By examining the data and looking at the scatterplot, we see that the temperature is behaving as the predictor of the damaged O-rings. Results are better when the temperature is above 65 degrees and no failures above 75 degrees.
Temperature: if temperature is increases, O-rings damages will decrease by 0.2162. With a p-value of 0.
\(y = log(\frac{p_i}{1 - p_i}) = b_0 + b_1 \times Temperature\)
y = 11.6630 - 0.2162 * Temperature
\(p_{51}\) = 0.6540297 = 65.4 %
\(p_{53}\) = 0.5509228 = 55.09 %
\(p_{55}\) = 0.4432456 = 44.32 %
Temperature <- c(51,53,55,57,59,61,63,65,67,69,71)
Model_estimated_probabilities <- c(0.654,0.551,0.443,0.341,0.251,0.179,0.124,0.084,0.056,0.037,0.024)
qplot(x = Temperature, y = Model_estimated_probabilities, geom = "point") +
stat_smooth(method = "glm", method.args = list(family = "binomial"), se = FALSE)
## Warning in eval(family$initialize): non-integer #successes in a binomial
## glm!