physical_data = read.table(url("https://gksmyth.github.io/ozdasl/oz/physical.txt"), header = T)
physical_data = physical_data[c("Height", "Chest", "Waist", "Fore", "Mass")]
head(physical_data)
## Height Chest Waist Fore Mass
## 1 178 100 85.0 28.5 77.0
## 2 187 107 90.5 29.5 85.5
## 3 175 94 80.5 25.0 63.0
## 4 183 104 91.5 28.5 80.5
## 5 174 107 92.0 28.5 79.5
## 6 180 112 101.0 30.5 94.0
Answer:
pd_model = lm( Mass ~ Height+ Chest+ Waist+ Fore, data = physical_data )
summary(pd_model)
##
## Call:
## lm(formula = Mass ~ Height + Chest + Waist + Fore, data = physical_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.8991 -1.2602 0.5069 1.4681 4.0036
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -112.75051 17.08728 -6.599 4.52e-06 ***
## Height 0.27917 0.09668 2.887 0.01023 *
## Chest 0.12336 0.13574 0.909 0.37616
## Waist 0.69605 0.11557 6.023 1.37e-05 ***
## Fore 2.34065 0.51371 4.556 0.00028 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.508 on 17 degrees of freedom
## Multiple R-squared: 0.9576, Adjusted R-squared: 0.9477
## F-statistic: 96.05 on 4 and 17 DF, p-value: 1.955e-11
Answer: The coefficient for Waist in your model indicates the expected change in Mass for a one-unit change in Waist, while holding other factors constant, reflecting its potential impact on body weight.
Answer: The slope coefficient in your regression model indicates how much the Mass of a male subject is expected to change with a one-unit increase in the corresponding physical measurement, holding other measurements constant. This helps to understand the relative importance of each measurement in predicting body mass.
Answer:
# Load the data
physical_data <- read.table(url("https://gksmyth.github.io/ozdasl/oz/physical.txt"), header = TRUE)
physical_data <- physical_data[c("Height", "Chest", "Waist", "Fore", "Mass")]
# Fit the linear regression model
model <- lm(Mass ~ Height + Chest + Waist + Fore, data = physical_data)
# Display the summary of the model
summary(model)
##
## Call:
## lm(formula = Mass ~ Height + Chest + Waist + Fore, data = physical_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.8991 -1.2602 0.5069 1.4681 4.0036
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -112.75051 17.08728 -6.599 4.52e-06 ***
## Height 0.27917 0.09668 2.887 0.01023 *
## Chest 0.12336 0.13574 0.909 0.37616
## Waist 0.69605 0.11557 6.023 1.37e-05 ***
## Fore 2.34065 0.51371 4.556 0.00028 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.508 on 17 degrees of freedom
## Multiple R-squared: 0.9576, Adjusted R-squared: 0.9477
## F-statistic: 96.05 on 4 and 17 DF, p-value: 1.955e-11
# Get the residual standard error
residual_standard_error <- summary(model)$sigma
# Calculate the error variance
error_variance <- residual_standard_error^2
# Print the results
cat("Estimated Error Standard Deviation:", residual_standard_error, "\n")
## Estimated Error Standard Deviation: 2.508251
cat("Estimated Error Variance:", error_variance, "\n")
## Estimated Error Variance: 6.291321
Answer:
# Load the data
physical_data <- read.table(url("https://gksmyth.github.io/ozdasl/oz/physical.txt"), header = TRUE)
physical_data <- physical_data[c("Height", "Chest", "Waist", "Fore", "Mass")]
# Fit the linear regression model
model <- lm(Mass ~ Height + Chest + Waist + Fore, data = physical_data)
# Display the summary of the model
model_summary <- summary(model)
print(model_summary)
##
## Call:
## lm(formula = Mass ~ Height + Chest + Waist + Fore, data = physical_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.8991 -1.2602 0.5069 1.4681 4.0036
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -112.75051 17.08728 -6.599 4.52e-06 ***
## Height 0.27917 0.09668 2.887 0.01023 *
## Chest 0.12336 0.13574 0.909 0.37616
## Waist 0.69605 0.11557 6.023 1.37e-05 ***
## Fore 2.34065 0.51371 4.556 0.00028 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.508 on 17 degrees of freedom
## Multiple R-squared: 0.9576, Adjusted R-squared: 0.9477
## F-statistic: 96.05 on 4 and 17 DF, p-value: 1.955e-11
# Get R-squared and Adjusted R-squared
r_squared <- model_summary$r.squared
adjusted_r_squared <- model_summary$adj.r.squared
# Print the results
cat("R-squared:", r_squared, "\n")
## R-squared: 0.9576283
cat("Adjusted R-squared:", adjusted_r_squared, "\n")
## Adjusted R-squared: 0.9476585
Answer: \(H_0\): β 1=β 2=β 3=β 4 =0 \(H_A\): At least one βi not equal 0
Fail to reject the null hypothesis.
# Load the data
physical_data <- read.table(url("https://gksmyth.github.io/ozdasl/oz/physical.txt"), header = TRUE)
physical_data <- physical_data[c("Height", "Chest", "Waist", "Fore", "Mass")]
# Fit the linear regression model
model <- lm(Mass ~ Height + Chest + Waist + Fore, data = physical_data)
# Display the summary of the model
model_summary <- summary(model)
print(model_summary)
##
## Call:
## lm(formula = Mass ~ Height + Chest + Waist + Fore, data = physical_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.8991 -1.2602 0.5069 1.4681 4.0036
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -112.75051 17.08728 -6.599 4.52e-06 ***
## Height 0.27917 0.09668 2.887 0.01023 *
## Chest 0.12336 0.13574 0.909 0.37616
## Waist 0.69605 0.11557 6.023 1.37e-05 ***
## Fore 2.34065 0.51371 4.556 0.00028 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.508 on 17 degrees of freedom
## Multiple R-squared: 0.9576, Adjusted R-squared: 0.9477
## F-statistic: 96.05 on 4 and 17 DF, p-value: 1.955e-11
Answer: \(H_0\): βChest =0 \(H_A\): βChestnot =0. p-value: 1.955e-11; Fail to reject hypothesis.
# Load the data
physical_data <- read.table(url("https://gksmyth.github.io/ozdasl/oz/physical.txt"), header = TRUE)
physical_data <- physical_data[c("Height", "Chest", "Waist", "Fore", "Mass")]
# Fit the linear regression model
model <- lm(Mass ~ Height + Chest + Waist + Fore, data = physical_data)
# Display the summary of the model
model_summary <- summary(model)
print(model_summary)
##
## Call:
## lm(formula = Mass ~ Height + Chest + Waist + Fore, data = physical_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.8991 -1.2602 0.5069 1.4681 4.0036
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -112.75051 17.08728 -6.599 4.52e-06 ***
## Height 0.27917 0.09668 2.887 0.01023 *
## Chest 0.12336 0.13574 0.909 0.37616
## Waist 0.69605 0.11557 6.023 1.37e-05 ***
## Fore 2.34065 0.51371 4.556 0.00028 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.508 on 17 degrees of freedom
## Multiple R-squared: 0.9576, Adjusted R-squared: 0.9477
## F-statistic: 96.05 on 4 and 17 DF, p-value: 1.955e-11
Answer:
# Load the data
physical_data <- read.table(url("https://gksmyth.github.io/ozdasl/oz/physical.txt"), header = TRUE)
physical_data <- physical_data[c("Height", "Chest", "Waist", "Fore", "Mass")]
# Fit the linear regression model
model <- lm(Mass ~ Height + Chest + Waist + Fore, data = physical_data)
# Display the summary of the model
model_summary <- summary(model)
print(model_summary)
##
## Call:
## lm(formula = Mass ~ Height + Chest + Waist + Fore, data = physical_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.8991 -1.2602 0.5069 1.4681 4.0036
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -112.75051 17.08728 -6.599 4.52e-06 ***
## Height 0.27917 0.09668 2.887 0.01023 *
## Chest 0.12336 0.13574 0.909 0.37616
## Waist 0.69605 0.11557 6.023 1.37e-05 ***
## Fore 2.34065 0.51371 4.556 0.00028 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.508 on 17 degrees of freedom
## Multiple R-squared: 0.9576, Adjusted R-squared: 0.9477
## F-statistic: 96.05 on 4 and 17 DF, p-value: 1.955e-11
# Calculate the 95% confidence intervals
conf_intervals <- confint(model, level = 0.95)
print(conf_intervals)
## 2.5 % 97.5 %
## (Intercept) -148.80152106 -76.6994969
## Height 0.07518816 0.4831559
## Chest -0.16302096 0.4097352
## Waist 0.45221337 0.9398799
## Fore 1.25682394 3.4244712
Answer: Yes, multicollinearity in the following data.
# Load the data
physical_data <- read.table(url("https://gksmyth.github.io/ozdasl/oz/physical.txt"), header = TRUE)
physical_data <- physical_data[c("Height", "Chest", "Waist", "Fore", "Mass")]
# Calculate the correlation matrix for the predictor variables
correlation_matrix <- cor(physical_data[c("Height", "Chest", "Waist", "Fore")])
print(correlation_matrix)
## Height Chest Waist Fore
## Height 1.0000000 0.2016911 0.3512769 0.3536844
## Chest 0.2016911 1.0000000 0.7153804 0.7670703
## Waist 0.3512769 0.7153804 1.0000000 0.7577464
## Fore 0.3536844 0.7670703 0.7577464 1.0000000
Answer: No multicollinearlity in the following data.
# Load the data
physical_data <- read.table(url("https://gksmyth.github.io/ozdasl/oz/physical.txt"), header = TRUE)
physical_data <- physical_data[c("Height", "Chest", "Waist", "Fore", "Mass")]
# Fit the linear regression model
model <- lm(Mass ~ Height + Chest + Waist + Fore, data = physical_data)
# Calculate VIF
vif_values <-(model)
print(vif_values)
##
## Call:
## lm(formula = Mass ~ Height + Chest + Waist + Fore, data = physical_data)
##
## Coefficients:
## (Intercept) Height Chest Waist Fore
## -112.7505 0.2792 0.1234 0.6960 2.3406
Answer: Yes the residuals follow normality. The points on the QQ-plot lie approximately along the reference line, it indicates that the residuals follow a normal distribution.
# Load the data
physical_data <- read.table(url("https://gksmyth.github.io/ozdasl/oz/physical.txt"), header = TRUE)
physical_data <- physical_data[c("Height", "Chest", "Waist", "Fore", "Mass")]
# Fit the linear regression model
model <- lm(Mass ~ Height + Chest + Waist + Fore, data = physical_data)
# QQ-plot of residuals
qqnorm(residuals(model))
qqline(residuals(model), col = "red") # Add a reference line
Answer: p-value = 0.5661
# Fit a linear regression model
model <- lm(Mass ~ Height + Chest + Waist + Fore, data = physical_data)
# Extract residuals
residuals <- resid(model)
# Perform the Shapiro-Wilk test
shapiro_test <- shapiro.test(residuals)
# Print the result of the Shapiro-Wilk test
print(shapiro_test)
##
## Shapiro-Wilk normality test
##
## data: residuals
## W = 0.96364, p-value = 0.5661
rabbit_data = read.table(url("http://www.statsci.org/data/oz/rabbit.txt"), header = T)
head(rabbit_data)
## Age Lens
## 1 15 21.66
## 2 15 22.75
## 3 15 22.30
## 4 18 31.25
## 5 28 44.79
## 6 29 40.55
Answer:
# Load the data
rabbit_data <- read.table(url("http://www.statsci.org/data/oz/rabbit.txt"), header = TRUE)
# Create the scatterplot
plot(rabbit_data$Age, rabbit_data$Lens,
main = "Scatterplot of Age vs. Lens Weight",
xlab = "Age (days)",
ylab = "Lens Weight (mg)",
pch = 19,
col = "blue")
Answer:
# Load the data
rabbit_data <- read.table(url("http://www.statsci.org/data/oz/rabbit.txt"), header = TRUE)
# Fit the linear regression model
linear_model <- lm(Lens ~ Age, data = rabbit_data)
# Display the summary of the model
summary(linear_model)
##
## Call:
## lm(formula = Lens ~ Age, data = rabbit_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -64.122 -26.517 5.146 27.214 51.219
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 80.17337 5.83405 13.74 <2e-16 ***
## Age 0.26820 0.01812 14.80 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 32.19 on 69 degrees of freedom
## Multiple R-squared: 0.7605, Adjusted R-squared: 0.757
## F-statistic: 219.1 on 1 and 69 DF, p-value: < 2.2e-16
# Create the residuals vs. fitted plot
plot(linear_model$fitted.values, linear_model$residuals,
main = "Residuals vs. Fitted Values",
xlab = "Fitted Values",
ylab = "Residuals",
pch = 19,
col = "blue")
# Add a horizontal line at y = 0
abline(h = 0, col = "red", lty = 2)
Answer:
# Load the data
rabbit_data <- read.table(url("http://www.statsci.org/data/oz/rabbit.txt"), header = TRUE)
# Fit the polynomial regression model (2nd degree)
polynomial_model <- lm(Lens ~ Age + I(Age^2), data = rabbit_data)
# Display the summary of the polynomial model
summary(polynomial_model)
##
## Call:
## lm(formula = Lens ~ Age + I(Age^2), data = rabbit_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -38.090 -10.583 1.431 9.929 38.668
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.558e+01 3.841e+00 9.264 1.13e-13 ***
## Age 6.978e-01 2.786e-02 25.050 < 2e-16 ***
## I(Age^2) -5.783e-04 3.579e-05 -16.156 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 14.74 on 68 degrees of freedom
## Multiple R-squared: 0.9505, Adjusted R-squared: 0.949
## F-statistic: 652.9 on 2 and 68 DF, p-value: < 2.2e-16
Answer: Linear = 0.757 Polynomial = 0.949. here our first model is the preferred second by following adjusted \(R^2\).
summary(polynomial_model)
##
## Call:
## lm(formula = Lens ~ Age + I(Age^2), data = rabbit_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -38.090 -10.583 1.431 9.929 38.668
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.558e+01 3.841e+00 9.264 1.13e-13 ***
## Age 6.978e-01 2.786e-02 25.050 < 2e-16 ***
## I(Age^2) -5.783e-04 3.579e-05 -16.156 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 14.74 on 68 degrees of freedom
## Multiple R-squared: 0.9505, Adjusted R-squared: 0.949
## F-statistic: 652.9 on 2 and 68 DF, p-value: < 2.2e-16
Cowles_data <- read.csv("https://raw.githubusercontent.com/ejvanholm/DataProjects/master/Cowles.csv")
Cowles_data = Cowles_data[c("neuroticism", "extraversion", "volunteer")]
Cowles_data$volunteer = as.numeric(Cowles_data$volunteer == "yes")
Cowles_data[431:437, ]
## neuroticism extraversion volunteer
## 431 10 18 0
## 432 10 14 0
## 433 5 14 0
## 434 8 14 0
## 435 15 16 1
## 436 12 13 1
## 437 9 10 1
Answer: Logistic regression is the appropriate choice for this analysis because it handles the binary nature of the response variable appropriately, adheres to the assumptions required for valid inference, and provides meaningful interpretations in the context of predicting probabilities.
Answer:
logistic_model = glm(volunteer ~ neuroticism + extraversion, data = Cowles_data, family = "binomial")
summary(logistic_model)
##
## Call:
## glm(formula = volunteer ~ neuroticism + extraversion, family = "binomial",
## data = Cowles_data)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.28106 0.23682 -5.409 6.32e-08 ***
## neuroticism 0.01078 0.01115 0.967 0.333
## extraversion 0.06705 0.01423 4.712 2.45e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1933.5 on 1420 degrees of freedom
## Residual deviance: 1910.5 on 1418 degrees of freedom
## AIC: 1916.5
##
## Number of Fisher Scoring iterations: 4
Answer: logit(P)=−1.5+0.03×neuroticism+0.02×extraversion
Answer:
# Load the data
Cowles_data <- read.csv("https://raw.githubusercontent.com/ejvanholm/DataProjects/master/Cowles.csv")
# Clean the data for our use
Cowles_data = Cowles_data[c("neuroticism", "extraversion", "volunteer")]
Cowles_data$volunteer = as.numeric(Cowles_data$volunteer == "yes")
# Fit the initial logistic regression model
full_model = glm(volunteer ~ neuroticism + extraversion, data = Cowles_data, family = "binomial")
# Show the summary of the full model
summary(full_model)
##
## Call:
## glm(formula = volunteer ~ neuroticism + extraversion, family = "binomial",
## data = Cowles_data)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.28106 0.23682 -5.409 6.32e-08 ***
## neuroticism 0.01078 0.01115 0.967 0.333
## extraversion 0.06705 0.01423 4.712 2.45e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1933.5 on 1420 degrees of freedom
## Residual deviance: 1910.5 on 1418 degrees of freedom
## AIC: 1916.5
##
## Number of Fisher Scoring iterations: 4
# Perform backward elimination
final_model <- step(full_model, direction = "backward")
## Start: AIC=1916.55
## volunteer ~ neuroticism + extraversion
##
## Df Deviance AIC
## - neuroticism 1 1911.5 1915.5
## <none> 1910.5 1916.5
## - extraversion 1 1933.3 1937.3
##
## Step: AIC=1915.48
## volunteer ~ extraversion
##
## Df Deviance AIC
## <none> 1911.5 1915.5
## - extraversion 1 1933.5 1935.5
# Display the summary of the final model
summary(final_model)
##
## Call:
## glm(formula = volunteer ~ extraversion, family = "binomial",
## data = Cowles_data)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.13942 0.18538 -6.146 7.93e-10 ***
## extraversion 0.06561 0.01414 4.640 3.49e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1933.5 on 1420 degrees of freedom
## Residual deviance: 1911.5 on 1419 degrees of freedom
## AIC: 1915.5
##
## Number of Fisher Scoring iterations: 4
Answer:
# Load necessary library
# install.packages("MASS") # Uncomment if you need to install the package
library(MASS)
# Load the dataset
Cowles_data <- read.csv("https://raw.githubusercontent.com/ejvanholm/DataProjects/master/Cowles.csv")
Cowles_data = Cowles_data[c("neuroticism", "extraversion", "volunteer")]
Cowles_data$volunteer = as.numeric(Cowles_data$volunteer == "yes")
# Fit the full model
full_model <- glm(volunteer ~ neuroticism + extraversion, data = Cowles_data, family = binomial)
# Display the summary of the full model
summary(full_model)
##
## Call:
## glm(formula = volunteer ~ neuroticism + extraversion, family = binomial,
## data = Cowles_data)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.28106 0.23682 -5.409 6.32e-08 ***
## neuroticism 0.01078 0.01115 0.967 0.333
## extraversion 0.06705 0.01423 4.712 2.45e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1933.5 on 1420 degrees of freedom
## Residual deviance: 1910.5 on 1418 degrees of freedom
## AIC: 1916.5
##
## Number of Fisher Scoring iterations: 4
# Perform backward selection
final_model <- step(full_model, direction = "backward")
## Start: AIC=1916.55
## volunteer ~ neuroticism + extraversion
##
## Df Deviance AIC
## - neuroticism 1 1911.5 1915.5
## <none> 1910.5 1916.5
## - extraversion 1 1933.3 1937.3
##
## Step: AIC=1915.48
## volunteer ~ extraversion
##
## Df Deviance AIC
## <none> 1911.5 1915.5
## - extraversion 1 1933.5 1935.5
# Display the summary of the final model
summary(final_model)
##
## Call:
## glm(formula = volunteer ~ extraversion, family = binomial, data = Cowles_data)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.13942 0.18538 -6.146 7.93e-10 ***
## extraversion 0.06561 0.01414 4.640 3.49e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1933.5 on 1420 degrees of freedom
## Residual deviance: 1911.5 on 1419 degrees of freedom
## AIC: 1915.5
##
## Number of Fisher Scoring iterations: 4
Courtesy: I am thankful to Dr. Eric van Holm’s Introduction to Research Methods.↩︎