library(MASS)
# Load the dataset
data(cpus)
# Explore the structure of the dataset
str(cpus)
## 'data.frame': 209 obs. of 9 variables:
## $ name : Factor w/ 209 levels "ADVISOR 32/60",..: 1 3 2 4 5 6 8 9 10 7 ...
## $ syct : int 125 29 29 29 29 26 23 23 23 23 ...
## $ mmin : int 256 8000 8000 8000 8000 8000 16000 16000 16000 32000 ...
## $ mmax : int 6000 32000 32000 32000 16000 32000 32000 32000 64000 64000 ...
## $ cach : int 256 32 32 32 32 64 64 64 64 128 ...
## $ chmin : int 16 8 8 8 8 8 16 16 16 32 ...
## $ chmax : int 128 32 32 32 16 32 32 32 32 64 ...
## $ perf : int 198 269 220 172 132 318 367 489 636 1144 ...
## $ estperf: int 199 253 253 253 132 290 381 381 749 1238 ...
# Perform linear regression
model <- lm(estperf ~ ., data = cpus)
# Summarize the regression results
summary(model)
##
## Call:
## lm(formula = estperf ~ ., data = cpus)
##
## Residuals:
## ALL 209 residuals are 0: no residual degrees of freedom!
##
## Coefficients: (7 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 199 NaN NaN NaN
## nameAMDAHL 470/7A 54 NaN NaN NaN
## nameAMDAHL 470V/7 54 NaN NaN NaN
## nameAMDAHL 470V/7B 54 NaN NaN NaN
## nameAMDAHL 470V/7C -67 NaN NaN NaN
## nameAMDAHL 470V/8 91 NaN NaN NaN
## nameAMDAHL 580 5880 1039 NaN NaN NaN
## nameAMDAHL 580-5840 182 NaN NaN NaN
## nameAMDAHL 580-5850 182 NaN NaN NaN
## nameAMDAHL 580-5860 550 NaN NaN NaN
## nameAPOLLO DN320 -176 NaN NaN NaN
## nameAPOLLO DN420 -175 NaN NaN NaN
## nameBASF 7/65 -129 NaN NaN NaN
## nameBASF 7/68 -82 NaN NaN NaN
## nameBTI 5000 -184 NaN NaN NaN
## nameBTI 8000 -135 NaN NaN NaN
## nameBURROUGHS B1955 -176 NaN NaN NaN
## nameBURROUGHS B2900 -170 NaN NaN NaN
## nameBURROUGHS B2925 -177 NaN NaN NaN
## nameBURROUGHS B4955 -75 NaN NaN NaN
## nameBURROUGHS B5900 -164 NaN NaN NaN
## nameBURROUGHS B5920 -160 NaN NaN NaN
## nameBURROUGHS B6900 -159 NaN NaN NaN
## nameBURROUGHS B6925 -154 NaN NaN NaN
## nameC.R.D. 68/10-80 -171 NaN NaN NaN
## nameC.R.D. UNIVERSE 2203T -178 NaN NaN NaN
## nameC.R.D. UNIVERSE 68 -171 NaN NaN NaN
## nameC.R.D. UNIVERSE 68/05 -177 NaN NaN NaN
## nameC.R.D. UNIVERSE 68/137 -171 NaN NaN NaN
## nameC.R.D. UNIVERSE 68/37 -172 NaN NaN NaN
## nameCAMBEX 1636-1 -169 NaN NaN NaN
## nameCAMBEX 1636-10 -158 NaN NaN NaN
## nameCAMBEX 1641-1 -125 NaN NaN NaN
## nameCAMBEX 1641-11 -125 NaN NaN NaN
## nameCAMBEX 1651-1 -125 NaN NaN NaN
## nameCDC CYBER 170/750 -97 NaN NaN NaN
## nameCDC CYBER 170/760 -97 NaN NaN NaN
## nameCDC CYBER 170/815 -125 NaN NaN NaN
## nameCDC CYBER 170/825 -125 NaN NaN NaN
## nameCDC CYBER 170/835 -61 NaN NaN NaN
## nameCDC CYBER 170/845 -63 NaN NaN NaN
## nameCDC OMEGA 480-I -176 NaN NaN NaN
## nameCDC OMEGA 480-II -170 NaN NaN NaN
## nameCDC OMEGA 480-III -155 NaN NaN NaN
## nameDEC DECSYS 10 1091 -145 NaN NaN NaN
## nameDEC DECSYS 20 2060 -158 NaN NaN NaN
## nameDEC MICROVAX-1 -181 NaN NaN NaN
## nameDEC VAX 11/730 -171 NaN NaN NaN
## nameDEC VAX 11/750 -163 NaN NaN NaN
## nameDEC VAX 11/780 -161 NaN NaN NaN
## nameDG ECLIPSE C/350 -165 NaN NaN NaN
## nameDG ECLIPSE M/600 -180 NaN NaN NaN
## nameDG ECLIPSE MV/1000 -127 NaN NaN NaN
## nameDG ECLIPSE MV/4000 -163 NaN NaN NaN
## nameDG ECLIPSE MV/6000 -169 NaN NaN NaN
## nameDG ECLIPSE MV/8000 -143 NaN NaN NaN
## nameDG ECLIPSE MV/8000 II -157 NaN NaN NaN
## nameFORMATION F4000/100 -165 NaN NaN NaN
## nameFORMATION F4000/200 -165 NaN NaN NaN
## nameFORMATION F4000/200AP -165 NaN NaN NaN
## nameFORMATION F4000/300 -165 NaN NaN NaN
## nameFORMATION F4000/300AP -165 NaN NaN NaN
## nameFOUR PHASE 2000/260 -180 NaN NaN NaN
## nameGOULD CONCEPT 32/8705 -124 NaN NaN NaN
## nameGOULD CONCEPT 32/8750 -86 NaN NaN NaN
## nameGOULD CONCEPT 32/8780 -42 NaN NaN NaN
## nameHARRIS 100 -176 NaN NaN NaN
## nameHARRIS 300 -174 NaN NaN NaN
## nameHARRIS 500 -147 NaN NaN NaN
## nameHARRIS 600 -172 NaN NaN NaN
## nameHARRIS 700 -149 NaN NaN NaN
## nameHARRIS 80 -181 NaN NaN NaN
## nameHARRIS 800 -146 NaN NaN NaN
## nameHONEYWELL DPS 6/35 -176 NaN NaN NaN
## nameHONEYWELL DPS 6/92 -169 NaN NaN NaN
## nameHONEYWELL DPS 6/96 -126 NaN NaN NaN
## nameHONEYWELL DPS 7/35 -179 NaN NaN NaN
## nameHONEYWELL DPS 7/45 -174 NaN NaN NaN
## nameHONEYWELL DPS 7/55 -171 NaN NaN NaN
## nameHONEYWELL DPS 7/65 -170 NaN NaN NaN
## nameHONEYWELL DPS 8/20 -167 NaN NaN NaN
## nameHONEYWELL DPS 8/44 -167 NaN NaN NaN
## nameHONEYWELL DPS 8/49 -24 NaN NaN NaN
## nameHONEYWELL DPS 8/50 -142 NaN NaN NaN
## nameHONEYWELL DPS 8/52 -18 NaN NaN NaN
## nameHONEYWELL DPS 8/62 -18 NaN NaN NaN
## nameHP 3000/30 -181 NaN NaN NaN
## nameHP 3000/40 -179 NaN NaN NaN
## nameHP 3000/44 -171 NaN NaN NaN
## nameHP 3000/48 -166 NaN NaN NaN
## nameHP 3000/64 -152 NaN NaN NaN
## nameHP 3000/88 -145 NaN NaN NaN
## nameHP 3000/III -179 NaN NaN NaN
## nameIBM 3033 S -117 NaN NaN NaN
## nameIBM 3033 U -28 NaN NaN NaN
## nameIBM 3081 162 NaN NaN NaN
## nameIBM 3081 D 151 NaN NaN NaN
## nameIBM 3083 B 21 NaN NaN NaN
## nameIBM 3083 E -86 NaN NaN NaN
## nameIBM 370/125-2 -184 NaN NaN NaN
## nameIBM 370/148 -178 NaN NaN NaN
## nameIBM 370/158-3 -164 NaN NaN NaN
## nameIBM 38/3 -181 NaN NaN NaN
## nameIBM 38/4 -179 NaN NaN NaN
## nameIBM 38/5 -179 NaN NaN NaN
## nameIBM 38/7 -171 NaN NaN NaN
## nameIBM 38/8 -154 NaN NaN NaN
## nameIBM 4321 -181 NaN NaN NaN
## nameIBM 4331-1 -182 NaN NaN NaN
## nameIBM 4331-11 -173 NaN NaN NaN
## nameIBM 4331-2 -171 NaN NaN NaN
## nameIBM 4341 -171 NaN NaN NaN
## nameIBM 4341-1 -168 NaN NaN NaN
## nameIBM 4341-10 -168 NaN NaN NaN
## nameIBM 4341-11 -157 NaN NaN NaN
## nameIBM 4341-12 -123 NaN NaN NaN
## nameIBM 4341-2 -123 NaN NaN NaN
## nameIBM 4341-9 -173 NaN NaN NaN
## nameIBM 4361-4 -140 NaN NaN NaN
## nameIBM 4361-5 -134 NaN NaN NaN
## nameIBM 4381-1 -98 NaN NaN NaN
## nameIBM 4381-2 -83 NaN NaN NaN
## nameIBM 8130 A -181 NaN NaN NaN
## nameIBM 8130 B -179 NaN NaN NaN
## nameIBM 8140 -179 NaN NaN NaN
## nameIPL 4436 -169 NaN NaN NaN
## nameIPL 4443 -155 NaN NaN NaN
## nameIPL 4445 -155 NaN NaN NaN
## nameIPL 4446 -117 NaN NaN NaN
## nameIPL 4460 -117 NaN NaN NaN
## nameIPL 4480 -71 NaN NaN NaN
## nameMAGNUSON M80/30 -162 NaN NaN NaN
## nameMAGNUSON M80/31 -153 NaN NaN NaN
## nameMAGNUSON M80/32 -153 NaN NaN NaN
## nameMAGNUSON M80/42 -119 NaN NaN NaN
## nameMAGNUSON M80/43 -111 NaN NaN NaN
## nameMAGNUSON M80/44 -111 NaN NaN NaN
## nameMICRODATA SEQ.MS/3200 -166 NaN NaN NaN
## nameNAS AS/3000 -153 NaN NaN NaN
## nameNAS AS/3000 N -170 NaN NaN NaN
## nameNAS AS/5000 -146 NaN NaN NaN
## nameNAS AS/5000 E -146 NaN NaN NaN
## nameNAS AS/5000 N -158 NaN NaN NaN
## nameNAS AS/6130 -113 NaN NaN NaN
## nameNAS AS/6150 -104 NaN NaN NaN
## nameNAS AS/6620 -92 NaN NaN NaN
## nameNAS AS/6630 -82 NaN NaN NaN
## nameNAS AS/6650 -80 NaN NaN NaN
## nameNAS AS/7000 -79 NaN NaN NaN
## nameNAS AS/7000 N -151 NaN NaN NaN
## nameNAS AS/8040 -73 NaN NaN NaN
## nameNAS AS/8050 67 NaN NaN NaN
## nameNAS AS/8060 71 NaN NaN NaN
## nameNAS AS/9000 DPC 227 NaN NaN NaN
## nameNAS AS/9000 N -48 NaN NaN NaN
## nameNAS AS/9040 68 NaN NaN NaN
## nameNAS AS/9060 404 NaN NaN NaN
## nameNCR V8535 II -180 NaN NaN NaN
## nameNCR V8545 II -178 NaN NaN NaN
## nameNCR V8555 II -173 NaN NaN NaN
## nameNCR V8565 II -164 NaN NaN NaN
## nameNCR V8565 II E -158 NaN NaN NaN
## nameNCR V8575 II -152 NaN NaN NaN
## nameNCR V8585 II -137 NaN NaN NaN
## nameNCR V8595 II -121 NaN NaN NaN
## nameNCR V8635 -57 NaN NaN NaN
## nameNCR V8635 -119 NaN NaN NaN
## nameNCR V8650 -119 NaN NaN NaN
## nameNCR V8665 82 NaN NaN NaN
## nameNCR V8670 -9 NaN NaN NaN
## nameNIXDORF 8890/30 -178 NaN NaN NaN
## nameNIXDORF 8890/50 -174 NaN NaN NaN
## nameNIXDORF 8890/70 -132 NaN NaN NaN
## namePERKIN-ELMER 3205 -175 NaN NaN NaN
## namePERKIN-ELMER 3210 -175 NaN NaN NaN
## namePERKIN-ELMER 3230 -135 NaN NaN NaN
## namePRIME 50-2250 -174 NaN NaN NaN
## namePRIME 50-250 II -179 NaN NaN NaN
## namePRIME 50-550 II -170 NaN NaN NaN
## namePRIME 50-750 II -156 NaN NaN NaN
## namePRIME 50-850 II -146 NaN NaN NaN
## nameSIEMENS 7.521 -180 NaN NaN NaN
## nameSIEMENS 7.531 -177 NaN NaN NaN
## nameSIEMENS 7.536 -168 NaN NaN NaN
## nameSIEMENS 7.541 -158 NaN NaN NaN
## nameSIEMENS 7.551 -152 NaN NaN NaN
## nameSIEMENS 7.561 -100 NaN NaN NaN
## nameSIEMENS 7.865-2 -132 NaN NaN NaN
## nameSIEMENS 7.870-2 -118 NaN NaN NaN
## nameSIEMENS 7.872-2 -50 NaN NaN NaN
## nameSIEMENS 7.875-2 -16 NaN NaN NaN
## nameSIEMENS 7.880-2 76 NaN NaN NaN
## nameSIEMENS 7.881-2 183 NaN NaN NaN
## nameSPERRY 1100/61 H1 -143 NaN NaN NaN
## nameSPERRY 1100/81 -17 NaN NaN NaN
## nameSPERRY 1100/82 28 NaN NaN NaN
## nameSPERRY 1100/83 142 NaN NaN NaN
## nameSPERRY 1100/84 161 NaN NaN NaN
## nameSPERRY 1100/93 720 NaN NaN NaN
## nameSPERRY 1100/94 779 NaN NaN NaN
## nameSPERRY 80/3 -175 NaN NaN NaN
## nameSPERRY 80/4 -175 NaN NaN NaN
## nameSPERRY 80/5 -175 NaN NaN NaN
## nameSPERRY 80/6 -175 NaN NaN NaN
## nameSPERRY 80/8 -162 NaN NaN NaN
## nameSPERRY 90/80 MODEL 3 -149 NaN NaN NaN
## nameSTRATUS 32 -158 NaN NaN NaN
## nameWANG VS 90 -174 NaN NaN NaN
## nameWANG VS10 -152 NaN NaN NaN
## syct NA NA NA NA
## mmin NA NA NA NA
## mmax NA NA NA NA
## cach NA NA NA NA
## chmin NA NA NA NA
## chmax NA NA NA NA
## perf NA NA NA NA
##
## Residual standard error: NaN on 0 degrees of freedom
## Multiple R-squared: 1, Adjusted R-squared: NaN
## F-statistic: NaN on 208 and 0 DF, p-value: NA
summary(cpus)
## name syct mmin mmax
## ADVISOR 32/60 : 1 Min. : 17.0 Min. : 64 Min. : 64
## AMDAHL 470/7A : 1 1st Qu.: 50.0 1st Qu.: 768 1st Qu.: 4000
## AMDAHL 470V/7 : 1 Median : 110.0 Median : 2000 Median : 8000
## AMDAHL 470V/7B: 1 Mean : 203.8 Mean : 2868 Mean :11796
## AMDAHL 470V/7C: 1 3rd Qu.: 225.0 3rd Qu.: 4000 3rd Qu.:16000
## AMDAHL 470V/8 : 1 Max. :1500.0 Max. :32000 Max. :64000
## (Other) :203
## cach chmin chmax perf
## Min. : 0.00 Min. : 0.000 Min. : 0.00 Min. : 6.0
## 1st Qu.: 0.00 1st Qu.: 1.000 1st Qu.: 5.00 1st Qu.: 27.0
## Median : 8.00 Median : 2.000 Median : 8.00 Median : 50.0
## Mean : 25.21 Mean : 4.699 Mean : 18.27 Mean : 105.6
## 3rd Qu.: 32.00 3rd Qu.: 6.000 3rd Qu.: 24.00 3rd Qu.: 113.0
## Max. :256.00 Max. :52.000 Max. :176.00 Max. :1150.0
##
## estperf
## Min. : 15.00
## 1st Qu.: 28.00
## Median : 45.00
## Mean : 99.33
## 3rd Qu.: 101.00
## Max. :1238.00
##
library(corrplot)
## corrplot 0.92 loaded
# Select numeric variables from the dataset
numeric_cps <- cpus[sapply(cpus, is.numeric)]
# Compute correlation matrix
cor_matrix <- cor(numeric_cps)
# Create a correlation heatmap
corrplot(cor_matrix, method = "color", type = "upper", order = "hclust")
# Load necessary library
library(MASS)
# Load the dataset
data(cpus)
# Fit linear regression model with stepwise variable selection
model_stepwise <- lm(estperf ~ . - name, data = cpus)
# Perform stepwise regression with AIC as the criterion
final_model <- step(model_stepwise, direction = "both", trace = 0)
# Summarize the final model
summary(final_model)
##
## Call:
## lm(formula = estperf ~ syct + mmin + mmax + cach + chmax + perf,
## data = cpus)
##
## Residuals:
## Min 1Q Median 3Q Max
## -117.541 -9.553 2.867 15.213 182.172
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -3.424e+01 4.713e+00 -7.264 8.01e-12 ***
## syct 3.778e-02 9.403e-03 4.018 8.27e-05 ***
## mmin 5.476e-03 1.100e-03 4.980 1.36e-06 ***
## mmax 3.375e-03 3.959e-04 8.526 3.55e-15 ***
## cach 1.238e-01 7.473e-02 1.656 0.09920 .
## chmax 3.443e-01 1.223e-01 2.814 0.00537 **
## perf 5.770e-01 3.708e-02 15.563 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 31.62 on 202 degrees of freedom
## Multiple R-squared: 0.9595, Adjusted R-squared: 0.9582
## F-statistic: 796.7 on 6 and 202 DF, p-value: < 2.2e-16
Loading Data: We load the cpus dataset from the MASS package.
Initial Model: We create an initial linear regression model (model_stepwise) including all predictors (.) except the name variable.
Stepwise Regression: We use the step() function to perform stepwise regression with the AIC criterion (direction = “both” indicates that both forward and backward steps are allowed). The trace = 0 argument suppresses the output during the stepwise selection process.
Final Model: The resulting final_model is the linear regression model with the selected predictors based on the stepwise regression process.
Summarize Model: We use summary() to display details of the final model, including coefficients, standard errors, t-values, and p-values.
# Obtain residuals from the final model
residuals <- residuals(final_model)
# Create a residual plot
plot(x = fitted(final_model), y = residuals,
xlab = "Fitted Values", ylab = "Residuals",
main = "Residual Plot")
# Add a horizontal line at y = 0 for reference
abline(h = 0, col = "red", lty = 2)
“The residual plot shows a generally random scatter of residuals around the zero line, indicating that the linear regression model adequately captures the underlying relationship between the predictors and the estimated performance. The spread of residuals are not relatively constant across different levels of fitted values, suggesting non-homoscedasticity. However, there are a few points with larger residuals that may warrant further investigation as potential outliers. Overall, the model seems to be a reasonable fit, but caution should be exercised in interpreting the results, especially in the presence of potential outliers.”
# Log-transform the response variable
cpus$log_estperf <- log(cpus$estperf)
# Fit a new linear regression model with the transformed response variable
model_transformed <- lm(log_estperf ~ . - name, data = cpus)
# Check the residual plot for heteroscedasticity
residuals_transformed <- residuals(model_transformed)
plot(x = fitted(model_transformed), y = residuals_transformed,
xlab = "Fitted Values", ylab = "Residuals",
main = "Residual Plot (Transformed Model)")
abline(h = 0, col = "red", lty = 2)
summary(model_transformed)
##
## Call:
## lm(formula = log_estperf ~ . - name, data = cpus)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.34248 -0.07284 0.01788 0.06913 0.34862
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.030e+00 1.961e-02 154.526 < 2e-16 ***
## syct -1.834e-04 3.618e-05 -5.068 9.13e-07 ***
## mmin 6.660e-05 4.374e-06 15.225 < 2e-16 ***
## mmax 7.997e-05 1.710e-06 46.774 < 2e-16 ***
## cach 8.782e-03 2.879e-04 30.507 < 2e-16 ***
## chmin 4.237e-03 1.669e-03 2.538 0.0119 *
## chmax 2.741e-03 4.834e-04 5.671 4.92e-08 ***
## perf 9.006e-04 2.034e-04 4.427 1.57e-05 ***
## estperf -4.724e-03 2.603e-04 -18.148 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.117 on 200 degrees of freedom
## Multiple R-squared: 0.985, Adjusted R-squared: 0.9844
## F-statistic: 1642 on 8 and 200 DF, p-value: < 2.2e-16
The low RSE (0.117) suggests that, on average, the model’s predictions deviate from the actual values by a small amount. The high R-squared (0.985) and adjusted R-squared (0.9844) values indicate that the model explains a large proportion of the variability in the estimated performance. The significant F-statistic suggests that the overall model is statistically significant, providing evidence that the predictors jointly have an impact on the response variable.
###Interpret all variables in your final model using complete sentences, making sure to account for the fact that this may be a multivariable model. Give interpretations in terms of as meaningful of units as possible (it may not be possible to use seconds for cycle time - the answer is too large, but you may use MB instead of kB, for instance). Adjust interpretations as needed, both for units, and the fact that our outcome has been log transformed (how do we get to the raw data values from a log transformation? Start by thinking: what is the inverse of the log function???)
Intercept:
The intercept is approximately 3.030. In the context of the log transformation, this represents the estimated log-transformed performance when all other predictor variables are zero. To interpret this in the original units, apply the inverse log transformation: exp(3.030).
syct (System Cycle Time): The coefficient for syct is approximately -1.834e-04. For each one-unit increase in syct, the estimated log-transformed performance decreases by -1.834e-04 units, holding all other variables constant.
mmin (Minimum Main Memory): The coefficient for mmin is approximately 6.660e-05. For each one-unit increase in mmin, the estimated log-transformed performance increases by 6.660e-05 units, holding all other variables constant.
mmax (Maximum Main Memory): The coefficient for mmax is approximately 7.997e-05. For each one-unit increase in mmax, the estimated log-transformed performance increases by 7.997e-05 units, holding all other variables constant.
cach (Cache Memory): The coefficient for cach is approximately 8.782e-03. For each one-unit increase in cach, the estimated log-transformed performance increases by 8.782e-03 units, holding all other variables constant.
chmin (Minimum Channels): The coefficient for chmin is approximately 4.237e-03. For each one-unit increase in chmin, the estimated log-transformed performance increases by 4.237e-03 units, holding all other variables constant.
chmax (Maximum Channels): The coefficient for chmax is approximately 2.741e-03. For each one-unit increase in chmax, the estimated log-transformed performance increases by 2.741e-03 units, holding all other variables constant.
perf (Published Performance): The coefficient for perf is approximately 9.006e-04. For each one-unit increase in perf, the estimated log-transformed performance increases by 9.006e-04 units, holding all other variables constant.
estperf (Estimated Performance): The coefficient for estperf is approximately -4.724e-03. For each one-unit increase in estperf, the estimated log-transformed performance decreases by -4.724e-03 units, holding all other variables constant.
These interpretations are based on the log-transformed scale. If you want to express these effects in the original units, you can apply the exponential function (inverse of the log transformation) to the coefficients. For example, exp(3.030) would give you the estimated performance when all other predictors are zero. Adjustments to the interpretation may be needed based on the specific units of your variables.
# Load necessary library
library(car)
## Loading required package: carData
# Assuming 'final_model' is the name of your model
vif_values <- vif(model_transformed)
# Print VIF values
print(vif_values)
## syct mmin mmax cach chmin chmax perf estperf
## 1.347415 4.374518 6.108787 2.078872 1.967216 2.399655 16.268406 24.663399
Based on these VIF values: syct, cach, chmin, and chmax have low to moderate VIF values, suggesting relatively low multicollinearity. mmin and mmax have moderate VIF values, indicating a moderate level of multicollinearity. perf and estperf have high VIF values, indicating potential high multicollinearity.
# Assuming 'final_model' is the name of your model
residuals <- residuals(model_transformed)
fitted_values <- fitted(model_transformed)
cooksd <- cooks.distance(model_transformed)
# Plot standardized residuals vs. fitted values
plot(x = fitted_values, y = rstandard(model_transformed),
xlab = "Fitted Values", ylab = "Standardized Residuals",
main = "Standardized Residuals vs. Fitted Values")
# Add a horizontal line at y = 0 for reference
abline(h = 0, col = "red", lty = 2)
# Identify influential observations using Cook's distance
influential_obs <- which(cooksd > 4 / length(fitted_values))
influential_obs
## 1 9 10 31 32 83 91 92 96 97 138 153 154 157 166 167 198 199 200
## 1 9 10 31 32 83 91 92 96 97 138 153 154 157 166 167 198 199 200
Yes there are outliers/influential values in this dataset.
# Load necessary library
library(MASS)
# Load the dataset
data(birthwt)
# View the structure of the dataset
str(birthwt)
## 'data.frame': 189 obs. of 10 variables:
## $ low : int 0 0 0 0 0 0 0 0 0 0 ...
## $ age : int 19 33 20 21 18 21 22 17 29 26 ...
## $ lwt : int 182 155 105 108 107 124 118 103 123 113 ...
## $ race : int 2 3 1 1 1 3 1 3 1 1 ...
## $ smoke: int 0 0 1 1 1 0 0 0 1 1 ...
## $ ptl : int 0 0 0 0 0 0 0 0 0 0 ...
## $ ht : int 0 0 0 0 0 0 0 0 0 0 ...
## $ ui : int 1 0 0 1 1 0 0 0 0 0 ...
## $ ftv : int 0 3 1 2 0 0 1 1 1 0 ...
## $ bwt : int 2523 2551 2557 2594 2600 2622 2637 2637 2663 2665 ...
# Fit logistic regression model
logistic_model <- glm(low ~ ., data = birthwt, family = binomial)
## Warning: glm.fit: algorithm did not converge
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
# Summarize the model
summary(logistic_model)
##
## Call:
## glm(formula = low ~ ., family = binomial, data = birthwt)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.161e+03 2.074e+05 0.006 0.996
## age 3.223e-01 1.787e+03 0.000 1.000
## lwt -1.733e-01 3.202e+02 -0.001 1.000
## race 6.494e-01 3.165e+04 0.000 1.000
## smoke -1.746e+01 7.668e+04 0.000 1.000
## ptl 1.267e+02 3.406e+05 0.000 1.000
## ht 3.636e+01 1.237e+05 0.000 1.000
## ui -6.183e+01 7.547e+04 -0.001 0.999
## ftv -8.925e+00 1.624e+04 -0.001 1.000
## bwt -4.466e-01 6.468e+01 -0.007 0.994
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2.3467e+02 on 188 degrees of freedom
## Residual deviance: 1.0537e-07 on 179 degrees of freedom
## AIC: 20
##
## Number of Fisher Scoring iterations: 25
# Fit discriminant analysis model
discriminant_model <- lda(low ~ ., data = birthwt)
# Summarize the model
summary(discriminant_model)
## Length Class Mode
## prior 2 -none- numeric
## counts 2 -none- numeric
## means 18 -none- numeric
## scaling 9 -none- numeric
## lev 2 -none- character
## svd 1 -none- numeric
## N 1 -none- numeric
## call 3 -none- call
## terms 3 terms call
## xlevels 0 -none- list
# Load necessary libraries
library(MASS)
# Load the dataset
data(birthwt)
summary(birthwt)
## low age lwt race
## Min. :0.0000 Min. :14.00 Min. : 80.0 Min. :1.000
## 1st Qu.:0.0000 1st Qu.:19.00 1st Qu.:110.0 1st Qu.:1.000
## Median :0.0000 Median :23.00 Median :121.0 Median :1.000
## Mean :0.3122 Mean :23.24 Mean :129.8 Mean :1.847
## 3rd Qu.:1.0000 3rd Qu.:26.00 3rd Qu.:140.0 3rd Qu.:3.000
## Max. :1.0000 Max. :45.00 Max. :250.0 Max. :3.000
## smoke ptl ht ui
## Min. :0.0000 Min. :0.0000 Min. :0.00000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.00000 1st Qu.:0.0000
## Median :0.0000 Median :0.0000 Median :0.00000 Median :0.0000
## Mean :0.3915 Mean :0.1958 Mean :0.06349 Mean :0.1481
## 3rd Qu.:1.0000 3rd Qu.:0.0000 3rd Qu.:0.00000 3rd Qu.:0.0000
## Max. :1.0000 Max. :3.0000 Max. :1.00000 Max. :1.0000
## ftv bwt
## Min. :0.0000 Min. : 709
## 1st Qu.:0.0000 1st Qu.:2414
## Median :0.0000 Median :2977
## Mean :0.7937 Mean :2945
## 3rd Qu.:1.0000 3rd Qu.:3487
## Max. :6.0000 Max. :4990
# Visualizations for categorical variables
categorical_vars <- sapply(birthwt, is.factor)
categorical_data <- birthwt[, categorical_vars]
# Manually select pairs of categorical variables for mosaic plots
pairs_for_mosaic <- list(c("race", "smoke"), c("race", "ht"), c("race", "ui"), c("smoke", "ht"), c("smoke", "ui"), c("ht", "ui"))
# Mosaic plots for selected pairs of categorical variables
mosaic_plots <- lapply(pairs_for_mosaic, function(pair) {
mosaicplot(table(birthwt[, c(pair, "low")]),
main = paste("Low Birthweight vs", pair[1], "and", pair[2]))
})
# Arrange and print mosaic plots
par(mfrow = c(2, 3)) # Adjust the layout if needed
invisible(lapply(mosaic_plots, print))
## NULL
## NULL
## NULL
## NULL
## NULL
## NULL
# Load necessary libraries
library(MASS)
library(ggplot2)
# Load the dataset
data(birthwt)
# Visualizations for categorical variables
par(mfrow = c(1, 2))
# Mosaic plot for race vs low
mosaicplot(table(birthwt$race, birthwt$low), main = "Low Birthweight vs Race")
# Mosaic plot for smoke vs low
mosaicplot(table(birthwt$smoke, birthwt$low), main = "Low Birthweight vs Smoke")
# Visualizations for quantitative variables
par(mfrow = c(2, 2))
# Box plot for age vs low
boxplot(age ~ low, data = birthwt, main = "Low Birthweight vs Age")
# Box plot for lwt vs low
boxplot(lwt ~ low, data = birthwt, main = "Low Birthweight vs LWT")
# Box plot for ptl vs low
boxplot(ptl ~ low, data = birthwt, main = "Low Birthweight vs PTL")
# Box plot for bwt vs low
boxplot(bwt ~ low, data = birthwt, main = "Low Birthweight vs BWT")
Race vs. Low Birthweight: “The mosaic plot for race vs. low birthweight shows that race category 1 (presumably indicating a specific race) has a higher proportion of low birthweights compared to other race categories.”
Smoke vs. Low Birthweight: “The mosaic plot for smoke vs. low birthweight indicates that smoking (smoke = 1) is associated with a higher proportion of low birthweights compared to non-smokers.”
Box Plots: Age vs. Low Birthweight: “In the box plot for age vs. low birthweight, we observe a slight tendency for younger mothers to have a higher likelihood of low birthweight compared to older mothers.”
LWT (LbWeight) vs. Low Birthweight: “The box plot for LWT (LbWeight) vs. low birthweight suggests that lower maternal weight is associated with a higher likelihood of low birthweight.”
PTL (Prior preterm labor) vs. Low Birthweight: “The box plot for PTL (prior preterm labor) vs. low birthweight shows that a history of prior preterm labor may be associated with an increased likelihood of low birthweight.”
BWT (Birthweight) vs. Low Birthweight: “The box plot for birthweight vs. low birthweight illustrates that infants with low birthweights have, on average, lower birthweights compared to those with normal birthweights.”
# Load necessary libraries
library(MASS)
# Load the dataset
data(birthwt)
# View the structure of the dataset
str(birthwt)
## 'data.frame': 189 obs. of 10 variables:
## $ low : int 0 0 0 0 0 0 0 0 0 0 ...
## $ age : int 19 33 20 21 18 21 22 17 29 26 ...
## $ lwt : int 182 155 105 108 107 124 118 103 123 113 ...
## $ race : int 2 3 1 1 1 3 1 3 1 1 ...
## $ smoke: int 0 0 1 1 1 0 0 0 1 1 ...
## $ ptl : int 0 0 0 0 0 0 0 0 0 0 ...
## $ ht : int 0 0 0 0 0 0 0 0 0 0 ...
## $ ui : int 1 0 0 1 1 0 0 0 0 0 ...
## $ ftv : int 0 3 1 2 0 0 1 1 1 0 ...
## $ bwt : int 2523 2551 2557 2594 2600 2622 2637 2637 2663 2665 ...
# Fit logistic regression model
logistic_model <- glm(low ~ age + lwt + race + smoke + ptl + ht + ui + ftv,
data = birthwt,
family = binomial)
# Display the summary of the model
summary(logistic_model)
##
## Call:
## glm(formula = low ~ age + lwt + race + smoke + ptl + ht + ui +
## ftv, family = binomial, data = birthwt)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.078975 1.276254 -0.062 0.95066
## age -0.035845 0.036472 -0.983 0.32569
## lwt -0.012387 0.006614 -1.873 0.06111 .
## race 0.453424 0.215294 2.106 0.03520 *
## smoke 0.937275 0.398458 2.352 0.01866 *
## ptl 0.542087 0.346168 1.566 0.11736
## ht 1.830720 0.694135 2.637 0.00835 **
## ui 0.721965 0.463174 1.559 0.11906
## ftv 0.063461 0.169765 0.374 0.70854
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 234.67 on 188 degrees of freedom
## Residual deviance: 204.19 on 180 degrees of freedom
## AIC: 222.19
##
## Number of Fisher Scoring iterations: 4
Coefficients for ptl and ftv: ptl (prior preterm labor): Estimate = 0.542087, Std. Error = 0.346168, z value = 1.566, p-value = 0.11736 ftv (number of physician visits): Estimate = 0.063461, Std. Error = 0.169765, z value = 0.374, p-value = 0.70854
Interpretation: ptl (prior preterm labor): Estimate: 0.542087
Interpretation: Holding other variables constant, the estimated increase in the log odds of having a low birthweight baby for individuals with a history of prior preterm labor (ptl) compared to those without is 0.542087. p-value: 0.11736 (not statistically significant at conventional significance levels) ftv (number of physician visits): Estimate: 0.063461
Interpretation: Holding other variables constant, the estimated increase in the log odds of having a low birthweight baby for each additional physician visit (ftv) is 0.063461. p-value: 0.70854 (not statistically significant at conventional significance levels)
Assumptions and Considerations:
ptl (prior preterm labor): The positive coefficient (0.542087) suggests an increase in the log odds of low birthweight for individuals with a history of prior preterm labor.
The p-value is 0.11736, indicating that the effect of ptl is not statistically significant at a conventional significance level of 0.05.
ftv (number of physician visits): The positive coefficient (0.063461) suggests a small increase in the log odds of low birthweight for each additional physician visit.
The p-value is 0.70854, indicating that the effect of ftv is not statistically significant at a conventional significance level of 0.05.
Considerations for Model Interpretation: While ptl has a positive coefficient, it’s important to note that the associated p-value suggests that the effect is not statistically significant. Interpretation should be cautious, and the variable’s impact might not be practically significant.
The non-significant p-value for ftv suggests that the number of physician visits does not have a statistically significant impact on the log odds of having a low birthweight baby.
# Create new variable ptl2
birthwt$ptl2 <- ifelse(birthwt$ptl > 0, 1, 0)
# View the updated dataset
head(birthwt)
## low age lwt race smoke ptl ht ui ftv bwt ptl2
## 85 0 19 182 2 0 0 0 1 0 2523 0
## 86 0 33 155 3 0 0 0 0 3 2551 0
## 87 0 20 105 1 1 0 0 0 1 2557 0
## 88 0 21 108 1 1 0 0 1 2 2594 0
## 89 0 18 107 1 1 0 0 1 0 2600 0
## 91 0 21 124 3 0 0 0 0 0 2622 0
# Create new variable ftv2
birthwt$ftv2 <- ifelse(birthwt$ftv > 0, 1, 0)
# View the updated dataset
head(birthwt)
## low age lwt race smoke ptl ht ui ftv bwt ptl2 ftv2
## 85 0 19 182 2 0 0 0 1 0 2523 0 0
## 86 0 33 155 3 0 0 0 0 3 2551 0 1
## 87 0 20 105 1 1 0 0 0 1 2557 0 1
## 88 0 21 108 1 1 0 0 1 2 2594 0 1
## 89 0 18 107 1 1 0 0 1 0 2600 0 0
## 91 0 21 124 3 0 0 0 0 0 2622 0 0
# Create a table summarizing low birthweight probabilities by levels of ftv2
table_summary <- table(birthwt$ftv2, birthwt$low)
# Add row and column names for clarity
rownames(table_summary) <- c("No Physician Visit", "At Least One Physician Visit")
colnames(table_summary) <- c("Normal Birthweight", "Low Birthweight")
# Display the table summary
table_summary
##
## Normal Birthweight Low Birthweight
## No Physician Visit 64 36
## At Least One Physician Visit 66 23
# Fit logistic regression model with ptl2 and ftv2
updated_logistic_model <- glm(low ~ age + lwt + race + smoke + ptl2 + ht + ui + ftv2,
data = birthwt,
family = binomial)
# Display the summary of the updated model
summary(updated_logistic_model)
##
## Call:
## glm(formula = low ~ age + lwt + race + smoke + ptl2 + ht + ui +
## ftv2, family = binomial, data = birthwt)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.132302 1.325225 0.100 0.92048
## age -0.041760 0.037844 -1.103 0.26981
## lwt -0.011848 0.006755 -1.754 0.07943 .
## race 0.404931 0.224863 1.801 0.07174 .
## smoke 0.816451 0.416669 1.959 0.05006 .
## ptl2 1.249407 0.465305 2.685 0.00725 **
## ht 1.795749 0.702141 2.558 0.01054 *
## ui 0.657830 0.468951 1.403 0.16069
## ftv2 -0.103561 0.373413 -0.277 0.78152
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 234.67 on 188 degrees of freedom
## Residual deviance: 199.45 on 180 degrees of freedom
## AIC: 217.45
##
## Number of Fisher Scoring iterations: 4
Interpretation: ptl2 (Prior Preterm Labor):
Estimate: 1.249407 Std. Error: 0.465305 z value: 2.685 p-value: 0.00725 (statistically significant at a conventional significance level of 0.05) Interpretation: Holding other variables constant, the log odds of having a low birthweight baby for individuals with a history of prior preterm labor (ptl2) is significantly higher than for those without prior preterm labor. ftv2 (Physician Visits):
Estimate: -0.103561 Std. Error: 0.373413 z value: -0.277 p-value: 0.78152 (not statistically significant at a conventional significance level of 0.05) Interpretation: Holding other variables constant, the log odds of having a low birthweight baby for individuals with at least one physician visit (ftv2) is not significantly different from those without physician visits. Comments: ptl2 (Prior Preterm Labor):
The positive coefficient for ptl2 suggests an increase in the log odds of low birthweight for individuals with a history of prior preterm labor. The statistically significant p-value (0.00725) indicates that the effect of ptl2 is significant in predicting low birthweight. ftv2 (Physician Visits):
The negative coefficient for ftv2 suggests a decrease in the log odds of low birthweight for individuals with at least one physician visit, although it is not statistically significant (p-value = 0.78152). Conclusion: The variables ptl2 and ftv2 have different impacts on the log odds of low birthweight. ptl2 is statistically significant and positively associated with an increased likelihood of low birthweight. ftv2 is not statistically significant, and its impact on the log odds of low birthweight is not supported by the data. These findings provide insights into the potential importance of prior preterm labor (ptl2) in predicting low birthweight, while the impact of physician visits (ftv2) is not statistically supported in this model.
# Load necessary libraries
library(MASS)
library(caret)
## Loading required package: lattice
# Set seed
set.seed(123)
# Step 1: Split data into training and test sets
splitIndex <- createDataPartition(birthwt$low, p = 0.8, list = FALSE)
train_data <- birthwt[splitIndex, ]
test_data <- birthwt[-splitIndex, ]
# Step 2: Fit Logistic Regression model using the training set
logistic_model_train <- glm(low ~ age + lwt + race + smoke + ptl2 + ht + ui + ftv2,
family = binomial, data = train_data)
# Step 3: Predictions for logistic regression on the test set
predicted_probs <- predict(logistic_model_train, newdata = test_data, type = "response")
# Convert predicted probabilities to class labels
predicted_labels <- ifelse(predicted_probs > 0.5, 1, 0)
# Confusion matrix for logistic regression
conf_matrix_logistic <- table(predicted_labels, test_data$low)
# Sensitivity, specificity, and accuracy for logistic regression
sensitivity_logistic <- conf_matrix_logistic[2, 2] / sum(test_data$low == 1)
specificity_logistic <- conf_matrix_logistic[1, 1] / sum(test_data$low == 0)
accuracy_logistic <- sum(diag(conf_matrix_logistic)) / sum(conf_matrix_logistic)
# Display results
performance <- data.frame(
Model = "Logistic Regression",
Sensitivity = sensitivity_logistic,
Specificity = specificity_logistic,
Accuracy = accuracy_logistic
)
performance
## Model Sensitivity Specificity Accuracy
## 1 Logistic Regression 0.1818182 0.8461538 0.6486486
model f
# Step 1: Split data into training and test sets
splitIndex <- createDataPartition(birthwt$low, p = 0.8, list = FALSE)
train_data <- birthwt[splitIndex, ]
test_data <- birthwt[-splitIndex, ]
# Step 2: Predictions for updated logistic regression on the test set
predicted_probs_updated <- predict(updated_logistic_model, newdata = test_data, type = "response")
# Convert predicted probabilities to class labels
predicted_labels_updated <- ifelse(predicted_probs_updated > 0.5, 1, 0)
# Confusion matrix for updated logistic regression
conf_matrix_updated <- table(predicted_labels_updated, test_data$low)
# Sensitivity, specificity, and accuracy for updated logistic regression
sensitivity_updated <- conf_matrix_updated[2, 2] / sum(test_data$low == 1)
specificity_updated <- conf_matrix_updated[1, 1] / sum(test_data$low == 0)
accuracy_updated <- sum(diag(conf_matrix_updated)) / sum(conf_matrix_updated)
# Display results for updated logistic regression
performance_updated <- data.frame(
Model = "Updated Logistic Regression",
Sensitivity = sensitivity_updated,
Specificity = specificity_updated,
Accuracy = accuracy_updated
)
performance_updated
## Model Sensitivity Specificity Accuracy
## 1 Updated Logistic Regression 0.3846154 0.8333333 0.6756757
model b
# Step 1: Split data into training and test sets
splitIndex <- createDataPartition(birthwt$low, p = 0.8, list = FALSE)
train_data <- birthwt[splitIndex, ]
test_data <- birthwt[-splitIndex, ]
# Step 2: Predictions for updated logistic regression on the test set
predicted_probs_updated <- predict(logistic_model, newdata = test_data, type = "response")
# Convert predicted probabilities to class labels
predicted_labels_updated <- ifelse(predicted_probs_updated > 0.5, 1, 0)
# Confusion matrix for updated logistic regression
conf_matrix_updated <- table(predicted_labels_updated, test_data$low)
# Sensitivity, specificity, and accuracy for updated logistic regression
sensitivity_updated <- conf_matrix_updated[2, 2] / sum(test_data$low == 1)
specificity_updated <- conf_matrix_updated[1, 1] / sum(test_data$low == 0)
accuracy_updated <- sum(diag(conf_matrix_updated)) / sum(conf_matrix_updated)
# Display results for updated logistic regression
performance_updated <- data.frame(
Model = "Updated Logistic Regression",
Sensitivity = sensitivity_updated,
Specificity = specificity_updated,
Accuracy = accuracy_updated
)
performance_updated
## Model Sensitivity Specificity Accuracy
## 1 Updated Logistic Regression 0.2222222 0.8928571 0.7297297
# Display the summary of the logistic regression model
summary(logistic_model_train)
##
## Call:
## glm(formula = low ~ age + lwt + race + smoke + ptl2 + ht + ui +
## ftv2, family = binomial, data = train_data)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.852906 1.604201 -0.532 0.5950
## age -0.010328 0.041957 -0.246 0.8056
## lwt -0.011783 0.008076 -1.459 0.1445
## race 0.564538 0.267157 2.113 0.0346 *
## smoke 0.965524 0.496599 1.944 0.0519 .
## ptl2 1.168134 0.504784 2.314 0.0207 *
## ht 1.822660 0.852966 2.137 0.0326 *
## ui 0.794484 0.508886 1.561 0.1185
## ftv2 -0.446293 0.427504 -1.044 0.2965
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 189.59 on 151 degrees of freedom
## Residual deviance: 156.77 on 143 degrees of freedom
## AIC: 174.77
##
## Number of Fisher Scoring iterations: 4
The logistic regression model for predicting low birthweight (variable ‘low’) based on the covariates age, lwt, race, smoke, ptl2, ht, ui, and ftv2 has been summarized as follows:
Model Coefficients:
Intercept: -0.853 (p-value = 0.5950) The estimated log-odds of the response variable being 1 when all predictors are zero is -0.853. However, the p-value suggests that this intercept is not statistically significant.
Age: -0.0103 (p-value = 0.8056) For a one-unit increase in age, the log-odds of the response variable being 1 decrease by 0.0103. The p-value indicates that age is not statistically significant.
Lwt: -0.0118 (p-value = 0.1445) For a one-unit increase in lwt, the log-odds of the response variable being 1 decrease by 0.0118. The p-value suggests that lwt is not statistically significant.
Race: 0.5645 (p-value = 0.0346) The coefficient for race represents the log-odds change in response for a one-unit change in the race variable. The positive value suggests an increase in the log-odds for a higher race value. The p-value indicates that race is statistically significant.
Smoke: 0.9655 (p-value = 0.0519) For a smoker compared to a non-smoker, the log-odds of the response variable being 1 increase by 0.9655. The p-value is close to the significance threshold (0.05).
Ptl2: 1.1681 (p-value = 0.0207) For a one-unit increase in ptl2, the log-odds of the response variable being 1 increase by 1.1681. The p-value suggests that ptl2 is statistically significant.
Ht: 1.8227 (p-value = 0.0326) For a one-unit increase in ht, the log-odds of the response variable being 1 increase by 1.8227. The p-value indicates that ht is statistically significant.
Ui: 0.7945 (p-value = 0.1185) For a one-unit increase in ui, the log-odds of the response variable being 1 increase by 0.7945. The p-value is greater than the significance threshold. Ftv2: -0.4463 (p-value = 0.2965)
For a one-unit increase in ftv2, the log-odds of the response variable being 1 decrease by 0.4463. The p-value suggests that ftv2 is not statistically significant.
Model Fit: Null Deviance: 189.59 (on 151 degrees of freedom) Residual Deviance: 156.77 (on 143 degrees of freedom) AIC: 174.77
Comments: The overall model fit is assessed using the Null Deviance and Residual Deviance, and AIC. Lower AIC values indicate better model fit. Some predictors, such as race, ptl2, and ht, appear to be statistically significant based on their p-values.