The data set shows information about the consumer price index, or CPI. This provides summary costs of a representative market basket of goods. The data can be used to determine salaries for employees or inflation changes. The index used in the given data compares each location to New York City. For example, if rent.index is 125, then the rent in that city is about 25% higher than in New York City. I provided an excerpt of the explanation from the website below:
“The Consumer Price Index (CPI) summarizes the cost of a representative market basket of goods that includes groceries, restaurants, transportation, utilities, and medical care. Global companies often use the CPI to determine living allowances and salaries for employees. Inflation is often measured by how much the CPI changes from year to year. Relative CPIs can be found for different cities. We have data giving CPI components relative to New York City. For New York City, each index is 100(%).”
The variables in the data set and their descriptions are below:
City: Location by city. U.S. states are included for all U.S. cities.
Consumer.Price.Index: The relative prices of consumer goods like groceries, restaurants, transportation, and utilities. It excludes rent and mortgage.
Rent.Index: Estimates the prices of renting apartments in a city.
Consumer.Price.Plus.Rent.Index: Estimates the consumer goods prices, including rent.
Groceries.Index: Estimation of grocery prices in a city.
Restaurant.Price.Index: The relative prices of meals and drinks in restaurants and bars.
Local.Purchasing.Power.Index: The relative purchasing power in a given city based on the average net salary. A lower purchasing power index means that residents with an average salary can afford less goods and services compared to residents of NYC with an average salary.
Source: https://dasl.datadescription.com/datafiles/?_sf_s=CPI_Worldwide&_sfm_cases=4+59943
In this assignment, I will create two multiple linear regression models. The first model will use the variables that provide the best analysis. The other model will use one quadratic term, one dichotomous term, and one dichotomous vs. quantitative interaction term. After creating these models, I will interpret the coefficients and conduct residual analysis. Lastly, I will determine if the models are appropriate to use for the data.
cpi_data <- read.csv("https://raw.githubusercontent.com/juliaDataScience-22/cuny-spring-24/main/hw12file.csv")
The data is split into city and country. The United States data is copied into its own data frame with state and city separated.
library(stringr)
# To split into City and Country by the last comma
# Some rows have two commas because of states, and some only have one comma
cpi_data[c('City', 'Country')] <- str_split_fixed(cpi_data$City, ', (?=[^,]+$)', 2)
# Create a data frame of just US data
us_data <- cpi_data[cpi_data$Country == "United States", ]
# Separate to City and State
# Note: Washington, DC is split into city and district
# DC is put into the state category
us_data[c('City', 'State')] <- str_split_fixed(us_data$City, ', ', 2)
# Country is identical so unecessary, the column is removed
us_data <- us_data[, -8]
fifty_states_and_dc <- c("AL", "AK", "AZ", "AR", "CA", "CO", "CT", "DE", "DC", "FL", "GA", "HI", "ID", "IL", "IN", "IA", "KS", "KY", "LA", "ME", "MD", "MA", "MI", "MN", "MS", "MO", "MT", "NE", "NV", "NH", "NJ", "NM", "NY", "NC", "ND", "OH", "OK", "OR", "PA", "RI", "SC", "SD", "TN", "TX", "UT", "VT", "VA", "WA", "WV", "WI", "WY")
First, the correlation plot shows how different numerical variables are correlated. This is to help determine how to avoid collinearity problems. Local.Purchasing.Power.Index appears to be the least correlated to other variables.
I will create a linear model using the other numerical variables to predict Local Purchasing Power. To determine the best model to use, I will compare all possible options. I can do this by using stepwise selection. The library MASS contains a function StepAIC that can do this.
Linear model:
Local.Purchasing.Power.Index = 44.8935 - 1.3413(Consumer.Price.Index) + 1.3379(Groceries.Index) + 1.1470(Restaurant.Price.Index)
library(corrplot)
## Warning: package 'corrplot' was built under R version 4.3.3
## corrplot 0.92 loaded
library(MASS)
## Warning: package 'MASS' was built under R version 4.3.3
#Correlation plot
corrplot(cor(cpi_data[c(2,3,4,5,6,7)]), method = "color")
# This only includes the numerical variables
full_model <- lm(Local.Purchasing.Power.Index ~ ., data = cpi_data[c(2,3,4,5,6,7)])
step_model <- stepAIC(full_model, direction = "both")
## Start: AIC=3441.9
## Local.Purchasing.Power.Index ~ Consumer.Price.Index + Rent.Index +
## Consumer.Price.Plus.Rent.Index + Groceries.Index + Restaurant.Price.Index
##
## Df Sum of Sq RSS AIC
## - Rent.Index 1 6.8 493759 3439.9
## - Consumer.Price.Plus.Rent.Index 1 6.8 493759 3439.9
## - Consumer.Price.Index 1 7.9 493760 3439.9
## <none> 493752 3441.9
## - Groceries.Index 1 23352.4 517104 3462.9
## - Restaurant.Price.Index 1 29373.3 523125 3468.6
##
## Step: AIC=3439.9
## Local.Purchasing.Power.Index ~ Consumer.Price.Index + Consumer.Price.Plus.Rent.Index +
## Groceries.Index + Restaurant.Price.Index
##
## Df Sum of Sq RSS AIC
## - Consumer.Price.Plus.Rent.Index 1 223.4 493982 3438.1
## <none> 493759 3439.9
## + Rent.Index 1 6.8 493752 3441.9
## - Consumer.Price.Index 1 8176.6 501935 3446.1
## - Groceries.Index 1 23392.6 517151 3460.9
## - Restaurant.Price.Index 1 29483.6 523242 3466.7
##
## Step: AIC=3438.13
## Local.Purchasing.Power.Index ~ Consumer.Price.Index + Groceries.Index +
## Restaurant.Price.Index
##
## Df Sum of Sq RSS AIC
## <none> 493982 3438.1
## + Consumer.Price.Plus.Rent.Index 1 223.4 493759 3439.9
## + Rent.Index 1 223.4 493759 3439.9
## - Consumer.Price.Index 1 8393.4 502376 3444.5
## - Groceries.Index 1 24524.4 518506 3460.2
## - Restaurant.Price.Index 1 29664.9 523647 3465.1
summary(step_model)
##
## Call:
## lm(formula = Local.Purchasing.Power.Index ~ Consumer.Price.Index +
## Groceries.Index + Restaurant.Price.Index, data = cpi_data[c(2,
## 3, 4, 5, 6, 7)])
##
## Residuals:
## Min 1Q Median 3Q Max
## -90.613 -21.007 -2.974 17.235 128.952
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 44.8935 5.5014 8.160 2.80e-15 ***
## Consumer.Price.Index -1.3413 0.4634 -2.894 0.00397 **
## Groceries.Index 1.3379 0.2704 4.947 1.03e-06 ***
## Restaurant.Price.Index 1.1470 0.2108 5.441 8.35e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 31.65 on 493 degrees of freedom
## Multiple R-squared: 0.4627, Adjusted R-squared: 0.4594
## F-statistic: 141.5 on 3 and 493 DF, p-value: < 2.2e-16
I will produce a similar linear model step by step to demonstrate another method. Variables with the lowest p-values will be removed from the model. After removing each variable, I will determine if any other variables should be removed.
full_model <- lm(Local.Purchasing.Power.Index ~ ., data = cpi_data[c(2,3,4,5,6,7)])
summary(full_model)
##
## Call:
## lm(formula = Local.Purchasing.Power.Index ~ ., data = cpi_data[c(2,
## 3, 4, 5, 6, 7)])
##
## Residuals:
## Min 1Q Median 3Q Max
## -89.44 -20.96 -2.30 17.22 129.00
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 45.5438 5.6958 7.996 9.28e-15 ***
## Consumer.Price.Index -19.2696 217.2637 -0.089 0.929
## Rent.Index -16.5071 201.0479 -0.082 0.935
## Consumer.Price.Plus.Rent.Index 34.4839 418.3289 0.082 0.934
## Groceries.Index 1.3192 0.2737 4.819 1.93e-06 ***
## Restaurant.Price.Index 1.1431 0.2115 5.405 1.02e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 31.71 on 491 degrees of freedom
## Multiple R-squared: 0.4629, Adjusted R-squared: 0.4575
## F-statistic: 84.64 on 5 and 491 DF, p-value: < 2.2e-16
full_model_minus_consumer <- lm(Local.Purchasing.Power.Index ~ Rent.Index +
Consumer.Price.Plus.Rent.Index + Groceries.Index + Restaurant.Price.Index,
data = cpi_data)
summary(full_model_minus_consumer)
##
## Call:
## lm(formula = Local.Purchasing.Power.Index ~ Rent.Index + Consumer.Price.Plus.Rent.Index +
## Groceries.Index + Restaurant.Price.Index, data = cpi_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -89.592 -20.949 -2.349 17.215 128.835
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 45.5632 5.6858 8.013 8.16e-15 ***
## Rent.Index 1.3242 0.4640 2.854 0.00450 **
## Consumer.Price.Plus.Rent.Index -2.6185 0.8963 -2.922 0.00364 **
## Groceries.Index 1.3198 0.2734 4.828 1.85e-06 ***
## Restaurant.Price.Index 1.1440 0.2111 5.420 9.36e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 31.68 on 492 degrees of freedom
## Multiple R-squared: 0.4629, Adjusted R-squared: 0.4585
## F-statistic: 106 on 4 and 492 DF, p-value: < 2.2e-16
second_model_minus_consumer_plus_rent <- lm(Local.Purchasing.Power.Index ~ Rent.Index +
Groceries.Index + Restaurant.Price.Index,
data = cpi_data)
summary(second_model_minus_consumer_plus_rent)
##
## Call:
## lm(formula = Local.Purchasing.Power.Index ~ Rent.Index + Groceries.Index +
## Restaurant.Price.Index, data = cpi_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -100.44 -21.66 -2.55 19.44 131.83
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 33.51749 3.94499 8.496 2.33e-16 ***
## Rent.Index 0.03101 0.14005 0.221 0.825
## Groceries.Index 0.61555 0.12993 4.738 2.83e-06 ***
## Restaurant.Price.Index 0.60932 0.10596 5.751 1.56e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 31.92 on 493 degrees of freedom
## Multiple R-squared: 0.4536, Adjusted R-squared: 0.4503
## F-statistic: 136.4 on 3 and 493 DF, p-value: < 2.2e-16
second_model_minus_rent <- lm(Local.Purchasing.Power.Index ~ Consumer.Price.Plus.Rent.Index +
Groceries.Index + Restaurant.Price.Index,
data = cpi_data)
summary(second_model_minus_rent)
##
## Call:
## lm(formula = Local.Purchasing.Power.Index ~ Consumer.Price.Plus.Rent.Index +
## Groceries.Index + Restaurant.Price.Index, data = cpi_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -101.312 -21.350 -2.031 19.610 131.745
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 33.4207 3.7995 8.796 < 2e-16 ***
## Consumer.Price.Plus.Rent.Index -0.1778 0.2704 -0.658 0.511
## Groceries.Index 0.7140 0.1735 4.114 4.55e-05 ***
## Restaurant.Price.Index 0.6645 0.1287 5.163 3.53e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 31.91 on 493 degrees of freedom
## Multiple R-squared: 0.454, Adjusted R-squared: 0.4507
## F-statistic: 136.7 on 3 and 493 DF, p-value: < 2.2e-16
only_groceries_and_restaurant <- lm(Local.Purchasing.Power.Index ~ Groceries.Index + Restaurant.Price.Index,
data = cpi_data)
summary(only_groceries_and_restaurant)
##
## Call:
## lm(formula = Local.Purchasing.Power.Index ~ Groceries.Index +
## Restaurant.Price.Index, data = cpi_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -100.862 -21.694 -2.695 19.232 131.862
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 33.2788 3.7912 8.778 < 2e-16 ***
## Groceries.Index 0.6288 0.1153 5.453 7.83e-08 ***
## Restaurant.Price.Index 0.6142 0.1035 5.935 5.55e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 31.89 on 494 degrees of freedom
## Multiple R-squared: 0.4535, Adjusted R-squared: 0.4513
## F-statistic: 205 on 2 and 494 DF, p-value: < 2.2e-16
# That did not seem to improve the model, so we will go back to the beginning and remove the second lowest p-value
full_minus_consumer_plus_rent <- lm(Local.Purchasing.Power.Index ~ Consumer.Price.Index + Rent.Index +
Groceries.Index + Restaurant.Price.Index,
data = cpi_data)
summary(full_minus_consumer_plus_rent)
##
## Call:
## lm(formula = Local.Purchasing.Power.Index ~ Consumer.Price.Index +
## Rent.Index + Groceries.Index + Restaurant.Price.Index, data = cpi_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -89.580 -20.945 -2.346 17.215 128.847
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 45.56261 5.68549 8.014 8.14e-15 ***
## Consumer.Price.Index -1.36001 0.46549 -2.922 0.00364 **
## Rent.Index 0.06581 0.13950 0.472 0.63728
## Groceries.Index 1.31980 0.27336 4.828 1.84e-06 ***
## Restaurant.Price.Index 1.14398 0.21106 5.420 9.34e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 31.68 on 492 degrees of freedom
## Multiple R-squared: 0.4629, Adjusted R-squared: 0.4585
## F-statistic: 106 on 4 and 492 DF, p-value: < 2.2e-16
minus_rent <- lm(Local.Purchasing.Power.Index ~ Consumer.Price.Index +
Groceries.Index + Restaurant.Price.Index,
data = cpi_data)
summary(minus_rent)
##
## Call:
## lm(formula = Local.Purchasing.Power.Index ~ Consumer.Price.Index +
## Groceries.Index + Restaurant.Price.Index, data = cpi_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -90.613 -21.007 -2.974 17.235 128.952
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 44.8935 5.5014 8.160 2.80e-15 ***
## Consumer.Price.Index -1.3413 0.4634 -2.894 0.00397 **
## Groceries.Index 1.3379 0.2704 4.947 1.03e-06 ***
## Restaurant.Price.Index 1.1470 0.2108 5.441 8.35e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 31.65 on 493 degrees of freedom
## Multiple R-squared: 0.4627, Adjusted R-squared: 0.4594
## F-statistic: 141.5 on 3 and 493 DF, p-value: < 2.2e-16
I will now compare the Adjusted R squared values of all the models to determine the best one from my own step by step analysis. From the analysis, we can see that all the models are very similar with moderate strength. The best is the last model with only Consumer.Price.Index, Groceries.Index, and Restaurant.Price.Index as variables. This last model matches the model that was created in the first example using the function StepAIC.
all_values <- c(summary(full_model)$adj.r.squared, summary(full_model_minus_consumer)$adj.r.squared,
summary(second_model_minus_consumer_plus_rent)$adj.r.squared, summary(second_model_minus_rent)$adj.r.squared,
summary(only_groceries_and_restaurant)$adj.r.squared, summary(full_minus_consumer_plus_rent)$adj.r.squared,
summary(minus_rent)$adj.r.squared)
compare <- data.frame(modelName = c("full_model", "full_model_minus_consumer", "second_model_minus_consumer_plus_rent",
"second_model_minus_rent", "only_groceries_and_restaurant", "full_minus_consumer_plus_rent",
"minus_rent"),
adjustedRSquared = all_values)
compare
## modelName adjustedRSquared
## 1 full_model 0.4574535
## 2 full_model_minus_consumer 0.4585475
## 3 second_model_minus_consumer_plus_rent 0.4502717
## 4 second_model_minus_rent 0.4506989
## 5 only_groceries_and_restaurant 0.4513300
## 6 full_minus_consumer_plus_rent 0.4585487
## 7 minus_rent 0.4594025
The other model will use one quadratic term, one dichotomous term, and one dichotomous vs. quantitative interaction term. The quadratic term is Rent_Index raised to the power of 0.01. I plotted it below to show the correlation. It was about 0.6 when a quadratic term instead of about 0.5. The dichotomous term is Country. I split it into 0 and 1 where 0 indicates the United States and 1 indicates all other countries. The dichotomous vs. quantitative interaction term uses both of those terms. I included one more variable, Restaurant.Price.Index, because it showed the best adjusted R squared value when I tested different models. I did not include the other models to make the final model stand out, but I performed step by step analysis like the example from the first part of this assignment to come to the model shown below.
The final model:
Local.Purchasing.Power.Index = 2606 - 2426(Rent.Index^0.01) - 3560(us_indicator) + 3407(Rent.Index^0.01:us_indicator) + 0.7545(Restaurant.Price.Index)
# Original
plot(cpi_data$Rent.Index, cpi_data$Local.Purchasing.Power.Index)
new <- lm(Local.Purchasing.Power.Index ~ Rent.Index, cpi_data)
abline(new)
plot(cpi_data$Rent.Index^0.01, cpi_data$Local.Purchasing.Power.Index)
cpi_data$Quadratic <- cpi_data$Rent.Index^0.01
new <- lm(Local.Purchasing.Power.Index ~ Quadratic, cpi_data)
abline(new)
cpi_data$us_indicator <- ifelse(cpi_data$Country == "United States", 0, 1)
partTwoLM <- lm(Local.Purchasing.Power.Index ~ Quadratic + us_indicator + Quadratic:us_indicator +
Restaurant.Price.Index, data = cpi_data)
summary(partTwoLM)
##
## Call:
## lm(formula = Local.Purchasing.Power.Index ~ Quadratic + us_indicator +
## Quadratic:us_indicator + Restaurant.Price.Index, data = cpi_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -88.097 -20.189 -2.781 17.388 131.348
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.606e+03 8.671e+02 3.005 0.00279 **
## Quadratic -2.426e+03 8.373e+02 -2.898 0.00393 **
## us_indicator -3.560e+03 8.881e+02 -4.009 7.04e-05 ***
## Restaurant.Price.Index 7.545e-01 9.187e-02 8.213 1.92e-15 ***
## Quadratic:us_indicator 3.407e+03 8.569e+02 3.977 8.04e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 30.14 on 492 degrees of freedom
## Multiple R-squared: 0.5138, Adjusted R-squared: 0.5098
## F-statistic: 130 on 4 and 492 DF, p-value: < 2.2e-16
library(car)
## Warning: package 'car' was built under R version 4.3.3
## Loading required package: carData
## Warning: package 'carData' was built under R version 4.3.3
library(cowplot)
## Warning: package 'cowplot' was built under R version 4.3.3
# First model
par(mfrow=c(2, 2))
plot(cpi_data$Consumer.Price.Index, cpi_data$Local.Purchasing.Power.Index,
xlab = "Consumer Price Index", ylab = "Local Purchasing Power Index",
main = "Scatter Plot of Local Purchasing Power Index vs Consumer Price Index")
plot(cpi_data$Groceries.Index, cpi_data$Local.Purchasing.Power.Index,
xlab = "Groceries Index", ylab = "Local Purchasing Pp]ower Index",
main = "Scatter Plot of Local Purchasing Power Index vs Groceries Index")
plot(cpi_data$Restaurant.Price.Index, cpi_data$Local.Purchasing.Power.Index,
xlab = "Restaurant Price Index", ylab = "Local Purchasing Power Index",
main = "Scatter Plot of Local Purchasing Power Index vs Restaurant Price Index")
crPlots(step_model)
# Second model
par(mfrow=c(2, 2))
plot(step_model)
library(ggplot2)
par(mfrow=c(2, 2))
plot(cpi_data$Quadratic, cpi_data$Local.Purchasing.Power.Index,
xlab = "Rent Index ^ 0.01", ylab = "Local Purchasing Power Index",
main = "Scatter Plot of Local Purchasing Power Index vs Rent Index ^ 0.01")
plot(cpi_data$us_indicator, cpi_data$Local.Purchasing.Power.Index,
xlab = "U.S. Indicator", ylab = "Local Purchasing Pp]ower Index",
main = "Scatter Plot of Local Purchasing Power Index vs U.S. Indicator")
plot(cpi_data$Restaurant.Price.Index, cpi_data$Local.Purchasing.Power.Index,
xlab = "Restaurant Price Index", ylab = "Local Purchasing Power Index",
main = "Scatter Plot of Local Purchasing Power Index vs Restaurant Price Index")
plot(cpi_data$Quadratic * cpi_data$us_indicator, cpi_data$Local.Purchasing.Power.Index,
xlab = "Rent Index ^ 0.01 : U.S. Indicator", ylab = "Local Purchasing Power Index",
main = "Scatter Plot of Local Purchasing Power Index vs. Rent Index ^ 0.01 : U.S. Indicator")
cpi_data$residuals <- resid(partTwoLM)
plot1 <- ggplot(cpi_data, aes(x = Quadratic, y = residuals)) +
geom_point() +
labs(x = "Rent Index ^ 0.01", y = "Residuals") +
theme_minimal()
plot2 <- ggplot(cpi_data, aes(x = us_indicator, y = residuals)) +
geom_point() +
labs(x = "U.S. Indicator", y = "Residuals") +
theme_minimal()
plot3 <- ggplot(cpi_data, aes(x = Restaurant.Price.Index, y = residuals)) +
geom_point() +
labs(x = "Restaurant Price Index", y = "Residuals") +
theme_minimal()
plot4 <- ggplot(cpi_data, aes(x = Quadratic, y = resid(partTwoLM), color = us_indicator)) +
geom_point() +
labs(x = "Rent Index ^ 0.01 : U.S. Indicator", y = "Residuals") +
theme_minimal()
plot_grid(plot1, plot2, plot3, plot4, ncol = 2)
par(mfrow=c(2, 2))
plot(partTwoLM)
## Statistical Analysis for Both Models
Local.Purchasing.Power.Index = 44.8935 - 1.3413(Consumer.Price.Index) + 1.3379(Groceries.Index) + 1.1470(Restaurant.Price.Index)
In the results of this model, the coefficients show how the corresponding indices can predict the local purchasing power index. For example, in locations with a consumer price index higher by 1, their local purchasing power index will decrease by 1.3413. Overall, the consumer price index seemed to decrease the local purchasing power index, while the other two variables seemed to increase the local purchasing power index. The p-values for all the coefficients were significant, so we can assume that the individual relationships were strong.
Although the coefficients have significant p-values, the adjusted R squared value shows a moderate relationship. The value is 0.4594. A strong relationship would be above 0.7, and a weak relationship would be below 0.3. Since this falls in between, the linear model provides a moderate explanation of the data.
The residuals in the graphs above are displayed in two different ways. The first residual plots show the residuals for each dependent variable. These residual plots do not show consistent variability. The final array of plots for this first model show more residual plots and a Q-Q plot. The residual plots appear to have more consistent variability, but it still varies enough that it does not meet the assumptions of a linear model. The Q-Q Plot shows that the data is mostly normally distributed, so that assumption is met.
Local.Purchasing.Power.Index = 2606 - 2426(Rent.Index^0.01) - 3560(us_indicator) + 3407(Rent.Index^0.01:us_indicator) + 0.7545(Restaurant.Price.Index)
In the results of this model, the coefficients show how the corresponding variables predict the local purchasing power index. As the rent index increased, the local purchasing power index decreased. Countries that were not in the U.S. had a lower local purchasing power than the U.S. Based on the coefficient for the interaction term, when both rent and U.S. were taken into account, places with a higher rent index that were not in the U.S. had a higher purchasing power index than U.S. states with a lower rent index. The restaurant price index slightly increased the local purchasing power index. The p-values for all the coefficients were significant, so we can assume that the individual relationships were strong.
Although the coefficients have significant p-values, the adjusted R squared value shows a moderate relationship. The value is 0.5098. A strong relationship would be above 0.7, and a weak relationship would be below 0.3. Since this falls in between, the linear model provides a moderate explanation of the data. However, this shows that the model is slightly better than the other model since it has a slightly stronger adjusted R squared value.
The residuals in the graphs above are displayed in two different ways. The first residual plots show the residuals for each dependent variable. Rent Index ^ 0.01 appears to have pretty consistent variability. The other graphs are not as consistent. The residuals for restaurant price index appears to be the best of those other three graphs. These residual plots do not show consistent variability overall, so this assumption of the linear model is not met. The final array of plots for this first model show more residual plots and a Q-Q plot. The residual plots appear to have more consistent variability than the individaul graphs, but I would still say that the residual plots does appear to meet the assumptions of a linear model. The Q-Q Plot shows that the data is mostly normally distributed, so that assumption is met. It does have a tail, but it appears to be caused by some outliers and not significant.
Overall, these models do not seem to be appropriate for the data. Neither of the models showed strong relationships when applied to the data. Also, the assumptions of linear regression were not met for both models. The second model that contained one quadratic term, one dichotomous term, and one dichotomous vs. quantitative interaction term had a slightly stronger relationship, but it was not enough to be considered strong.
The U.S. data set can be used in the future for analysis. Different states can be compared to see if regional differences have any affect on the results. I included the us_data data frame in the beginning of this assignment.