The data file Weeklylab7data.xlsx contains mock data on relationship satisfaction (measured on a scale from 0 to 100) for 200 individuals. Other variables include Sex (0 male, 1 female), age (continuous), shared house work (0 no, 1 yes), nights spent together on average per week (0 to 7), and financial security measured on a scale from 0 (heavily in debt) to 10 (very secure). Your goal is to build a model that predicts satisfaction using these variables. Summarize your findings and include and a graph.
library(readxl)
WeeklyLab7=read_excel("C:/Users/jcolu/OneDrive/Documents/Harrisburg/Summer 2018/ANLY 510/WeeklyLab7Data.xlsx")
WeeklyLab7
## # A tibble: 199 x 6
## RelationshipSatisfaction Sex Age ShareInHouseWork NightsTogether
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 17.0 1.00 19.0 0 4.00
## 2 63.0 1.00 48.0 1.00 2.00
## 3 48.0 1.00 49.0 0 2.00
## 4 45.0 1.00 46.0 1.00 4.00
## 5 53.0 1.00 18.0 1.00 5.00
## 6 41.0 0 41.0 0 2.00
## 7 41.0 0 23.0 1.00 4.00
## 8 6.00 0 22.0 0 2.00
## 9 10.0 0 25.0 0 5.00
## 10 71.0 1.00 30.0 1.00 4.00
## # ... with 189 more rows, and 1 more variable: FinancialSecurity <dbl>
Initial Data Anlysis - Determine if factorizing is needed
str(WeeklyLab7)
## Classes 'tbl_df', 'tbl' and 'data.frame': 199 obs. of 6 variables:
## $ RelationshipSatisfaction: num 17 63 48 45 53 41 41 6 10 71 ...
## $ Sex : num 1 1 1 1 1 0 0 0 0 1 ...
## $ Age : num 19 48 49 46 18 41 23 22 25 30 ...
## $ ShareInHouseWork : num 0 1 0 1 1 0 1 0 0 1 ...
## $ NightsTogether : num 4 2 2 4 5 2 4 2 5 4 ...
## $ FinancialSecurity : num 1 1 3 10 5 9 3 0 4 10 ...
Primary Data Analysis - using density ploy - see how data is evenly distributed & skewness.
plot(density(WeeklyLab7$RelationshipSatisfaction))
Data set looks evenly distributed; it doesn’t look to be skewed.
Secondary data analysis - Skewness
library(moments)
agostino.test(WeeklyLab7$RelationshipSatisfaction)
##
## D'Agostino skewness test
##
## data: WeeklyLab7$RelationshipSatisfaction
## skew = -0.14113, z = -0.83724, p-value = 0.4025
## alternative hypothesis: data have a skewness
D’Agostino test concludes that dataset is skewed.
Third data analysis - Visual representation
boxplot(WeeklyLab7$RelationshipSatisfaction~WeeklyLab7$Sex,main="Relationship Satisfaction per Gender", ylab="Relationship Satisfaction (0-100)", xlab="Gender", col=c("blue","pink"))
Data concludes that women are more satisfied in their relationships than men.
plot(WeeklyLab7$RelationshipSatisfaction,WeeklyLab7$Age, main ="Realtionship Satisfaction & Age", ylab="Age", xlab="Relationship Satisfaction")
abline(lm(WeeklyLab7$Age~WeeklyLab7$RelationshipSatisfaction))
There appears to be a slight tendency that older people tend to be more satisfied.
Tertiary Data Anlysis
Model=lm(RelationshipSatisfaction~Sex+Age+ShareInHouseWork+NightsTogether+FinancialSecurity, data = WeeklyLab7)
summary(Model)
##
## Call:
## lm(formula = RelationshipSatisfaction ~ Sex + Age + ShareInHouseWork +
## NightsTogether + FinancialSecurity, data = WeeklyLab7)
##
## Residuals:
## Min 1Q Median 3Q Max
## -25.5757 -6.8926 0.6291 7.2475 23.0836
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.7333 3.1873 -0.858 0.392
## Sex 12.7091 1.4750 8.616 2.46e-15 ***
## Age 0.2761 0.0683 4.043 7.63e-05 ***
## ShareInHouseWork 23.6837 1.4814 15.987 < 2e-16 ***
## NightsTogether 1.6254 0.3204 5.073 9.15e-07 ***
## FinancialSecurity 1.7714 0.2199 8.056 7.99e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 10.33 on 193 degrees of freedom
## Multiple R-squared: 0.6954, Adjusted R-squared: 0.6875
## F-statistic: 88.1 on 5 and 193 DF, p-value: < 2.2e-16
Adjusted R-Squared value indicates that the data set has 68% varation in this model.
qqnorm(Model$residuals)
Residuals of the qq plot determine a normal distribution.