The data file Weeklylab7data.xlsx contains mock data on relationship satisfaction (measured on a scale from 0 to 100) for 200 individuals. Other variables include Sex (0 male, 1 female), age (continuous), shared house work (0 no, 1 yes), nights spent together on average per week (0 to 7), and financial security measured on a scale from 0 (heavily in debt) to 10 (very secure). Your goal is to build a model that predicts satisfaction using these variables. Summarize your findings and include and a graph.

library(readxl)
WeeklyLab7=read_excel("C:/Users/jcolu/OneDrive/Documents/Harrisburg/Summer 2018/ANLY 510/WeeklyLab7Data.xlsx")
WeeklyLab7
## # A tibble: 199 x 6
##    RelationshipSatisfaction   Sex   Age ShareInHouseWork NightsTogether
##                       <dbl> <dbl> <dbl>            <dbl>          <dbl>
##  1                    17.0   1.00  19.0             0              4.00
##  2                    63.0   1.00  48.0             1.00           2.00
##  3                    48.0   1.00  49.0             0              2.00
##  4                    45.0   1.00  46.0             1.00           4.00
##  5                    53.0   1.00  18.0             1.00           5.00
##  6                    41.0   0     41.0             0              2.00
##  7                    41.0   0     23.0             1.00           4.00
##  8                     6.00  0     22.0             0              2.00
##  9                    10.0   0     25.0             0              5.00
## 10                    71.0   1.00  30.0             1.00           4.00
## # ... with 189 more rows, and 1 more variable: FinancialSecurity <dbl>

Initial Data Anlysis - Determine if factorizing is needed

str(WeeklyLab7)
## Classes 'tbl_df', 'tbl' and 'data.frame':    199 obs. of  6 variables:
##  $ RelationshipSatisfaction: num  17 63 48 45 53 41 41 6 10 71 ...
##  $ Sex                     : num  1 1 1 1 1 0 0 0 0 1 ...
##  $ Age                     : num  19 48 49 46 18 41 23 22 25 30 ...
##  $ ShareInHouseWork        : num  0 1 0 1 1 0 1 0 0 1 ...
##  $ NightsTogether          : num  4 2 2 4 5 2 4 2 5 4 ...
##  $ FinancialSecurity       : num  1 1 3 10 5 9 3 0 4 10 ...

Primary Data Analysis - using density ploy - see how data is evenly distributed & skewness.

plot(density(WeeklyLab7$RelationshipSatisfaction))

Data set looks evenly distributed; it doesn’t look to be skewed.

Secondary data analysis - Skewness

library(moments)
agostino.test(WeeklyLab7$RelationshipSatisfaction)
## 
##  D'Agostino skewness test
## 
## data:  WeeklyLab7$RelationshipSatisfaction
## skew = -0.14113, z = -0.83724, p-value = 0.4025
## alternative hypothesis: data have a skewness

D’Agostino test concludes that dataset is skewed.

Third data analysis - Visual representation

boxplot(WeeklyLab7$RelationshipSatisfaction~WeeklyLab7$Sex,main="Relationship Satisfaction per Gender", ylab="Relationship Satisfaction (0-100)", xlab="Gender", col=c("blue","pink"))

Data concludes that women are more satisfied in their relationships than men.

plot(WeeklyLab7$RelationshipSatisfaction,WeeklyLab7$Age, main ="Realtionship Satisfaction & Age", ylab="Age", xlab="Relationship Satisfaction")
abline(lm(WeeklyLab7$Age~WeeklyLab7$RelationshipSatisfaction))

There appears to be a slight tendency that older people tend to be more satisfied.

Tertiary Data Anlysis

Model=lm(RelationshipSatisfaction~Sex+Age+ShareInHouseWork+NightsTogether+FinancialSecurity, data = WeeklyLab7)
summary(Model)
## 
## Call:
## lm(formula = RelationshipSatisfaction ~ Sex + Age + ShareInHouseWork + 
##     NightsTogether + FinancialSecurity, data = WeeklyLab7)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -25.5757  -6.8926   0.6291   7.2475  23.0836 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        -2.7333     3.1873  -0.858    0.392    
## Sex                12.7091     1.4750   8.616 2.46e-15 ***
## Age                 0.2761     0.0683   4.043 7.63e-05 ***
## ShareInHouseWork   23.6837     1.4814  15.987  < 2e-16 ***
## NightsTogether      1.6254     0.3204   5.073 9.15e-07 ***
## FinancialSecurity   1.7714     0.2199   8.056 7.99e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 10.33 on 193 degrees of freedom
## Multiple R-squared:  0.6954, Adjusted R-squared:  0.6875 
## F-statistic:  88.1 on 5 and 193 DF,  p-value: < 2.2e-16

Adjusted R-Squared value indicates that the data set has 68% varation in this model.

qqnorm(Model$residuals)

Residuals of the qq plot determine a normal distribution.