INFO 3010 - Assignment 7
by Lingzi Hong
Instructions
- This is an R Markdown format used for publishing markdown documents
to GitHub. When you click the Knit button, all R code
chunks are run and a markdown file (.md) suitable for publishing to
GitHub is generated.
- Please download the snowstorm.json from canvas. Fill in the code
chunks for following question and submit this R markdown file to the
assignment on Canvas. Make sure when you save that you have run all
cells, so the outputs displace between the cells.
- make sure to replace the directoryID in the filename with your
student ID.
Q1.
(5pts) Read the file “2015.csv” to a dataframe. Have a statistical
summary of the dataset.
df <- read.csv("/Users/ahzenthu/Downloads/2015.csv")
summary(df)
## Country Region Happiness_Rank Happiness_Score
## Length:158 Length:158 Min. : 1.00 Min. :2.839
## Class :character Class :character 1st Qu.: 40.25 1st Qu.:4.526
## Mode :character Mode :character Median : 79.50 Median :5.232
## Mean : 79.49 Mean :5.376
## 3rd Qu.:118.75 3rd Qu.:6.244
## Max. :158.00 Max. :7.587
## Lower_Confidence_Interval Upper_Confidence_Interval GDP_per_Capita
## Min. :0.01848 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.03727 1st Qu.:0.5458 1st Qu.:0.8568
## Median :0.04394 Median :0.9102 Median :1.0295
## Mean :0.04788 Mean :0.8461 Mean :0.9910
## 3rd Qu.:0.05230 3rd Qu.:1.1584 3rd Qu.:1.2144
## Max. :0.13693 Max. :1.6904 Max. :1.4022
## Family Life_Expectancy Freedom Government_Corruption
## Min. :0.0000 Min. :0.0000 Min. :0.00000 Min. :0.0000
## 1st Qu.:0.4392 1st Qu.:0.3283 1st Qu.:0.06168 1st Qu.:0.1506
## Median :0.6967 Median :0.4355 Median :0.10722 Median :0.2161
## Mean :0.6303 Mean :0.4286 Mean :0.14342 Mean :0.2373
## 3rd Qu.:0.8110 3rd Qu.:0.5491 3rd Qu.:0.18025 3rd Qu.:0.3099
## Max. :1.0252 Max. :0.6697 Max. :0.55191 Max. :0.7959
## Generosity
## Min. :0.3286
## 1st Qu.:1.7594
## Median :2.0954
## Mean :2.0990
## 3rd Qu.:2.4624
## Max. :3.6021
Q2.
(5pts) Draw a plot with box plot of Happiness_Score for countries in
each Region group. Write 2-3 sentences to describe your findings.
boxplot(Happiness_Score~Region, data = df)

#we see that australia has some strange data,
# there isnt really a clear median, there is a clear median max but overall, strange results
Q3.
(10pts) Draw a scatter plot matrix for variables: Happiness_Score,
Family, Freedom, Government_Corruption. What is the relation between
Happiness_Score and Government_Corruption? What is the relation between
Happiness_Score and Freedom?
pairs(~Happiness_Score+Family+Freedom+Government_Corruption, data = df, main = "big scatterplot", cex = .25)

#relation between happiness score and government corruption nearing no correlation, with most of the point on the left
#relation of happiness score and freedom are clustering towards the bottom, but with few in the top left, could be a small positive correlation
Q4.
(10pts) Build a linear regression model to predict Life_Expectancy with
Hapiness_Score, GDP_per_Capita, Family and Government_Corruption.
model <- lm(Life_Expectancy~Happiness_Score+GDP_per_Capita+Family+Government_Corruption, data = df)
summary(model)
##
## Call:
## lm(formula = Life_Expectancy ~ Happiness_Score + GDP_per_Capita +
## Family + Government_Corruption, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.35494 -0.08702 0.00841 0.07806 0.30047
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.03073 0.04757 -0.646 0.519
## Happiness_Score 0.06982 0.01520 4.594 9.01e-06 ***
## GDP_per_Capita 0.04067 0.05145 0.790 0.430
## Family -0.05690 0.05516 -1.032 0.304
## Government_Corruption 0.33536 0.07560 4.436 1.74e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1177 on 153 degrees of freedom
## Multiple R-squared: 0.4057, Adjusted R-squared: 0.3901
## F-statistic: 26.11 on 4 and 153 DF, p-value: < 2.2e-16
Q5.
(10pts) Check model details. Answer the following questions out of code
box: What is the adjusted R-squared value? Is the linear relation
significant?
# the adjusted r^2 value is 0.3901
# the linear relation for happiness score and gov corruption are significant bc of the *** next to them
Q6. (10pts)
Draw diagnostic plots for the model in Q6.



