INFO 3010 - Assignment 7

by Lingzi Hong

Instructions

  1. This is an R Markdown format used for publishing markdown documents to GitHub. When you click the Knit button, all R code chunks are run and a markdown file (.md) suitable for publishing to GitHub is generated.
  2. Please download the snowstorm.json from canvas. Fill in the code chunks for following question and submit this R markdown file to the assignment on Canvas. Make sure when you save that you have run all cells, so the outputs displace between the cells.
  3. make sure to replace the directoryID in the filename with your student ID.

Q1. (5pts) Read the file “2015.csv” to a dataframe. Have a statistical summary of the dataset.

df <- read.csv("/Users/ahzenthu/Downloads/2015.csv")
summary(df)
##    Country             Region          Happiness_Rank   Happiness_Score
##  Length:158         Length:158         Min.   :  1.00   Min.   :2.839  
##  Class :character   Class :character   1st Qu.: 40.25   1st Qu.:4.526  
##  Mode  :character   Mode  :character   Median : 79.50   Median :5.232  
##                                        Mean   : 79.49   Mean   :5.376  
##                                        3rd Qu.:118.75   3rd Qu.:6.244  
##                                        Max.   :158.00   Max.   :7.587  
##  Lower_Confidence_Interval Upper_Confidence_Interval GDP_per_Capita  
##  Min.   :0.01848           Min.   :0.0000            Min.   :0.0000  
##  1st Qu.:0.03727           1st Qu.:0.5458            1st Qu.:0.8568  
##  Median :0.04394           Median :0.9102            Median :1.0295  
##  Mean   :0.04788           Mean   :0.8461            Mean   :0.9910  
##  3rd Qu.:0.05230           3rd Qu.:1.1584            3rd Qu.:1.2144  
##  Max.   :0.13693           Max.   :1.6904            Max.   :1.4022  
##      Family       Life_Expectancy     Freedom        Government_Corruption
##  Min.   :0.0000   Min.   :0.0000   Min.   :0.00000   Min.   :0.0000       
##  1st Qu.:0.4392   1st Qu.:0.3283   1st Qu.:0.06168   1st Qu.:0.1506       
##  Median :0.6967   Median :0.4355   Median :0.10722   Median :0.2161       
##  Mean   :0.6303   Mean   :0.4286   Mean   :0.14342   Mean   :0.2373       
##  3rd Qu.:0.8110   3rd Qu.:0.5491   3rd Qu.:0.18025   3rd Qu.:0.3099       
##  Max.   :1.0252   Max.   :0.6697   Max.   :0.55191   Max.   :0.7959       
##    Generosity    
##  Min.   :0.3286  
##  1st Qu.:1.7594  
##  Median :2.0954  
##  Mean   :2.0990  
##  3rd Qu.:2.4624  
##  Max.   :3.6021

Q2. (5pts) Draw a plot with box plot of Happiness_Score for countries in each Region group. Write 2-3 sentences to describe your findings.

boxplot(Happiness_Score~Region, data = df)

#we see that australia has some strange data,
# there isnt really a clear median, there is a clear median max but overall, strange results

Q3. (10pts) Draw a scatter plot matrix for variables: Happiness_Score, Family, Freedom, Government_Corruption. What is the relation between Happiness_Score and Government_Corruption? What is the relation between Happiness_Score and Freedom?

pairs(~Happiness_Score+Family+Freedom+Government_Corruption, data = df, main = "big scatterplot", cex = .25)

#relation between happiness score and government corruption nearing no correlation, with most of the point on the left
#relation of happiness score and freedom are clustering towards the bottom, but with few in the top left, could be a small positive correlation 

Q4. (10pts) Build a linear regression model to predict Life_Expectancy with Hapiness_Score, GDP_per_Capita, Family and Government_Corruption.

model <- lm(Life_Expectancy~Happiness_Score+GDP_per_Capita+Family+Government_Corruption, data = df)
summary(model)
## 
## Call:
## lm(formula = Life_Expectancy ~ Happiness_Score + GDP_per_Capita + 
##     Family + Government_Corruption, data = df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.35494 -0.08702  0.00841  0.07806  0.30047 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           -0.03073    0.04757  -0.646    0.519    
## Happiness_Score        0.06982    0.01520   4.594 9.01e-06 ***
## GDP_per_Capita         0.04067    0.05145   0.790    0.430    
## Family                -0.05690    0.05516  -1.032    0.304    
## Government_Corruption  0.33536    0.07560   4.436 1.74e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1177 on 153 degrees of freedom
## Multiple R-squared:  0.4057, Adjusted R-squared:  0.3901 
## F-statistic: 26.11 on 4 and 153 DF,  p-value: < 2.2e-16

Q5. (10pts) Check model details. Answer the following questions out of code box: What is the adjusted R-squared value? Is the linear relation significant?

# the adjusted r^2 value is 0.3901
# the linear relation for happiness score and gov corruption are significant bc of the *** next to them

Q6. (10pts) Draw diagnostic plots for the model in Q6.

plot(model)