Presentation
Social Data Science

Adam Ingwersen, GQR701

January 28, 2016

Introduction

Purpose of Group 15’s project

Key findings

Limitations of found data

Limitations of data gathering approach

Key findings

What the data showed

Improvements

Increasing sample?

Improvements

Methodology?

Extended Analysis

What we did after handing in the project

Packages used

packagesused = c("MASS", "ggplot2", "plyr", "sandwich", "lmtest", "readr", "tidyr", 
                 "lubridate", "dplyr", "stringr", "readr", "rpart", "rpart.plot",
                 "class", "countrycode", "rvest", "maps", "countrycode")
lapply(packagesused, library, character.only =TRUE)

Linear Modelling

Stepwise Backward Selection

## Start:  AIC=812.01
## decay ~ Injuried + Category + Region + Dead
## 
##            Df Sum of Sq   RSS    AIC
## - Region    3   191.101 20052 807.61
## - Category  1     4.592 19865 810.05
## - Dead      1    86.342 19947 810.74
## - Injuried  1   134.117 19995 811.13
## <none>                  19861 812.01
## 
## Step:  AIC=807.61
## decay ~ Injuried + Category + Dead
## 
##            Df Sum of Sq   RSS    AIC
## - Dead      1    59.294 20111 806.10
## - Category  1    61.338 20113 806.12
## - Injuried  1   207.317 20259 807.33
## <none>                  20052 807.61
## 
## Step:  AIC=806.1
## decay ~ Injuried + Category
## 
##            Df Sum of Sq   RSS    AIC
## - Category  1     49.64 20161 804.51
## <none>                  20111 806.10
## - Injuried  1    532.73 20644 808.47
## 
## Step:  AIC=804.51
## decay ~ Injuried
## 
##            Df Sum of Sq   RSS    AIC
## <none>                  20161 804.51
## - Injuried  1    497.64 20658 806.59

Linear Modelling

Descriptive Statistics

## 
## Call:
## lm(formula = decay ~ Injuried, data = newdf)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -26.021  -7.921   1.916   5.579  39.556 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -6.64533    0.92048  -7.219 1.53e-11 ***
## Injuried    -0.02709    0.00949  -2.855  0.00483 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 11.02 on 175 degrees of freedom
##   (228 observations deleted due to missingness)
## Multiple R-squared:  0.04449,    Adjusted R-squared:  0.03903 
## F-statistic: 8.149 on 1 and 175 DF,  p-value: 0.00483
  1. This states, that the only factor contained in our data to explain variation in decay rates is number of injured for a related event.No apparent signs of heteroskedasticity given BP-test.
  2. This does have some intuitive elements - however the result is not interesting and we believe that there are interesting results to be found.

Linear Modelling

Changing regressand

## 
## Call:
## lm(formula = increase ~ Region + Category, data = newdf)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -49.312 -15.816  -5.222  14.590  68.778 
## 
## Coefficients:
##                           Estimate Std. Error t value Pr(>|t|)  
## (Intercept)                 15.026     17.458   0.861   0.3904  
## RegionAsia                  -3.188      3.378  -0.944   0.3463  
## RegionEurope                12.902      6.239   2.068   0.0398 *
## RegionNorth America         14.590     16.280   0.896   0.3711  
## RegionSouth America          7.812     25.131   0.311   0.7562  
## CategoryNatural disasters   11.162     20.157   0.554   0.5803  
## CategoryTerrorism            6.385     17.267   0.370   0.7119  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 22.73 on 216 degrees of freedom
##   (182 observations deleted due to missingness)
## Multiple R-squared:  0.04477,    Adjusted R-squared:  0.01823 
## F-statistic: 1.687 on 6 and 216 DF,  p-value: 0.1253

Linear Modelling

Changing regressand pt. 2

Machine Learning

Decision Trees

Machine Learning

Interpretation

Implications

Omitted factors

Internet accessibility

EXTRA

Map : Rate of Decay

EXTRA

Map : Recorded Events

EXTRA

Pruned Decision Tree including “percentage”