Does being happy make you cancer-free?

Exploring Cancer Cases and Happiness Globally

Jason Perez - S3747946

Last updated: 27 October, 2018

Introduction



Introduction Cont.





Problem Statement






Data


Data Cont.

Happiness Data

Cancer Incidence Data

Data Preprocessing

Descriptive Statistics

format(summary(cancer_happiness$Cases), big.mark = ",", trim = TRUE)
##        Min.     1st Qu.      Median        Mean     3rd Qu.        Max. 
##     "1,990"    "17,524"    "41,510"   "313,177"   "184,899" "8,359,517"
round(summary(cancer_happiness$Happiness_Rating),2) 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    2.90    4.56    5.34    5.43    6.29    7.66

Visualisation

hist_cancer <- ggplot(cancer_happiness, aes(x = log(Cases))) +  geom_histogram(fill = "red", color = "blue") + xlab("Number of Cancer Cases") + ggtitle("Histogram of Cancer Cases")
hist_happiness <- ggplot(cancer_happiness, aes(x = log(Happiness_Rating, Year))) +  geom_histogram(fill = "green", color = "blue") + xlab("Happiness Rating") + ggtitle("Histogram of Happiness Rating")
grid.arrange(hist_cancer, hist_happiness, ncol = 2)

Visualisation Cont.

Graphical representation is shown on the plot where it shows the relationship between happiness and the cancer incidences of the sampled countries per year.

ggplot(cancer_happiness, aes(x = log(Cases), y = log(Happiness_Rating), colour = factor(Year))) + geom_line() + labs(x = "Happiness Rating", y = "Cancer Cases", colour = "Year") + ggtitle("Plot of Happiness and Cancer Cases") # Plot 

Visualisation Cont.

Scatter plot of Happiness and Cancer Cases of the sampled countries per year is shown below.

ggplot(cancer_happiness, aes(x = log(Happiness_Rating), y = log(Cases), colour = factor(Year))) + geom_jitter(alpha = 1) + geom_smooth(lwd = 0.1, alpha = 0.1) + ggtitle("Scatter Plot of Happiness and Cancer Cases") + labs(x = "Happiness Rating", y = "Cancer Cases", colour = "Year") # Scatter plot

Visualisation Cont.

data_geomap <- gvisGeoChart(cancer_happiness, "Country", "Happiness_Rating",options=list(width=200, height=150))
cancer_happiness2 <- cancer_happiness %>% filter(Year == 2016)
cancer_happiness2$Year <- as.factor(cancer_happiness2$Year)
cancer_happiness_normal <- cancer_happiness %>% mutate(Happiness_norm = log(Happiness_Rating), Cancer_norm = log(Cases))
data_table <- gvisTable(cancer_happiness2,options=list(width=200, height=270))
data_motion <- gvisMotionChart(cancer_happiness_normal, idvar = "Country", timevar = "Year", xvar = "Happiness_norm", yvar = "Cancer_norm", sizevar = "Cases") 
map_table <- gvisMerge(data_geomap, data_table, horizontal = FALSE)
map_table_motion <- gvisMerge(data_motion, map_table, horizontal = TRUE, tableOptions="bgcolor=\"#CCCCCC\" cellspacing = 10")
plot(map_table_motion)

Hypothesis Testing

This research study is very simple. We want to predict whether there is a correlation between country’s happiness and its cancer incidences. We want to test whether the statistical data we used fit the linear regression model. The hypotheses formulated are shown below:

\(H_0:\) The Country’s Happiness and Cancer Cases data do not fit the linear regression model

\(H_A:\) The Country’s Happiness and Cancer Cases data fit the linear regression model

Assumptions made:

  1. Linearity of the model - linear relationship is present
  2. Normality - this is tested by using the transformed/scaled variables
  3. Independence - assumed to be present when the providers conducted the survey
  4. Homoscedasticity - assumed on the model
  5. Significance level is set at alpha \(0.05\)
  6. Causality is outside the scope of the study which means that a change in the independent variable does not cause changes in the dependent variable.

Interpretation

The model summary results using \(lm\) function are shown below:

results <- (lm(Cases~Happiness_Rating,data=cancer_happiness))
results %>% summary()
## 
## Call:
## lm(formula = Cases ~ Happiness_Rating, data = cancer_happiness)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -566268 -314423 -167451  -39867 8062280 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       -478820     191408  -2.502   0.0126 *  
## Happiness_Rating   145875      34534   4.224 2.77e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 947100 on 603 degrees of freedom
## Multiple R-squared:  0.02874,    Adjusted R-squared:  0.02713 
## F-statistic: 17.84 on 1 and 603 DF,  p-value: 2.77e-05

Interpretation Cont.

The Pearson correlation coefficient results are shown below:

cor.test(cancer_happiness$Cases, cancer_happiness$Happiness_Rating, method = "pearson")
## 
##  Pearson's product-moment correlation
## 
## data:  cancer_happiness$Cases and cancer_happiness$Happiness_Rating
## t = 4.2241, df = 603, p-value = 2.77e-05
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.09104636 0.24591814
## sample estimates:
##       cor 
## 0.1695287


Interpretation Cont.






Discussion



Proposed Action for Future Investigation



Final Conclusion



References