Introduction:
According to the CDC, the number of suicides is on the rise. The main objective of this project is to find out these reasons why people want to end their lives. When we get to the root of it we find that unhappiness is root cause of the oneself killing. In this project, we will now try to find out where this unhappiness comes from in people.
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.4.2
## Warning: package 'ggplot2' was built under R version 4.4.2
## Warning: package 'tibble' was built under R version 4.4.2
## Warning: package 'stringr' was built under R version 4.4.2
## Warning: package 'lubridate' was built under R version 4.4.2
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)
library(ggthemes)
library(stringr)
library(Hmisc)
## Warning: package 'Hmisc' was built under R version 4.4.2
##
## Attaching package: 'Hmisc'
##
## The following objects are masked from 'package:dplyr':
##
## src, summarize
##
## The following objects are masked from 'package:base':
##
## format.pval, units
library(ggplot2)
library(corrplot)
## corrplot 0.95 loaded
library(RColorBrewer)
library(countrycode)
## Warning: package 'countrycode' was built under R version 4.4.2
library(remotes)
library(caret)
## Warning: package 'caret' was built under R version 4.4.2
## Loading required package: lattice
##
## Attaching package: 'caret'
##
## The following object is masked from 'package:purrr':
##
## lift
library("ranger")
## Warning: package 'ranger' was built under R version 4.4.2
suicide_data_from_1985_2001<- read.csv("https://raw.githubusercontent.com/asadny82/Data607/refs/heads/main/suicide_data_from_1985_2001.csv")
head(suicide_data_from_1985_2001)
## country year sex age suicides_no population suicides.100k.pop
## 1 Albania 1987 male 15-24 years 21 312900 6.71
## 2 Albania 1987 male 35-54 years 16 308000 5.19
## 3 Albania 1987 female 15-24 years 14 289700 4.83
## 4 Albania 1987 male 75+ years 1 21800 4.59
## 5 Albania 1987 male 25-34 years 9 274300 3.28
## 6 Albania 1987 female 75+ years 1 35600 2.81
## country.year HDI.for.year gdp_for_year.... gdp_per_capita.... generation
## 1 Albania1987 NA 2,156,624,900 796 Generation X
## 2 Albania1987 NA 2,156,624,900 796 Silent
## 3 Albania1987 NA 2,156,624,900 796 Generation X
## 4 Albania1987 NA 2,156,624,900 796 G.I. Generation
## 5 Albania1987 NA 2,156,624,900 796 Boomers
## 6 Albania1987 NA 2,156,624,900 796 G.I. Generation
colnames(suicide_data_from_1985_2001) <- c("country", "year","sex","age","suicides_no","population","Suicide_rate","country.year","HDI.for.year","gdp_for_year","gdp_per_capita","generation")
#data cleaning # Check data type
suicide_data_from_1985_2001[c(1,2,3,4)] <- lapply(suicide_data_from_1985_2001[c(1,2,3,4)],factor)
suicide_data_from_1985_2001$year <- factor(suicide_data_from_1985_2001$year, ordered = TRUE)
sapply(suicide_data_from_1985_2001, class)
## $country
## [1] "factor"
##
## $year
## [1] "ordered" "factor"
##
## $sex
## [1] "factor"
##
## $age
## [1] "factor"
##
## $suicides_no
## [1] "integer"
##
## $population
## [1] "integer"
##
## $Suicide_rate
## [1] "numeric"
##
## $country.year
## [1] "character"
##
## $HDI.for.year
## [1] "numeric"
##
## $gdp_for_year
## [1] "character"
##
## $gdp_per_capita
## [1] "integer"
##
## $generation
## [1] "character"
data <- suicide_data_from_1985_2001 %>% mutate(age = str_remove(age,'years'))
data <- data %>% mutate(age=str_remove(age," "))
head(data$age, n=6)
## [1] "15-24" "35-54" "15-24" "75+" "25-34" "75+"
Through this project, As I will find out the root causes of Suicide which countries where people commit suicide, and what reasons are hidden behind suicide in which country. First, I pull data from kaggle website and add to my github. Finally, I will read data from github.
World_Happiness_Report <-read.csv("https://raw.githubusercontent.com/asadny82/Data607/refs/heads/main/World_Happiness_Report_2015.csv")
glimpse(World_Happiness_Report)
## Rows: 155
## Columns: 12
## $ Country <chr> "Norway", "Denmark", "Iceland", "Switzer…
## $ Happiness.Rank <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1…
## $ Happiness.Score <dbl> 7.537, 7.522, 7.504, 7.494, 7.469, 7.377…
## $ Whisker.high <dbl> 7.594445, 7.581728, 7.622030, 7.561772, …
## $ Whisker.low <dbl> 7.479556, 7.462272, 7.385970, 7.426227, …
## $ Economy..GDP.per.Capita. <dbl> 1.616463, 1.482383, 1.480633, 1.564980, …
## $ Family <dbl> 1.533524, 1.551122, 1.610574, 1.516912, …
## $ Health..Life.Expectancy. <dbl> 0.7966665, 0.7925655, 0.8335521, 0.85813…
## $ Freedom <dbl> 0.6354226, 0.6260067, 0.6271626, 0.62007…
## $ Generosity <dbl> 0.36201224, 0.35528049, 0.47554022, 0.29…
## $ Trust..Government.Corruption. <dbl> 0.31596384, 0.40077007, 0.15352656, 0.36…
## $ Dystopia.Residual <dbl> 2.277027, 2.313707, 2.322715, 2.276716, …
happiness_rank_data <- World_Happiness_Report %>%
rename(Happyness_Rank = Happiness.Rank, Happiness_Score= Happiness.Score,Life_Expectancy = Health..Life.Expectancy.
,Trust = Trust..Government.Corruption.,Whisker_high=Whisker.high, Whisker_low=Whisker.low, Economy_GDP_per_Capita=Economy..GDP.per.Capita., Country=Country) %>%
select(-Whisker_high, -Whisker_low, -'Economy_GDP_per_Capita',-Generosity)%>%
group_by(Country)
head(happiness_rank_data)
## # A tibble: 6 × 8
## # Groups: Country [6]
## Country Happyness_Rank Happiness_Score Family Life_Expectancy Freedom Trust
## <chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Norway 1 7.54 1.53 0.797 0.635 0.316
## 2 Denmark 2 7.52 1.55 0.793 0.626 0.401
## 3 Iceland 3 7.50 1.61 0.834 0.627 0.154
## 4 Switzerla… 4 7.49 1.52 0.858 0.620 0.367
## 5 Finland 5 7.47 1.54 0.809 0.618 0.383
## 6 Netherlan… 6 7.38 1.43 0.811 0.585 0.283
## # ℹ 1 more variable: Dystopia.Residual <dbl>
data %>%
ggplot(aes(population))+geom_histogram(fill="deepskyblue2",color="navy")+
labs(y="Absute Freequency",x="year.")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
data %>%
ggplot(aes(suicides_no))+geom_histogram(fill="deepskyblue2",color="navy")+
labs(y="Absute Freequency",x="suicide nNumber")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#Frequency distribution for suicide rate
data %>%
ggplot(aes(Suicide_rate))+geom_histogram(fill="deepskyblue2",color="navy")+
labs(y="Absute Freequency",x="suicide Number")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
data %>%
ggplot(aes(sex))+geom_bar(fill="deepskyblue2",color="navy")+
labs(y="Absute Freequency",x="suicide Number")
Year from 1985 to 2016
Suicide_rate_visualize <- data %>%
ggplot(aes(year,Suicide_rate))+
geom_point()+
geom_smooth(method="lm", se=FALSE)+
labs( title="", x="Year", y="Suicidr Rate per 100,000 for year")
Suicide_rate_visualize
## `geom_smooth()` using formula = 'y ~ x'
#Generation pie chart.
generation_count <- data %>% count(generation)
pie(generation_count$n,labels = generation_count$generation, redious=1,col = c("orange", "green","yellow","blue"),main="Generation")
## Warning in text.default(1.1 * P$x, 1.1 * P$y, labels[i], xpd = TRUE, adj =
## ifelse(P$x < : "redious" is not a graphical parameter
## Warning in text.default(1.1 * P$x, 1.1 * P$y, labels[i], xpd = TRUE, adj =
## ifelse(P$x < : "redious" is not a graphical parameter
## Warning in text.default(1.1 * P$x, 1.1 * P$y, labels[i], xpd = TRUE, adj =
## ifelse(P$x < : "redious" is not a graphical parameter
## Warning in text.default(1.1 * P$x, 1.1 * P$y, labels[i], xpd = TRUE, adj =
## ifelse(P$x < : "redious" is not a graphical parameter
## Warning in text.default(1.1 * P$x, 1.1 * P$y, labels[i], xpd = TRUE, adj =
## ifelse(P$x < : "redious" is not a graphical parameter
## Warning in text.default(1.1 * P$x, 1.1 * P$y, labels[i], xpd = TRUE, adj =
## ifelse(P$x < : "redious" is not a graphical parameter
## Warning in title(main = main, ...): "redious" is not a graphical parameter
#Suicide rate compare with countries to countries
suicide_data_from_1985_2001 %>% group_by(country) %>% summarise(country_suicide_rate_=sum(suicides_no)*100000/sum(population))%>%top_n(25)%>%
ggplot(aes(reorder(country,country_suicide_rate_),country_suicide_rate_))+
geom_bar(stat="identity",fill="red",color="navy")+
coord_flip()+
labs(x="country", y="Suicide rate per 100000 population")+
ggtitle("Suicide rates by country")
## Selecting by country_suicide_rate_
suicide_data_from_1985_2001$continent <- countrycode(sourcevar = suicide_data_from_1985_2001[,"country"],origin = "country.name",destination = "continent")
suicide_data_from_1985_2001 %>% group_by(country,continent)%>%
summarise(avg_suicide_rate=mean(Suicide_rate))%>%
ggplot(aes(continent,avg_suicide_rate))+
geom_boxplot(fill="red",color="blue")+
labs(x="continent",y="Suicide rate per 100000 population")+
ggtitle("Suicide rate by continent")
## `summarise()` has grouped output by 'country'. You can override using the
## `.groups` argument.
suicide_data_from_1985_2001 %>%group_by(country,year)%>%
summarise(pop=mean(population),Suicide_rate=sum(suicides_no)*100000/sum(population),pop=sum(pop)) %>%ungroup()%>%
group_by(country)%>%
summarise(pop=sum(pop),Suicide_rate=mean(Suicide_rate))%>%
ggplot(aes(Suicide_rate,pop))+
geom_point(fill="red",color="blue")+
geom_text(data=. %>%filter(Suicide_rate>35 | pop >40000000),
aes(label = country, col=country),
position = "dodge")+stat_smooth(method = "lm",color="green",size=1)+
theme(legend.position = "none")+
labs(x="Suicide rate",y="Population")+
ggtitle("suicide Rate by population size")
## `summarise()` has grouped output by 'country'. You can override using the
## `.groups` argument.
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## `geom_smooth()` using formula = 'y ~ x'
## Warning: Width not defined
## ℹ Set with `position_dodge(width = ...)`
#Effect of nation wealth on suicide rate.
suicide_data_from_1985_2001 %>% group_by(country)%>%
summarise(Suicide_rate=sum(suicides_no)*100000/sum(population),
gdp_per_capita=mean(gdp_per_capita),
pop=sum(as.numeric(population)))%>%
arrange(desc(gdp_per_capita))%>%
ggplot(aes(gdp_per_capita,Suicide_rate))+
geom_point(fill="red",color="navy")+
stat_smooth(method = "lm", color="green",size=1)+
geom_text(data=.%>% filter(gdp_per_capita>64000| Suicide_rate>40),aes(gdp_per_capita,Suicide_rate, label = country,col=country))+
ggtitle("gdp per capita vs suicide Rate")+
theme(legend.position = "none")
## `geom_smooth()` using formula = 'y ~ x'
# Suicide rate by age group.
level_key_age <- c('1'= "5-14 years",'2'="15-24 years",'3'="25-34 years",'4'="35-54 years",'5'="55-74 years",'6'="75+ years")
suicide_data_from_1985_2001$age <- recode_factor(as.character(suicide_data_from_1985_2001$age),!!!level_key_age)
suicide_data_from_1985_2001 %>% group_by(age,country) %>%
summarise(Suicide_rate=sum(suicides_no)*100000/sum(population))%>%
ggplot(aes(age,Suicide_rate))+
geom_boxplot(fill="deepskyblue2",col="green")+
labs(x="age group",y="Suicide rate")+
ggtitle("Suicide Rate by age group")+
theme(axis.text.x = element_text(angle=30))
## `summarise()` has grouped output by 'age'. You can override using the `.groups`
## argument.
suicide_data_from_1985_2001 %>% group_by(sex) %>%
summarise(Suicide_rate=sum(suicides_no)*100000/sum(population))%>%
ggplot(aes(reorder(sex,Suicide_rate),Suicide_rate,fill=sex))+
geom_histogram( stat="identity", color="green")+
ggtitle("Suicide Rate by sex")+
scale_color_manual(values = c("deepskyblue2","navyblue"),
aesthetics = c("color","fill"))+
labs(x="sex",y="Suicide rte 100000 population", fill="sex")
## Warning in geom_histogram(stat = "identity", color = "green"): Ignoring unknown
## parameters: `binwidth`, `bins`, and `pad`
#In the plot we will see how is the happiness rank and happiness score are related. The rankings of national happiness are based on a happiness measurement survey undertaken. Nationally representative samples of respondents are asked to think of a ladder, with the best possible life for them being a 10, and the worst possible life being a 0. They are then asked to rate their own current lives on that 0 to 10 scale.[16] The report correlates the life evaluation results with various life factors. The survey says happiness score and happiness rank are inversely proportional.
happiness_rank_data %>%
ggplot(aes(Happyness_Rank, Happiness_Score))+
geom_point()+
geom_smooth(method="lm", se=FALSE)+
labs(title = "", x = "Happyness_Rank", y = "Happiness_Score")
## `geom_smooth()` using formula = 'y ~ x'
# In the above plot show the happiness rank and happiness score
coorilated
#Now we will find the happiness score by countries and continents.
At first, I create the vecotor to the countries in the continents.
happiness_rank_data <- World_Happiness_Report %>%
mutate(Continent = case_when(
Country %in% c("Afghanistan","Azerbaijan","United Arab Emirates", "Singapore", "Thailand", "Taiwan Province of China", "Qatar","Turkey", "Saudi Arabia", "Kuwait", "Bahrain", "Malaysia", "Uzbekistan", "Japan", "South Korea", "Turkmenistan", "Kazakhstan", "Hong Kong S.A.R., China","Israel", "Philippines", "Jordan", "China", "Pakistan", "Indonesia", "Lebanon", "Vietnam", "Tajikistan", "Bhutan", "Kyrgyzstan", "Nepal", "Mongolia", "Palestinian Territories", "Iran", "Bangladesh", "Myanmar", "Iraq", "Sri Lanka", "Armenia", "India", "Georgia", "Cambodia", "Yemen", "Syria") ~ "Asia",
Country %in% c( "Finland","Switzerland","Norway","Bulgaria", "Denmark", "Iceland", "Netherlands", "Sweden", "Austria", "Ireland", "Germany", "Belgium", "Luxembourg", "United Kingdom", "Czech Republic", "Malta", "France", "Spain", "Slovakia", "Poland", "Italy", "Russia", "Lithuania", "Latvia", "Moldova", "Romania", "Slovenia", "North Cyprus", "Cyprus", "Estonia", "Belarus", "Serbia", "Hungary", "Croatia", "Kosovo", "Montenegro", "Greece", "Portugal", "Bosnia and Herzegovina", "Macedonia", "Albania", "Ukraine") ~ "Europe",
Country %in% c("United States","Canada", "Costa Rica", "Mexico", "Panama","Trinidad and Tobago", "El Salvador", "Belize", "Guatemala", "Jamaica", "Nicaragua", "Dominican Republic", "Honduras", "Haiti") ~ "North America",
Country %in% c("Chile", "Argentina", "Uruguay", "Colombia", "Ecuador", "Bolivia", "Peru", "Paraguay", "Venezuela","Brazil") ~ "South America",
Country %in% c("New Zealand", "Australia") ~ "Australia",
TRUE ~ "Africa")) %>%
mutate(Continent = as.factor(Continent)) %>%
select(Country, Continent, everything())
glimpse(happiness_rank_data)
## Rows: 155
## Columns: 13
## $ Country <chr> "Norway", "Denmark", "Iceland", "Switzer…
## $ Continent <fct> Europe, Europe, Europe, Europe, Europe, …
## $ Happiness.Rank <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1…
## $ Happiness.Score <dbl> 7.537, 7.522, 7.504, 7.494, 7.469, 7.377…
## $ Whisker.high <dbl> 7.594445, 7.581728, 7.622030, 7.561772, …
## $ Whisker.low <dbl> 7.479556, 7.462272, 7.385970, 7.426227, …
## $ Economy..GDP.per.Capita. <dbl> 1.616463, 1.482383, 1.480633, 1.564980, …
## $ Family <dbl> 1.533524, 1.551122, 1.610574, 1.516912, …
## $ Health..Life.Expectancy. <dbl> 0.7966665, 0.7925655, 0.8335521, 0.85813…
## $ Freedom <dbl> 0.6354226, 0.6260067, 0.6271626, 0.62007…
## $ Generosity <dbl> 0.36201224, 0.35528049, 0.47554022, 0.29…
## $ Trust..Government.Corruption. <dbl> 0.31596384, 0.40077007, 0.15352656, 0.36…
## $ Dystopia.Residual <dbl> 2.277027, 2.313707, 2.322715, 2.276716, …
happiness_rank_data %>%
select(-Happiness.Rank, -Happiness.Score,-Country, -Continent) %>%
describe()
## .
##
## 9 Variables 155 Observations
## --------------------------------------------------------------------------------
## Whisker.high
## n missing distinct Info Mean pMedian Gmd .05
## 155 0 155 1 5.452 5.451 1.287 3.684
## .10 .25 .50 .75 .90 .95
## 3.962 4.608 5.370 6.195 6.986 7.364
##
## lowest : 2.86488 3.07469 3.46143 3.54303 3.58443
## highest: 7.52754 7.56177 7.58173 7.59444 7.62203
## --------------------------------------------------------------------------------
## Whisker.low
## n missing distinct Info Mean pMedian Gmd .05
## 155 0 155 1 5.256 5.25 1.316 3.448
## .10 .25 .50 .75 .90 .95
## 3.680 4.375 5.193 6.007 6.868 7.231
##
## lowest : 2.52112 2.73531 3.23657 3.26033 3.39596
## highest: 7.38597 7.41046 7.42623 7.46227 7.47956
## --------------------------------------------------------------------------------
## Economy..GDP.per.Capita.
## n missing distinct Info Mean pMedian Gmd .05
## 155 0 155 1 0.9847 1 0.4802 0.2415
## .10 .25 .50 .75 .90 .95
## 0.3687 0.6634 1.0646 1.3180 1.4860 1.5479
##
## lowest : 0 0.0226432 0.0916226 0.0921023 0.119042
## highest: 1.62634 1.63295 1.69228 1.74194 1.87077
## --------------------------------------------------------------------------------
## Family
## n missing distinct Info Mean pMedian Gmd .05
## 155 0 155 1 1.189 1.22 0.3106 0.6213
## .10 .25 .50 .75 .90 .95
## 0.7814 1.0426 1.2539 1.4143 1.4856 1.5215
##
## lowest : 0 0.396103 0.431883 0.4353 0.512569
## highest: 1.5482 1.54897 1.55112 1.55823 1.61057
## --------------------------------------------------------------------------------
## Health..Life.Expectancy.
## n missing distinct Info Mean pMedian Gmd .05
## 155 0 155 1 0.5513 0.5673 0.2677 0.1118
## .10 .25 .50 .75 .90 .95
## 0.1925 0.3699 0.6060 0.7230 0.8273 0.8448
##
## lowest : 0 0.00556475 0.0187727 0.0411347 0.0486422
## highest: 0.888961 0.900214 0.913476 0.943062 0.949492
## --------------------------------------------------------------------------------
## Freedom
## n missing distinct Info Mean pMedian Gmd .05
## 155 0 155 1 0.4088 0.4187 0.1691 0.1179
## .10 .25 .50 .75 .90 .95
## 0.2007 0.3037 0.4375 0.5166 0.5874 0.6133
##
## lowest : 0 0.0149959 0.0303699 0.0599008 0.0815394
## highest: 0.626007 0.627163 0.633376 0.635423 0.658249
## --------------------------------------------------------------------------------
## Generosity
## n missing distinct Info Mean pMedian Gmd .05
## 155 0 155 1 0.2469 0.2378 0.1482 0.05149
## .10 .25 .50 .75 .90 .95
## 0.08534 0.15411 0.23154 0.32376 0.42829 0.48970
##
## lowest : 0 0.0101647 0.0288068 0.03221 0.0437854
## highest: 0.500005 0.572123 0.574731 0.611705 0.838075
## --------------------------------------------------------------------------------
## Trust..Government.Corruption.
## n missing distinct Info Mean pMedian Gmd .05
## 155 0 155 1 0.1231 0.1025 0.1047 0.02072
## .10 .25 .50 .75 .90 .95
## 0.03213 0.05727 0.08985 0.15330 0.28256 0.33724
##
## lowest : 0 0.0043879 0.00896482 0.0100913 0.0110515
## highest: 0.384399 0.40077 0.439299 0.45522 0.464308
## --------------------------------------------------------------------------------
## Dystopia.Residual
## n missing distinct Info Mean pMedian Gmd .05
## 155 0 155 1 1.85 1.853 0.5526 1.056
## .10 .25 .50 .75 .90 .95
## 1.316 1.591 1.833 2.145 2.488 2.731
##
## lowest : 0.377914 0.419389 0.540061 0.554633 0.62113
## highest: 2.80781 2.83715 2.89389 2.89864 3.11748
## --------------------------------------------------------------------------------
happiness_Correlation <- cor(happiness_rank_data[c(3:10)])
corrplot(happiness_Correlation, method = "pie", type = "upper", order = "FPC",
col = brewer.pal(n = 7, name = "GnBu"),
tl.col = "black", cl.align = "r", cl.ratio = 0.3)
# In the plot we see the what is related with people happiness.
happiness_rank_data %>%
ggplot(aes(Continent, Happiness.Score, color = Continent)) +
geom_violin() +
theme_fivethirtyeight() +
theme(legend.position = "none", plot.title = element_text(hjust = 0.5, vjust = 0.3)) +
labs(title = "Happiness Score by Continent",
x = " ",
y = "Happiness Score")
happiness_rank_data %>%
ggplot(aes(Family, Happiness.Score)) +
geom_point(aes(color = Continent), size = 3, alpha = 0.8) +
geom_smooth(aes(color = Continent, fill = Continent), method = "lm", fullrange = TRUE) +
facet_wrap(~ Continent) +
theme_fivethirtyeight() +
ggtitle("Family")
## `geom_smooth()` using formula = 'y ~ x'
## Warning in qt((1 - level)/2, df): NaNs produced
## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf
happiness_rank_data %>%
ggplot(aes(Health..Life.Expectancy., Happiness.Score)) +
geom_point(aes(color = Continent), size = 3, alpha = 0.8) +
geom_smooth(aes(color = Continent, fill = Continent), method = "lm", fullrange = TRUE) +
facet_wrap(~ Continent) +
theme_fivethirtyeight() +
ggtitle("Health Life Expectancy.")
## `geom_smooth()` using formula = 'y ~ x'
## Warning in qt((1 - level)/2, df): NaNs produced
## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf
happiness_rank_data <- World_Happiness_Report %>%
mutate(Country= case_when(
Country %in% c("United States") ~"United States" ,
TRUE ~ "United States"))%>%
mutate(Country = as.factor(Country))
glimpse(happiness_rank_data)
## Rows: 155
## Columns: 12
## $ Country <fct> United States, United States, United Sta…
## $ Happiness.Rank <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1…
## $ Happiness.Score <dbl> 7.537, 7.522, 7.504, 7.494, 7.469, 7.377…
## $ Whisker.high <dbl> 7.594445, 7.581728, 7.622030, 7.561772, …
## $ Whisker.low <dbl> 7.479556, 7.462272, 7.385970, 7.426227, …
## $ Economy..GDP.per.Capita. <dbl> 1.616463, 1.482383, 1.480633, 1.564980, …
## $ Family <dbl> 1.533524, 1.551122, 1.610574, 1.516912, …
## $ Health..Life.Expectancy. <dbl> 0.7966665, 0.7925655, 0.8335521, 0.85813…
## $ Freedom <dbl> 0.6354226, 0.6260067, 0.6271626, 0.62007…
## $ Generosity <dbl> 0.36201224, 0.35528049, 0.47554022, 0.29…
## $ Trust..Government.Corruption. <dbl> 0.31596384, 0.40077007, 0.15352656, 0.36…
## $ Dystopia.Residual <dbl> 2.277027, 2.313707, 2.322715, 2.276716, …
happiness_rank_data %>%
ggplot(aes(Family, Happiness.Score)) +
geom_point(aes(color = Country), size = 1, alpha = 0.8) +
geom_smooth(aes(color = Country, fill = Country), method = "lm", fullrange = TRUE) +
facet_wrap(~Country) +
theme_fivethirtyeight() +
ggtitle("Family")
## `geom_smooth()` using formula = 'y ~ x'
happiness_rank_data %>%
ggplot(aes(Economy..GDP.per.Capita., Happiness.Score)) +
geom_point(aes(color = Country), size = 1, alpha = 0.8) +
geom_smooth(aes(color = Country, fill = Country), method = "lm", fullrange = TRUE) +
facet_wrap(~Country) +
theme_fivethirtyeight() +
ggtitle("Economy..GDP.per.Capita.")
## `geom_smooth()` using formula = 'y ~ x'
happiness_rank_data %>%
ggplot(aes(Health..Life.Expectancy., Happiness.Score)) +
geom_point(aes(color = Country), size = 1, alpha = 0.8) +
geom_smooth(aes(color = Country, fill = Country), method = "lm", fullrange = TRUE) +
facet_wrap(~Country) +
theme_fivethirtyeight() +
ggtitle("Health..Life.Expectancy.")
## `geom_smooth()` using formula = 'y ~ x'
happiness_rank_data %>%
ggplot(aes(Trust..Government.Corruption., Happiness.Score)) +
geom_point(aes(color = Country), size = 1, alpha = 0.8) +
geom_smooth(aes(color = Country, fill = Country), method = "lm", fullrange = TRUE) +
facet_wrap(~Country) +
theme_fivethirtyeight() +
ggtitle("Trust..Government.Corruption.")
## `geom_smooth()` using formula = 'y ~ x'
happiness_rank_data %>%
ggplot(aes(Freedom, Happiness.Score)) +
geom_point(aes(color = Country), size = 1, alpha = 0.8) +
geom_smooth(aes(color = Country, fill = Country), method = "lm", fullrange = TRUE) +
facet_wrap(~Country) +
theme_fivethirtyeight() +
ggtitle("Freedom")
## `geom_smooth()` using formula = 'y ~ x'
Conclusion: In the research I used two set of data. One data set was about the suicide survey and another dataset is about the posible reason why people end their fives.In the project I download data from keggle website and upload to my github repository and read data from there. Then I clean the data and make it tidy. In my analysis, I research who age and gender pepole commit more suicide. Also, I analyses which continent and country people commit more suicide. I have found that large amount of people over age 75+ years and men commit more suicide. There are few reason what make them people unhappy like family relation,economy,freedom and more.
===================================================source==================================================