Introduction:

According to the CDC, the number of suicides is on the rise. The main objective of this project is to find out these reasons why people want to end their lives. When we get to the root of it we find that unhappiness is root cause of the oneself killing. In this project, we will now try to find out where this unhappiness comes from in people.

library(tidyverse)

## Warning: package 'tidyverse' was built under R version 4.4.2

## Warning: package 'ggplot2' was built under R version 4.4.2

## Warning: package 'tibble' was built under R version 4.4.2

## Warning: package 'stringr' was built under R version 4.4.2

## Warning: package 'lubridate' was built under R version 4.4.2

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(dplyr)
library(ggthemes)
library(stringr)
library(Hmisc)

## Warning: package 'Hmisc' was built under R version 4.4.2

## 
## Attaching package: 'Hmisc'
## 
## The following objects are masked from 'package:dplyr':
## 
##     src, summarize
## 
## The following objects are masked from 'package:base':
## 
##     format.pval, units

library(ggplot2)
library(corrplot)

## corrplot 0.95 loaded

library(RColorBrewer)
library(countrycode)

## Warning: package 'countrycode' was built under R version 4.4.2

library(remotes)
library(caret)

## Warning: package 'caret' was built under R version 4.4.2

## Loading required package: lattice
## 
## Attaching package: 'caret'
## 
## The following object is masked from 'package:purrr':
## 
##     lift

library("ranger")

## Warning: package 'ranger' was built under R version 4.4.2

suicide_data_from_1985_2001<- read.csv("https://raw.githubusercontent.com/asadny82/Data607/refs/heads/main/suicide_data_from_1985_2001.csv")
head(suicide_data_from_1985_2001)

##   country year    sex         age suicides_no population suicides.100k.pop
## 1 Albania 1987   male 15-24 years          21     312900              6.71
## 2 Albania 1987   male 35-54 years          16     308000              5.19
## 3 Albania 1987 female 15-24 years          14     289700              4.83
## 4 Albania 1987   male   75+ years           1      21800              4.59
## 5 Albania 1987   male 25-34 years           9     274300              3.28
## 6 Albania 1987 female   75+ years           1      35600              2.81
##   country.year HDI.for.year gdp_for_year.... gdp_per_capita....      generation
## 1  Albania1987           NA    2,156,624,900                796    Generation X
## 2  Albania1987           NA    2,156,624,900                796          Silent
## 3  Albania1987           NA    2,156,624,900                796    Generation X
## 4  Albania1987           NA    2,156,624,900                796 G.I. Generation
## 5  Albania1987           NA    2,156,624,900                796         Boomers
## 6  Albania1987           NA    2,156,624,900                796 G.I. Generation

rename the column of dataframe

 colnames(suicide_data_from_1985_2001) <- c("country", "year","sex","age","suicides_no","population","Suicide_rate","country.year","HDI.for.year","gdp_for_year","gdp_per_capita","generation")

#data cleaning # Check data type

suicide_data_from_1985_2001[c(1,2,3,4)] <- lapply(suicide_data_from_1985_2001[c(1,2,3,4)],factor)
suicide_data_from_1985_2001$year <- factor(suicide_data_from_1985_2001$year, ordered = TRUE)
sapply(suicide_data_from_1985_2001, class)

## $country
## [1] "factor"
## 
## $year
## [1] "ordered" "factor" 
## 
## $sex
## [1] "factor"
## 
## $age
## [1] "factor"
## 
## $suicides_no
## [1] "integer"
## 
## $population
## [1] "integer"
## 
## $Suicide_rate
## [1] "numeric"
## 
## $country.year
## [1] "character"
## 
## $HDI.for.year
## [1] "numeric"
## 
## $gdp_for_year
## [1] "character"
## 
## $gdp_per_capita
## [1] "integer"
## 
## $generation
## [1] "character"

remove years from age column

data <- suicide_data_from_1985_2001 %>% mutate(age = str_remove(age,'years'))
 data <- data %>% mutate(age=str_remove(age," "))
 head(data$age, n=6)

## [1] "15-24" "35-54" "15-24" "75+"   "25-34" "75+"

Through this project, As I will find out the root causes of Suicide which countries where people commit suicide, and what reasons are hidden behind suicide in which country. First, I pull data from kaggle website and add to my github. Finally, I will read data from github.

World_Happiness_Report <-read.csv("https://raw.githubusercontent.com/asadny82/Data607/refs/heads/main/World_Happiness_Report_2015.csv")
glimpse(World_Happiness_Report)

## Rows: 155
## Columns: 12
## $ Country                       <chr> "Norway", "Denmark", "Iceland", "Switzer…
## $ Happiness.Rank                <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1…
## $ Happiness.Score               <dbl> 7.537, 7.522, 7.504, 7.494, 7.469, 7.377…
## $ Whisker.high                  <dbl> 7.594445, 7.581728, 7.622030, 7.561772, …
## $ Whisker.low                   <dbl> 7.479556, 7.462272, 7.385970, 7.426227, …
## $ Economy..GDP.per.Capita.      <dbl> 1.616463, 1.482383, 1.480633, 1.564980, …
## $ Family                        <dbl> 1.533524, 1.551122, 1.610574, 1.516912, …
## $ Health..Life.Expectancy.      <dbl> 0.7966665, 0.7925655, 0.8335521, 0.85813…
## $ Freedom                       <dbl> 0.6354226, 0.6260067, 0.6271626, 0.62007…
## $ Generosity                    <dbl> 0.36201224, 0.35528049, 0.47554022, 0.29…
## $ Trust..Government.Corruption. <dbl> 0.31596384, 0.40077007, 0.15352656, 0.36…
## $ Dystopia.Residual             <dbl> 2.277027, 2.313707, 2.322715, 2.276716, …

Now I will rename the columns and change the data to TidyData for hanniness data

happiness_rank_data <- World_Happiness_Report %>%
  rename(Happyness_Rank = Happiness.Rank, Happiness_Score= Happiness.Score,Life_Expectancy = Health..Life.Expectancy.
         ,Trust = Trust..Government.Corruption.,Whisker_high=Whisker.high, Whisker_low=Whisker.low, Economy_GDP_per_Capita=Economy..GDP.per.Capita., Country=Country) %>%
  select(-Whisker_high, -Whisker_low, -'Economy_GDP_per_Capita',-Generosity)%>%
  group_by(Country)
head(happiness_rank_data)

## # A tibble: 6 × 8
## # Groups:   Country [6]
##   Country    Happyness_Rank Happiness_Score Family Life_Expectancy Freedom Trust
##   <chr>               <int>           <dbl>  <dbl>           <dbl>   <dbl> <dbl>
## 1 Norway                  1            7.54   1.53           0.797   0.635 0.316
## 2 Denmark                 2            7.52   1.55           0.793   0.626 0.401
## 3 Iceland                 3            7.50   1.61           0.834   0.627 0.154
## 4 Switzerla…              4            7.49   1.52           0.858   0.620 0.367
## 5 Finland                 5            7.47   1.54           0.809   0.618 0.383
## 6 Netherlan…              6            7.38   1.43           0.811   0.585 0.283
## # ℹ 1 more variable: Dystopia.Residual <dbl>

data %>%
  ggplot(aes(population))+geom_histogram(fill="deepskyblue2",color="navy")+
  labs(y="Absute Freequency",x="year.")

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

data %>%
  ggplot(aes(suicides_no))+geom_histogram(fill="deepskyblue2",color="navy")+
  labs(y="Absute Freequency",x="suicide nNumber")

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

#Frequency distribution for suicide rate

data %>%
  ggplot(aes(Suicide_rate))+geom_histogram(fill="deepskyblue2",color="navy")+
  labs(y="Absute Freequency",x="suicide Number")

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

data %>%
  ggplot(aes(sex))+geom_bar(fill="deepskyblue2",color="navy")+
  labs(y="Absute Freequency",x="suicide Number")

Year from 1985 to 2016

Suicide_rate_visualize <- data %>%
  ggplot(aes(year,Suicide_rate))+
  geom_point()+
  geom_smooth(method="lm", se=FALSE)+
 labs( title="", x="Year", y="Suicidr Rate per 100,000 for year")

Suicide_rate_visualize

## `geom_smooth()` using formula = 'y ~ x'

#Generation pie chart.

generation_count <- data %>% count(generation)
pie(generation_count$n,labels = generation_count$generation, redious=1,col = c("orange", "green","yellow","blue"),main="Generation")

## Warning in text.default(1.1 * P$x, 1.1 * P$y, labels[i], xpd = TRUE, adj =
## ifelse(P$x < : "redious" is not a graphical parameter
## Warning in text.default(1.1 * P$x, 1.1 * P$y, labels[i], xpd = TRUE, adj =
## ifelse(P$x < : "redious" is not a graphical parameter
## Warning in text.default(1.1 * P$x, 1.1 * P$y, labels[i], xpd = TRUE, adj =
## ifelse(P$x < : "redious" is not a graphical parameter
## Warning in text.default(1.1 * P$x, 1.1 * P$y, labels[i], xpd = TRUE, adj =
## ifelse(P$x < : "redious" is not a graphical parameter
## Warning in text.default(1.1 * P$x, 1.1 * P$y, labels[i], xpd = TRUE, adj =
## ifelse(P$x < : "redious" is not a graphical parameter
## Warning in text.default(1.1 * P$x, 1.1 * P$y, labels[i], xpd = TRUE, adj =
## ifelse(P$x < : "redious" is not a graphical parameter

## Warning in title(main = main, ...): "redious" is not a graphical parameter

#Suicide rate compare with countries to countries

suicide_data_from_1985_2001 %>% group_by(country) %>% summarise(country_suicide_rate_=sum(suicides_no)*100000/sum(population))%>%top_n(25)%>%
  ggplot(aes(reorder(country,country_suicide_rate_),country_suicide_rate_))+
  geom_bar(stat="identity",fill="red",color="navy")+
  coord_flip()+
  labs(x="country", y="Suicide rate per 100000 population")+
  ggtitle("Suicide rates by country")

## Selecting by country_suicide_rate_

suicide_data_from_1985_2001$continent <- countrycode(sourcevar = suicide_data_from_1985_2001[,"country"],origin = "country.name",destination = "continent")

suicide_data_from_1985_2001 %>% group_by(country,continent)%>%
summarise(avg_suicide_rate=mean(Suicide_rate))%>%
  ggplot(aes(continent,avg_suicide_rate))+
  geom_boxplot(fill="red",color="blue")+
  labs(x="continent",y="Suicide rate per 100000 population")+
  ggtitle("Suicide rate by continent")

## `summarise()` has grouped output by 'country'. You can override using the
## `.groups` argument.

Suicite rate by population size.

suicide_data_from_1985_2001 %>%group_by(country,year)%>%
  summarise(pop=mean(population),Suicide_rate=sum(suicides_no)*100000/sum(population),pop=sum(pop)) %>%ungroup()%>%
  group_by(country)%>%
  summarise(pop=sum(pop),Suicide_rate=mean(Suicide_rate))%>%
  ggplot(aes(Suicide_rate,pop))+
  geom_point(fill="red",color="blue")+
  geom_text(data=. %>%filter(Suicide_rate>35 | pop >40000000),
            aes(label = country, col=country),
            position = "dodge")+stat_smooth(method = "lm",color="green",size=1)+
  theme(legend.position = "none")+
  labs(x="Suicide rate",y="Population")+
  ggtitle("suicide Rate by population size")

## `summarise()` has grouped output by 'country'. You can override using the
## `.groups` argument.

## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## `geom_smooth()` using formula = 'y ~ x'

## Warning: Width not defined
## ℹ Set with `position_dodge(width = ...)`

#Effect of nation wealth on suicide rate.

suicide_data_from_1985_2001 %>% group_by(country)%>%
  summarise(Suicide_rate=sum(suicides_no)*100000/sum(population),
            gdp_per_capita=mean(gdp_per_capita),
            pop=sum(as.numeric(population)))%>%
  arrange(desc(gdp_per_capita))%>%
  ggplot(aes(gdp_per_capita,Suicide_rate))+
  geom_point(fill="red",color="navy")+
  stat_smooth(method = "lm", color="green",size=1)+
  geom_text(data=.%>% filter(gdp_per_capita>64000| Suicide_rate>40),aes(gdp_per_capita,Suicide_rate, label = country,col=country))+
  ggtitle("gdp per capita vs suicide Rate")+
  theme(legend.position = "none")

## `geom_smooth()` using formula = 'y ~ x'

# Suicide rate by age group.

level_key_age <- c('1'= "5-14 years",'2'="15-24 years",'3'="25-34 years",'4'="35-54 years",'5'="55-74 years",'6'="75+ years")
suicide_data_from_1985_2001$age <- recode_factor(as.character(suicide_data_from_1985_2001$age),!!!level_key_age)

Suicide rate by age group

suicide_data_from_1985_2001 %>% group_by(age,country) %>%
  summarise(Suicide_rate=sum(suicides_no)*100000/sum(population))%>%
  ggplot(aes(age,Suicide_rate))+
  geom_boxplot(fill="deepskyblue2",col="green")+
  labs(x="age group",y="Suicide rate")+
  ggtitle("Suicide Rate by age group")+
  theme(axis.text.x = element_text(angle=30))

## `summarise()` has grouped output by 'age'. You can override using the `.groups`
## argument.

suicide rate by sex

suicide_data_from_1985_2001 %>% group_by(sex) %>%
  summarise(Suicide_rate=sum(suicides_no)*100000/sum(population))%>%
  
  
  ggplot(aes(reorder(sex,Suicide_rate),Suicide_rate,fill=sex))+
           
  geom_histogram( stat="identity", color="green")+
   
  ggtitle("Suicide Rate by sex")+
  
  scale_color_manual(values = c("deepskyblue2","navyblue"),
                     aesthetics = c("color","fill"))+
                       labs(x="sex",y="Suicide rte 100000 population", fill="sex")

## Warning in geom_histogram(stat = "identity", color = "green"): Ignoring unknown
## parameters: `binwidth`, `bins`, and `pad`

#In the plot we will see how is the happiness rank and happiness score are related. The rankings of national happiness are based on a happiness measurement survey undertaken. Nationally representative samples of respondents are asked to think of a ladder, with the best possible life for them being a 10, and the worst possible life being a 0. They are then asked to rate their own current lives on that 0 to 10 scale.[16] The report correlates the life evaluation results with various life factors. The survey says happiness score and happiness rank are inversely proportional.

happiness_rank_data %>%
  ggplot(aes(Happyness_Rank, Happiness_Score))+
  geom_point()+
  geom_smooth(method="lm", se=FALSE)+
  labs(title = "", x = "Happyness_Rank", y = "Happiness_Score")

## `geom_smooth()` using formula = 'y ~ x'

# In the above plot show the happiness rank and happiness score coorilated

#Now we will find the happiness score by countries and continents.

At first, I create the vecotor to the countries in the continents.

happiness_rank_data <- World_Happiness_Report %>%
  mutate(Continent = case_when( 
  Country %in% c("Afghanistan","Azerbaijan","United Arab Emirates", "Singapore", "Thailand", "Taiwan Province of China", "Qatar","Turkey", "Saudi Arabia", "Kuwait", "Bahrain", "Malaysia", "Uzbekistan", "Japan", "South Korea", "Turkmenistan", "Kazakhstan", "Hong Kong S.A.R., China","Israel", "Philippines", "Jordan", "China", "Pakistan", "Indonesia", "Lebanon", "Vietnam", "Tajikistan", "Bhutan", "Kyrgyzstan", "Nepal", "Mongolia", "Palestinian Territories", "Iran", "Bangladesh", "Myanmar", "Iraq", "Sri Lanka", "Armenia", "India", "Georgia", "Cambodia", "Yemen", "Syria") ~ "Asia",
  
  Country %in%  c( "Finland","Switzerland","Norway","Bulgaria", "Denmark", "Iceland", "Netherlands", "Sweden", "Austria", "Ireland", "Germany", "Belgium", "Luxembourg", "United Kingdom", "Czech Republic", "Malta", "France", "Spain", "Slovakia", "Poland", "Italy", "Russia", "Lithuania", "Latvia", "Moldova", "Romania", "Slovenia", "North Cyprus", "Cyprus", "Estonia", "Belarus", "Serbia", "Hungary", "Croatia", "Kosovo", "Montenegro", "Greece", "Portugal", "Bosnia and Herzegovina", "Macedonia", "Albania", "Ukraine") ~ "Europe",
  
  
  Country %in%  c("United States","Canada", "Costa Rica", "Mexico", "Panama","Trinidad and Tobago", "El Salvador", "Belize", "Guatemala", "Jamaica", "Nicaragua", "Dominican Republic", "Honduras", "Haiti") ~ "North America", 
  
  
  Country %in%  c("Chile", "Argentina", "Uruguay", "Colombia", "Ecuador", "Bolivia", "Peru", "Paraguay", "Venezuela","Brazil") ~ "South America",
  Country %in%  c("New Zealand", "Australia") ~ "Australia",
  
  TRUE ~ "Africa")) %>%
  mutate(Continent = as.factor(Continent)) %>%
  select(Country, Continent, everything()) 

glimpse(happiness_rank_data)

## Rows: 155
## Columns: 13
## $ Country                       <chr> "Norway", "Denmark", "Iceland", "Switzer…
## $ Continent                     <fct> Europe, Europe, Europe, Europe, Europe, …
## $ Happiness.Rank                <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1…
## $ Happiness.Score               <dbl> 7.537, 7.522, 7.504, 7.494, 7.469, 7.377…
## $ Whisker.high                  <dbl> 7.594445, 7.581728, 7.622030, 7.561772, …
## $ Whisker.low                   <dbl> 7.479556, 7.462272, 7.385970, 7.426227, …
## $ Economy..GDP.per.Capita.      <dbl> 1.616463, 1.482383, 1.480633, 1.564980, …
## $ Family                        <dbl> 1.533524, 1.551122, 1.610574, 1.516912, …
## $ Health..Life.Expectancy.      <dbl> 0.7966665, 0.7925655, 0.8335521, 0.85813…
## $ Freedom                       <dbl> 0.6354226, 0.6260067, 0.6271626, 0.62007…
## $ Generosity                    <dbl> 0.36201224, 0.35528049, 0.47554022, 0.29…
## $ Trust..Government.Corruption. <dbl> 0.31596384, 0.40077007, 0.15352656, 0.36…
## $ Dystopia.Residual             <dbl> 2.277027, 2.313707, 2.322715, 2.276716, …

find the factor for happiness score

happiness_rank_data %>%
  select(-Happiness.Rank, -Happiness.Score,-Country, -Continent) %>%
  describe()

## . 
## 
##  9  Variables      155  Observations
## --------------------------------------------------------------------------------
## Whisker.high 
##        n  missing distinct     Info     Mean  pMedian      Gmd      .05 
##      155        0      155        1    5.452    5.451    1.287    3.684 
##      .10      .25      .50      .75      .90      .95 
##    3.962    4.608    5.370    6.195    6.986    7.364 
## 
## lowest : 2.86488 3.07469 3.46143 3.54303 3.58443
## highest: 7.52754 7.56177 7.58173 7.59444 7.62203
## --------------------------------------------------------------------------------
## Whisker.low 
##        n  missing distinct     Info     Mean  pMedian      Gmd      .05 
##      155        0      155        1    5.256     5.25    1.316    3.448 
##      .10      .25      .50      .75      .90      .95 
##    3.680    4.375    5.193    6.007    6.868    7.231 
## 
## lowest : 2.52112 2.73531 3.23657 3.26033 3.39596
## highest: 7.38597 7.41046 7.42623 7.46227 7.47956
## --------------------------------------------------------------------------------
## Economy..GDP.per.Capita. 
##        n  missing distinct     Info     Mean  pMedian      Gmd      .05 
##      155        0      155        1   0.9847        1   0.4802   0.2415 
##      .10      .25      .50      .75      .90      .95 
##   0.3687   0.6634   1.0646   1.3180   1.4860   1.5479 
## 
## lowest : 0         0.0226432 0.0916226 0.0921023 0.119042 
## highest: 1.62634   1.63295   1.69228   1.74194   1.87077  
## --------------------------------------------------------------------------------
## Family 
##        n  missing distinct     Info     Mean  pMedian      Gmd      .05 
##      155        0      155        1    1.189     1.22   0.3106   0.6213 
##      .10      .25      .50      .75      .90      .95 
##   0.7814   1.0426   1.2539   1.4143   1.4856   1.5215 
## 
## lowest : 0        0.396103 0.431883 0.4353   0.512569
## highest: 1.5482   1.54897  1.55112  1.55823  1.61057 
## --------------------------------------------------------------------------------
## Health..Life.Expectancy. 
##        n  missing distinct     Info     Mean  pMedian      Gmd      .05 
##      155        0      155        1   0.5513   0.5673   0.2677   0.1118 
##      .10      .25      .50      .75      .90      .95 
##   0.1925   0.3699   0.6060   0.7230   0.8273   0.8448 
## 
## lowest : 0          0.00556475 0.0187727  0.0411347  0.0486422 
## highest: 0.888961   0.900214   0.913476   0.943062   0.949492  
## --------------------------------------------------------------------------------
## Freedom 
##        n  missing distinct     Info     Mean  pMedian      Gmd      .05 
##      155        0      155        1   0.4088   0.4187   0.1691   0.1179 
##      .10      .25      .50      .75      .90      .95 
##   0.2007   0.3037   0.4375   0.5166   0.5874   0.6133 
## 
## lowest : 0         0.0149959 0.0303699 0.0599008 0.0815394
## highest: 0.626007  0.627163  0.633376  0.635423  0.658249 
## --------------------------------------------------------------------------------
## Generosity 
##        n  missing distinct     Info     Mean  pMedian      Gmd      .05 
##      155        0      155        1   0.2469   0.2378   0.1482  0.05149 
##      .10      .25      .50      .75      .90      .95 
##  0.08534  0.15411  0.23154  0.32376  0.42829  0.48970 
## 
## lowest : 0         0.0101647 0.0288068 0.03221   0.0437854
## highest: 0.500005  0.572123  0.574731  0.611705  0.838075 
## --------------------------------------------------------------------------------
## Trust..Government.Corruption. 
##        n  missing distinct     Info     Mean  pMedian      Gmd      .05 
##      155        0      155        1   0.1231   0.1025   0.1047  0.02072 
##      .10      .25      .50      .75      .90      .95 
##  0.03213  0.05727  0.08985  0.15330  0.28256  0.33724 
## 
## lowest : 0          0.0043879  0.00896482 0.0100913  0.0110515 
## highest: 0.384399   0.40077    0.439299   0.45522    0.464308  
## --------------------------------------------------------------------------------
## Dystopia.Residual 
##        n  missing distinct     Info     Mean  pMedian      Gmd      .05 
##      155        0      155        1     1.85    1.853   0.5526    1.056 
##      .10      .25      .50      .75      .90      .95 
##    1.316    1.591    1.833    2.145    2.488    2.731 
## 
## lowest : 0.377914 0.419389 0.540061 0.554633 0.62113 
## highest: 2.80781  2.83715  2.89389  2.89864  3.11748 
## --------------------------------------------------------------------------------

In this bellow visualiztion,find the main factory of happiness.

happiness_Correlation <- cor(happiness_rank_data[c(3:10)])
corrplot(happiness_Correlation, method = "pie", type = "upper", order = "FPC",
         col = brewer.pal(n = 7, name = "GnBu"),
         tl.col = "black", cl.align = "r", cl.ratio = 0.3)

# In the plot we see the what is related with people happiness.

happiness_rank_data %>%
 ggplot(aes(Continent, Happiness.Score, color = Continent)) +
  geom_violin() +
  theme_fivethirtyeight() +
  theme(legend.position = "none", plot.title = element_text(hjust = 0.5, vjust = 0.3)) +
  labs(title = "Happiness Score by Continent", 
       x = " ",
       y = "Happiness Score")

family relation

happiness_rank_data %>%
  ggplot(aes(Family, Happiness.Score)) +
  geom_point(aes(color = Continent), size = 3, alpha = 0.8) +
  geom_smooth(aes(color = Continent, fill = Continent), method = "lm", fullrange = TRUE) +
  facet_wrap(~ Continent) + 
  theme_fivethirtyeight() +
  ggtitle("Family")

## `geom_smooth()` using formula = 'y ~ x'

## Warning in qt((1 - level)/2, df): NaNs produced

## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf

happiness_rank_data %>%
  ggplot(aes(Health..Life.Expectancy., Happiness.Score)) +
  geom_point(aes(color = Continent), size = 3, alpha = 0.8) +
  geom_smooth(aes(color = Continent, fill = Continent), method = "lm", fullrange = TRUE) +
  facet_wrap(~ Continent) + 
  theme_fivethirtyeight() +
  ggtitle("Health Life Expectancy.")

## `geom_smooth()` using formula = 'y ~ x'

## Warning in qt((1 - level)/2, df): NaNs produced

## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf

happiness_rank_data <- World_Happiness_Report %>%
  mutate(Country= case_when(
           
           Country %in%  c("United States") ~"United States" ,
                      TRUE ~ "United States"))%>%
  mutate(Country = as.factor(Country))

glimpse(happiness_rank_data)

## Rows: 155
## Columns: 12
## $ Country                       <fct> United States, United States, United Sta…
## $ Happiness.Rank                <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1…
## $ Happiness.Score               <dbl> 7.537, 7.522, 7.504, 7.494, 7.469, 7.377…
## $ Whisker.high                  <dbl> 7.594445, 7.581728, 7.622030, 7.561772, …
## $ Whisker.low                   <dbl> 7.479556, 7.462272, 7.385970, 7.426227, …
## $ Economy..GDP.per.Capita.      <dbl> 1.616463, 1.482383, 1.480633, 1.564980, …
## $ Family                        <dbl> 1.533524, 1.551122, 1.610574, 1.516912, …
## $ Health..Life.Expectancy.      <dbl> 0.7966665, 0.7925655, 0.8335521, 0.85813…
## $ Freedom                       <dbl> 0.6354226, 0.6260067, 0.6271626, 0.62007…
## $ Generosity                    <dbl> 0.36201224, 0.35528049, 0.47554022, 0.29…
## $ Trust..Government.Corruption. <dbl> 0.31596384, 0.40077007, 0.15352656, 0.36…
## $ Dystopia.Residual             <dbl> 2.277027, 2.313707, 2.322715, 2.276716, …

happiness_rank_data %>%
  ggplot(aes(Family, Happiness.Score)) +
  geom_point(aes(color = Country), size = 1, alpha = 0.8) +
  geom_smooth(aes(color = Country, fill = Country), method = "lm", fullrange = TRUE) +
  facet_wrap(~Country) + 
  theme_fivethirtyeight() +
  ggtitle("Family")

## `geom_smooth()` using formula = 'y ~ x'

happiness_rank_data %>%
  ggplot(aes(Economy..GDP.per.Capita., Happiness.Score)) +
  geom_point(aes(color = Country), size = 1, alpha = 0.8) +
  geom_smooth(aes(color = Country, fill = Country), method = "lm", fullrange = TRUE) +
  facet_wrap(~Country) + 
  theme_fivethirtyeight() +
  ggtitle("Economy..GDP.per.Capita.")

## `geom_smooth()` using formula = 'y ~ x'

happiness_rank_data %>%
  ggplot(aes(Health..Life.Expectancy., Happiness.Score)) +
  geom_point(aes(color = Country), size = 1, alpha = 0.8) +
  geom_smooth(aes(color = Country, fill = Country), method = "lm", fullrange = TRUE) +
  facet_wrap(~Country) + 
  theme_fivethirtyeight() +
  ggtitle("Health..Life.Expectancy.")

## `geom_smooth()` using formula = 'y ~ x'

happiness_rank_data %>%
  ggplot(aes(Trust..Government.Corruption., Happiness.Score)) +
  geom_point(aes(color = Country), size = 1, alpha = 0.8) +
  geom_smooth(aes(color = Country, fill = Country), method = "lm", fullrange = TRUE) +
  facet_wrap(~Country) + 
  theme_fivethirtyeight() +
  ggtitle("Trust..Government.Corruption.")

## `geom_smooth()` using formula = 'y ~ x'

happiness_rank_data %>%
  ggplot(aes(Freedom, Happiness.Score)) +
  geom_point(aes(color = Country), size = 1, alpha = 0.8) +
  geom_smooth(aes(color = Country, fill = Country), method = "lm", fullrange = TRUE) +
  facet_wrap(~Country) + 
  theme_fivethirtyeight() +
  ggtitle("Freedom")

## `geom_smooth()` using formula = 'y ~ x'

Conclusion: In the research I used two set of data. One data set was about the suicide survey and another dataset is about the posible reason why people end their fives.In the project I download data from keggle website and upload to my github repository and read data from there. Then I clean the data and make it tidy. In my analysis, I research who age and gender pepole commit more suicide. Also, I analyses which continent and country people commit more suicide. I have found that large amount of people over age 75+ years and men commit more suicide. There are few reason what make them people unhappy like family relation,economy,freedom and more.

===================================================source==================================================

https://www.kaggle.com/datasets/unsdsn/world-happiness/data

final_project

Md Asaduzzaman

2024-12-14