Analysis of mental health with R source -> kaggle

CONTENTS

->Mental health treatment by occupation

->Impact of Family History

->Self-Employment and Mental Health

->Mental Health and Lifestyle

->Awareness of Care Options

->Mental Health and Job Interviews

->Geographical Differences

->Chi-Square Test of Independence

->Logistic Regression

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats   1.0.0     ✔ readr     2.1.5
## ✔ ggplot2   3.4.4     ✔ stringr   1.5.1
## ✔ lubridate 1.9.3     ✔ tibble    3.2.1
## ✔ purrr     1.0.2     ✔ tidyr     1.3.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggplot2)
df <- read_csv("https://kiycoesgtwolawkyqacm.supabase.co/storage/v1/object/public/public/output/cleaned-data-07b327.csv")
## Rows: 287162 Columns: 17
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (17): Timestamp, Gender, Country, Occupation, self_employed, family_hist...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
df <- na.omit(df)
table(df$Occupation, df$treatment)
##            
##                No   Yes
##   Business  24307 24945
##   Corporate 30278 29879
##   Housewife 32388 32785
##   Others    25538 26355
##   Student   29907 30780

Mental health treatment by Ocuppation

ggplot(df, aes(x=Occupation, fill=treatment)) +
  geom_bar(position="dodge") +
  labs(x="Occupation", y="Count", 
       fill="Treatment", title="fig 1.0: Mental Health by Occupation")

# Group the data by 'family_history' and 'treatment' and count the number of occurrences
family_history_impact <- df %>%
  group_by(family_history, treatment) %>%
  summarise(counts = n())
## `summarise()` has grouped output by 'family_history'. You can override using
## the `.groups` argument.
# Create a bar plot using ggplot2
ggplot(family_history_impact, aes(x=family_history, y=counts, fill=treatment)) +
  geom_bar(stat="identity", position=position_dodge()) +
  labs(title="Fig 2.0: Impact of Family History on Mental Health Treatment",
       x="Family History",
       y="Count",
       fill="Treatment")

This plot can provide insights into the impact of family history on the likelihood of an individual undergoing treatment for mental health issues.

By looking at the plot, we can compare the counts of individuals undergoing treatment versus not undergoing treatment, within the groups of individuals with and without a family history of mental health issues

Self-Employment and Mental Health

# Group the data by 'self_employed' and 'treatment' and count the number of occurrences
self_employed_impact <- df %>%
  group_by(self_employed, treatment) %>%
  summarise(counts = n())
## `summarise()` has grouped output by 'self_employed'. You can override using the
## `.groups` argument.
# Create a bar plot using ggplot2
ggplot(self_employed_impact, aes(x=self_employed, y=counts, fill=treatment)) +
  geom_bar(stat="identity", position=position_dodge()) +
  labs(title="Fig3.0: Impact of Self-Employment on Mental Health Treatment",
       x="Self-Employed",
       y="Count",
       fill="Treatment")

The above plot provide insights into the impact of self-employment on the likelihood of an individual undergoing treatment for mental health issues. It help us understand if self-employed individuals are more or less likely to seek treatment compared to those who are not self-employed.

LifeStyle and Treatment

To analyze the impact of lifestyle on mental health, we can consider the ‘Days_Indoors’ and ‘Growing_Stress’ columns as indicators of lifestyle. Here is the R code to create a bar plot using ggplot2. This plot shows the impact of days spent indoors and growing stress on mental health treatment.

# Group the data by 'Days_Indoors' and 'treatment' and count the number of occurrences
indoors_impact <- df %>%
  group_by(Days_Indoors, treatment) %>%
  summarise(counts = n())
## `summarise()` has grouped output by 'Days_Indoors'. You can override using the
## `.groups` argument.
# Create a bar plot using ggplot2
ggplot(indoors_impact, aes(x=Days_Indoors, y=counts, fill=treatment)) +
  geom_bar(stat="identity", position=position_dodge()) +
  labs(title="Fig4.0: Impact of Days Indoors on Mental Health Treatment",
       x="Days Indoors",
       y="Count",
       fill="Treatment")

Awareness of Care Options

# Group the data by 'care_options' and count the number of occurrences
care_options <- df %>%
  group_by(care_options) %>%
  summarise(counts = n())

# Create a bar plot using ggplot2
ggplot(care_options, aes(x=care_options, y=counts)) +
  geom_bar(stat="identity", fill="steelblue") +
  labs(title="fig5.0: Awareness of Care Options",
       x="Care Options",
       y="Count")

Mental Health and Job Interviews

# Group the data by 'mental_health_interview' and count the number of occurrences
interview_mental_health <- df %>%
  group_by(mental_health_interview) %>%
  summarise(counts = n())

# Create a bar plot using ggplot2
ggplot(interview_mental_health, aes(x=mental_health_interview, y=counts)) +
  geom_bar(stat="identity", fill="steelblue") +
  labs(title="Mental Health and Job Interviews",
       x="Willingness to Discuss Mental Health",
       y="Count")

By looking at the plot, we can see the distribution of individuals’ willingness to discuss mental health issues in a job interview. This can provide insights into the stigma associated with mental health in the workplace and how comfortable individuals feel discussing these issues with potential employers.

# Group the data by 'Country' and count the number of occurrences
country_distribution <- df %>%
  group_by(Country) %>%
  summarise(counts = n())

# Create a bar plot using ggplot2
ggplot(country_distribution, aes(x=Country, y=counts)) +
  geom_bar(stat="identity", fill="steelblue") +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
  labs(title="Geographical Distribution",
       x="Country",
       y="Count")

Chi-sqaured test for independence

Please note that this is a statistical test and it does not imply causation. Further investigation would be needed to understand the nature of this association.

# Load necessary library


# Create a contingency table
contingency_table <- table(df$Country, df$treatment)

# Perform the Chi-Square test
chi_squared_test <- chisq.test(contingency_table)

# Print the test result
print(chi_squared_test)
## 
##  Pearson's Chi-squared test
## 
## data:  contingency_table
## X-squared = 19833, df = 34, p-value < 2.2e-16

The Chi-Square statistic is approximately 19833 and the p-value is p-value < 2.2e-16. This suggests that there is a significant association between ‘Country’ and ‘treatment’. In other words, the likelihood of seeking treatment may depend on the country.

Logistic Regression

#Convert categorical variables to factors
df$Country <- as.factor(df$Country)
df$self_employed <- as.factor(df$self_employed)
df$family_history <- as.factor(df$family_history)
# Recode 'treatment' to be binary
df$treatment <- ifelse(df$treatment == 'Yes', 1, 0)
# Fit the model
model <- glm(treatment ~ Country + self_employed + family_history, data = df, family = binomial)

# Summary of the model
summary(model)
## 
## Call:
## glm(formula = treatment ~ Country + self_employed + family_history, 
##     family = binomial, data = df)
## 
## Coefficients:
##                                 Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                    -0.301949   0.028567 -10.570  < 2e-16 ***
## CountryBelgium                -17.264119 137.820075  -0.125 0.900313    
## CountryBosnia and Herzegovina -18.852700 200.328980  -0.094 0.925023    
## CountryBrazil                  -1.071497   0.054504 -19.659  < 2e-16 ***
## CountryCanada                  -0.069314   0.032741  -2.117 0.034253 *  
## CountryColombia               -17.264119 200.328980  -0.086 0.931324    
## CountryCosta Rica             -17.264119 200.328980  -0.086 0.931324    
## CountryCroatia                 17.623498 200.328980   0.088 0.929898    
## CountryCzech Republic         -18.852700 200.328980  -0.094 0.925023    
## CountryDenmark                 16.034918 141.653982   0.113 0.909874    
## CountryFinland                -17.264119 200.328980  -0.086 0.931324    
## CountryFrance                 -17.971241  73.297069  -0.245 0.806314    
## CountryGeorgia                -17.264119 200.328980  -0.086 0.931324    
## CountryGermany                 -0.264151   0.042199  -6.260 3.86e-10 ***
## CountryGreece                 -17.264119 141.653982  -0.122 0.902998    
## CountryIndia                   -0.887406   0.051812 -17.127  < 2e-16 ***
## CountryIreland                 -0.493588   0.040506 -12.185  < 2e-16 ***
## CountryIsrael                 -17.264119 100.164493  -0.172 0.863156    
## CountryItaly                  -17.264119 100.164493  -0.172 0.863156    
## CountryMexico                 -17.508639 200.328981  -0.087 0.930354    
## CountryMoldova                 17.868017 200.328980   0.089 0.928928    
## CountryNetherlands             -1.168502   0.040891 -28.576  < 2e-16 ***
## CountryNew Zealand              1.240342   0.064968  19.092  < 2e-16 ***
## CountryNigeria                -17.264119 200.328980  -0.086 0.931324    
## CountryPhilippines            -17.264119 200.328980  -0.086 0.931324    
## CountryPoland                  17.755593 137.538185   0.129 0.897282    
## CountryPortugal               -17.264119 200.328980  -0.086 0.931324    
## CountryRussia                 -17.264119 141.653982  -0.122 0.902998    
## CountrySingapore              -17.264119 141.653982  -0.122 0.902998    
## CountrySouth Africa            -0.216576   0.057304  -3.779 0.000157 ***
## CountrySweden                  -1.168519   0.052493 -22.261  < 2e-16 ***
## CountrySwitzerland             -0.969019   0.059879 -16.183  < 2e-16 ***
## CountryThailand               -17.508639 200.328981  -0.087 0.930354    
## CountryUnited Kingdom          -0.229255   0.029867  -7.676 1.64e-14 ***
## CountryUnited States           -0.211023   0.028808  -7.325 2.39e-13 ***
## self_employedYes                0.244520   0.014631  16.713  < 2e-16 ***
## family_historyYes               1.588580   0.008741 181.735  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 398072  on 287161  degrees of freedom
## Residual deviance: 335596  on 287125  degrees of freedom
## AIC: 335670
## 
## Number of Fisher Scoring iterations: 16
# Visualize the model