Abstract

In this analysis, I use data from SOEP to explore the link between TV consumption and happiness. I explore decriptive statistics and plot a heatmap. The analysis shows that people that consume more TV are, on average, happier than those that watch less TV. However, the observed link does not imply causality due to possible confounding factors.

Introduction

In this analysis, we use data from the Socio-Economic Panel (SOEP) to examine the relationship between happiness and the hours of television that respondents watch ¹. The motivation for the analysis is that maximizing happiness is the key goal of human existence (Cummins, 2012; Singh et al., 2023).

Humans derive happiness from a diverse range activities and life situations. Among the variables associated with happiness include mental, emotional and physical health (Zhang & Chen, 2019), work-life balance, nurturing social relationships for self and others and the harmony with one’s culture, traditions, community, religion, and environment (Singh et al., 2023). The extent to which each of these variables affects happiness varies with individuals (Ali et al., 2020).

Code

## Install required packages 
if (!require(pacman)) {
  install.packages("pacman")
}

p_load(
  tidyverse, performance,
  janitor, GGally, psych,
  skimr, gt, kableExtra, styler,
  haven, doParallel, naniar,
  ggthemes, Amelia, plotly,
  nnet, patchwork
)

## Create a parallel computing cluster
cl <- makePSOCKcluster(3)
registerDoParallel(cl)

## Set display options
options(scipen = 999)
options(digits = 3)

## Set a theme for plots
theme_set(ggthemes::theme_clean() +
  theme(axis.text = element_text()))

At the macro-level, the drivers of happiness as captured in the World Happiness Report 2023 include social support, income (per capita income), health (access to quality health care), independence (personal freedoms), generosity, and lack of corruption (Clark et al., 2017; Helliwell et al., 2023). The report ranks Scandinavian countries as the happiest with poor, conflict and corruption prone countries like Afghanistan and Somalia at the bottom ².

We define the terms happiness and TV consumption as follows.

Happiness: I follow the Oxford dictionary definition of happiness. The state of being satisfied that something is good or right. Happiness is hence synonymous with satisfaction.
TV Consumption: How often a respondent watches television in a week. This metric is categorical varying from watching TV Daily to never watching TV.

Main

In this section, I load and clean the data and perform data exploration and analysis.

Tip

Please visit my rpubs site to see more data projects. Alternatively, copy and paste the link <www.rpubs.com/Karuitha> into your browser. You can also view my linkedin site for my skills and education.

My Tableau public profile contains my data visualizations.

My Shiny web apps are available on this site. You can copy-paste this web address instead https://karuitha.shinyapps.io

Tip

Skills & Technologies Applied: R, GGPLOT2, Quarto, Data Science.

Data

As noted earlier, the data comes from SOEP. It is from this data that I pick the variables of interest. Again, the analysis explores the relationship between happiness and television consumption. Consequently, happiness and television consumption are the primary variables. However, I also include some control variables. I define these variables next.

Variables

I select the variables defined in Table 1 below based on the review of literature. Researchers have linked mental, emotional and physical health (Zhang & Chen, 2019), work-life balance, nurturing social relationships for self and others and the harmony with one’s culture, traditions, community, religion, and environment (Singh et al., 2023) to happiness.

Code

tribble(
  ~Variable, ~Description,
  "happy", "Frequency of being happy in the last 4 weeks from 1: very rare to 5: very often",
  "tv", "Hours per week on tv from 1: Daily to 5: Never",
  "mcs", "Summary Scale Mental: Mental Health Indicator", "pcs", "Summary Scale Physical: Physical Health Indicator", "sf_nbs", "Social functioning: Indicator of the state of social life", "sex_v1", "Gender of Individual from 1: Male to 2: Female", "income", "Gross salary as employee Amount previous year"
) %>%
  kbl(caption = "Variables Description") %>%
  kable_classic(full_width = FALSE) %>%
  kableExtra::add_footnote(label = "Source: https://paneldata.org/soep-core/",
               notation = "alphabet")

Variables Description
Variable	Description
happy	Frequency of being happy in the last 4 weeks from 1: very rare to 5: very often
tv	Hours per week on tv from 1: Daily to 5: Never
mcs	Summary Scale Mental: Mental Health Indicator
pcs	Summary Scale Physical: Physical Health Indicator
sf_nbs	Social functioning: Indicator of the state of social life
sex_v1	Gender of Individual from 1: Male to 2: Female
income	Gross salary as employee Amount previous year
^a Source: https://paneldata.org/soep-core/

Data Management

I read into R the following data sets that contain the variables of interest.

jugendl.
Health.
pl.

Code

jugendl <- read_dta("jugendl.dta")
health <- read_dta("health.dta")
pl <- read_dta("pl.dta")

Next, I clean up the data by coding missing values and extracting relevant values. We start with jugendal data.

Code

dt1 <- jugendl %>%
  select(pid, jl0383, sex_v1, jl0088) %>%
  ## Extract codes from happiness and sex
  mutate(jl0383 = str_extract(jl0383, "^-?\\d{1}$")) %>%
  mutate(sex_v1 = str_extract(sex_v1, "^-?\\d{1}$")) %>%
  ## Rename column jl0383 to happy
  rename(happy = jl0383, tv1 = jl0088) %>%
  ## Convert to numeric
  mutate(
    pid = as.numeric(pid),
    happy = as.numeric(happy),
    sex_v1 = as.numeric(sex_v1)
  ) %>%
  ## Code missing values
  mutate(sex_v1 = case_when(
    sex_v1 %in% c(1, 2) ~ sex_v1,
    .default = NA
  )) %>%
  mutate(happy = case_when(
    happy >= 1 ~ happy,
    .default = NA
  )) %>%
  ## Fill up sex as it barely changes
  group_by(pid) %>%
  fill(sex_v1, .direction = "updown") %>%
  fill(sex_v1, .direction = "downup") %>%
  ungroup() %>%
  ## Filter for non-missing data
  filter(happy >= 1) %>%
  ## Convert variables to factors
  mutate(sex_v1 = factor(sex_v1,
    levels = c(1, 2),
    labels = c("Male", "Female")
  )) %>%
  mutate(happy = factor(happy,
    levels = 1:5,
    labels = c(
      "Very Rare", "Rare",
      "Sometime", "Often",
      "Very Often"
    )
  )) %>%
  group_by(pid) %>%
  fill(happy, .direction = "updown")

head(dt1) %>%
  kbl(caption = "Jugendl Dataset Extract") %>%
  kable_classic(full_width = FALSE) %>%
  add_footnote(label = "Source: https://paneldata.org/soep-core/",
               notation = "alphabet")

Jugendl Dataset Extract
pid	happy	sex_v1	tv1
13904	Often	NA	-8
13903	Very Often	NA	-8
893403	Often	Female	-8
950105	Often	NA	-8
950104	Very Often	NA	-8
19205	Very Often	NA	-8
^a Source: https://paneldata.org/soep-core/

Next, I extract some variables from the health data.

Code

dt2 <- health %>%
  select(pid, mcs, pcs, sf_nbs, gh_nbs)

head(dt2) %>%
  kbl(caption = "Health Dataset Extract") %>%
  kable_classic(full_width = FALSE) %>%
  add_footnote(label = "Source: https://paneldata.org/soep-core/",
               notation = "alphabet")

Health Dataset Extract
pid	mcs	pcs	sf_nbs	gh_nbs
901	-8.0	-8.0	-8.0	-8.0
901	-8.0	-8.0	-8.0	-8.0
901	-8.0	-8.0	-8.0	-8.0
901	-8.0	-8.0	-8.0	-8.0
901	-8.0	-8.0	-8.0	-8.0
901	68.2	29.7	57.1	45.6
^a Source: https://paneldata.org/soep-core/

Next, I get the data from the pl set of data.

Code

## TV usage per week ----
dt3 <- pl %>%
  select(pid, pli0083, plb0471_h) %>%
  rename(tv = pli0083, income = plb0471_h) %>%
  ## Extract the code
  mutate(tv = str_extract(tv, "^-?\\d{1}$")) %>%
  ## Extract the salary
  mutate(income = str_extract(income, "^-?\\d{1,5}$")) %>%
  ## Convert tv to numeric
  mutate(tv = parse_number(tv)) %>%
  mutate(
    pid = as.numeric(pid),
    income = as.numeric(income)
  ) %>%
  mutate(tv = case_when(
    tv < 1 ~ NA,
    .default = tv
  )) %>%
  mutate(income = case_when(
    income < 1 ~ NA,
    .default = income
  )) %>%
  filter(tv >= 1, !is.na(tv)) %>%
  ## Convert to factor
  mutate(tv = factor(tv,
    levels = 1:5,
    labels = c(
      "Daily",
      "Weekly",
      "Monthly",
      "Rarely",
      "Never"
    )
  ))

head(dt1) %>%
  kbl(caption = "Pl Dataset Extract") %>%
  kable_classic(full_width = FALSE) %>%
  add_footnote(label = "Source: https://paneldata.org/soep-core/",
               notation = "alphabet")

Pl Dataset Extract
pid	happy	sex_v1	tv1
13904	Often	NA	-8
13903	Very Often	NA	-8
893403	Often	Female	-8
950105	Often	NA	-8
950104	Very Often	NA	-8
19205	Very Often	NA	-8
^a Source: https://paneldata.org/soep-core/

Code

final_data <- dt2 %>%
  left_join(dt3, by = join_by(pid)) %>%
  left_join(dt1, by = join_by(pid)) %>%
  drop_na(tv, happy)

# head(final_data)

Code

final_data %>%
  Amelia::missmap()

Sample

I pick a sample from the observations with complete cases for both TV consumption and happiness. These two are the main variables and hence it is sensible to pick a sample that has these observations in full.

Data Visualizations and Data Analysis

Summary Statistics

In this section, I summarize the data. I start by doing summary statistics for the entire data set.

Numeric Summary

Code

final_data %>%
  select(
    where(is.numeric), -pid,
    -starts_with("syear")
  ) %>%
  skimr::skim_without_charts() %>%
  select(-n_missing, -skim_type) %>%
  rename(
    Mean = numeric.mean,
    SD = numeric.sd,
    Min = numeric.p0,
    Q1 = numeric.p25,
    Median = numeric.p50,
    Q3 = numeric.p75,
    Max = numeric.p100,
    Variable = skim_variable
  ) %>%
  kbl(caption = "Summary Statistics") %>%
  kable_classic(full_width = FALSE) %>%
  add_footnote(label = "Source: https://paneldata.org/soep-core/",
               notation = "alphabet")

Summary Statistics
Variable	complete_rate	Mean	SD	Min	Q1	Median	Q3	Max
mcs	1.000	17.5	27.6	-8	-8	-1	48.6	68.5
pcs	1.000	20.2	30.3	-8	-8	-1	55.4	73.8
sf_nbs	1.000	18.3	28.4	-8	-8	-1	57.1	57.1
gh_nbs	1.000	19.8	30.2	-8	-8	-1	56.0	66.4
income	0.452	1344.8	1034.2	30	580	950	2000.0	6000.0
tv1	1.000	-8.0	0.0	-8	-8	-8	-8.0	-8.0
^a Source: https://paneldata.org/soep-core/

Non-Numeric Summary

Code

final_data %>%
  select(where(is.factor)) %>%
  skimr::skim_without_charts() %>%
  select(-n_missing, -skim_type) %>%
  kbl(caption = "Summary Statistics") %>%
  kable_classic(full_width = FALSE) %>%
  add_footnote(label = "Source: https://paneldata.org/soep-core/",
               notation = "alphabet")

Summary Statistics
skim_variable	complete_rate	factor.ordered	factor.n_unique	factor.top_counts
tv	1.000	FALSE	5	Dai: 8657, Wee: 3129, Mon: 518, Rar: 424
happy	1.000	FALSE	5	Oft: 6857, Ver: 3319, Som: 2060, Rar: 422
sex_v1	0.089	FALSE	2	Fem: 641, Mal: 506
^a Source: https://paneldata.org/soep-core/

Summary Chart

Code

final_data %>%
  select(
    -pid, -tv1,
    -starts_with("syear")
  ) %>%
  GGally::ggpairs(mapping = aes(fill = happy))

Let us examine further the extent of happiness and how much TV respondents in the sample watch. We see that a lot more people watch TV daily compared to those that never watch TV. Proportionately, people who watch TV report a higher relative incidence of experiencing happiness (very often, often, sometimes). However, of people who experience happiness often, the larger proportion never watch TV. Panel C and D also tell the same story with happy people watching TV more frequently.

Code

(final_data %>%
  ggplot(mapping = aes(x = tv)) +
  geom_bar(show.legend = FALSE) +
  labs(
    x = "", y = "Frequency",
    title = "Watching TV"
  ) +
  final_data %>%
  ggplot(mapping = aes(x = tv, fill = happy)) +
  geom_bar(position = "fill") +
  labs(
    x = "", y = "",
    title = "Watching TV"
  ) +
  scale_fill_colorblind()) /

  (final_data %>%
    ggplot(mapping = aes(x = happy)) +
    geom_bar(show.legend = FALSE) +
    labs(
      x = "", y = "Frequency",
      title = "Happiness"
    ) +
    final_data %>%
    ggplot(mapping = aes(x = happy, fill = tv)) +
    geom_bar(position = "fill") +
    labs(
      x = "", y = "",
      title = "Happiness"
    ) +
    scale_fill_colorblind())

Happiness vs TV Consumption

The heat map and numerical summary show the relationship between watching TV and happiness. The chart shows that people that express more happiness tend to watch mote TV than people that do not. The opposite is also true. This relationship does not imply causality as there could be some underlying factors that drive or mitigate this relationship.

Chart

Code

final_data %>%
  count(happy, tv) %>%
  ggplot(mapping = aes(
    y = happy, x = tv,
    fill = n
  )) +
  geom_tile() +
  scale_fill_gradient(
    low = "blue",
    high = "red"
  ) +
  labs(
    y = "Happiness",
    x = "Watching TV",
    title = "Happiness and Watching TV",
    subtitle = "The graph shows that happy people watch more TV.\nThis does not imply causality.",
    caption = "Data Source: https://paneldata.org/soep-core/"
  )

Table

Code

final_data %>%
  count(happy, tv) %>%
  kbl(caption = "Happiness and TV Consumption") %>%
  kable_classic(full_width = FALSE) %>%
  footnote(general = "Source: Author's Construction Using Data from https://paneldata.org/soep-core/")

Happiness and TV Consumption
happy	tv	n
Very Rare	Daily	67
Very Rare	Weekly	78
Very Rare	Monthly	7
Very Rare	Rarely	4
Very Rare	Never	6
Rare	Daily	252
Rare	Weekly	83
Rare	Monthly	47
Rare	Rarely	29
Rare	Never	11
Sometime	Daily	1329
Sometime	Weekly	557
Sometime	Monthly	75
Sometime	Rarely	89
Sometime	Never	10
Often	Daily	4688
Often	Weekly	1687
Often	Monthly	282
Often	Rarely	179
Often	Never	21
Very Often	Daily	2321
Very Often	Weekly	724
Very Often	Monthly	107
Very Often	Rarely	123
Very Often	Never	44
Note:
Source: Author's Construction Using Data from https://paneldata.org/soep-core/

Conclusion

In this analysis, I have examined the relationship between happiness and the extent to which people watch TV. The analysis shows that people that watch more TV are, on average, happier than those people that do not watch as much TV. However, this relationship does not imply causality as there could be other underlying factors behind this relationship. Only an experimental approach could reasonably attribute such causality.

Code

# styler::style_file("Final_juan_david.Rmd")
Sys.info()

                                                           sysname 
                                                           "Linux" 
                                                           release 
                                          "6.4.6-76060406-generic" 
                                                           version 
"#202307241739~1692717645~22.04~5597803 SMP PREEMPT_DYNAMIC Tue A" 
                                                          nodename 
                                                          "pop-os" 
                                                           machine 
                                                          "x86_64" 
                                                             login 
                                                        "karuitha" 
                                                              user 
                                                        "karuitha" 
                                                    effective_user 
                                                        "karuitha"

References

Ali, S., Murshed, S. M., & Papyrakis, E. (2020). Happiness and the resource curse. Journal of Happiness Studies, 21, 437–464.

Clark, A. E., Flèche, S., Layard, R., Powdthavee, N., & Ward, G. (2017). The key determinants of happiness and misery.

Cummins, R. A. (2012). The determinants of happiness. International Journal of Happiness and Development, 1(1), 86–101. https://doi.org/10.1504/IJHD.2012.050833

Helliwell, J. F., Layard, R., Sachs, J. D., De Neve, J.-E., Aknin, L. B., & Wang, S. (2023). World happiness report. United Nations.

Singh, S., Kshtriya, S., & Valk, R. (2023). Health, hope, and harmony: A systematic review of the determinants of happiness across cultures and countries. International Journal of Environmental Research and Public Health, 20(4), 3306.

Zhang, Z., & Chen, W. (2019). A systematic review of the relationship between physical activity and happiness. Journal of Happiness Studies, 20(4), 1305–1322.

Footnotes

The description of the variables is available on this site or link http://companion.soep.de/Topics%20of%20SOEPcore/Family%20and%20Social%20Networks.html ↩︎
You can access the country-level happiness ranking on this site or link https://www.theglobaleconomy.com/rankings/happiness/↩︎