Introduction:

Recent work suggests that sedentary activity, such as TV watching, is associated with negative changes in many aspects of health including cardiovascular, bone health and cellular function [2]. Television use in particular has been linked with greater risk for obesity and Type2 diabetes, heart diseases, lower life satisfaction, less frequent engagement in social and physical interaction, and increased risk for dementia [2,3].

The researchers concluded that increasing public awareness of alternatives to TV watching and reducing barriers to alternative activities that are more socially and physically engaging could reduce TV use by older people and diminish the potential for associated negative health effects [3].

But is it really the case that older people have a higher TV consumption? Do the GSS data support those statistics? This study will try to answer the question whether retired people watch more TV than not-retired.

Data:

Data Collection

The data was collected by the General Society Survey (GSS) between 1972 and 2012 by means of questionnaire. Most surveys were administered in person. Computer assisted interviews have been used since 2002 and some interviews were also conducted by phone [1]. The target sample size for each year the survey was administered is 1500. GSS has been administered to two sample sizes since 1994.

Cases

Each data-case is the result of an interview with a resident of the Unites States (observational unit). The unis of observations were adults living in households within the United States.

Variables

This study is investigating the relationship between the following two variables of the GSS dataset:

  1. tvhours - average hours per day watching TV (numerical variable)

  2. wrkstat - labor force status (categorical variable)

The variable wrkstat comprises 8 levels. For the purpose of this study, these levels were summarized to two levels (retired, not_retired) of a new categorical variable: retirementStatus.

Type of Study

This is an observational study. It is based on the dataset of the General Society Survey (GSS). for this study 200 cases of retired and 200 cases of not-retired indivicuals were selected randomly from about 3600 cases of respondents surveyed between the year 2000 and 2005 with tvhours>0 (exluding NAs).

Scope of inference

The population of interest for this study is the population of the United States in the years 2000 to 2005.

The findings from this study can be generalized to that population, because: 1. the GSS surveys 2000 to 2005 were taken on randomly selected households in the United States 2. for this study two stratified subsets (according to retirement status) of the GSS data were used 3. 200 cases were randomly selected from each of the subsets

There is a possible non-response bias - inidcated as value ‘NA’ for variables ‘wrkstat’ and ‘tvhours’. Variable wrkstat has only a few ’NA’s. This can be neglected. But variable tvhours has over 23000 ’NA’s in the original dataset. This could influence the results of this study. Further studies may be necessary.

As this is an observation study, the reletionship between the two variables may show only an association, but will not show a causal relationship, eg. that retirement causes more TV consumption.

Exploratory data analysis:

Exploring original dataset:

An early inspection found many ‘NA’ values for the tvhours variable in the GSS dataset. What is the distribution of the ‘NA’ values acccording to the various levels of the wrkstat? Below the portion of ‘NA’ values for the different wrkstat levels:

tapply(gss$tvhours, gss$wrkstat, function(x) sum(is.na(x))/length(x))
## Working Fulltime Working Parttime Temp Not Working Unempl, Laid Off 
##        0.4046513        0.3943855        0.4072547        0.4025627 
##          Retired           School    Keeping House            Other 
##        0.4068307        0.4283267        0.4172792        0.4045936

We find, that ‘NA’ vaues are evently distributed over the various wrkstat-levels. Each level has about the same proportion (40%) of ‘NA’ values. Non-response bias is the same for retired and not-retired respondents.

Filtering data

We are only interested in two variables of the GSS dataset and the years between 2000 and 2005. Therefor we select wrkstat (labor force status) and tvhours (hours per day watching TV) of cases of the years 2000 to 2005. To avoid NA values, we omit ‘NA’ values for these variables.

gss2 <- subset (gss, year>=2000 & year <= 2005 & tvhours>=0 & !is.na(wrkstat), 
                select=c(wrkstat,tvhours))
dim(gss2)
## [1] 3633    2

The filtered dataset comprises about 3600 cases.

Next we create a new categorical variable ‘retirementStatus’ with only two levels, specifying whether a respondent is retired or not.

gss2[,"retirementStatus"] <- "not_retired"
gss2[gss2$wrkstat=="Retired","retirementStatus"] <- "retired"
gss2$retirementStatus <- as.factor(gss2$retirementStatus)

Stratified sampling

Next a stratified sampling is performed. From the filtered dataset 2 subsets are created:

  • set of not-retired persons (about 3100 cases)
  • set of retired persons (about 500 cases)

From each set 200 cases will then be selected. Both subsets will be merged to the resulting dataset for this study (400 cases)

set.seed(1010101)
gss2n <- gss2[gss2$retirementStatus=="not_retired",]
gss2y <- gss2[gss2$retirementStatus=="retired",]
gss3n <- gss2n[sample(nrow(gss2n),200 ), ]
gss3y <- gss2y[sample(nrow(gss2y),200 ), ]
gss3 <- rbind(gss3n,gss3y)

Some Statistics

Next we calculate some basic statistics (median, mean and SD in hours) for both groups (retired and not-retired cases).

Not-Retired Retired
Median 2 3.5
Mean 2.675 4.08
SD 2.336 2.877
n 200 200

This suggests,that the average daily TV consumption of retired people is indeed higher than for not-retired. But the standard deviation is high for both groups (mean - 2*SD gets negative), so there is some uncertainty.

Exploring group data

Next we are examining both groups: retired and not-retired persons.

We first create a summary statistics of variable tvhours

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   2.000   3.000   3.378   4.000  20.000

We observe an average of about 3.4 hours and a maximum of 20 hours for the daily TV consumption of all (400) cases.

The following figure shows the histograms of TV hours for both groups For both groups the distribution is strongly right skewed with outliers. But the sample size of 200 for each group is large enough to justify the normality-condition for the hypothesis test that follows.

The side-by-side boxplot illustrates the TV consumption versus retirement status It shows that both distributions are right skewed with some outliers. The median for retired is substantially higher than for not-retired. And the variability for retired persons is higher than for not-retired (wider IRQ).

Suggestion

The exploratory analysis suggests that there exists a relationship between retirement status and TV consumption: while the average daily TV consumption is about 2.7 hours for non-retired respondends, it is higher (3.8 hours) for retired respondends. But the uncertainty is relatively high. Therefor a hypothesis test will be performed.

Inference:

Hypothesis

For this study we measure and compare the difference in the average daily TV consumption between retired and not-retired respondents. It is an association between a numerical (tvhours) and a categorical (retirementStatus) Varaiable. And as we compare two means, the inference is done via a hypothesis test and a check of the confidence interval.

For the hypothesis test we state both hypothesis:

  • Null hypothesis (H0): there is no difference in the average daily TV consumption between retired (\(\mu\)rt) and not-retired (\(\mu\)nrt) persons.
  • Alternative hypothesis (HA): the average daily TV consumption of retired persons is higher than of not-retired persons.

H0 = \(\mu\)rt - \(\mu\)nrt = 0

HA = \(\mu\)rt - \(\mu\)nrt > 0 (one-sided)

The observed difference in the average daily TV consumption between retired and not-retired persons is 1.405 hours.

Check conditions

Before performing a hyphotesis test we must be sure that the conditions for inference for comparing two independent means are met.

  • Independence within groups is given because the sampled cases in the GSS survey were randomly selected (without replacement) and the sample size is less than 10% of the population.
  • Independence between both groups is given because cases of both groups are independent of each other (not paired)
  • The sample size of 200 for each group is large enough, even considering the strong skewness of both distribution

As all conditions are met we can continue performing the hypothesis test.

Methods

We have two independent groups (retired and not-retired residents of the US) and want to compare the average daily TV consumption (tvhours) of both groups.

We are interested in the difference of the average daily TV consumption for all US residents who are retired and those not retired. As the point estimate we use the average difference of the daily TV consumption between two sampled groups of US residents who are retired and not-retired.

We will perform a hypothesis test and a confidence interval check to estimate the difference and margin of error.

  • Significance level for the hypothesis test: 5%
  • Level for the confidence interval: 95%

Hypothesis test

For the hypothesis test we want to compare the means of the variable tvhours for both levels of the variable retirementStatus. The Null value is 0 (no difference); the alternative is one-sided (tvhousr(retired) > tvhours(not-retired). The significance level is 5%. The test is done with the inference function of the ‘statistics_lab_resources_inference.R’ script.

How large is the probability - given the NULL hypothesis - of observing a difference at least as large as in the sample dataset (1.405 hours)?

inference(y = gss3$tvhours, x = gss3$retirementStatus, 
          est = "mean", type = "ht", null = 0, alternative = "greater", siglevel = 0.05,
          method = "theoretical", order = c("retired","not_retired"),
          eda_plot = FALSE, inf_plot = FALSE)
## Response variable: numerical, Explanatory variable: categorical
## Difference between two means
## Summary statistics:
## n_retired = 200, mean_retired = 4.08, sd_retired = 2.8766
## n_not_retired = 200, mean_not_retired = 2.675, sd_not_retired = 2.3359
## Observed difference between means (retired-not_retired) = 1.405
## H0: mu_retired - mu_not_retired = 0 
## HA: mu_retired - mu_not_retired > 0 
## Standard error = 0.262 
## Test statistic: Z =  5.362 
## p-value =  0

The p-value is nearly zero (0). If there is no difference in the average daily TV consumption between retired and not-retired persons, there is nearly no chance of obtaining random samples of 200 retired and 200 not-retired persons where the average difference in the daily TV consumption is at least 1.405 hours.

Therefor we reject the Null hypothesis (H0) and stay with the alternative hypothesis, that the average daily TV consumption of retired persons is higher than that of not-retired.

As the p-value is very close to 0, there is also no probability of making a type 1 or type 2 error.

Confidence interval

In a second step we want to estimate the uncertainty of the test-result. We calculate the confidence interval for the difference between the average daily TV consumption (in hours) of retired to not-retired persons. The confidence level is 95%.

inference(y = gss3$tvhours, x = gss3$retirementStatus, 
          est = "mean", type = "ci", null = 0, alternative = "greater", conflevel = 0.95,
          method = "theoretical", order = c("retired","not_retired"),
          eda_plot = FALSE, inf_plot = FALSE)
## Response variable: numerical, Explanatory variable: categorical
## Difference between two means
## Summary statistics:
## n_retired = 200, mean_retired = 4.08, sd_retired = 2.8766
## n_not_retired = 200, mean_not_retired = 2.675, sd_not_retired = 2.3359
## Observed difference between means (retired-not_retired) = 1.405
## Standard error = 0.262 
## 95 % Confidence interval = ( 0.8914 , 1.9186 )

Retired persons in the US watch on average 0.89 to 1.92 hours TV per day more than not-retired persons. The interval does not include ‘0’ and is in accordance with the hypothesis tests There is a significant difference in the daily TV consumption of both groups.

Conclusion

Some scientists suggests that TV watiching is associated with negative changes in many aspects of health. And it is said that older persons watch more TV [2,3] In this study we wanted to check whether the GSS dataset supports the statement and wanted to quantify the difference of the daily TV consumption of retired persons versus not-retired.

We learned that the daily TV consumption of retired persons is about 4.1 hours versus 2.7 hours for not-retired with some persons (in both groups) watching TV for more than 8 hours per day (strong skewed distribution);

The hypothesis test performed led to a strong evidence that retired persons watch about 0.9 to 1.9 hours more TV per day than non-retired. This is a significant difference and supports the efforts to reduce the TV consumption of older persons to improve their health.

However there is a concern: the non-response bias regarding the tvhours variable is relatively high (about 40% accross all levels of wrkstat). Getting responses of non-responders could influence the study results. But because of the low p-value (nearly 0) it is not very probable that this would change our results.

Future studies could concentrate on investigating alternative activities that are more socially and physically engaging and looking for which activities best reduces passive TV consumption.

References

  1. Smith, Tom W., Michael Hout, and Peter V. Marsden. General Social Survey, 1972-2012 [Cumulative File]. ICPSR34802-v1. Storrs, CT: Roper Center for Public Opinion Research, University of Connecticut /Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributors], 2013-09-11. doi:10.3886/ICPSR34802.v1, Persistent URL: http://doi.org/10.3886/ICPSR34802.v1
  2. Older adults watch more TV than younger people, enjoy it less, ScienceDaily, June 29, 2010
  3. Too Much TV Linked With Disease and Early Death, TIME, June 15, 2011

Appendix

The table below shows one page of the data used for this study with the original variables (wrkstat, tvhours) and the derived variable (retirementStatus).

##                wrkstat tvhours retirementStatus
## 39912 Working Fulltime       0      not_retired
## 45242 Working Fulltime       2      not_retired
## 42306          Retired       2          retired
## 42931          Retired       2          retired
## 45992          Retired       6          retired
## 39177          Retired      13          retired
## 42011          Retired       5          retired
## 46427          Retired       1          retired
## 41450    Keeping House       4      not_retired
## 38512 Working Parttime       4      not_retired
## 41828          Retired       3          retired
## 41593          Retired       5          retired
## 42697    Keeping House       4      not_retired
## 39595    Keeping House       3      not_retired
## 40500          Retired       6          retired
## 41523          Retired       7          retired
## 41005          Retired       8          retired
## 40399 Working Fulltime       1      not_retired
## 42070            Other       3      not_retired
## 38620          Retired       3          retired
## 45899    Keeping House       1      not_retired
## 42314          Retired       4          retired
## 40883          Retired       5          retired
## 45419          Retired       3          retired
## 39577 Working Fulltime       3      not_retired
## 43281          Retired       5          retired
## 41763 Working Fulltime       2      not_retired
## 46099          Retired      20          retired
## 40925          Retired       3          retired
## 40872 Working Parttime       2      not_retired
## 41182 Working Parttime       1      not_retired
## 41267 Working Fulltime       2      not_retired
## 41236          Retired       3          retired
## 44974 Working Fulltime       2      not_retired
## 40217          Retired       0          retired
## 39930 Working Fulltime       2      not_retired
## 40263          Retired       4          retired
## 45947 Working Fulltime       1      not_retired
## 46190          Retired       3          retired
## 44837          Retired       5          retired
## 38612 Working Fulltime       1      not_retired
## 45894          Retired       2          retired
## 46146          Retired       1          retired
## 38801    Keeping House       2      not_retired
## 43075    Keeping House       1      not_retired