Work was done by:

Team “CandyCrash”


Gugnina Daria

Korolik Irina

Sokolova Anna

Country: Sweden

Topic: “Social inclusion and the level of happiness (desired and subjectively available) among men and women of different ages”


Downolading packages

# libraries for tables

# for spss

# for select

# library for graphs

# for Levene's test

# to run post hoc tests

Project 1

Let’s start with the selection of variables relevant to our topic:

#set new directory
## [1] "E:/gfgrf/ESS8SE_spss"
ESS <- read.spss("ESS8SE.sav", use.value.labels=T, 
 ESS1 <- select(ESS, c("happy", "sclmeet", "sclact", "inprdsc", 
                      "gndr", "ipgdtim", "yrbrn"))
ESS1 = na.omit(ESS1) 

Table of variables:

Meaning Qualitative or Quantitative Level of measurement Descrete or continious
Gndr Gender Qualitative Nominal Descrete
Yrbrn Year of birth Quantitative Interval Continuous
Happy How happy are you Qualitative Ordinal Descrete
Sclmeet How often socially meet with friends, relatives or colleagues Qualitative Ordinal Descrete
Sclact Take part in social activities compared to others of same age Qualitative Ordinal Descrete
Inprdsc How many people with whom you can discuss intimate and personal matters Qualitative Ordinal Descrete
Ipgdtim Important to have a good time Qualitative Ordinal Descrete

Recode our variables:

ESS1$yrbrn <-as.numeric(ESS1$yrbrn) 
## [1] "double"
## [1] "numeric"

Describe single variables using Central tendency measurement (CTM)

Describe single variables using CTM


We find the mode of variables, it has all of our variables:

Mode <- function(x) { 
  ux <- unique(x) 
  ux[which.max(tabulate(match(x, ux)))] 

## [1] 28
## [1] Somewhat like me
## 6 Levels: Very much like me Like me Somewhat like me ... Not like me at all
## [1] 8
## Levels: Extremely unhappy 1 2 3 4 5 6 7 8 9 Extremely happy
## [1] About the same
## 5 Levels: Much less than most Less than most ... Much more than most
## [1] Several times a week
## 7 Levels: Never Less than once a month ... Every day
## [1] 4-6
## Levels: None 1 2 3 4-6 7-9 10 or more
## [1] Female
## Levels: Male Female


Also, we look at the median value of our variables:

median(as.numeric(ESS1$yrbrn), na.rm = T)
## [1] 45

On our only quantitative variable ( “Year of birth”) we looked at the first and third quartiles, median, mean, and maximum and minimum values:

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00   30.00   45.00   45.64   61.00   82.00

Table by CTM

For better data perception, we generated a table on CTM according to our variables:

Meaning Moda Median Mean
Gndr Gender Female
Yrbrn Year of birth 28 45 45.9
Happy How happy are you 8
Sclmeet How often socially meet with friends, relatives or colleagues Several times a week
Sclact Take part in social activities compared to others of same age About the same
Inprdsc How many people with whom you can discuss intimate and personal matters 4-6
Ipgdtim Important to have a good time Somewhat like me

We also decided to illustrate some of our variables with barplots:

year <- c(mean(ESS1$yrbrn, na.rm = T), median(ESS1$yrbrn, na.rm = T), Mode(ESS1$yrbrn))
barplot(year,  xlab="CTM", ylab="Age of respondents", col=c("slateblue1", "turquoise2", "plum2"), names=c("mean by age", "median by age", "moda by age"))

plot(ESS1$gndr, xlab="Gender", ylab="Number of people", col=c("paleturquoise1", "rosybrown1"), names=c("Male", "Female"))

This graph shows the average, median values and the value of fashion by age of respondents. It can be seen that the average value of age (45.5) takes the greatest value, while the median age is only slightly inferior to the average (45). The value of the age mode does not reach even thirty (28) - the majority of respondents were aged 28 years.

According to the second schedule, we can say that in our sample, the number of men only slightly exceeds the number of women: 773 men and 777 women.

Next we turned to the distribution of our variables:

Distribution of respondents by age:

The graph shows that the largest number of respondents - at the age of 28 years. In principle, respondents younger than twenty years less than respondents of other ages. The smallest number of respondents among respondents over the age of twenty - about 27 respondents at the age of about 75 years.

Variable distribution - How happy are you?

This chart reflects the level of happiness that respondents themselves attribute to themselves. Respondents were offered a scale, where 0 - very unhappy, 10 - extremely happy. As you can see, most of the answers are concentrated in the right part of the graph, from which it can be concluded that the level of happiness of the prevailing number of informants is quite high.

Variable distribution - How often socially meet with friends, relatives or colleagues?

The informants were asked “how often do you meet socially with friends, relatives or work colleagues?”. Frequency of meetings should be noted on a scale, where 0 - never, 7 - every day. As can be seen from the histogram, the minimum number of respondents chose on the scale figures from 0 to 3, which is equivalent to the answers “never” and “once a month”. A much larger proportion chose numbers 4 - several times a month, and 5 - once a week. But. The most popular answers were “several times a week” and “every day”. This picture indicates a high social activity of respondents.

Variable distribution - Take part in social activities compared to others of same age?

The graph shows the frequency of social contacts, in the opinion of the respondents themselves, in comparison with other people of their own age. On the proposed scale 1 - much less often than most, 5 - much more often than most. Apparently, the polar indicators are much more often / rarely - the rarest answers of respondents. The most frequent responses of respondents are close to the middle of the scale, which corresponds to the figure 3 - “about the same”. In general, respondents believe that they interact with people a little less often or as often as their peers.

Variable distribution - How many people with whom you can discuss intimate and personal matters?

Here the respondents were asked to indicate the number of people with whom they can speak personal, intimate topics, where 0 is “with no one”, 6 with 10 or more (people). Most often in questionnaires, a figure of five was chosen, which is equivalent to 7 to 9 people with whom the respondent can share confidential information. The option “no one” is the least popular, so we can say that the majority of respondents have at least 2 people who are not trusted. Hence, the level of trust in people among the respondents is high.

Variable distribution - Important to have a good time

On this point the respondents were offered the following situation: please listen to each description and tell me how much each person is or is not not like you. Use this card for your answer. Having a good time is important to her / him. She / he likes to “spoil” herself / himself. The level of “similarity” the respondent was to note on the scale, where 0 - very similar to me, 6 - absolutely different from me. The most common answers are 2 and 3 - Like me and Somewhat like me.

Plot pairs of variables with different measurement scales

Construct a scatterplot for two continuous variables

People of all ages tend to meet with friends and colleagues several times a week. People under the age of twenty do not tend to leave the question about the frequency of meetings with friends. The smallest number of respondents never meets their friends, relatives and colleagues, but there is practically no such investment. Several times a month, people from about 14 to 72 years old, younger than 14 and older than 72, are not seen with their acquaintances.

Construct a boxplot for continuous and categorical (binary) variables

Men and women respond equally to the question of time spent: they believe that descriptions of other people are somewhat similar to themselves.

Project 2

We make a table for the variables that we will include in the hypotheses and check with the help of the chi-square test:

table(as.character(ESS1$gndr), as.character(ESS1$sclact))
##          About the same Less than most More than most Much less than most
##   Female            374            195            127                  39
##   Male              348            213            133                  42
##          Much more than most
##   Female                  26
##   Male                    19
table(as.character(ESS1$gndr), as.character(ESS1$ipgdtim))
##          A little like me Like me Not like me Not like me at all
##   Female              157     205          99                 12
##   Male                131     224          86                 10
##          Somewhat like me Very much like me
##   Female              214                74
##   Male                217                87


Chi-square test

We want to test the following hypothesis for Chi-square test:

H0: there are no differences between chosen groups

The alternative will be:

H1: there are some differences between chosen groups

chisq.test(as.character(ESS1$gndr), as.character(ESS1$sclact))
##  Pearson's Chi-squared test
## data:  as.character(ESS1$gndr) and as.character(ESS1$sclact)
## X-squared = 3.0452, df = 4, p-value = 0.5503

Conclusion: For p-value> .05 (p-value = 0.5503), X-squared = 3.0452, df = 4, so we cannot reject the H0 hypothesis, so there are no significant differences in social