Description of the dataset This dataset was obtained as part of a course “The Impact of Social Environment on Health”, read by Marijn Stok at the University of Konstanz. The students were divided in groups á 5 participants. Each group investigated a question and designed an experiment. The research question in this particular dataset was about how norms influence our eating behavior. There is evidence that norms transmitted in an injunctive form cause reactance and thus may enhance unhealthy eating choice (Stok et al, 2014). On the contrary a descriptive norm might promote healthy eating behavior. We formulated two norms on two posters: “Eat healthier! How? Nutrient experts recommend making fruits and vegetables to a bigger part of your diet: a healthy diet should include at least 3 portions of fruits and vegetables per day” (injunctive) vs. “Join other students and eat healthier! How? Make fruits and vegetables be a bigger part of your diet: 67% of students at the University of Konstanz eat at least 3 portions of fruits and vegetables per day” (descriptive). The participants were confronted with either of norms. We were interested to see if participants in the descriptive condition will opt for a healthy snack (mandarin) more frequently and if participants in the injunctive condition will choose unhealthy snack (Lebkuchen) more frequently. Additional distractor-posters were designed which were shown to participants in both conditions. These posters included information about healthy eating helping to protect the environment, helping to save money or helping to protect health. Research has shown that people confronted with a list of different reasons for behavior of interest, descriptive and injunctive norms among these reasons, usually underestimate the impact of norms on their behavior ranking them as least influential. At the same time it appears that the impact of norms on the actual behavior is bigger than of the other information. People tend not to recognize the guiding impact of the norms on the behavior (Nolan et al., 2008). We were interested to see if participants will rank the distractor-posters higher in impact than the norms-posters. Additionally we asked participants about their intention to eat healthier to see whether the norms can help to overcome the intention-behavior gap. On the other hand, if no behavioral difference would be found between conditions, we could at least see if norms influence the inner intention to eat healthier. The data was collected at the University. The investigators contacted students in foyer and asked them to fill out a questionnaire on a tablet about eating habits. A snack (mandarin or cookie) was offered as a reward. Participants received a tablet. First they answered some general questions. Then they were asked to rank 4 posters (poster with a norm, health-poster, save-money-poster and environment-poster) according to the impact it made on them. To make sure that participants take a closer look at the norm-poster, they were asked to rank this poster in more detail (color, message, font, general impression). Participants were made to believe that this specific poster was randomly selected. After filling out the questionnaire participants could choose a snack. There are 50 columns and 68 rows in the dataset. Names of columns: Lfdn: actual numbers of participants who were finally included into the dataset Lastpage: the ID of the condition Duration: how long it took to fill out the questionnaire (seconds) tn_number: number of participant starting from 1 to 68 for the dataset agree: informed consent

General questions about health and eating: health: self-reported health eating: self reported healthy eating importance: how important is healthy eating for a participant fruit: consumed portions of fruit per week veg: consumed portions of vegetables per weak diet: “Are you on a diet?” y/n vegetarian: diet y/n vegan: diet y/n religion: diet (exclusion of products) due to religion allergies: diet y/n allergies_kind: specifying allergy others: any other diet others_kind: specifying

Ranking of posters: Poster_condition: poster with either descriptive or injunctive norm Poster_environment: poster with a statement that healthy eating helps to protect environment Poster_money: money statement Poster_health: statement about protecting health Back: rating of poster background Color: rating of poster color Message: rating of poster message Text: rating of text quality Font: rating of text font Composition: rating of poster composition Overall: overall impression

Measurement of intention Want: I want to eat healthy Intend: I intend to eat healthy Plan: I plan to eat healthy Will: I will eat healthy Fruit_intention: I intend to eat more fruit Veg_intention: I intend to eat more vegetables Check: estimation of how healthy the participant eats at the moment

Deographic variables: Gender Nationality Age Occupation Height Weight

Reward: which reward a participant chose - mandarin or Lebkuchen Session_id Ats: code of questionnaire Datetime: date and time Date_of_last_access Condition: condition injunctive or descriptive

Questions:

1 Do people rate posters with a norm-statement as less influential in comparison to other distractor-posters?

2 Do people rank the impact of the descriptive poster lower as the impact of the injunctive poster?

3 Does poster with a norm statement has any impact on intention?

4 Does poster with descriptive norm increase healthy snack choice compared to the injunctive norm condition?

5 Do participants show intention-behavior gap?

First I open an SPSS file in R

library("memisc", lib.loc="~/R/win-library/3.2")
## Warning: package 'memisc' was built under R version 3.2.3
## Loading required package: lattice
## Loading required package: MASS
## 
## Attaching package: 'memisc'
## 
## Die folgenden Objekte sind maskiert von 'package:stats':
## 
##     contr.sum, contr.treatment, contrasts
## 
## Das folgende Objekt ist maskiert 'package:base':
## 
##     as.array
library("rmarkdown", lib.loc="~/R/win-library/3.2")
library("yarrr", lib.loc="~/R/win-library/3.2")
library("RColorBrewer", lib.loc="~/R/win-library/3.2")

my.data <- as.data.set(spss.system.file('C:/Users/Elena/Downloads/Dataset_group lebkuchen_2015_12_22.sav'))

Question 1. Now I test the first question: Do people rate posters with a norm-statement as less influential in comparison to other distractor-posters?

To find it out I would like to see what was the mean ranking for each poster under each condition (!!FINDING A MEAN - ONE OF ASSIGMENTS!!)

mean(my.data$poster_condition[my.data$condition == "injunctive"])
## [1] 2.058824
mean(my.data$poster_health[my.data$condition == "injunctive"])
## [1] 2.764706
mean(my.data$poster_environment[my.data$condition == "injunctive"])
## [1] 2.323529
mean(my.data$poster_money[my.data$condition == "injunctive"])
## [1] 2.852941
mean(my.data$poster_condition[my.data$condition == "descriptive"])
## [1] 2.235294
mean(my.data$poster_money[my.data$condition == "descriptive"])
## [1] 2.970588
mean(my.data$poster_health[my.data$condition == "descriptive"])
## [1] 2.705882
mean(my.data$poster_environment[my.data$condition == "descriptive"])
## [1] 2.088235

Lower mean means higher significance-ranking. It looks like people rate the descriptive norm poster as slighty more influential as the distractor posters. The injunctive poster reached position 2 in rankting compared to distractor posters. This did’t confirm the prediction that people will rate the norm posters as less influential.

let’s see the a standard deviation and median for column poster_condition

sd(my.data$poster_condition)
## [1] 1.096326
median(my.data$poster_condition)
## [1] 2

I conduct multiple t-test to check the group means for significance

library(survival)

lapply(my.data[,c("poster_condition", "poster_money", "poster_health", "poster_environment")], function(x) t.test(x ~ my.data$condition, var.equal = TRUE))
## $poster_condition
## 
##  Two Sample t-test
## 
## data:  x by my.data$condition
## t = 0.66088, df = 66, p-value = 0.511
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.3566579  0.7095991
## sample estimates:
## mean in group descriptive  mean in group injunctive 
##                  2.235294                  2.058824 
## 
## 
## $poster_money
## 
##  Two Sample t-test
## 
## data:  x by my.data$condition
## t = 0.49556, df = 66, p-value = 0.6218
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.3563402  0.5916343
## sample estimates:
## mean in group descriptive  mean in group injunctive 
##                  2.970588                  2.852941 
## 
## 
## $poster_health
## 
##  Two Sample t-test
## 
## data:  x by my.data$condition
## t = -0.21349, df = 66, p-value = 0.8316
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.6089300  0.4912829
## sample estimates:
## mean in group descriptive  mean in group injunctive 
##                  2.705882                  2.764706 
## 
## 
## $poster_environment
## 
##  Two Sample t-test
## 
## data:  x by my.data$condition
## t = -0.88021, df = 66, p-value = 0.3819
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.7690103  0.2984221
## sample estimates:
## mean in group descriptive  mean in group injunctive 
##                  2.088235                  2.323529

Question 2: now let’s see if there is any significant difference in rankings of norm posters between conditions. This was actually answered in the previos step, but I will conduct a separate t-test and write it in APA format (!!T-TEST AND APA FORMAT - ASSIGNMENT)

with(my.data, t.test(poster_condition ~ condition))
## 
##  Welch Two Sample t-test
## 
## data:  poster_condition by condition
## t = 0.66088, df = 64.491, p-value = 0.511
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.3568908  0.7098319
## sample estimates:
## mean in group descriptive  mean in group injunctive 
##                  2.235294                  2.058824
t.test.poster<-with(my.data, t.test(poster_condition ~ condition))
apa(t.test.poster)
## [1] "mean difference = -0.18, t(64.49) = 0.66, p = 0.51 (2-tailed)"

There is no difference in ranking of descriptive and injunctive poster.

I create a boxplot for this result (!!ASSIGNMENT)

with(my.data, boxplot(poster_condition ~ condition,
                       ylab = "Ranking",
                       xlab = "Condition",
                       main = "Ranking of posters according to condition",
                    col="paleturquoise2"))

I would like to see where is the median in ranking of norm posters. I do it with aggregate function (!!ASSIGNMENT)

with(my.data, aggregate(poster_condition ~ condition, FUN = median))
##   condition poster_condition
## 1         1                2
## 2         2                2

3rd question: Does poster with a norm statement has any impact on intention? Intention was measured by 4 subscales (columns 31-34)

First of all I need to recode the columns as they are in an SPSS format and then I need to calculate the mean intention from these 4 columns and add it to a separate column intention.total

recoding column (!!ASSIGNMENT)

recode.v <- function(original.vector, 
                   old.values, 
                   new.values, 
                   others = NULL) {
  
if(is.null(others)) {
  
  new.vector <- original.vector
  
}

if(is.null(others) == F) {
  
 new.vector <- rep(others, 
                   length(original.vector))
 
}

for (i in 1:length(old.values)) {
  
change.log <- new.vector == old.values[i] & 
              is.na(new.vector) == F

new.vector[change.log] <- new.values[i] 

}

return(new.vector)
  
}


#recode each column

my.data$want <- as.character(my.data$want)
my.data$want <- recode.v(original.vector = my.data$want,
                                 old.values = c("very much", "much", "neutral"),
                                 new.values = c(2, 1, 0)
)
my.data$want <-as.numeric(my.data$want)


my.data$intend <- as.character(my.data$intend)
my.data$intend <- recode.v(original.vector = my.data$intend,
                         old.values = c("very much", "much", "neutral"),
                         new.values = c(2, 1, 0)
)
my.data$intend <-as.numeric(my.data$intend)



my.data$plan <- as.character(my.data$plan)
my.data$plan <- recode.v(original.vector = my.data$plan,
                         old.values = c("very much", "much", "neutral"),
                         new.values = c(2, 1, 0)
)
my.data$plan <-as.numeric(my.data$plan)
## Warning: NAs durch Umwandlung erzeugt
my.data$will <- as.character(my.data$will)
my.data$will <- recode.v(original.vector = my.data$will,
                         old.values = c("very likely", "likely", "neutral"),
                         new.values = c(2, 1, 0)
)
my.data$will <-as.numeric(my.data$will)

#create new column for total intention
my.data$intention.total <-my.data$want+my.data$will+my.data$intend+my.data$plan/4

Now let’s see if there was any impact of condition on intention

with(my.data, t.test(intention.total ~ condition))
## 
##  Welch Two Sample t-test
## 
## data:  intention.total by condition
## t = -0.28968, df = 63.931, p-value = 0.773
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.9253752  0.6910002
## sample estimates:
## mean in group descriptive  mean in group injunctive 
##                  3.757812                  3.875000

There is no significant difference in intention under conditions. Descriptive and injunctive norm didn’t differ in their impact.

Question 4: Does poster with descriptive norm increase healthy snack choice compared to the injunctive norm condition?

first of all i need to clean the data, because the reward (snack) was written in multiple ways. I want only 3 version: mandarin, lebkuchen, nothing. (!!ASSIGNMENT: RECODE COLUMN USING VECTORING)

my.data$reward <- recode.v(original.vector = my.data$reward,
                         old.values = c("Landwirten", "lebkuch", "leiblichen", "man darin", "mandarine","mit ging", "nichts" ),
                         new.values = c("lebkuchen", "lebkuchen", "lebkuchen", "mandarin", "mandarin", "mandarin", "nothing")
)

my.data$reward[my.data$reward == "a"] <- NA

table(my.data$reward)
## 
## lebkuchen  mandarin   nothing 
##        17        38        12
# now i will conduct a Chi-Squared-Test

library(MASS)
tbl = table(my.data$reward, my.data$condition) 
tbl
##            
##             descriptive injunctive
##   lebkuchen           7         10
##   mandarin           19         19
##   nothing             8          4
chisq.test(tbl)
## 
##  Pearson's Chi-squared test
## 
## data:  tbl
## X-squared = 1.8482, df = 2, p-value = 0.3969

there is no difference in reward selection between conditions

Question 5: Do participants show an intention-behaivor gap? I would like to see if there is correlation between what they intend and how they actually behave. For this i cmpare columns fruit_intention with fruit (intention to eat fruit and actual reported fruit portions) and veg_intention with veg (intention to eat vegetables and actual ammount of vegetable-portions) (!!ASSIGNMENT - CORRELATION)

#i have to recode columns to numeric and calculate the mean. 
my.data$fruit_intention<-as.character(my.data$fruit_intention)
my.data$fruit_intention<-as.numeric(my.data$fruit_intention)
## Warning: NAs durch Umwandlung erzeugt
my.data$fruit<-as.numeric(my.data$fruit)
## Warning in .nextMethod(x = x, mode = mode): NAs durch Umwandlung erzeugt
cor.fruit<-with(my.data, cor.test(fruit_intention, fruit))

apa(cor.fruit)
## [1] "r = 0.46, t(65) = 4.17, p < 0.01 (2-tailed)"
my.data$veg_intention<-as.numeric(my.data$veg_intention)

my.data$veg<-as.numeric(my.data$veg)

cor.veg<-with(my.data, cor.test(veg_intention, veg))
apa(cor.veg)
## [1] "r = 0.95, t(66) = 24.7, p < 0.01 (2-tailed)"

looks like participants are very motivated people and their intention correlates highly with their self-reported behavior.

Now let’s take a look at the assignments I have already done: mean, sd, and median; t-test written in apa format; boxplot; aggregate; recoding column; correlation with apa format.

Following assignment are to be done: regression, scatterplot + abline, histogram + reference lines for mean and median; function; loop.

Regression

# Can intention to eat healthier be explained through importance of the topic "healthy eating" for a person?

regression<-lm(intention.total ~ importance,
   data = my.data)

summary(regression)
## 
## Call:
## lm(formula = intention.total ~ importance, data = my.data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.5000 -0.5000 -0.5000  0.7216  2.7500 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           4.7500     0.3645  13.031  < 2e-16 ***
## importanceimportant  -1.0000     0.4385  -2.280 0.025978 *  
## importanceneutral    -2.1364     0.5815  -3.674 0.000495 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.503 on 63 degrees of freedom
##   (2 observations deleted due to missingness)
## Multiple R-squared:  0.1784, Adjusted R-squared:  0.1523 
## F-statistic:  6.84 on 2 and 63 DF,  p-value: 0.00205
#importance is a significant predictor for intention

scatterplot and regression line for relationship between importance and intention

plot (x= my.data$importance, 
       y=my.data$intention,
       pch = 16, 
      col = "blue", 
      xlab = "Importance", 
      ylab = "Intention",
      main = "Distribution"
 )
 points(my.data$intention, 
        pch = 16, 
        col = "orange"
 )
 
 abline(a = 0,
        b = 1,
        lwd = 2,
        lty = 2)

histogram: how much participants weight on the average with mean line and median line

# again recode a column to numeric
my.data$weight<-as.numeric(my.data$weight)
 
#now histogram:
 
hist(my.data$weight,
     xlim = c(30,150),
     ylim = c(0, 50),
     xlab = "Weight",
     ylab = "Persons",
     main = "Weight",
     cex.main = .7,
     col = "chartreuse"
)

text(mean(my.data$weight), 38,
     labels = paste("Mean\n", round(mean(my.data$weight), 2), sep = ""),
     adj = 0,
     pos = 4
)
abline(v = mean(my.data$weight), lty = 2)

text(median(my.data$weight), 38,
     labels = paste("Median\n", round(median(my.data$weight), 2), sep = ""),
     adj = 0,
     pos = 1
)
abline(v = median(my.data$weight), lty = 2)

function: some students study arts, some study science. I will assign artst to 0 and science to 1 in a separate column. In order not to forget it, I will write a function which will remind me what 1 and 0 means.

# first of all i have to clean data and assign the courses to the same labels
my.data$student <- recode.v(original.vector = my.data$student,
  old.values = c("Biologe, French", "Business education", "ecenomics", "Germanistik", "jura","Law", "Lehramt", "lehramt"),
  new.values = c("biology", "economics", "economics", "german literature", "law", "law", "teacher", "teacher")
  )

my.data$student <- recode.v(original.vector = my.data$student,
  old.values = c("econimics", "educational sciences", "Linguistics", "Math,English", "Physik","Sprachwissenschaft", "wiwi", "Wirtschaftswissenschaften"),
  new.values = c("economics", "teacher", "linguistics", "math", "physic", "linguistics", "economics", "economics")
  )

my.data$student <- recode.v(original.vector = my.data$student,
  old.values = c("politics and public Administration", "Politics and Public Administration", "politics and puplic administration", "politics public administration", "politisch and public administration","psychologie", "Psychologie"),
  new.values = c("politics", "politics", "politics", "politics", "politics", "psychology", "psychology")
  )

my.data$student <- recode.v(original.vector = my.data$student,
  old.values = c("politicalscience", "Psychology", "school", "sportwissenschaft", "wirtschaftspadagogik","philosophy, german", "sport"),
  new.values = c("politics", "psychology", "teacher", "sport", "economics", "philosophy", "sports")
  )

my.data$student[my.data$student == "-99"] <- NA

table(my.data$student)
## 
##                 biology               chemistry               economics 
##                       2                       2                      10 
##                  french       german literature Information Engineering 
##                       1                       2                       1 
##                     law             linguistics                    math 
##                       8                       3                       1 
##              philosophy                  physic                politics 
##                       1                       1                       8 
##              psychology               sociology                  sports 
##                      10                       2                       2 
##                 teacher 
##                       4
# now I create a new colus with arts = 0, science=1
my.data$study.type <- NA

my.data$study.type[my.data$student%in% c("biology", "chemistry","Information Engineering ", "math", "physic", "psychology", "economics") ] <- 1

my.data$study.type[my.data$student%in% c("french", "german literature","law", "linguistics", "philosophy", "sociology", "teacher", "sports") ] <- 0

# now I write a function
study.section <- function(x) {
  if(x == 0) {output <- "arts"}
  if(x == 1) {output <- "science"}
  return(output)
}

study.section(1)
## [1] "science"

A loop for invalid answers per participant

my.data$invalid.answers <- NA

for(row.i in 1:nrow(my.data)) {
  
  data.temp <- my.data[row.i,]
  n.na <- sum(is.na(data.temp)) - 1
  
  my.data$invalid.answers[row.i] <- n.na

}