Notes from the R-Workshop on Monday, 12 June (Peter Maurer)
Loading packages
Loading an SPSS-dataset about politicians’ perceived media influence, friends among journalists and satisfaction with democracy (from a survey). Inspecting the data.
library(haven)
library(dplyr)
library(ltm)
library(psych)
library(infer)
mydata <- read_sav("\\\\nas-a1/redirected$/petemaur/My Documents/Rworkshop/Data/politiciandata.sav")
mydata <- select(mydata, SYS01a:SYS01f, SELF04c, IN02f, SYS12, DEM9, country)
summary(mydata)
SYS01a SYS01b SYS01c SYS01d SYS01e SYS01f SELF04c IN02f SYS12 DEM9
Min. :1.000 Min. :1.000 Min. :2.000 Min. :1.00 Min. :1.000 Min. :1.000 Min. : 0.000 Min. :1.000 Min. :1.000 Min. :1.000
1st Qu.:3.000 1st Qu.:3.000 1st Qu.:4.000 1st Qu.:2.00 1st Qu.:2.000 1st Qu.:2.000 1st Qu.: 0.000 1st Qu.:3.000 1st Qu.:2.000 1st Qu.:1.000
Median :3.000 Median :4.000 Median :4.000 Median :3.00 Median :3.000 Median :3.000 Median : 2.000 Median :3.000 Median :3.000 Median :1.000
Mean :3.303 Mean :3.754 Mean :4.205 Mean :3.17 Mean :3.044 Mean :3.139 Mean : 5.857 Mean :3.413 Mean :3.104 Mean :1.239
3rd Qu.:4.000 3rd Qu.:5.000 3rd Qu.:5.000 3rd Qu.:4.00 3rd Qu.:4.000 3rd Qu.:4.000 3rd Qu.: 5.000 3rd Qu.:4.000 3rd Qu.:4.000 3rd Qu.:1.000
Max. :5.000 Max. :5.000 Max. :5.000 Max. :5.00 Max. :5.000 Max. :5.000 Max. :99.000 Max. :5.000 Max. :5.000 Max. :2.000
NA's :2 NA's :3 NA's :2 NA's :77 NA's :10 NA's :1 NA's :2 NA's :1
country
Min. :1.000
1st Qu.:1.000
Median :2.000
Mean :1.595
3rd Qu.:2.000
Max. :2.000
mydata <- rename(mydata, press = SYS01a,
tabloid = SYS01b,
public = SYS01c,
commercial = SYS01d,
online = SYS01e,
sns = SYS01f,
polls = IN02f,
friends = SELF04c,
satisfaction = SYS12,
gender = DEM9,
land = country)
glimpse(mydata)
Rows: 185
Columns: 11
$ press <dbl+lbl> 3, 5, 5, 4, 3, 4, 3, 4, 4, 5, 5, 4, 4, 3, 3, 4, 2, 4, 5, 3, 5, 3, 3, 2, 2, 5, 4, 5, 3, 5, 4, 3, 5, 4, 3, 2, 4, 3, 3, 3, 5, 3, 3, 3, 4, 3, 1, 2, …
$ tabloid <dbl+lbl> 4, 5, 2, 2, 2, 5, 2, 3, 2, 3, 2, 5, 2, 2, 4, 5, 3, 2, 5, 4, 4, 2, 4, 5, 3, 4, 4, 5, 2, 5, 5, 4, 3, 5, 4, 1, 5, 1, 4, 2, 2, 4, 3, 1, 3, 4, 1, 1, …
$ public <dbl+lbl> 3, 3, 4, 4, 3, 5, 4, 3, 5, 4, 4, 5, 4, 4, 4, 4, 3, 4, 5, 5, 5, 5, 5, 3, 3, 5, 3, 5, 4, 5, 4, 3, 5, 4, 5, 4, 3, 3, 5, 4, 5, 3, 3, 5, 4, 4, 4, 4, …
$ commercial <dbl+lbl> 4, 2, 4, 3, 3, 3, 4, 4, 3, 3, 4, 5, 3, 3, 5, 4, 4, 4, 2, 4, 4, 5, 4, 5, 3, 4, NA, 3, 4, 4, 4, 4, 2, 3, 3, 3, …
$ online <dbl+lbl> 2, 4, 3, 3, 3, 4, 3, 5, 2, 4, 3, 4, 3, 2, 4, 3, 2, 4, 5, 3, 5, 2, 3, 3, 2, 4, 2, 4, 3, 4, 4, 3, 5, 2, 5, 2, 3, 3, 5, 3, 2, 1, 4, 3, 5, 3, 2, 4, …
$ sns <dbl+lbl> 2, 4, 3, 2, 2, 4, 3, 4, 3, 2, 2, 4, 2, 2, 5, 3, 2, 2, 5, 2, 2, 3, 2, 5, 2, 3, 2, 4, 4, 3, 3, 3, 3, 3, 3, 2, 3, 2, 5, 2, 3, 4, 3, 3, 4, 2, 3, 4, …
$ friends <dbl> 0, 3, 4, 5, 0, 0, 3, 5, 0, 5, 4, 3, 0, 0, 3, 4, 0, 2, 0, 2, 1, 2, 4, 20, 0, 0, 2, 5, 4, 0, 1, 3, 0, 10, 0, 0, 0, 4, 0, 5, 2, 0, 0, 2, 0, 1, 0, 2, 4,…
$ polls <dbl+lbl> 3, 3, 5, 2, 2, 5, 4, 5, 4, 3, 4, 5, 2, 4, 2, 5, 3, 5, 5, 4, 4, 3, 5, 2, 3, 4, 4, 4, 3, 5, 3, 2, 4, 3, 4, 3, 4, 4, 5, 3, 2, 4, 2, 4, 3, 4, 5, 4, …
$ satisfaction <dbl+lbl> 3, 3, 4, 4, 4, 3, 4, 2, 2, 4, 2, 4, 3, 4, 4, 2, 4, 4, 2, 4, 4, 4, 2, 5, 4, 4, 2, 4, 3, 3, 3, 4, 4, 3, 3, 3, 4, 2, 1, 2, 2, 3, 4, 2, 3, 2, 4, 4, …
$ gender <dbl+lbl> 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 2, 2, 2, 1, 1, 1, 1, 1, 2, 2, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ land <dbl+lbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, …
We add an ID-variable, discover ‘99’ as missing value and assign it as new NA for the variable “friends”. We than check if it worked (yes, ’99’s are gone).
mydata <- mydata %>%
mutate(ID = row_number()) %>%
relocate(ID)
select(mydata, friends) %>%
arrange(desc(friends)) %>%
print(n = 185)
mydata$friends <- na_if(mydata$friends, 99)
select(mydata, friends) %>%
arrange(desc(friends)) %>%
print(n = 185)
NA
We recode the numercial variable “friends” into a categorical variable (high, medium, low) and change the order of variables so that old and new friends variables are next to each other.
mydata <- mydata %>%
filter(!is.na(friends)) %>%
mutate(frcat = case_when(friends == 0 ~ 'low',
friends > 9 ~ 'high',
TRUE ~ 'medium')) %>%
print(n = 185)
mydata %>%
relocate(frcat, .after = (friends)) %>%
print(n = 185)
NA
We calculate the mean of perceived influence of different media types and do the same for two groups (“land” is grouping variable)
mydata %>%
summarise_at(c("friends", "press", "sns"), list(mean, sd, median), na.rm = T)
mydata %>%
group_by(land) %>%
summarise_at(c("friends", "press", "sns"), list(mean = mean, sd = sd, median = median), na.rm = T)
mydata %>%
select(friends) %>%
colMeans(na.rm = T)
friends
4.232558
We summarize personal means for respondents accross several variables measuring perceived influence of different media types.
mydata <- mydata %>%
rowwise() %>%
mutate(suminf_mean = mean(c(press, tabloid, public, commercial, online, sns), na.rm = T)) %>%
ungroup
mydata
mydata %>%
arrange(desc(suminf_mean))
We perform correlations among perceived influences and test for significance.
mydata %>%
select(tabloid, sns) %>%
cor(method = 'spearman', use = 'complete.obs')
tabloid sns
tabloid 1.0000000 0.2813735
sns 0.2813735 1.0000000
cordata <- mydata%>%
select(tabloid, sns, online, polls, friends)
library(psych)
test<- corr.test(cordata, method = 'spearman')
test
Call:corr.test(x = cordata, method = "spearman")
Correlation matrix
tabloid sns online polls friends
tabloid 1.00 0.28 0.13 0.04 0.16
sns 0.28 1.00 0.46 0.11 0.00
online 0.13 0.46 1.00 0.24 -0.11
polls 0.04 0.11 0.24 1.00 -0.14
friends 0.16 0.00 -0.11 -0.14 1.00
Sample Size
tabloid sns online polls friends
tabloid 170 104 168 169 170
sns 104 105 103 104 105
online 168 103 170 169 170
polls 169 104 169 171 171
friends 170 105 170 171 172
Probability values (Entries above the diagonal are adjusted for multiple tests.)
tabloid sns online polls friends
tabloid 0.00 0.03 0.53 1.00 0.27
sns 0.00 0.00 0.00 0.76 1.00
online 0.11 0.00 0.00 0.02 0.60
polls 0.58 0.25 0.00 0.00 0.42
friends 0.04 0.98 0.15 0.07 0.00
To see confidence intervals of the correlations, print with the short=FALSE option
We do a T-test with country (land) as independent or grouping variable and the summarized mean for perceived influence as dependent variable.
mydata$land <- as.factor(mydata$land)
mydata$gender <- as.factor(mydata$gender)
t_test(x = mydata,
formula = suminf_mean ~ gender,
alternative = "two-sided")
Warning: The statistic is based on a difference or ratio; by default, for difference-based statistics, the explanatory variable is subtracted in the order "1" - "2", or divided in the order "1" / "2" for ratio-based statistics. To specify this order yourself, supply `order = c("1", "2")`.
We calculate the reliability for a battery of items. as we can see, it’s low…
sel <- mydata %>%
select(press, tabloid, public, commercial, online)
cronbach.alpha(sel, CI = TRUE, standardized = TRUE, na.rm = T)
Standardized Cronbach's alpha for the 'sel' data-set
Items: 5
Sample units: 172
alpha: 0.568
Bootstrap 95% CI based on 1000 samples
2.5% 97.5%
0.452 0.656
We regress (linear multiple regression) satisfaction with democracy on the perceived influence of different media and opinion polls.
lmfit <- lm(satisfaction ~press + tabloid + public + commercial + polls, data = mydata)
summary(lmfit)
Call:
lm(formula = satisfaction ~ press + tabloid + public + commercial +
polls, data = mydata)
Residuals:
Min 1Q Median 3Q Max
-2.44988 -0.86638 -0.01921 0.78551 2.17586
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.80360 0.51522 7.382 8.03e-12 ***
press -0.15775 0.08296 -1.902 0.0590 .
tabloid 0.05159 0.06655 0.775 0.4393
public 0.01593 0.10083 0.158 0.8747
commercial 0.03107 0.07140 0.435 0.6641
polls -0.14537 0.07709 -1.886 0.0611 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.9981 on 160 degrees of freedom
(6 observations deleted due to missingness)
Multiple R-squared: 0.05016, Adjusted R-squared: 0.02048
F-statistic: 1.69 on 5 and 160 DF, p-value: 0.1399
To be continued…