R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

There are many different kinds of surveys and a lot of documentation on how to arrange them, from sampling to drawing conclusions. For example, the SMART Methodology is based on the two most vital and basic public health indicators: nutrition and mortality during humanitarian crises. Additionally, there are many standard questions and survey templates available on the IFRC website and within the UNHCR Assessment and Monitoring Resource Centre. There are also XLSForms that can be directly uploaded into Kobo

Central in this document are some examples of scaling. There are different types of scales used in research, such as the Likert scale and the social distance scale. These scales are often used to measure attitudes or opinions. Another approach to measuring social constructs, like levels of racism or job satisfaction, is to use a set of questions that cover the concept. This method involves asking multiple questions that together provide a comprehensive understanding of the construct.

Likert scale

A Likert scale is a psychometric scale commonly used in surveys to measure attitudes, opinions, or perceptions. It typically consists of a series of statements, and respondents indicate their level of agreement or disagreement on a symmetric agree-disagree scale. For example, a 5-point Likert scale might include options like: Strongly Disagree, Disagree, Neutral, Agree, Strongly Agree.

Converting to an Ordinal Scale A Likert scale is inherently ordinal because it represents ordered categories. However, when we talk about converting it to an ordinal scale, we often mean treating the responses as ordinal data in statistical analyses. This involves recognizing that the responses have a meaningful order but the intervals between them are not necessarily equal.

Besides the likert scale there are also other scales like the social distance scale or guttman scale.

In the The Epidemiologist R Handbook there is an example that present all 8 questions at once (q1 - q8) What the exact questions in the survey is not mentioned.

tables

The first table present the result for question q1 and in the next table shows a summary for question 1 to 5 from the survey. There is now subdivision to the three groups as shown in the first plot.

Q1A

Frequency

Percent

very dissatisfied

1

2.6

dissatisfied

1

2.6

neutral

2

5.3

satisfied

16

42.1

very satisfied

15

39.5

3

7.9

question

very_poor

poor

neutral

good

very_good

not_applicable

Q1A

3 (7.9%)

1 (2.6%)

1 (2.6%)

2 (5.3%)

16 (42.1%)

15 (39.5%)

Q2A

1 (2.6%)

1 (2.6%)

3 (7.9%)

1 (2.6%)

12 (31.6%)

20 (52.6%)

Q3A

1 (2.6%)

0 (0%)

4 (10.5%)

1 (2.6%)

17 (44.7%)

15 (39.5%)

Q4A

2 (5.3%)

16 (42.1%)

2 (5.3%)

1 (2.6%)

13 (34.2%)

4 (10.5%)

Q5A

17 (44.7%)

1 (2.6%)

10 (26.3%)

0 (0%)

8 (21.1%)

2 (5.3%)

PiE Charts

Other presentation

I change the names of the questions q1 to q5 into questions about the flavours of icecreams witch conqreet questions I found on the internet. And another way of presentation of the likert scale.

Statistics

When a Likert scale is converted into an ordinal scale with values from 1 to 5, it is possible to determine the mean or the sum of one question or all the questions.

Item

low

neutral

high

mean

sd

Ice cream is my favorite food

2.9

0.0

97.1

4.4

0.6

I think eating about ice cream daily

10.8

0.0

89.2

4.3

1.0

My favorite brand is Ben & Jerry's

10.8

0.0

89.2

4.2

0.9

It was a mistake to retire 'Jam Core'

50.0

0.0

50.0

2.7

1.6

Ice cream is better than sorbet

52.4

0.0

47.6

3.0

1.2

Sum per question

sum_value1

sum_value2

sum_value3

sum_value4

sum_value5

153

160

155

96

63

Reliablity - Cronbach Alfa

When you want to measure a perception or opinion, you often need multiple questions to fully capture the construct. For example, to measure “perceived stress,” you might include questions about different stressors, physical symptoms, and emotional responses. Cronbach’s alpha helps ensure that these questions together provide a reliable measure of the overall perception of stress.

Cronbach’s alpha helps determine if these questions collectively measure the overall construct of job satisfaction reliably. Cronbach’s alpha is a number between 0 and 1 that indicates how well a set of questions (items) measures a single construct. A higher value indicates a higher degree of internal consistency. Sometimes, removing a question (item) can improve the alpha score. This is because some questions may not fit well with the overall construct being measured, or they may add noise to the measurement.A crombach’s alfa above 0.7 means a good reliability.

I have used the questions q1 to q4 and change it into questions about ice creams. A reliabilty test is just hypothetical because there are no respondents who answered the questions about ice. But if looked at “Reliability if an item is dropped:” (third table of the output) there is an improvement possible to delete question q4.

## 
## Reliability analysis   
## Call: psych::alpha(x = df10[c("Q1A", "Q2A", "Q3A", "Q4A")], check.keys = TRUE)
## 
##   raw_alpha std.alpha G6(smc) average_r S/N  ase mean   sd median_r
##       0.41      0.54    0.54      0.23 1.2 0.16  4.1 0.68      0.2
## 
##     95% confidence boundaries 
##          lower alpha upper
## Feldt     0.03  0.41  0.66
## Duhachek  0.09  0.41  0.73
## 
##  Reliability if an item is dropped:
##      raw_alpha std.alpha G6(smc) average_r  S/N alpha se  var.r med.r
## Q1A       0.31      0.43    0.43      0.20 0.75    0.201 0.0961 0.025
## Q2A       0.19      0.33    0.30      0.14 0.49    0.213 0.0446 0.025
## Q3A       0.19      0.32    0.29      0.14 0.48    0.217 0.0466 0.015
## Q4A-      0.70      0.70    0.62      0.44 2.38    0.077 0.0099 0.386
## 
##  Item statistics 
##       n raw.r std.r r.cor r.drop mean   sd
## Q1A  35  0.54  0.69 0.509  0.326  4.4 0.65
## Q2A  37  0.68  0.75 0.684  0.373  4.3 1.03
## Q3A  37  0.67  0.76 0.687  0.400  4.2 0.91
## Q4A- 36  0.64  0.40 0.027  0.022  3.3 1.62
## 
## Non missing response frequency for each item
##        1    2    4    5 miss
## Q1A 0.00 0.03 0.54 0.43 0.08
## Q2A 0.03 0.08 0.32 0.57 0.03
## Q3A 0.00 0.11 0.49 0.41 0.03
## Q4A 0.44 0.06 0.39 0.11 0.05

Principals component analyse

Principal component analysis (PCA) is used when several questions or variables reflect a common underlying factor, allowing them to be combined into a single variable. PCA can be employed to reduce the dimensionality of the data by collapsing different variables (or questions) into one principal component, which represents the combined effect of the original variables. This technique helps to simplify the data while retaining as much of the original information as possible.”

It goes beyond the scope to perform all the steps of this component analysis. The plot shows that question 4 has little relation to the other questions. Therefore, it does not belong to the same item or component. It also supports the Cronbach’s alpha analysis, where removing question 4 improved the outcome.

Spurious relations

Both Cronbach’s alpha and principal component analysis (PCA) rely on the correlation between items. It’s important to carefully consider whether a correlation exists, as these methods do not prevent spurious relationships. The best way to prevent this is by using existing and validated questionnaires and constructs.

A well-known example of a spurious relationship is the correlation between storks and babies. In some regions, there appears to be a positive correlation between the number of storks and the number of babies born. However, this relationship is not causal. Instead, it is influenced by a third variable, such as rural areas having more storks and higher birth rates compared to urban areas. This example highlights the importance of critically evaluating correlations to avoid drawing incorrect conclusions.

Another example is the correlation between ice cream sales and the number of children drowning. During the summer months, both ice cream sales and drowning incidents tend to increase. However, this does not mean that buying ice cream causes drowning. Instead, the increase in both is due to a third variable: warmer weather, which leads to more people buying ice cream and more people swimming.