2. How many of them responded “yes” in the variable “c1.3”?
Answer: 45
Lyra%>%
group_by(c1.3)%>%
summarise(Frequency=n())%>%
mutate(Percentage =round(Frequency/sum(Frequency)*100, 2))
# A tibble: 2 × 3
c1.3 Frequency Percentage
<chr> <int> <dbl>
1 no 55 55
2 yes 45 45
3. Provide the frequency and percentage of the categories in the
variable “NewYears.”
Lyra1%>%
group_by(NewYears = case_when(
NewYears<=12 ~ '12 years and below',
NewYears>=13 & NewYears<=15 ~ '13 to 15 years',
NewYears>=16 ~ '16 years and above'
))%>%
summarise(Frequency=n())%>%
mutate(Percentage =round(Frequency/sum(Frequency)*100, 2))
# A tibble: 3 × 3
NewYears Frequency Percentage
<chr> <int> <dbl>
1 12 years and below 44 44
2 13 to 15 years 29 29
3 16 years and above 27 27
4. Provide the mean and standard deviation of the variable “score”
when they are grouped according to the categories of the variable
“NewYears.”
Lyra1%>%
group_by(NewYears = case_when(
NewYears<=12 ~ '12 years and below',
NewYears>=13 & NewYears<=15 ~ '13 to 15 years',
NewYears>=16 ~ '16 years and above'
))%>%
summarize(
Mean_Score = mean(Score),
SD_Score = sd(Score)
)
# A tibble: 3 × 3
NewYears Mean_Score SD_Score
<chr> <dbl> <dbl>
1 12 years and below 43 0.889
2 13 to 15 years 42.7 1.03
3 16 years and above 42.9 1.17
5. What is the strength of association on the variables “Score” and
“ScoreB?”
Answer: Pearson Correlation Coefficient: 0.0549
A correlation coefficient of 0.05485635 suggests a very weak or
negligible relationship between the variables being studied. This value
indicates that there is little to no linear relationship between the two
variables.
correlation <- cor(Lyra$Score, Lyra$ScoreB)
print(correlation)
[1] 0.05485635
6. For the Gender” variable, classify this variable as boy, girl,
and others. Provide the Frequency Distribution Table, that is, showing
the frequency and the percentage of the categories for the variable
“Gender.”
Lyra1%>%
group_by(Gender = case_when(
Gender=="boy" ~ "Boy",
Gender=="girl" ~ "Girl",
TRUE ~ "Others"
))%>%
summarise(Frequency=n())%>%
mutate(Percentage =round(Frequency/sum(Frequency)*100, 2))
# A tibble: 3 × 3
Gender Frequency Percentage
<chr> <int> <dbl>
1 Boy 35 35
2 Girl 36 36
3 Others 29 29
7. Provide the same output given below
#Summary statistics
Lyra2%>%
group_by(ScoreABC) %>%
get_summary_stats(Scorevalues, type = "mean_sd")
# A tibble: 3 × 5
ScoreABC variable n mean sd
<fct> <fct> <dbl> <dbl> <dbl>
1 ScoreA Scorevalues 100 2.55 0.446
2 ScoreB Scorevalues 100 2.31 0.472
3 ScoreC Scorevalues 100 2.36 0.479
8. Is there a significant difference on the variable “ScoreA” values
between M and F in the variable “Sex” using the following methods:
T.test
t.test<-t.test(Lyra3$ScoreA ~ Lyra3$Sex, alternative = "two.sided")
print(t.test)
Welch Two Sample t-test
data: Lyra3$ScoreA by Lyra3$Sex
t = 0.62143, df = 86.183, p-value = 0.536
alternative hypothesis: true difference in means between group F and group M is not equal to 0
95 percent confidence interval:
-0.1246377 0.2380032
sample estimates:
mean in group F mean in group M
2.574074 2.517391
The t-test, (p = 0.536), suggests no significant difference in
“ScoreA” between “Sex” (M and F). The p-value, higher than 0.05,
suggests that there is not enough evidence to reject the null hypothesis
stating that both groups are similar.
Mann-Whitney U Test
mwu.test<-wilcox.test(ScoreA ~ Sex, data=Lyra3)
print(mwu.test)
Wilcoxon rank sum test with continuity correction
data: ScoreA by Sex
W = 1346, p-value = 0.4584
alternative hypothesis: true location shift is not equal to 0
The results where the p-value is 0.4584, which is higher than 0.05,
indicates that there is not enough evidence to conclude a significant
difference in “ScoreA” values between “Sex” (M and F). This suggests
that, based on this test, the difference between the groups is not
statistically significant.
9. Is there a significant difference in the categories (boy, girl,
and Others) on the variable “ScoreC” using F-test?
summary(model1)
Df Sum Sq Mean Sq F value Pr(>F)
Gender 2 1.662 0.8309 3.823 0.0252 *
Residuals 97 21.085 0.2174
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
With a p-value of 0.0252 being less than 0.05, it suggests there is
evidence to support a significant difference in “ScoreC” among the
categories (Boy, Girl, and Others). Therefore, based on this test, the
differences observed in “ScoreC” among these categories are
statistically significant.
10. Considering the variables “ScoreA”, “ScoreB”, and “ScoreC” as
independent variables, which of these significantly predicts the
variable “Score?”
summary(model)
Call:
lm(formula = Score ~ ScoreA + ScoreB + ScoreC, data = Lyra)
Residuals:
Min 1Q Median 3Q Max
-2.29078 -0.58529 0.07604 0.72523 2.27412
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 43.78202 0.75029 58.353 < 2e-16 ***
ScoreA -0.66041 0.21978 -3.005 0.00339 **
ScoreB -0.04561 0.24431 -0.187 0.85229
ScoreC 0.37603 0.24136 1.558 0.12253
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.969 on 96 degrees of freedom
Multiple R-squared: 0.1037, Adjusted R-squared: 0.07567
F-statistic: 3.702 on 3 and 96 DF, p-value: 0.01435
From these results, it appears that only ScoreA significantly
predicts the variable “Score.” The low p-value (0.00339) for ScoreA
suggests a strong statistical significance, indicating that changes in
ScoreA are related to changes in the dependent variable “Score.”
Conversely, ScoreB and ScoreC do not seem to have a significant
relationship with the variable “Score,” as their p-values are higher
than the typical significance threshold of 0.05.