2. How many of them responded “yes” in the variable “c1.3”?
Answer: 45
Lyra%>%
group_by(c1.3)%>%
summarise(Frequency=n())%>%
mutate(Percentage =round(Frequency/sum(Frequency)*100, 2))
# A tibble: 2 × 3
c1.3 Frequency Percentage
<chr> <int> <dbl>
1 no 55 55
2 yes 45 45
3. Provide the frequency and percentage of the categories in the
variable “NewYears.”
Lyra1%>%
group_by(NewYears = case_when(
NewYears<=12 ~ '12 years and below',
NewYears>=13 & NewYears<=15 ~ '13 to 15 years',
NewYears>=16 ~ '16 years and above'
))%>%
summarise(Frequency=n())%>%
mutate(Percentage =round(Frequency/sum(Frequency)*100, 2))
# A tibble: 3 × 3
NewYears Frequency Percentage
<chr> <int> <dbl>
1 12 years and below 44 44
2 13 to 15 years 29 29
3 16 years and above 27 27
4. Provide the mean and standard deviation of the variable “score”
when they are grouped according to the categories of the variable
“NewYears.”
Lyra1%>%
group_by(NewYears = case_when(
NewYears<=12 ~ '12 years and below',
NewYears>=13 & NewYears<=15 ~ '13 to 15 years',
NewYears>=16 ~ '16 years and above'
))%>%
summarize(
Mean_Score = mean(Score),
SD_Score = sd(Score)
)
# A tibble: 3 × 3
NewYears Mean_Score SD_Score
<chr> <dbl> <dbl>
1 12 years and below 43 0.889
2 13 to 15 years 42.7 1.03
3 16 years and above 42.9 1.17
5. What is the strength of association on the variables “Score” and
“ScoreB?”
Answer: Correlation: 0.0549
correlation <- cor(Lyra$Score, Lyra$ScoreB)
print(correlation)
[1] 0.05485635
6. For the Gender” variable, classify this variable as boy, girl,
and others. Provide the Frequency Distribution Table, that is, showing
the frequency and the percentage of the categories for the variable
“Gender.”
Lyra1%>%
group_by(Gender = case_when(
Gender=="boy" ~ "Boy",
Gender=="girl" ~ "Girl",
TRUE ~ "Others"
))%>%
summarise(Frequency=n())%>%
mutate(Percentage =round(Frequency/sum(Frequency)*100, 2))
# A tibble: 3 × 3
Gender Frequency Percentage
<chr> <int> <dbl>
1 Boy 35 35
2 Girl 36 36
3 Others 29 29
7. Provide the same output given below
#Summary statistics
Lyra2%>%
group_by(ScoreABC) %>%
get_summary_stats(Scorevalues, type = "mean_sd")
# A tibble: 3 × 5
ScoreABC variable n mean sd
<fct> <fct> <dbl> <dbl> <dbl>
1 ScoreA Scorevalues 100 2.55 0.446
2 ScoreB Scorevalues 100 2.31 0.472
3 ScoreC Scorevalues 100 2.36 0.479
8. Is there a significant difference on the variable “ScoreA” values
between M and F in the variable “Sex” using the following methods:
T.test
t.test<-t.test(Lyra3$ScoreA ~ Lyra3$Sex, alternative = "two.sided")
mwu.test<-wilcox.test(ScoreA ~ Sex, data=Lyra3)
print(t.test)
Welch Two Sample t-test
data: Lyra3$ScoreA by Lyra3$Sex
t = 0.62143, df = 86.183, p-value = 0.536
alternative hypothesis: true difference in means between group F and group M is not equal to 0
95 percent confidence interval:
-0.1246377 0.2380032
sample estimates:
mean in group F mean in group M
2.574074 2.517391
Mann-Whitney U Test
print(mwu.test)
Wilcoxon rank sum test with continuity correction
data: ScoreA by Sex
W = 1346, p-value = 0.4584
alternative hypothesis: true location shift is not equal to 0
9. Is there a significant difference in the categories (boy, girl,
and Others) on the variable “ScoreC” using F-test?
summary(model1)
Df Sum Sq Mean Sq F value Pr(>F)
Gender 2 1.662 0.8309 3.823 0.0252 *
Residuals 97 21.085 0.2174
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
10. Considering the variables “ScoreA”, “ScoreB”, and “ScoreC” as
independent variables, which of these significantly predicts the
variable “Score?”
summary(model)
Call:
lm(formula = Score ~ ScoreA + ScoreB + ScoreC, data = Lyra)
Residuals:
Min 1Q Median 3Q Max
-2.29078 -0.58529 0.07604 0.72523 2.27412
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 43.78202 0.75029 58.353 < 2e-16 ***
ScoreA -0.66041 0.21978 -3.005 0.00339 **
ScoreB -0.04561 0.24431 -0.187 0.85229
ScoreC 0.37603 0.24136 1.558 0.12253
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.969 on 96 degrees of freedom
Multiple R-squared: 0.1037, Adjusted R-squared: 0.07567
F-statistic: 3.702 on 3 and 96 DF, p-value: 0.01435