1. Provide the minimum, 1st quartile, median, mean, 3rd quartile and maximum of the variable “Years”.

summary(Lyra$Years)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   9.00   11.00   13.00   13.77   16.00   30.00 

2. How many of them responded “yes” in the variable “c1.3”?

Answer: 45

Lyra%>%
  group_by(c1.3)%>%
  summarise(Frequency=n())%>%
  mutate(Percentage =round(Frequency/sum(Frequency)*100, 2))
# A tibble: 2 × 3
  c1.3  Frequency Percentage
  <chr>     <int>      <dbl>
1 no           55         55
2 yes          45         45

3. Provide the frequency and percentage of the categories in the variable “NewYears.”

Lyra1%>%
  group_by(NewYears = case_when(
    NewYears<=12 ~ '12 years and below',
    NewYears>=13 & NewYears<=15 ~ '13 to 15 years',
    NewYears>=16 ~ '16 years and above'
  ))%>%
  summarise(Frequency=n())%>%
  mutate(Percentage =round(Frequency/sum(Frequency)*100, 2))
# A tibble: 3 × 3
  NewYears           Frequency Percentage
  <chr>                  <int>      <dbl>
1 12 years and below        44         44
2 13 to 15 years            29         29
3 16 years and above        27         27

4. Provide the mean and standard deviation of the variable “score” when they are grouped according to the categories of the variable “NewYears.”

Lyra1%>%
  group_by(NewYears = case_when(
    NewYears<=12 ~ '12 years and below',
    NewYears>=13 & NewYears<=15 ~ '13 to 15 years',
    NewYears>=16 ~ '16 years and above'
  ))%>%
  summarize(
    Mean_Score = mean(Score),
    SD_Score = sd(Score)
  )
# A tibble: 3 × 3
  NewYears           Mean_Score SD_Score
  <chr>                   <dbl>    <dbl>
1 12 years and below       43      0.889
2 13 to 15 years           42.7    1.03 
3 16 years and above       42.9    1.17 

5. What is the strength of association on the variables “Score” and “ScoreB?”

Answer: Pearson Correlation Coefficient: 0.0549

A correlation coefficient of 0.05485635 suggests a very weak or negligible relationship between the variables being studied. This value indicates that there is little to no linear relationship between the two variables.

correlation <- cor(Lyra$Score, Lyra$ScoreB)

print(correlation)
[1] 0.05485635

6. For the Gender” variable, classify this variable as boy, girl, and others. Provide the Frequency Distribution Table, that is, showing the frequency and the percentage of the categories for the variable “Gender.”

Lyra1%>%
  group_by(Gender = case_when(
    Gender=="boy" ~ "Boy",
    Gender=="girl" ~ "Girl",
    TRUE ~ "Others"
  ))%>%
  summarise(Frequency=n())%>%
  mutate(Percentage =round(Frequency/sum(Frequency)*100, 2))
# A tibble: 3 × 3
  Gender Frequency Percentage
  <chr>      <int>      <dbl>
1 Boy           35         35
2 Girl          36         36
3 Others        29         29

7. Provide the same output given below

#Summary statistics
Lyra2%>%
  group_by(ScoreABC) %>%
   get_summary_stats(Scorevalues, type = "mean_sd")
# A tibble: 3 × 5
  ScoreABC variable        n  mean    sd
  <fct>    <fct>       <dbl> <dbl> <dbl>
1 ScoreA   Scorevalues   100  2.55 0.446
2 ScoreB   Scorevalues   100  2.31 0.472
3 ScoreC   Scorevalues   100  2.36 0.479

8. Is there a significant difference on the variable “ScoreA” values between M and F in the variable “Sex” using the following methods:

T.test

t.test<-t.test(Lyra3$ScoreA ~ Lyra3$Sex, alternative = "two.sided")

print(t.test)

    Welch Two Sample t-test

data:  Lyra3$ScoreA by Lyra3$Sex
t = 0.62143, df = 86.183, p-value = 0.536
alternative hypothesis: true difference in means between group F and group M is not equal to 0
95 percent confidence interval:
 -0.1246377  0.2380032
sample estimates:
mean in group F mean in group M 
       2.574074        2.517391 

The t-test, (p = 0.536), suggests no significant difference in “ScoreA” between “Sex” (M and F). The p-value, higher than 0.05, suggests that there is not enough evidence to reject the null hypothesis stating that both groups are similar.

Mann-Whitney U Test

mwu.test<-wilcox.test(ScoreA ~ Sex, data=Lyra3)

print(mwu.test)

    Wilcoxon rank sum test with continuity correction

data:  ScoreA by Sex
W = 1346, p-value = 0.4584
alternative hypothesis: true location shift is not equal to 0

The results where the p-value is 0.4584, which is higher than 0.05, indicates that there is not enough evidence to conclude a significant difference in “ScoreA” values between “Sex” (M and F). This suggests that, based on this test, the difference between the groups is not statistically significant.

9. Is there a significant difference in the categories (boy, girl, and Others) on the variable “ScoreC” using F-test?

summary(model1)
            Df Sum Sq Mean Sq F value Pr(>F)  
Gender       2  1.662  0.8309   3.823 0.0252 *
Residuals   97 21.085  0.2174                 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

With a p-value of 0.0252 being less than 0.05, it suggests there is evidence to support a significant difference in “ScoreC” among the categories (Boy, Girl, and Others). Therefore, based on this test, the differences observed in “ScoreC” among these categories are statistically significant.

10. Considering the variables “ScoreA”, “ScoreB”, and “ScoreC” as independent variables, which of these significantly predicts the variable “Score?”

summary(model)

Call:
lm(formula = Score ~ ScoreA + ScoreB + ScoreC, data = Lyra)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.29078 -0.58529  0.07604  0.72523  2.27412 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 43.78202    0.75029  58.353  < 2e-16 ***
ScoreA      -0.66041    0.21978  -3.005  0.00339 ** 
ScoreB      -0.04561    0.24431  -0.187  0.85229    
ScoreC       0.37603    0.24136   1.558  0.12253    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.969 on 96 degrees of freedom
Multiple R-squared:  0.1037,    Adjusted R-squared:  0.07567 
F-statistic: 3.702 on 3 and 96 DF,  p-value: 0.01435