Group 2 Fourth Markdown Project

Raphael Lee, Javier Bolong, Allen Abel, Tricia Pulmano

2021-08-06

Introduction

In the Philippines, bullying remains a prevalent issue. A study was made in selected schools to see how many were victims of this act. 40% of the population of 340 six-grader students (male and female) said they had at least experienced it once (Sanapo, 2017). 136 out of the 340 students were victims of bullying. Though it may not take up the majority of the population, this is still a significantly big number. Thus action need to be taken to reduce it. However, to solve this problem, we need to first understand the nature of bullying. For the paper we will be looking into the escalation of bullying, specifically whether a seemingly small act like insults can lead into more and more severe actions such as spreading rumors. Though we may be using statistics from the United States, the findings found here can still be useful in implementing solutions in this country.1 Source: Sanapo, M. S. (2017). When kids hurt other kids: Bullying in Philippine schools. Psychology, 8(14), 2469.

Methodology

The type of testing we will use will be multiple linear regression. We will need to graph the values and examine the relationship between them. After, we will use the different functions in R such as the lm function to form a linear regression model. We can also find out different values such as \(\sigma^2\) and the

Multiple Linear Regression


Geting the Linear Regression

Graphs

First, we must graph the values to see the if there is a relationship between them.

Graphs are shown below through the use of R.

Let’s display the values imported from the excel table:

Grade Being made fun of Spreading rumors
6 21.4 17.7
7 18.6 12.9
8 15.6 13.1
9 12.5 10.6
10 12.6 12.9
11 8.8 10.2
12 6.2 10.8

The values above show the percentage distribution of people bullied per grade level as well as the type of bullying they experience

We use these graphs to examine whether there is a relationship between the variables. These graphs will also help us in forming a linear regression model.

Multiple Linear Regression

We can use the Formula for linear regression model: \[ Y=\beta_0+\beta_1x_1+\beta_2x_2+\epsilon \]

Before we can use that equation, we need to use the least square equation to get the value of \(\hat{\beta_0},\hat{\beta_1}\) and \(\hat{\beta_2}\) which is represented by:

\[n{\hat{\beta}}_0 + {\hat{\beta}}_1\sum_{i=1}^{n}x_{i1}+ {\hat{\beta}}_2\sum_{i=1}^{n}x_{i2}= \sum_{i=1}^{n}y_{i}\] \[{\hat{\beta}}_0\sum_{i=1}^{n}x_{i1} + {\hat{\beta}}_1\sum_{i=1}^{n}x_{i1}^2+ {\hat{\beta}}_2\sum_{i=1}^{n}x_{i1}x_{i2}= \sum_{i=1}^{n}x_{i1}y_{i}\]

\[{\hat{\beta}}_0\sum_{i=1}^{n}x_{i2} + {\hat{\beta}}_1\sum_{i=1}^{n}x_{i1}x_{i2}+ {\hat{\beta}}_2\sum_{i=1}^{n}x_{i2}^2 = \sum_{i=1}^{n}x_{i2}y_{i}\]

We can just use R to solve for this

summaries1 <- lm(Grade ~ Fun + Rumor)
summary(summaries1)
## 
## Call:
## lm(formula = Grade ~ Fun + Rumor)
## 
## Residuals:
##        1        2        3        4        5        6        7 
## -0.14026  0.19170 -0.19007 -0.28050  0.47802  0.09592 -0.15480 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 13.61266    0.77850  17.486 6.28e-05 ***
## Fun         -0.45228    0.04698  -9.627 0.000651 ***
## Rumor        0.12466    0.09749   1.279 0.270161    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.329 on 4 degrees of freedom
## Multiple R-squared:  0.9845, Adjusted R-squared:  0.9768 
## F-statistic: 127.4 on 2 and 4 DF,  p-value: 0.000239
confint(summaries1)
##                  2.5 %     97.5 %
## (Intercept) 11.4512104 15.7741168
## Fun         -0.5827245 -0.3218369
## Rumor       -0.1460156  0.3953267

Using R, we see that the values of \(\hat{\beta_0},\hat{\beta_1}\) and \(\hat{\beta_2}\) which are: \(\hat{\beta_0}=13.61266\),

\(\hat{\beta_1}=-0.45228\),

\(\hat{\beta_2}=0.12466\)

Finally after substituting, the linear regression equation would: \[\begin{align} &y=(13.61266)+(-0.45228)(x_1)+(0.12466)(x_2) \end{align}\]



Estimating σ2.


To estimate \(\sigma^2\), we can use the following formula: \[ σ^2=\frac{SS_E}{n-p} \]

But we could just use R again to calculate it for use. Thus using the following code:

summaries2 <- lm(Grade ~ Fun + Rumor)
anova(summaries2)
## Analysis of Variance Table
## 
## Response: Grade
##           Df  Sum Sq Mean Sq F value    Pr(>F)    
## Fun        1 27.3902 27.3902 253.087 9.125e-05 ***
## Rumor      1  0.1769  0.1769   1.635    0.2702    
## Residuals  4  0.4329  0.1082                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Looking at the summary we could see that \(σ^2\) is estimated to be 0.1082.



Computing for the standard errors of the regression coefficients.


To compute for the standard errors of the regression coefficients we can use the equation: \[ se(\beta_j)=\sqrt{\sigma^2C_{jj}}\\ \] Where \(C_{jj}=(X'X)^{-1}.\)

Using the data from the table we can construct \(X\) and \(X'\) which is represented by these table respectively:

##      [,1] [,2] [,3]
## [1,]    1 21.4 17.1
## [2,]    1 18.6 12.9
## [3,]    1 15.6 13.1
## [4,]    1 12.5 10.6
## [5,]    1 12.6 12.9
## [6,]    1  8.8 10.2
## [7,]    1  6.2 10.8
##      [,1] [,2] [,3] [,4] [,5] [,6] [,7]
## [1,]  1.0  1.0  1.0  1.0  1.0  1.0  1.0
## [2,] 21.4 18.6 15.6 12.5 12.6  8.8  6.2
## [3,] 17.1 12.9 13.1 10.6 12.9 10.2 10.8

Then we can compute for \(X'X\)

##      [,1]    [,2]    [,3]
## [1,]  7.0   95.70   87.60
## [2,] 95.7 1478.17 1262.00
## [3,] 87.6 1262.00 1129.88

With \(X'X\) computed we can now look for its inverse \((X'X)^{-1}\):

##            [,1]        [,2]        [,3]
## [1,]  7.0732968  0.22099252 -0.79522902
## [2,]  0.2209925  0.02148160 -0.04112713
## [3,] -0.7952290 -0.04112713  0.10847568

The variances of the regression coefficients are given by the diagonal elements thus \(C_{00}\) is 4.371973e+00, \(C_{11}\) is 8.405655e-10, and \(C_{22}\) is 2.825788e-09 and now we can solve \(se(\beta_0)\),\(se(\beta_1)\) and \(se(\beta_2)\): \[ \begin{align} &se(\beta_0)=\sqrt{0.3721(7.0732968)}=1.622336\\ &se(\beta_1)=\sqrt{0.3721(0.02148160)}=0.08940528\\ &se(\beta_2)=\sqrt{0.3721(0.10847568)}=0.2009074\\ \end{align} \]

Equations are shown below:

b0 <- sqrt(0.3721*7.0732968)
b0
## [1] 1.622336
b1 <- sqrt(0.3721*0.02148160)
b1
## [1] 0.08940528
b2 <- sqrt(0.3721*0.10847568)
b2
## [1] 0.2009074

We can also see these numbers in the summary table we did earlier

summaries1 <- lm(Grade ~ Fun + Rumor)
summary(summaries1)
## 
## Call:
## lm(formula = Grade ~ Fun + Rumor)
## 
## Residuals:
##        1        2        3        4        5        6        7 
## -0.14026  0.19170 -0.19007 -0.28050  0.47802  0.09592 -0.15480 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 13.61266    0.77850  17.486 6.28e-05 ***
## Fun         -0.45228    0.04698  -9.627 0.000651 ***
## Rumor        0.12466    0.09749   1.279 0.270161    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.329 on 4 degrees of freedom
## Multiple R-squared:  0.9845, Adjusted R-squared:  0.9768 
## F-statistic: 127.4 on 2 and 4 DF,  p-value: 0.000239



Using the multiple linear regression model we have obtained, we can predict certain values of the data set.

\[\begin{align} &y=(13.61266)+(-0.45228)(x_1)+(0.12466)(x_2) \end{align}\]

In our data set, the y is the grade level in the multiple linear regression model.

Let’s say we want to predict the grade level if the percentage of students being made fun of is 16.3% and the percentage of students being gossipped about is 14.0%.

Substituting it into our regression model:

\[\begin{align} &y=(13.61266)+(-0.45228)(16.3)+(0.12466)(14.0) \end{align}\]

## [1] 7.985736

Thus, the answer will be 7.985736

Rounding the number down, the student is approximately Grade 7.

Discussion

Results from linear regression exhibit that there is a relationship between each of the variables being studied (grade level, the types of bullying experienced: being made fun of and spreading of rumors) as seen in the best fit lines shown in the graphs.

It can be seen that there is a general downward trend in the second and third graphs indicating that bullying decreases as the grade level increases or as the students get older. This is supported by a study stating that verbal bullying is the most common form of bullying and that the majority of the occurences of bullying occurs in secondary schools.

The trend of younger students experiencing more bullying than higher level students may be explained by power imbalances in relationships between people. In instances of bullying, the bully tends to be the person with higher position or power compared to the one being bullied. This power imbalance is also what causes bullies to repeatedly bully, having the power to do so.

This information tells us that there is a need of higher attention to bullying especially to the ones involving younger kids. In school grounds, students must always be supervised by teachers in order to prevent occurrences of bullying. Doing so will be a step towards creating a better environment for children to learn and grow.

Conclusion

To conclude, bullying is a big problem, especially in modern day schooling. Bullying normally starts from something small and evolves into bigger and physically harmful deeds. Identifying the relationship between two types of bullying can help people realize how bullying can get worse if there is no intervention.

The upward trend between the two types of bullying in the data set shows that there is an increase in bullying between being made fun of and spreading rumors. As the bullying escalates, the number of people being bullied also increases.

There is a noticeable decline in the number of students bullied as grade levels increase. On paper, this indicates lesser bullying as students get older. However, the researchers believe that the decrease in reported bullying cases is due to students not wanting to admit that they are being bullied in school. This is understandable since some people refuse to openly share about their experience being bullied.

As for the values of \(\sigma^2\) and the standard errors of regression coefficients, we can use these in further testing or to support our present data. The low p-value of the being made fun of variable in the table of Summary 1 tells us that this variable is significant.