What is Simple Linear Regression

Linear Regression is used to model the interaction between two or more data sets. It can be used by a wide assortment of career fields as the variables needed (an independent and dependent(s)), can be sourced from almost any data set required for analysis.

  • A Few Possible Uses
    • Generation of Predictions
    • Evaluation of Relationships
    • Analysis of Risk
    • Pattern of Life Analysis

Example of Financial Loss from DDoS Attacks in USA

Linear Regression of One Variable

An individual linear regression model can reveal items such as patterns of interest to verify if measures are working. In the case of DDoS attacks against the USA, a steady decrease like this could queue an analyst to look into causes like improved cyber security measures or a decrease in value of the asset. Using the data below, analysts then can attempt to model and predict what future losses may look like from DDoS Attacks. Specific to the linear plot, the coefficients can be used to model the line of best fit for changes in the x-value for the y-value.

Call:
lm(formula = Year ~ Financial.Loss..in.Million..., data = ddosAttacks)

Residuals:
    Min      1Q  Median      3Q     Max 
-4.0396 -1.9445 -0.2565  1.9766  4.9446 

Coefficients:
                                Estimate Std. Error t value Pr(>|t|)    
(Intercept)                   2019.74940    0.70312  2872.6   <2e-16 ***
Financial.Loss..in.Million...   -0.01700    0.01134    -1.5    0.139    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.697 on 58 degrees of freedom
Multiple R-squared:  0.03733,   Adjusted R-squared:  0.02073 
F-statistic: 2.249 on 1 and 58 DF,  p-value: 0.1391

Escalation of Attacks vs. Year

sumOfAttacks <- df%>%
  group_by(Year, Attack.Type)%>%
  summarise(count = n())
yearsAttacked <- ggplot(data = sumOfAttacks, 
                       aes(x = Year, 
                           y = count, 
                           color = Attack.Type)) + geom_point(alpha = 1/2)
yearsAttacked + geom_smooth(aes(x = Year, y = count, color = Attack.Type),method="lm",se=FALSE)

Linear Regression of Multiple Classes

As noted with the prior example, Linear Regression can be used for pattern of life analysis for multiple variables. In the case of these cyber security threats, the trend of usage can be examined to see what the rising threat could be. Specifically, it can be noted that the Ransomware attacks are on the rise while DDoS attacks are actually trending downward. This could tip the analysts towards examining the Defense Mechanisms or related categories. The data below is specific to the linear model of DDoS attacks where we can once again use the coefficients to get the linear model.

Call:
lm(formula = Year ~ count, data = ddosSum)

Residuals:
    Min      1Q  Median      3Q     Max 
-4.7829 -1.7810 -0.0529  2.1724  4.5821 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept) 2024.34550   13.45625 150.439 4.26e-15 ***
count         -0.09125    0.25270  -0.361    0.727    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.185 on 8 degrees of freedom
Multiple R-squared:  0.01604,   Adjusted R-squared:  -0.107 
F-statistic: 0.1304 on 1 and 8 DF,  p-value: 0.7274

3D Modeling of Linear Regression

Manipulating data in a 3D model allows for the user to see new insights and comparisons that would be harder to see in a 2D model. For linear regression, it allows the user to add additional variables to help shape a hypothesis or theory for prediction and analysis.

Final Thoughts for Simple Linear Regression

Simple Linear Regression is a critical tool that helps drive the data analysts of various companies. It allows us to find patterns within the data for predictions of events and behaviors of variables. Simple Linear Regression can use a variety of variables to extract the potential relationship that we are looking for. It then can be paired with other statistical tools to provide a better prediction or relation.

While Simple Linear Regression is a useful tool, it must be remembered that correlation does not imply causation. By this I mean, that while there may appear to be a relationship, the variables examined may not have any influence over one another.