Objective

Betas for individual stocks are determined by simple linear regression. The dependent variable is the total return for the stock and the independent variable is the total return for the stock market. For this case problem we will use the S&P 500 index as the measure of the total return for the stock market, and an estimated regression equation will be developed us- ing monthly data. The beta for the stock is the slope of the estimated regression equation (b1). The data contained in the file named Beta provides the total return (capital appreciation plus dividends) over 36 months for eight widely traded common stocks and the S&P 500.

###The value of beta for the stock market will always be 1; thus, stocks that tend to rise and fall with the stock market will also have a beta close to 1. Betas greater than 1 indicate that the stock is more volatile than the market, and betas less than 1 indicate that the stock is less volatile than the market. For instance, if a stock has a beta of 1.4, it is 40% more vola- tile than the market, and if a stock has a beta of .4, it is 60% less volatile than the market.

#Descriptive Statistics

# Get data
     df <- read.csv("~/Desktop/BSTAT_14_Case1_Beta.csv")
     attach(df)
     
     summary(df)

##     Month             Microsoft          Exxon.Mobil        Caterpillar      
     ##  Length:36          Min.   :-0.082010   Min.   :-0.11646   Min.   :-0.10060  
     ##  Class :character   1st Qu.:-0.037648   1st Qu.:-0.00926   1st Qu.:-0.03042  
     ##  Mode  :character   Median : 0.004000   Median : 0.01278   Median : 0.04081  
     ##                     Mean   : 0.005026   Mean   : 0.01664   Mean   : 0.03010  
     ##                     3rd Qu.: 0.043075   3rd Qu.: 0.03911   3rd Qu.: 0.06871  
     ##                     Max.   : 0.088830   Max.   : 0.23217   Max.   : 0.21847  
     ##  Johnson...Johnson     McDonald.s          Sandisk            Qualcomm       
     ##  Min.   :-0.059170   Min.   :-0.11443   Min.   :-0.28331   Min.   :-0.12170  
     ##  1st Qu.:-0.017570   1st Qu.:-0.02685   1st Qu.:-0.06935   1st Qu.:-0.04827  
     ##  Median :-0.001475   Median : 0.03701   Median : 0.07414   Median : 0.03871  
     ##  Mean   : 0.005296   Mean   : 0.02447   Mean   : 0.06926   Mean   : 0.02836  
     ##  3rd Qu.: 0.026353   3rd Qu.: 0.05877   3rd Qu.: 0.16625   3rd Qu.: 0.07992  
     ##  Max.   : 0.103340   Max.   : 0.18257   Max.   : 0.50165   Max.   : 0.21055  
     ##  Procter...Gamble      S.P.500        
     ##  Min.   :-0.05365   Min.   :-0.03429  
     ##  1st Qu.:-0.01240   1st Qu.:-0.01305  
     ##  Median : 0.01333   Median : 0.01034  
     ##  Mean   : 0.01059   Mean   : 0.01010  
     ##  3rd Qu.: 0.02772   3rd Qu.: 0.02167  
     ##  Max.   : 0.08783   Max.   : 0.08104

From the summary statistics we can say that Sandisk has the highest mean value and Microsoft has the lowest mean value.

Regression

lm1 <- lm(Microsoft~S.P.500, data=df)
     summary(lm1)

## 
     ## Call:
     ## lm(formula = Microsoft ~ S.P.500, data = df)
     ## 
     ## Residuals:
     ##       Min        1Q    Median        3Q       Max 
     ## -0.079550 -0.038259  0.005656  0.025712  0.080186 
     ## 
     ## Coefficients:
     ##              Estimate Std. Error t value Pr(>|t|)
     ## (Intercept) 0.0003984  0.0079355   0.050    0.960
     ## S.P.500     0.4583448  0.2848864   1.609    0.117
     ## 
     ## Residual standard error: 0.04438 on 34 degrees of freedom
     ## Multiple R-squared:  0.07075,    Adjusted R-squared:  0.04341 
     ## F-statistic: 2.588 on 1 and 34 DF,  p-value: 0.1169

lm2 <- lm(Exxon.Mobil~S.P.500, data=df)
     summary(lm2)

## 
     ## Call:
     ## lm(formula = Exxon.Mobil ~ S.P.500, data = df)
     ## 
     ## Residuals:
     ##       Min        1Q    Median        3Q       Max 
     ## -0.112751 -0.030479 -0.003176  0.017337  0.209095 
     ## 
     ## Coefficients:
     ##             Estimate Std. Error t value Pr(>|t|)  
     ## (Intercept) 0.009259   0.009414   0.983   0.3323  
     ## S.P.500     0.730907   0.337966   2.163   0.0377 *
     ## ---
     ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
     ## 
     ## Residual standard error: 0.05264 on 34 degrees of freedom
     ## Multiple R-squared:  0.1209, Adjusted R-squared:  0.09507 
     ## F-statistic: 4.677 on 1 and 34 DF,  p-value: 0.03769

lm3 <- lm(Caterpillar~S.P.500, data=df)
     summary(lm3)

## 
     ## Call:
     ## lm(formula = Caterpillar ~ S.P.500, data = df)
     ## 
     ## Residuals:
     ##       Min        1Q    Median        3Q       Max 
     ## -0.099586 -0.030686 -0.000617  0.031065  0.179221 
     ## 
     ## Coefficients:
     ##             Estimate Std. Error t value Pr(>|t|)    
     ## (Intercept)  0.01502    0.01019   1.474 0.149664    
     ## S.P.500      1.49320    0.36588   4.081 0.000256 ***
     ## ---
     ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
     ## 
     ## Residual standard error: 0.05699 on 34 degrees of freedom
     ## Multiple R-squared:  0.3288, Adjusted R-squared:  0.3091 
     ## F-statistic: 16.66 on 1 and 34 DF,  p-value: 0.0002565

lm4 <- lm(Johnson...Johnson~S.P.500, data=df)
     summary(lm4)

## 
     ## Call:
     ## lm(formula = Johnson...Johnson ~ S.P.500, data = df)
     ## 
     ## Residuals:
     ##      Min       1Q   Median       3Q      Max 
     ## -0.06423 -0.02289 -0.00657  0.02123  0.09806 
     ## 
     ## Coefficients:
     ##             Estimate Std. Error t value Pr(>|t|)
     ## (Intercept) 0.005207   0.006326   0.823    0.416
     ## S.P.500     0.008757   0.227098   0.039    0.969
     ## 
     ## Residual standard error: 0.03537 on 34 degrees of freedom
     ## Multiple R-squared:  4.373e-05,  Adjusted R-squared:  -0.02937 
     ## F-statistic: 0.001487 on 1 and 34 DF,  p-value: 0.9695

lm5 <- lm(McDonald.s~S.P.500, data=df)
     summary(lm5)

## 
     ## Call:
     ## lm(formula = McDonald.s ~ S.P.500, data = df)
     ## 
     ## Residuals:
     ##       Min        1Q    Median        3Q       Max 
     ## -0.116819 -0.032679  0.003738  0.032409  0.151472 
     ## 
     ## Coefficients:
     ##             Estimate Std. Error t value Pr(>|t|)    
     ## (Intercept) 0.009299   0.010054   0.925 0.361536    
     ## S.P.500     1.503201   0.360942   4.165 0.000201 ***
     ## ---
     ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
     ## 
     ## Residual standard error: 0.05622 on 34 degrees of freedom
     ## Multiple R-squared:  0.3378, Adjusted R-squared:  0.3183 
     ## F-statistic: 17.34 on 1 and 34 DF,  p-value: 0.0002015

lm6 <- lm(Sandisk~S.P.500, data=df)
     summary(lm6)

## 
     ## Call:
     ## lm(formula = Sandisk ~ S.P.500, data = df)
     ## 
     ## Residuals:
     ##     Min      1Q  Median      3Q     Max 
     ## -0.4180 -0.1311 -0.0068  0.1427  0.3261 
     ## 
     ## Coefficients:
     ##             Estimate Std. Error t value Pr(>|t|)  
     ## (Intercept)  0.04297    0.03320   1.294   0.2043  
     ## S.P.500      2.60484    1.19176   2.186   0.0358 *
     ## ---
     ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
     ## 
     ## Residual standard error: 0.1856 on 34 degrees of freedom
     ## Multiple R-squared:  0.1232, Adjusted R-squared:  0.09741 
     ## F-statistic: 4.777 on 1 and 34 DF,  p-value: 0.03582

lm7 <- lm(Qualcomm~S.P.500, data=df)
     summary(lm7)

## 
     ## Call:
     ## lm(formula = Qualcomm ~ S.P.500, data = df)
     ## 
     ## Residuals:
     ##      Min       1Q   Median       3Q      Max 
     ## -0.24311 -0.05192  0.01269  0.04835  0.13106 
     ## 
     ## Coefficients:
     ##             Estimate Std. Error t value Pr(>|t|)   
     ## (Intercept)  0.01409    0.01410   0.999  0.32494   
     ## S.P.500      1.41389    0.50632   2.793  0.00852 **
     ## ---
     ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
     ## 
     ## Residual standard error: 0.07887 on 34 degrees of freedom
     ## Multiple R-squared:  0.1866, Adjusted R-squared:  0.1626 
     ## F-statistic: 7.798 on 1 and 34 DF,  p-value: 0.008524

lm8 <- lm(Procter...Gamble~S.P.500, data=df)
     summary(lm8)

## 
     ## Call:
     ## lm(formula = Procter...Gamble ~ S.P.500, data = df)
     ## 
     ## Residuals:
     ##       Min        1Q    Median        3Q       Max 
     ## -0.062278 -0.023855  0.000239  0.017048  0.078124 
     ## 
     ## Coefficients:
     ##             Estimate Std. Error t value Pr(>|t|)  
     ## (Intercept) 0.005475   0.006275   0.873   0.3890  
     ## S.P.500     0.506533   0.225268   2.249   0.0311 *
     ## ---
     ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
     ## 
     ## Residual standard error: 0.03509 on 34 degrees of freedom
     ## Multiple R-squared:  0.1295, Adjusted R-squared:  0.1039 
     ## F-statistic: 5.056 on 1 and 34 DF,  p-value: 0.03113

Summary and interpretetion

Volatility

The most volatile stocks:

Caterpillar: Beta = 1.49
Qualcomm: Beta = 1.41
Mcdonalds: Beta = 1.5

Least Volatile Stocks:

Johnson & johnson = 0.008
Sandisk: Beta = 0.04
P & G: Beta = 0.51
Microsoft: Beta = 0.45
Exxonmobil: Beta = 0.73

R-Squared for each regressions

# Microsoft ~ S.P.500
     summary(lm1)$r.squared

## [1] 0.07074523

# Exxon.Mobil~S.P.500
     summary(lm2)$r.squared

## [1] 0.1209275

# Caterpillar ~ S.P.500
     summary(lm3)$r.squared

## [1] 0.3288034

# Johnson & Johnson ~ S.P.500
     summary(lm4)$r.squared

## [1] 4.373112e-05

# McDonald.s ~ S.P.500
     summary(lm5)$r.squared

## [1] 0.3378055

# Sandisk ~ S.P.500
     summary(lm6)$r.squared

## [1] 0.123198

# Qualcomm ~ S.P.500
     summary(lm7)$r.squared

## [1] 0.1865655

# Procter...Gamble~S.P.500
     summary(lm8)$r.squared

## [1] 0.1294575

DATA_4100

Aayush Sethi

4/8/2021