1

With decrease significance levels ??, the power of a statistical analysis Decreases, the lower the significance level, the lower the power of the test. If you reduce the significance level (eg. from 0.05 to 0.01) , the region of acceptance gets bigger. As a result, you are less likely to reject the null hypothesis. This means you are less likely to reject to reject the null hypothesis, when it is false, so you are more likely to make Type II error.

2

In shortm the power of the test is reuced when you reduce the significance level; and vice versa

Type I error A type I error occurs when researcher rejects a null hypothesis when it is true. The probability of commiting a Type I error is called the significance level and is often denoted by alpha sign

Type II error A type II error occurs when the researcher accepts a null hypothesis that is false. Often denoted by beta sign. The probability of not committing a Type II error is called the power of the test

The hypothesis is that the burger contains 100g of meat. So Type II error has occured as the consumer group has accepted the null hypothesis when in reality it is false and that the mean weight is less than 100g

3

4

In a cause and effect relationship, the independent variable is the cause, and the dependent variable is the effect. Least squares linear regression is a method for predicting the value of the dependent variable Y (GPA), based on the independent variable X (How much u study)

In regression analysis, the difference between the observed value of the dependent variable (y) and the predicted value (y) is called the residual (e). Each data point has one residual

Therefore in a scatterplot showing a linear regression line through data points, the resideuals of a regression model are computed as the difference between the actual y-values and the predicted y-values

5

The median of a sample of a numerical values is the value in the sample that divides the sample into a higher and lower half as its the middle sample in the vector

6

Diganostic Plots show residuals in 4 different ways

1 Residuals vs Fitted. The first plot depicts residuals versus fitted values. Residuals are measured as follows:

residual = observed y - model-predicted y

The plot of residuals versus predicted values is useful for checking the assumption of linearity and homoscedasticity. If the model does not meet the linear model assumption, we would expect to see residuals that are very large (big positive value or big negative value). To assess the assumption of linearity we want to ensure that the residuals are not too far away from 0 ((standardized values less than -2 or greater than 2 are deemed problematic). To assess if the homoscedasticity assumption is met we look to make sure that there is no pattern in the residuals and that they are equally spread around the y = 0 line.

2 Normal Q-Q Used to examine whether the residuals are normally distributed. Its good if the residual points follow the straight dashed line.

3 Scale-Location (or Spread-Location) square rooted standardized residual vs. predicted value). This is useful for checking the assumption of homoscedasticity. In this particular plot we are checking to see if there is a pattern in the residuals.It’s good if you see a horizontal line with equally (randomly) spread points

4 Residuals vs Leverage (cooks distance) Used to identify influential cases, that is extreme values that might influence the regression results when included or excluded from the analysis.

The diagnostic plot that allows for the evaluation of the variance homogeneity assumption is The plot of residuals vs fitted values

7

While the R value is quite high (0.51), the P value is quite high at 0.12 indicating that anxiety and exam performance is no statistically significant therefore NO correlation

8

—————- | ———————- | ——————- Response Variable | Explanatory Variable(s) | Statsistical Model| —————- | ———————– | ——————- Continious | Categorical | ANOVA |
—————- | ———————– | ——————- Continious | Continious | Linear Regression | —————- | ———————– | ——————- Continious | Continious&categorical | ANCOVA | —————- | ———————– | ——————- Discrete | Continous&|categorical | GLM |

9

R-squared is a statistical measure of how close the data are to teh fitted regression line. Also known as the coefficient of determination of the coefficient of multiple determination for multiple regression.

R-squared = Explained variation (observed value) / Total variation (fitted value)

The more variance that is accounted for by the regression model the closer the data points wil fall to the fitted regression line.

Residual sum of squares (RSS), is teh sum of squares of residuals (deviations predicted from teh actual empirical values of data)

Total sum of squares (TSS) is defined as being the sum, over all obsertvations, of the squared differences of each observation from the overall mean

Since r2 is a proportion, it is always a number between 0 and 1.
If r2 = 1, all of the data points fall perfectly on the regression line. The predictor x accounts for all of the variation in y!
If r2 = 0, the estimated regression line is perfectly horizontal. The predictor x accounts for none of the variation in y!

If r2 = 1 all of the data points fall perfectly on the regression line meaning that the residual sum of squares will be 0

10

power.anova.test(groups = 4, n = 15, within.var = 500, sig.level = 0.05, power = 0.8)
## 
##      Balanced one-way analysis of variance power calculation 
## 
##          groups = 4
##               n = 15
##     between.var = 129.9197
##      within.var = 500
##       sig.level = 0.05
##           power = 0.8
## 
## NOTE: n is number in each group

Minimum between-group variance is 130 that allows you to detect a potential difference between groups

11

The regression equation is written as Y = a + bX + e

Y is the value of the dependent variable (Y), what is being predicted or explained
a or Alpha, a constant, equals teh value of Y when the value of X = 0
b or Beta, the coefficient of X; the slope of the regression line; how much Y changes for each one unit change in X
X is the independent variable (X), what is predicting or explaining the value of Y
e is the error term, the error in predicting the value of Y, given the value of X

In a regression analysis you plot a regression line (which is the estimated values) you compare the estimated values with the actual values. The distance is the ERRORS Hence e in this equation indicates the residuals, which estimates the model error

12a

The interaction allows for seperate slopes for each lvl of the categorical level. This indicates that the level of z effects x’s effect on y

12b

ANCOVA would be used as there is a continious explanatory variable (x) and a categorical/ continious response variable (combination)

lm (y ~ x * z )

Part 2

13

The answer is C working is below

data(swiss)
?swiss
## starting httpd help server ... done
summary(swiss)
##    Fertility      Agriculture     Examination      Education    
##  Min.   :35.00   Min.   : 1.20   Min.   : 3.00   Min.   : 1.00  
##  1st Qu.:64.70   1st Qu.:35.90   1st Qu.:12.00   1st Qu.: 6.00  
##  Median :70.40   Median :54.10   Median :16.00   Median : 8.00  
##  Mean   :70.14   Mean   :50.66   Mean   :16.49   Mean   :10.98  
##  3rd Qu.:78.45   3rd Qu.:67.65   3rd Qu.:22.00   3rd Qu.:12.00  
##  Max.   :92.50   Max.   :89.70   Max.   :37.00   Max.   :53.00  
##     Catholic       Infant.Mortality
##  Min.   :  2.150   Min.   :10.80   
##  1st Qu.:  5.195   1st Qu.:18.15   
##  Median : 15.140   Median :20.00   
##  Mean   : 41.144   Mean   :19.94   
##  3rd Qu.: 93.125   3rd Qu.:21.70   
##  Max.   :100.000   Max.   :26.60
fert <- swiss[swiss$Infant.Mortality < 20, ]
fert
##              Fertility Agriculture Examination Education Catholic
## Aigle             64.1        62.0          21        12     8.52
## Aubonne           66.9        67.5          14         7     2.27
## Cossonay          61.7        69.3          22         5     2.82
## La Vallee         54.3        15.2          31        20     2.15
## Morges            65.5        59.8          22        10     5.23
## Nyone             56.6        50.9          22        12    15.14
## Orbe              57.4        54.1          20         6     4.20
## Paysd'enhaut      72.0        63.5           6         3     2.56
## Rolle             60.5        60.8          16        10     7.72
## Conthey           75.5        85.9           3         2    99.71
## Entremont         69.3        84.9           7         6    99.68
## Herens            77.3        89.7           5         2   100.00
## Martigwy          70.5        78.2          12         6    98.96
## St Maurice        65.0        75.9           9         9    99.06
## Sierre            92.2        84.6           3         3    99.46
## Sion              79.3        63.1          13        13    96.83
## Le Locle          72.7        16.7          22        13    11.22
## ValdeTravers      67.6        18.7          25         7     8.65
## V. De Geneve      35.0         1.2          37        53    42.34
## Rive Droite       44.7        46.6          16        29    50.43
## Rive Gauche       42.8        27.7          22        29    58.33
##              Infant.Mortality
## Aigle                    16.5
## Aubonne                  19.1
## Cossonay                 18.7
## La Vallee                10.8
## Morges                   18.0
## Nyone                    16.7
## Orbe                     15.3
## Paysd'enhaut             18.0
## Rolle                    16.3
## Conthey                  15.1
## Entremont                19.8
## Herens                   18.3
## Martigwy                 19.4
## St Maurice               17.8
## Sierre                   16.3
## Sion                     18.1
## Le Locle                 18.9
## ValdeTravers             19.5
## V. De Geneve             18.0
## Rive Droite              18.2
## Rive Gauche              19.3
median(fert[ ,"Fertility"])# not a as fertility has a median of 65.5
## [1] 65.5
iff <- swiss[swiss$Infant.Mortality > 15, ]
iff
##              Fertility Agriculture Examination Education Catholic
## Courtelary        80.2        17.0          15        12     9.96
## Delemont          83.1        45.1           6         9    84.84
## Franches-Mnt      92.5        39.7           5         5    93.40
## Moutier           85.8        36.5          12         7    33.77
## Neuveville        76.9        43.5          17        15     5.16
## Porrentruy        76.1        35.3           9         7    90.57
## Broye             83.8        70.2          16         7    92.85
## Glane             92.4        67.8          14         8    97.16
## Gruyere           82.4        53.3          12         7    97.67
## Sarine            82.9        45.2          16        13    91.38
## Veveyse           87.1        64.5          14         6    98.61
## Aigle             64.1        62.0          21        12     8.52
## Aubonne           66.9        67.5          14         7     2.27
## Avenches          68.9        60.7          19        12     4.43
## Cossonay          61.7        69.3          22         5     2.82
## Echallens         68.3        72.6          18         2    24.20
## Grandson          71.7        34.0          17         8     3.30
## Lausanne          55.7        19.4          26        28    12.11
## Lavaux            65.1        73.0          19         9     2.84
## Morges            65.5        59.8          22        10     5.23
## Moudon            65.0        55.1          14         3     4.52
## Nyone             56.6        50.9          22        12    15.14
## Orbe              57.4        54.1          20         6     4.20
## Oron              72.5        71.2          12         1     2.40
## Payerne           74.2        58.1          14         8     5.23
## Paysd'enhaut      72.0        63.5           6         3     2.56
## Rolle             60.5        60.8          16        10     7.72
## Vevey             58.3        26.8          25        19    18.46
## Yverdon           65.4        49.5          15         8     6.10
## Conthey           75.5        85.9           3         2    99.71
## Entremont         69.3        84.9           7         6    99.68
## Herens            77.3        89.7           5         2   100.00
## Martigwy          70.5        78.2          12         6    98.96
## Monthey           79.4        64.9           7         3    98.22
## St Maurice        65.0        75.9           9         9    99.06
## Sierre            92.2        84.6           3         3    99.46
## Sion              79.3        63.1          13        13    96.83
## Boudry            70.4        38.4          26        12     5.62
## La Chauxdfnd      65.7         7.7          29        11    13.79
## Le Locle          72.7        16.7          22        13    11.22
## Neuchatel         64.4        17.6          35        32    16.92
## Val de Ruz        77.6        37.6          15         7     4.97
## ValdeTravers      67.6        18.7          25         7     8.65
## V. De Geneve      35.0         1.2          37        53    42.34
## Rive Droite       44.7        46.6          16        29    50.43
## Rive Gauche       42.8        27.7          22        29    58.33
##              Infant.Mortality
## Courtelary               22.2
## Delemont                 22.2
## Franches-Mnt             20.2
## Moutier                  20.3
## Neuveville               20.6
## Porrentruy               26.6
## Broye                    23.6
## Glane                    24.9
## Gruyere                  21.0
## Sarine                   24.4
## Veveyse                  24.5
## Aigle                    16.5
## Aubonne                  19.1
## Avenches                 22.7
## Cossonay                 18.7
## Echallens                21.2
## Grandson                 20.0
## Lausanne                 20.2
## Lavaux                   20.0
## Morges                   18.0
## Moudon                   22.4
## Nyone                    16.7
## Orbe                     15.3
## Oron                     21.0
## Payerne                  23.8
## Paysd'enhaut             18.0
## Rolle                    16.3
## Vevey                    20.9
## Yverdon                  22.5
## Conthey                  15.1
## Entremont                19.8
## Herens                   18.3
## Martigwy                 19.4
## Monthey                  20.2
## St Maurice               17.8
## Sierre                   16.3
## Sion                     18.1
## Boudry                   20.3
## La Chauxdfnd             20.5
## Le Locle                 18.9
## Neuchatel                23.0
## Val de Ruz               20.0
## ValdeTravers             19.5
## V. De Geneve             18.0
## Rive Droite              18.2
## Rive Gauche              19.3
sd(iff[ , "Education"]) # not b as standard deviation is 9.626152
## [1] 9.626152
popu <- swiss[swiss$Education <= 10, ]
popu
##              Fertility Agriculture Examination Education Catholic
## Delemont          83.1        45.1           6         9    84.84
## Franches-Mnt      92.5        39.7           5         5    93.40
## Moutier           85.8        36.5          12         7    33.77
## Porrentruy        76.1        35.3           9         7    90.57
## Broye             83.8        70.2          16         7    92.85
## Glane             92.4        67.8          14         8    97.16
## Gruyere           82.4        53.3          12         7    97.67
## Veveyse           87.1        64.5          14         6    98.61
## Aubonne           66.9        67.5          14         7     2.27
## Cossonay          61.7        69.3          22         5     2.82
## Echallens         68.3        72.6          18         2    24.20
## Grandson          71.7        34.0          17         8     3.30
## Lavaux            65.1        73.0          19         9     2.84
## Morges            65.5        59.8          22        10     5.23
## Moudon            65.0        55.1          14         3     4.52
## Orbe              57.4        54.1          20         6     4.20
## Oron              72.5        71.2          12         1     2.40
## Payerne           74.2        58.1          14         8     5.23
## Paysd'enhaut      72.0        63.5           6         3     2.56
## Rolle             60.5        60.8          16        10     7.72
## Yverdon           65.4        49.5          15         8     6.10
## Conthey           75.5        85.9           3         2    99.71
## Entremont         69.3        84.9           7         6    99.68
## Herens            77.3        89.7           5         2   100.00
## Martigwy          70.5        78.2          12         6    98.96
## Monthey           79.4        64.9           7         3    98.22
## St Maurice        65.0        75.9           9         9    99.06
## Sierre            92.2        84.6           3         3    99.46
## Val de Ruz        77.6        37.6          15         7     4.97
## ValdeTravers      67.6        18.7          25         7     8.65
##              Infant.Mortality
## Delemont                 22.2
## Franches-Mnt             20.2
## Moutier                  20.3
## Porrentruy               26.6
## Broye                    23.6
## Glane                    24.9
## Gruyere                  21.0
## Veveyse                  24.5
## Aubonne                  19.1
## Cossonay                 18.7
## Echallens                21.2
## Grandson                 20.0
## Lavaux                   20.0
## Morges                   18.0
## Moudon                   22.4
## Orbe                     15.3
## Oron                     21.0
## Payerne                  23.8
## Paysd'enhaut             18.0
## Rolle                    16.3
## Yverdon                  22.5
## Conthey                  15.1
## Entremont                19.8
## Herens                   18.3
## Martigwy                 19.4
## Monthey                  20.2
## St Maurice               17.8
## Sierre                   16.3
## Val de Ruz               20.0
## ValdeTravers             19.5
mean(popu[, "Agriculture"]) # c is correct as the mean of males working in agriculture wthat are in regions where the population has had higher education by 10 or less %
## [1] 60.71
church <- swiss[swiss$Fertility > 70, ]
church
##              Fertility Agriculture Examination Education Catholic
## Courtelary        80.2        17.0          15        12     9.96
## Delemont          83.1        45.1           6         9    84.84
## Franches-Mnt      92.5        39.7           5         5    93.40
## Moutier           85.8        36.5          12         7    33.77
## Neuveville        76.9        43.5          17        15     5.16
## Porrentruy        76.1        35.3           9         7    90.57
## Broye             83.8        70.2          16         7    92.85
## Glane             92.4        67.8          14         8    97.16
## Gruyere           82.4        53.3          12         7    97.67
## Sarine            82.9        45.2          16        13    91.38
## Veveyse           87.1        64.5          14         6    98.61
## Grandson          71.7        34.0          17         8     3.30
## Oron              72.5        71.2          12         1     2.40
## Payerne           74.2        58.1          14         8     5.23
## Paysd'enhaut      72.0        63.5           6         3     2.56
## Conthey           75.5        85.9           3         2    99.71
## Herens            77.3        89.7           5         2   100.00
## Martigwy          70.5        78.2          12         6    98.96
## Monthey           79.4        64.9           7         3    98.22
## Sierre            92.2        84.6           3         3    99.46
## Sion              79.3        63.1          13        13    96.83
## Boudry            70.4        38.4          26        12     5.62
## Le Locle          72.7        16.7          22        13    11.22
## Val de Ruz        77.6        37.6          15         7     4.97
##              Infant.Mortality
## Courtelary               22.2
## Delemont                 22.2
## Franches-Mnt             20.2
## Moutier                  20.3
## Neuveville               20.6
## Porrentruy               26.6
## Broye                    23.6
## Glane                    24.9
## Gruyere                  21.0
## Sarine                   24.4
## Veveyse                  24.5
## Grandson                 20.0
## Oron                     21.0
## Payerne                  23.8
## Paysd'enhaut             18.0
## Conthey                  15.1
## Herens                   18.3
## Martigwy                 19.4
## Monthey                  20.2
## Sierre                   16.3
## Sion                     18.1
## Boudry                   20.3
## Le Locle                 18.9
## Val de Ruz               20.0
var(church[, "Catholic"]) # d is not correct as variance of catholic peopole in provinces where fertility was over 70% is 1997.215
## [1] 1977.215

14

qnorm(p = 0.1, mean = 35, sd = 8, lower.tail = T)
## [1] 24.74759
# answer is 25cm 

15a

pnorm(q = 60, mean = 50, sd = 5, lower.tail = F)
## [1] 0.02275013
0.02275013*100
## [1] 2.275013
# answer is 2.3%

15b

(pnorm(q = 60, mean = 50, sd = 5, lower.tail = F))*220
## [1] 5.005029
# I would be late to class 5 times out of 220 classes 

16

before <- c(2.98,   2.70,   2.60,   2.94,   2.55,   2.92,   2.94,   2.94,   2.50,   3.41,   2.22,   3.07)
after <- c(2.63,    2.43,   2.34,   2.41,   2.28,   2.44,   2.45,   2.44,   2.26,   2.96,   2.07,   2.79)
boxplot(before, after)

t.test(before, after, mu = 0, paired = TRUE)
## 
##  Paired t-test
## 
## data:  before and after
## t = 9.6658, df = 11, p-value = 1.037e-06
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.2748072 0.4368595
## sample estimates:
## mean of the differences 
##               0.3558333

The null hypothesis in this scenario indicates that there is no difference between samples. However the alternative hypothesis is that the mean of before does not equal the mean of after, indicating that there is a difference. With a t value of 9.6658 and a p value of less than <0.001 indicates that the 3 month treatment has an statistical significant decrease effect on LDL blood levels

A paired t.test was used

16 b

Type 1 error in this scenario would be to falsely reject the null hypothesis and incorrectly conclude that a 3 month intake of drug has statistically significant effect on blood LDL levels

16 c

t.test(before, after, mu = 0, paired = TRUE, conf.level = 0.90)
## 
##  Paired t-test
## 
## data:  before and after
## t = 9.6658, df = 11, p-value = 1.037e-06
## alternative hypothesis: true difference in means is not equal to 0
## 90 percent confidence interval:
##  0.2897204 0.4219463
## sample estimates:
## mean of the differences 
##               0.3558333
#90 percent confidence interval: 0.2897204 0.4219463 g L-1
#DO NOT FORGET UNITS

16 d

yy <- before - after
sd(yy)
## [1] 0.127526
(0.127526/0.02)^2
## [1] 40.6572
#you will need 41 people 

17

The extra tenderness gained with each day of curing is 5 as this is the co-effiecient/slope

100-15
## [1] 85
85/5
## [1] 17
# 17 days are required to get a rating of 100
15 + 5*0
## [1] 15
#expected tenderness value of meat is 15 if not cured at all
15 + 5*30
## [1] 165
#expected tenderness value of 165 if cured for 30 days

18

data("faithful")
faithful
##     eruptions waiting
## 1       3.600      79
## 2       1.800      54
## 3       3.333      74
## 4       2.283      62
## 5       4.533      85
## 6       2.883      55
## 7       4.700      88
## 8       3.600      85
## 9       1.950      51
## 10      4.350      85
## 11      1.833      54
## 12      3.917      84
## 13      4.200      78
## 14      1.750      47
## 15      4.700      83
## 16      2.167      52
## 17      1.750      62
## 18      4.800      84
## 19      1.600      52
## 20      4.250      79
## 21      1.800      51
## 22      1.750      47
## 23      3.450      78
## 24      3.067      69
## 25      4.533      74
## 26      3.600      83
## 27      1.967      55
## 28      4.083      76
## 29      3.850      78
## 30      4.433      79
## 31      4.300      73
## 32      4.467      77
## 33      3.367      66
## 34      4.033      80
## 35      3.833      74
## 36      2.017      52
## 37      1.867      48
## 38      4.833      80
## 39      1.833      59
## 40      4.783      90
## 41      4.350      80
## 42      1.883      58
## 43      4.567      84
## 44      1.750      58
## 45      4.533      73
## 46      3.317      83
## 47      3.833      64
## 48      2.100      53
## 49      4.633      82
## 50      2.000      59
## 51      4.800      75
## 52      4.716      90
## 53      1.833      54
## 54      4.833      80
## 55      1.733      54
## 56      4.883      83
## 57      3.717      71
## 58      1.667      64
## 59      4.567      77
## 60      4.317      81
## 61      2.233      59
## 62      4.500      84
## 63      1.750      48
## 64      4.800      82
## 65      1.817      60
## 66      4.400      92
## 67      4.167      78
## 68      4.700      78
## 69      2.067      65
## 70      4.700      73
## 71      4.033      82
## 72      1.967      56
## 73      4.500      79
## 74      4.000      71
## 75      1.983      62
## 76      5.067      76
## 77      2.017      60
## 78      4.567      78
## 79      3.883      76
## 80      3.600      83
## 81      4.133      75
## 82      4.333      82
## 83      4.100      70
## 84      2.633      65
## 85      4.067      73
## 86      4.933      88
## 87      3.950      76
## 88      4.517      80
## 89      2.167      48
## 90      4.000      86
## 91      2.200      60
## 92      4.333      90
## 93      1.867      50
## 94      4.817      78
## 95      1.833      63
## 96      4.300      72
## 97      4.667      84
## 98      3.750      75
## 99      1.867      51
## 100     4.900      82
## 101     2.483      62
## 102     4.367      88
## 103     2.100      49
## 104     4.500      83
## 105     4.050      81
## 106     1.867      47
## 107     4.700      84
## 108     1.783      52
## 109     4.850      86
## 110     3.683      81
## 111     4.733      75
## 112     2.300      59
## 113     4.900      89
## 114     4.417      79
## 115     1.700      59
## 116     4.633      81
## 117     2.317      50
## 118     4.600      85
## 119     1.817      59
## 120     4.417      87
## 121     2.617      53
## 122     4.067      69
## 123     4.250      77
## 124     1.967      56
## 125     4.600      88
## 126     3.767      81
## 127     1.917      45
## 128     4.500      82
## 129     2.267      55
## 130     4.650      90
## 131     1.867      45
## 132     4.167      83
## 133     2.800      56
## 134     4.333      89
## 135     1.833      46
## 136     4.383      82
## 137     1.883      51
## 138     4.933      86
## 139     2.033      53
## 140     3.733      79
## 141     4.233      81
## 142     2.233      60
## 143     4.533      82
## 144     4.817      77
## 145     4.333      76
## 146     1.983      59
## 147     4.633      80
## 148     2.017      49
## 149     5.100      96
## 150     1.800      53
## 151     5.033      77
## 152     4.000      77
## 153     2.400      65
## 154     4.600      81
## 155     3.567      71
## 156     4.000      70
## 157     4.500      81
## 158     4.083      93
## 159     1.800      53
## 160     3.967      89
## 161     2.200      45
## 162     4.150      86
## 163     2.000      58
## 164     3.833      78
## 165     3.500      66
## 166     4.583      76
## 167     2.367      63
## 168     5.000      88
## 169     1.933      52
## 170     4.617      93
## 171     1.917      49
## 172     2.083      57
## 173     4.583      77
## 174     3.333      68
## 175     4.167      81
## 176     4.333      81
## 177     4.500      73
## 178     2.417      50
## 179     4.000      85
## 180     4.167      74
## 181     1.883      55
## 182     4.583      77
## 183     4.250      83
## 184     3.767      83
## 185     2.033      51
## 186     4.433      78
## 187     4.083      84
## 188     1.833      46
## 189     4.417      83
## 190     2.183      55
## 191     4.800      81
## 192     1.833      57
## 193     4.800      76
## 194     4.100      84
## 195     3.966      77
## 196     4.233      81
## 197     3.500      87
## 198     4.366      77
## 199     2.250      51
## 200     4.667      78
## 201     2.100      60
## 202     4.350      82
## 203     4.133      91
## 204     1.867      53
## 205     4.600      78
## 206     1.783      46
## 207     4.367      77
## 208     3.850      84
## 209     1.933      49
## 210     4.500      83
## 211     2.383      71
## 212     4.700      80
## 213     1.867      49
## 214     3.833      75
## 215     3.417      64
## 216     4.233      76
## 217     2.400      53
## 218     4.800      94
## 219     2.000      55
## 220     4.150      76
## 221     1.867      50
## 222     4.267      82
## 223     1.750      54
## 224     4.483      75
## 225     4.000      78
## 226     4.117      79
## 227     4.083      78
## 228     4.267      78
## 229     3.917      70
## 230     4.550      79
## 231     4.083      70
## 232     2.417      54
## 233     4.183      86
## 234     2.217      50
## 235     4.450      90
## 236     1.883      54
## 237     1.850      54
## 238     4.283      77
## 239     3.950      79
## 240     2.333      64
## 241     4.150      75
## 242     2.350      47
## 243     4.933      86
## 244     2.900      63
## 245     4.583      85
## 246     3.833      82
## 247     2.083      57
## 248     4.367      82
## 249     2.133      67
## 250     4.350      74
## 251     2.200      54
## 252     4.450      83
## 253     3.567      73
## 254     4.500      73
## 255     4.150      88
## 256     3.817      80
## 257     3.917      71
## 258     4.450      83
## 259     2.000      56
## 260     4.283      79
## 261     4.767      78
## 262     4.533      84
## 263     1.850      58
## 264     4.250      83
## 265     1.983      43
## 266     2.250      60
## 267     4.750      75
## 268     4.117      81
## 269     2.150      46
## 270     4.417      90
## 271     1.817      46
## 272     4.467      74
?faithful

plot(eruptions ~ waiting, data = faithful, ylim = c(0, 6), xlim = c(0, 120))

pewpew <- lm(eruptions ~ waiting, data = faithful)

18b

plot(pewpew)

par(mfrow=c(2,2))

Residual ponints on Normal Q-Q plot follow straight dashed line therefore residuals are normally distributed. Vairance homogeity is relatively met however in the residuals vs fitted plot there are essentially 2 clouds of data meaning that there is some violation of variance homogeniety as the red line is not horizontal.

18c

predict(pewpew, newdata = data.frame(waiting = 60), interval = "confidence" )
##        fit      lwr      upr
## 1 2.663661 2.587644 2.739678

18d(i)

Variation in a linear model is noted by the R2 value, the adjusted r2 value of 0.8108 means that the linear model explains 81% of the variation seen in the raw data.

summary(pewpew)
## 
## Call:
## lm(formula = eruptions ~ waiting, data = faithful)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.29917 -0.37689  0.03508  0.34909  1.19329 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.874016   0.160143  -11.70   <2e-16 ***
## waiting      0.075628   0.002219   34.09   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4965 on 270 degrees of freedom
## Multiple R-squared:  0.8115, Adjusted R-squared:  0.8108 
## F-statistic:  1162 on 1 and 270 DF,  p-value: < 2.2e-16

The large F-statistic and small associated P-value indicate that the linear model is a significant improvement over the intercept-only model (F2, 270 = 1152, P < 0.001).

18 e

Intercept and slope are both statistically significant. However, a negative eruption time at 0 minutes waiting time as suggested by the intercept (-1.87) does not make any sense in the real word. Therefore, the intercept should be removed from the model to force the regression line through the origin.

19

data("PlantGrowth")
PlantGrowth
##    weight group
## 1    4.17  ctrl
## 2    5.58  ctrl
## 3    5.18  ctrl
## 4    6.11  ctrl
## 5    4.50  ctrl
## 6    4.61  ctrl
## 7    5.17  ctrl
## 8    4.53  ctrl
## 9    5.33  ctrl
## 10   5.14  ctrl
## 11   4.81  trt1
## 12   4.17  trt1
## 13   4.41  trt1
## 14   3.59  trt1
## 15   5.87  trt1
## 16   3.83  trt1
## 17   6.03  trt1
## 18   4.89  trt1
## 19   4.32  trt1
## 20   4.69  trt1
## 21   6.31  trt2
## 22   5.12  trt2
## 23   5.54  trt2
## 24   5.50  trt2
## 25   5.37  trt2
## 26   5.29  trt2
## 27   4.92  trt2
## 28   6.15  trt2
## 29   5.80  trt2
## 30   5.26  trt2
?PlantGrowth
#weight is response variable #treatment group  is explanatory variable weight goes first
weed <- aov(weight ~ group, data = PlantGrowth )
weed
## Call:
##    aov(formula = weight ~ group, data = PlantGrowth)
## 
## Terms:
##                    group Residuals
## Sum of Squares   3.76634  10.49209
## Deg. of Freedom        2        27
## 
## Residual standard error: 0.6233746
## Estimated effects may be unbalanced
plot(weed)

Residuals vs Fitted is equally spread with red line being relatively straight indicating homogeniety, normal qq polot lie on or close to the 1:1 line indicating normality. The cooks distance plot flags indicate that there are no observations that are highly influential.

summary.lm(weed)
## 
## Call:
## aov(formula = weight ~ group, data = PlantGrowth)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.0710 -0.4180 -0.0060  0.2627  1.3690 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   5.0320     0.1971  25.527   <2e-16 ***
## grouptrt1    -0.3710     0.2788  -1.331   0.1944    
## grouptrt2     0.4940     0.2788   1.772   0.0877 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.6234 on 27 degrees of freedom
## Multiple R-squared:  0.2641, Adjusted R-squared:  0.2096 
## F-statistic: 4.846 on 2 and 27 DF,  p-value: 0.01591

20

Value of the intercept

t-value = parameter estimate/standard error intercept = t-value x SE 40.516*6.167 = 249.86

Slope/SE = 21.326/1.018 = -20.949