With decrease significance levels ??, the power of a statistical analysis Decreases, the lower the significance level, the lower the power of the test. If you reduce the significance level (eg. from 0.05 to 0.01) , the region of acceptance gets bigger. As a result, you are less likely to reject the null hypothesis. This means you are less likely to reject to reject the null hypothesis, when it is false, so you are more likely to make Type II error.
In shortm the power of the test is reuced when you reduce the significance level; and vice versa
Type I error A type I error occurs when researcher rejects a null hypothesis when it is true. The probability of commiting a Type I error is called the significance level and is often denoted by alpha sign
Type II error A type II error occurs when the researcher accepts a null hypothesis that is false. Often denoted by beta sign. The probability of not committing a Type II error is called the power of the test
The hypothesis is that the burger contains 100g of meat. So Type II error has occured as the consumer group has accepted the null hypothesis when in reality it is false and that the mean weight is less than 100g
In a cause and effect relationship, the independent variable is the cause, and the dependent variable is the effect. Least squares linear regression is a method for predicting the value of the dependent variable Y (GPA), based on the independent variable X (How much u study)
In regression analysis, the difference between the observed value of the dependent variable (y) and the predicted value (y) is called the residual (e). Each data point has one residual
Therefore in a scatterplot showing a linear regression line through data points, the resideuals of a regression model are computed as the difference between the actual y-values and the predicted y-values
The median of a sample of a numerical values is the value in the sample that divides the sample into a higher and lower half as its the middle sample in the vector
Diganostic Plots show residuals in 4 different ways
1 Residuals vs Fitted. The first plot depicts residuals versus fitted values. Residuals are measured as follows:
residual = observed y - model-predicted y
The plot of residuals versus predicted values is useful for checking the assumption of linearity and homoscedasticity. If the model does not meet the linear model assumption, we would expect to see residuals that are very large (big positive value or big negative value). To assess the assumption of linearity we want to ensure that the residuals are not too far away from 0 ((standardized values less than -2 or greater than 2 are deemed problematic). To assess if the homoscedasticity assumption is met we look to make sure that there is no pattern in the residuals and that they are equally spread around the y = 0 line.
2 Normal Q-Q Used to examine whether the residuals are normally distributed. Its good if the residual points follow the straight dashed line.
3 Scale-Location (or Spread-Location) square rooted standardized residual vs. predicted value). This is useful for checking the assumption of homoscedasticity. In this particular plot we are checking to see if there is a pattern in the residuals.It’s good if you see a horizontal line with equally (randomly) spread points
4 Residuals vs Leverage (cooks distance) Used to identify influential cases, that is extreme values that might influence the regression results when included or excluded from the analysis.
The diagnostic plot that allows for the evaluation of the variance homogeneity assumption is The plot of residuals vs fitted values
While the R value is quite high (0.51), the P value is quite high at 0.12 indicating that anxiety and exam performance is no statistically significant therefore NO correlation
—————- | ———————- | ——————- Response Variable | Explanatory Variable(s) | Statsistical Model| —————- | ———————– | ——————- Continious | Categorical | ANOVA |
—————- | ———————– | ——————- Continious | Continious | Linear Regression | —————- | ———————– | ——————- Continious | Continious&categorical | ANCOVA | —————- | ———————– | ——————- Discrete | Continous&|categorical | GLM |
R-squared is a statistical measure of how close the data are to teh fitted regression line. Also known as the coefficient of determination of the coefficient of multiple determination for multiple regression.
R-squared = Explained variation (observed value) / Total variation (fitted value)
The more variance that is accounted for by the regression model the closer the data points wil fall to the fitted regression line.
Residual sum of squares (RSS), is teh sum of squares of residuals (deviations predicted from teh actual empirical values of data)
Total sum of squares (TSS) is defined as being the sum, over all obsertvations, of the squared differences of each observation from the overall mean
Since r2 is a proportion, it is always a number between 0 and 1.
If r2 = 1, all of the data points fall perfectly on the regression line. The predictor x accounts for all of the variation in y!
If r2 = 0, the estimated regression line is perfectly horizontal. The predictor x accounts for none of the variation in y!
If r2 = 1 all of the data points fall perfectly on the regression line meaning that the residual sum of squares will be 0
power.anova.test(groups = 4, n = 15, within.var = 500, sig.level = 0.05, power = 0.8)
##
## Balanced one-way analysis of variance power calculation
##
## groups = 4
## n = 15
## between.var = 129.9197
## within.var = 500
## sig.level = 0.05
## power = 0.8
##
## NOTE: n is number in each group
Minimum between-group variance is 130 that allows you to detect a potential difference between groups
The regression equation is written as Y = a + bX + e
Y is the value of the dependent variable (Y), what is being predicted or explained
a or Alpha, a constant, equals teh value of Y when the value of X = 0
b or Beta, the coefficient of X; the slope of the regression line; how much Y changes for each one unit change in X
X is the independent variable (X), what is predicting or explaining the value of Y
e is the error term, the error in predicting the value of Y, given the value of X
In a regression analysis you plot a regression line (which is the estimated values) you compare the estimated values with the actual values. The distance is the ERRORS Hence e in this equation indicates the residuals, which estimates the model error
The interaction allows for seperate slopes for each lvl of the categorical level. This indicates that the level of z effects x’s effect on y
ANCOVA would be used as there is a continious explanatory variable (x) and a categorical/ continious response variable (combination)
lm (y ~ x * z )
The answer is C working is below
data(swiss)
?swiss
## starting httpd help server ... done
summary(swiss)
## Fertility Agriculture Examination Education
## Min. :35.00 Min. : 1.20 Min. : 3.00 Min. : 1.00
## 1st Qu.:64.70 1st Qu.:35.90 1st Qu.:12.00 1st Qu.: 6.00
## Median :70.40 Median :54.10 Median :16.00 Median : 8.00
## Mean :70.14 Mean :50.66 Mean :16.49 Mean :10.98
## 3rd Qu.:78.45 3rd Qu.:67.65 3rd Qu.:22.00 3rd Qu.:12.00
## Max. :92.50 Max. :89.70 Max. :37.00 Max. :53.00
## Catholic Infant.Mortality
## Min. : 2.150 Min. :10.80
## 1st Qu.: 5.195 1st Qu.:18.15
## Median : 15.140 Median :20.00
## Mean : 41.144 Mean :19.94
## 3rd Qu.: 93.125 3rd Qu.:21.70
## Max. :100.000 Max. :26.60
fert <- swiss[swiss$Infant.Mortality < 20, ]
fert
## Fertility Agriculture Examination Education Catholic
## Aigle 64.1 62.0 21 12 8.52
## Aubonne 66.9 67.5 14 7 2.27
## Cossonay 61.7 69.3 22 5 2.82
## La Vallee 54.3 15.2 31 20 2.15
## Morges 65.5 59.8 22 10 5.23
## Nyone 56.6 50.9 22 12 15.14
## Orbe 57.4 54.1 20 6 4.20
## Paysd'enhaut 72.0 63.5 6 3 2.56
## Rolle 60.5 60.8 16 10 7.72
## Conthey 75.5 85.9 3 2 99.71
## Entremont 69.3 84.9 7 6 99.68
## Herens 77.3 89.7 5 2 100.00
## Martigwy 70.5 78.2 12 6 98.96
## St Maurice 65.0 75.9 9 9 99.06
## Sierre 92.2 84.6 3 3 99.46
## Sion 79.3 63.1 13 13 96.83
## Le Locle 72.7 16.7 22 13 11.22
## ValdeTravers 67.6 18.7 25 7 8.65
## V. De Geneve 35.0 1.2 37 53 42.34
## Rive Droite 44.7 46.6 16 29 50.43
## Rive Gauche 42.8 27.7 22 29 58.33
## Infant.Mortality
## Aigle 16.5
## Aubonne 19.1
## Cossonay 18.7
## La Vallee 10.8
## Morges 18.0
## Nyone 16.7
## Orbe 15.3
## Paysd'enhaut 18.0
## Rolle 16.3
## Conthey 15.1
## Entremont 19.8
## Herens 18.3
## Martigwy 19.4
## St Maurice 17.8
## Sierre 16.3
## Sion 18.1
## Le Locle 18.9
## ValdeTravers 19.5
## V. De Geneve 18.0
## Rive Droite 18.2
## Rive Gauche 19.3
median(fert[ ,"Fertility"])# not a as fertility has a median of 65.5
## [1] 65.5
iff <- swiss[swiss$Infant.Mortality > 15, ]
iff
## Fertility Agriculture Examination Education Catholic
## Courtelary 80.2 17.0 15 12 9.96
## Delemont 83.1 45.1 6 9 84.84
## Franches-Mnt 92.5 39.7 5 5 93.40
## Moutier 85.8 36.5 12 7 33.77
## Neuveville 76.9 43.5 17 15 5.16
## Porrentruy 76.1 35.3 9 7 90.57
## Broye 83.8 70.2 16 7 92.85
## Glane 92.4 67.8 14 8 97.16
## Gruyere 82.4 53.3 12 7 97.67
## Sarine 82.9 45.2 16 13 91.38
## Veveyse 87.1 64.5 14 6 98.61
## Aigle 64.1 62.0 21 12 8.52
## Aubonne 66.9 67.5 14 7 2.27
## Avenches 68.9 60.7 19 12 4.43
## Cossonay 61.7 69.3 22 5 2.82
## Echallens 68.3 72.6 18 2 24.20
## Grandson 71.7 34.0 17 8 3.30
## Lausanne 55.7 19.4 26 28 12.11
## Lavaux 65.1 73.0 19 9 2.84
## Morges 65.5 59.8 22 10 5.23
## Moudon 65.0 55.1 14 3 4.52
## Nyone 56.6 50.9 22 12 15.14
## Orbe 57.4 54.1 20 6 4.20
## Oron 72.5 71.2 12 1 2.40
## Payerne 74.2 58.1 14 8 5.23
## Paysd'enhaut 72.0 63.5 6 3 2.56
## Rolle 60.5 60.8 16 10 7.72
## Vevey 58.3 26.8 25 19 18.46
## Yverdon 65.4 49.5 15 8 6.10
## Conthey 75.5 85.9 3 2 99.71
## Entremont 69.3 84.9 7 6 99.68
## Herens 77.3 89.7 5 2 100.00
## Martigwy 70.5 78.2 12 6 98.96
## Monthey 79.4 64.9 7 3 98.22
## St Maurice 65.0 75.9 9 9 99.06
## Sierre 92.2 84.6 3 3 99.46
## Sion 79.3 63.1 13 13 96.83
## Boudry 70.4 38.4 26 12 5.62
## La Chauxdfnd 65.7 7.7 29 11 13.79
## Le Locle 72.7 16.7 22 13 11.22
## Neuchatel 64.4 17.6 35 32 16.92
## Val de Ruz 77.6 37.6 15 7 4.97
## ValdeTravers 67.6 18.7 25 7 8.65
## V. De Geneve 35.0 1.2 37 53 42.34
## Rive Droite 44.7 46.6 16 29 50.43
## Rive Gauche 42.8 27.7 22 29 58.33
## Infant.Mortality
## Courtelary 22.2
## Delemont 22.2
## Franches-Mnt 20.2
## Moutier 20.3
## Neuveville 20.6
## Porrentruy 26.6
## Broye 23.6
## Glane 24.9
## Gruyere 21.0
## Sarine 24.4
## Veveyse 24.5
## Aigle 16.5
## Aubonne 19.1
## Avenches 22.7
## Cossonay 18.7
## Echallens 21.2
## Grandson 20.0
## Lausanne 20.2
## Lavaux 20.0
## Morges 18.0
## Moudon 22.4
## Nyone 16.7
## Orbe 15.3
## Oron 21.0
## Payerne 23.8
## Paysd'enhaut 18.0
## Rolle 16.3
## Vevey 20.9
## Yverdon 22.5
## Conthey 15.1
## Entremont 19.8
## Herens 18.3
## Martigwy 19.4
## Monthey 20.2
## St Maurice 17.8
## Sierre 16.3
## Sion 18.1
## Boudry 20.3
## La Chauxdfnd 20.5
## Le Locle 18.9
## Neuchatel 23.0
## Val de Ruz 20.0
## ValdeTravers 19.5
## V. De Geneve 18.0
## Rive Droite 18.2
## Rive Gauche 19.3
sd(iff[ , "Education"]) # not b as standard deviation is 9.626152
## [1] 9.626152
popu <- swiss[swiss$Education <= 10, ]
popu
## Fertility Agriculture Examination Education Catholic
## Delemont 83.1 45.1 6 9 84.84
## Franches-Mnt 92.5 39.7 5 5 93.40
## Moutier 85.8 36.5 12 7 33.77
## Porrentruy 76.1 35.3 9 7 90.57
## Broye 83.8 70.2 16 7 92.85
## Glane 92.4 67.8 14 8 97.16
## Gruyere 82.4 53.3 12 7 97.67
## Veveyse 87.1 64.5 14 6 98.61
## Aubonne 66.9 67.5 14 7 2.27
## Cossonay 61.7 69.3 22 5 2.82
## Echallens 68.3 72.6 18 2 24.20
## Grandson 71.7 34.0 17 8 3.30
## Lavaux 65.1 73.0 19 9 2.84
## Morges 65.5 59.8 22 10 5.23
## Moudon 65.0 55.1 14 3 4.52
## Orbe 57.4 54.1 20 6 4.20
## Oron 72.5 71.2 12 1 2.40
## Payerne 74.2 58.1 14 8 5.23
## Paysd'enhaut 72.0 63.5 6 3 2.56
## Rolle 60.5 60.8 16 10 7.72
## Yverdon 65.4 49.5 15 8 6.10
## Conthey 75.5 85.9 3 2 99.71
## Entremont 69.3 84.9 7 6 99.68
## Herens 77.3 89.7 5 2 100.00
## Martigwy 70.5 78.2 12 6 98.96
## Monthey 79.4 64.9 7 3 98.22
## St Maurice 65.0 75.9 9 9 99.06
## Sierre 92.2 84.6 3 3 99.46
## Val de Ruz 77.6 37.6 15 7 4.97
## ValdeTravers 67.6 18.7 25 7 8.65
## Infant.Mortality
## Delemont 22.2
## Franches-Mnt 20.2
## Moutier 20.3
## Porrentruy 26.6
## Broye 23.6
## Glane 24.9
## Gruyere 21.0
## Veveyse 24.5
## Aubonne 19.1
## Cossonay 18.7
## Echallens 21.2
## Grandson 20.0
## Lavaux 20.0
## Morges 18.0
## Moudon 22.4
## Orbe 15.3
## Oron 21.0
## Payerne 23.8
## Paysd'enhaut 18.0
## Rolle 16.3
## Yverdon 22.5
## Conthey 15.1
## Entremont 19.8
## Herens 18.3
## Martigwy 19.4
## Monthey 20.2
## St Maurice 17.8
## Sierre 16.3
## Val de Ruz 20.0
## ValdeTravers 19.5
mean(popu[, "Agriculture"]) # c is correct as the mean of males working in agriculture wthat are in regions where the population has had higher education by 10 or less %
## [1] 60.71
church <- swiss[swiss$Fertility > 70, ]
church
## Fertility Agriculture Examination Education Catholic
## Courtelary 80.2 17.0 15 12 9.96
## Delemont 83.1 45.1 6 9 84.84
## Franches-Mnt 92.5 39.7 5 5 93.40
## Moutier 85.8 36.5 12 7 33.77
## Neuveville 76.9 43.5 17 15 5.16
## Porrentruy 76.1 35.3 9 7 90.57
## Broye 83.8 70.2 16 7 92.85
## Glane 92.4 67.8 14 8 97.16
## Gruyere 82.4 53.3 12 7 97.67
## Sarine 82.9 45.2 16 13 91.38
## Veveyse 87.1 64.5 14 6 98.61
## Grandson 71.7 34.0 17 8 3.30
## Oron 72.5 71.2 12 1 2.40
## Payerne 74.2 58.1 14 8 5.23
## Paysd'enhaut 72.0 63.5 6 3 2.56
## Conthey 75.5 85.9 3 2 99.71
## Herens 77.3 89.7 5 2 100.00
## Martigwy 70.5 78.2 12 6 98.96
## Monthey 79.4 64.9 7 3 98.22
## Sierre 92.2 84.6 3 3 99.46
## Sion 79.3 63.1 13 13 96.83
## Boudry 70.4 38.4 26 12 5.62
## Le Locle 72.7 16.7 22 13 11.22
## Val de Ruz 77.6 37.6 15 7 4.97
## Infant.Mortality
## Courtelary 22.2
## Delemont 22.2
## Franches-Mnt 20.2
## Moutier 20.3
## Neuveville 20.6
## Porrentruy 26.6
## Broye 23.6
## Glane 24.9
## Gruyere 21.0
## Sarine 24.4
## Veveyse 24.5
## Grandson 20.0
## Oron 21.0
## Payerne 23.8
## Paysd'enhaut 18.0
## Conthey 15.1
## Herens 18.3
## Martigwy 19.4
## Monthey 20.2
## Sierre 16.3
## Sion 18.1
## Boudry 20.3
## Le Locle 18.9
## Val de Ruz 20.0
var(church[, "Catholic"]) # d is not correct as variance of catholic peopole in provinces where fertility was over 70% is 1997.215
## [1] 1977.215
qnorm(p = 0.1, mean = 35, sd = 8, lower.tail = T)
## [1] 24.74759
# answer is 25cm
pnorm(q = 60, mean = 50, sd = 5, lower.tail = F)
## [1] 0.02275013
0.02275013*100
## [1] 2.275013
# answer is 2.3%
(pnorm(q = 60, mean = 50, sd = 5, lower.tail = F))*220
## [1] 5.005029
# I would be late to class 5 times out of 220 classes
before <- c(2.98, 2.70, 2.60, 2.94, 2.55, 2.92, 2.94, 2.94, 2.50, 3.41, 2.22, 3.07)
after <- c(2.63, 2.43, 2.34, 2.41, 2.28, 2.44, 2.45, 2.44, 2.26, 2.96, 2.07, 2.79)
boxplot(before, after)
t.test(before, after, mu = 0, paired = TRUE)
##
## Paired t-test
##
## data: before and after
## t = 9.6658, df = 11, p-value = 1.037e-06
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 0.2748072 0.4368595
## sample estimates:
## mean of the differences
## 0.3558333
The null hypothesis in this scenario indicates that there is no difference between samples. However the alternative hypothesis is that the mean of before does not equal the mean of after, indicating that there is a difference. With a t value of 9.6658 and a p value of less than <0.001 indicates that the 3 month treatment has an statistical significant decrease effect on LDL blood levels
A paired t.test was used
Type 1 error in this scenario would be to falsely reject the null hypothesis and incorrectly conclude that a 3 month intake of drug has statistically significant effect on blood LDL levels
t.test(before, after, mu = 0, paired = TRUE, conf.level = 0.90)
##
## Paired t-test
##
## data: before and after
## t = 9.6658, df = 11, p-value = 1.037e-06
## alternative hypothesis: true difference in means is not equal to 0
## 90 percent confidence interval:
## 0.2897204 0.4219463
## sample estimates:
## mean of the differences
## 0.3558333
#90 percent confidence interval: 0.2897204 0.4219463 g L-1
#DO NOT FORGET UNITS
yy <- before - after
sd(yy)
## [1] 0.127526
(0.127526/0.02)^2
## [1] 40.6572
#you will need 41 people
The extra tenderness gained with each day of curing is 5 as this is the co-effiecient/slope
100-15
## [1] 85
85/5
## [1] 17
# 17 days are required to get a rating of 100
15 + 5*0
## [1] 15
#expected tenderness value of meat is 15 if not cured at all
15 + 5*30
## [1] 165
#expected tenderness value of 165 if cured for 30 days
data("faithful")
faithful
## eruptions waiting
## 1 3.600 79
## 2 1.800 54
## 3 3.333 74
## 4 2.283 62
## 5 4.533 85
## 6 2.883 55
## 7 4.700 88
## 8 3.600 85
## 9 1.950 51
## 10 4.350 85
## 11 1.833 54
## 12 3.917 84
## 13 4.200 78
## 14 1.750 47
## 15 4.700 83
## 16 2.167 52
## 17 1.750 62
## 18 4.800 84
## 19 1.600 52
## 20 4.250 79
## 21 1.800 51
## 22 1.750 47
## 23 3.450 78
## 24 3.067 69
## 25 4.533 74
## 26 3.600 83
## 27 1.967 55
## 28 4.083 76
## 29 3.850 78
## 30 4.433 79
## 31 4.300 73
## 32 4.467 77
## 33 3.367 66
## 34 4.033 80
## 35 3.833 74
## 36 2.017 52
## 37 1.867 48
## 38 4.833 80
## 39 1.833 59
## 40 4.783 90
## 41 4.350 80
## 42 1.883 58
## 43 4.567 84
## 44 1.750 58
## 45 4.533 73
## 46 3.317 83
## 47 3.833 64
## 48 2.100 53
## 49 4.633 82
## 50 2.000 59
## 51 4.800 75
## 52 4.716 90
## 53 1.833 54
## 54 4.833 80
## 55 1.733 54
## 56 4.883 83
## 57 3.717 71
## 58 1.667 64
## 59 4.567 77
## 60 4.317 81
## 61 2.233 59
## 62 4.500 84
## 63 1.750 48
## 64 4.800 82
## 65 1.817 60
## 66 4.400 92
## 67 4.167 78
## 68 4.700 78
## 69 2.067 65
## 70 4.700 73
## 71 4.033 82
## 72 1.967 56
## 73 4.500 79
## 74 4.000 71
## 75 1.983 62
## 76 5.067 76
## 77 2.017 60
## 78 4.567 78
## 79 3.883 76
## 80 3.600 83
## 81 4.133 75
## 82 4.333 82
## 83 4.100 70
## 84 2.633 65
## 85 4.067 73
## 86 4.933 88
## 87 3.950 76
## 88 4.517 80
## 89 2.167 48
## 90 4.000 86
## 91 2.200 60
## 92 4.333 90
## 93 1.867 50
## 94 4.817 78
## 95 1.833 63
## 96 4.300 72
## 97 4.667 84
## 98 3.750 75
## 99 1.867 51
## 100 4.900 82
## 101 2.483 62
## 102 4.367 88
## 103 2.100 49
## 104 4.500 83
## 105 4.050 81
## 106 1.867 47
## 107 4.700 84
## 108 1.783 52
## 109 4.850 86
## 110 3.683 81
## 111 4.733 75
## 112 2.300 59
## 113 4.900 89
## 114 4.417 79
## 115 1.700 59
## 116 4.633 81
## 117 2.317 50
## 118 4.600 85
## 119 1.817 59
## 120 4.417 87
## 121 2.617 53
## 122 4.067 69
## 123 4.250 77
## 124 1.967 56
## 125 4.600 88
## 126 3.767 81
## 127 1.917 45
## 128 4.500 82
## 129 2.267 55
## 130 4.650 90
## 131 1.867 45
## 132 4.167 83
## 133 2.800 56
## 134 4.333 89
## 135 1.833 46
## 136 4.383 82
## 137 1.883 51
## 138 4.933 86
## 139 2.033 53
## 140 3.733 79
## 141 4.233 81
## 142 2.233 60
## 143 4.533 82
## 144 4.817 77
## 145 4.333 76
## 146 1.983 59
## 147 4.633 80
## 148 2.017 49
## 149 5.100 96
## 150 1.800 53
## 151 5.033 77
## 152 4.000 77
## 153 2.400 65
## 154 4.600 81
## 155 3.567 71
## 156 4.000 70
## 157 4.500 81
## 158 4.083 93
## 159 1.800 53
## 160 3.967 89
## 161 2.200 45
## 162 4.150 86
## 163 2.000 58
## 164 3.833 78
## 165 3.500 66
## 166 4.583 76
## 167 2.367 63
## 168 5.000 88
## 169 1.933 52
## 170 4.617 93
## 171 1.917 49
## 172 2.083 57
## 173 4.583 77
## 174 3.333 68
## 175 4.167 81
## 176 4.333 81
## 177 4.500 73
## 178 2.417 50
## 179 4.000 85
## 180 4.167 74
## 181 1.883 55
## 182 4.583 77
## 183 4.250 83
## 184 3.767 83
## 185 2.033 51
## 186 4.433 78
## 187 4.083 84
## 188 1.833 46
## 189 4.417 83
## 190 2.183 55
## 191 4.800 81
## 192 1.833 57
## 193 4.800 76
## 194 4.100 84
## 195 3.966 77
## 196 4.233 81
## 197 3.500 87
## 198 4.366 77
## 199 2.250 51
## 200 4.667 78
## 201 2.100 60
## 202 4.350 82
## 203 4.133 91
## 204 1.867 53
## 205 4.600 78
## 206 1.783 46
## 207 4.367 77
## 208 3.850 84
## 209 1.933 49
## 210 4.500 83
## 211 2.383 71
## 212 4.700 80
## 213 1.867 49
## 214 3.833 75
## 215 3.417 64
## 216 4.233 76
## 217 2.400 53
## 218 4.800 94
## 219 2.000 55
## 220 4.150 76
## 221 1.867 50
## 222 4.267 82
## 223 1.750 54
## 224 4.483 75
## 225 4.000 78
## 226 4.117 79
## 227 4.083 78
## 228 4.267 78
## 229 3.917 70
## 230 4.550 79
## 231 4.083 70
## 232 2.417 54
## 233 4.183 86
## 234 2.217 50
## 235 4.450 90
## 236 1.883 54
## 237 1.850 54
## 238 4.283 77
## 239 3.950 79
## 240 2.333 64
## 241 4.150 75
## 242 2.350 47
## 243 4.933 86
## 244 2.900 63
## 245 4.583 85
## 246 3.833 82
## 247 2.083 57
## 248 4.367 82
## 249 2.133 67
## 250 4.350 74
## 251 2.200 54
## 252 4.450 83
## 253 3.567 73
## 254 4.500 73
## 255 4.150 88
## 256 3.817 80
## 257 3.917 71
## 258 4.450 83
## 259 2.000 56
## 260 4.283 79
## 261 4.767 78
## 262 4.533 84
## 263 1.850 58
## 264 4.250 83
## 265 1.983 43
## 266 2.250 60
## 267 4.750 75
## 268 4.117 81
## 269 2.150 46
## 270 4.417 90
## 271 1.817 46
## 272 4.467 74
?faithful
plot(eruptions ~ waiting, data = faithful, ylim = c(0, 6), xlim = c(0, 120))
pewpew <- lm(eruptions ~ waiting, data = faithful)
plot(pewpew)
par(mfrow=c(2,2))
Residual ponints on Normal Q-Q plot follow straight dashed line therefore residuals are normally distributed. Vairance homogeity is relatively met however in the residuals vs fitted plot there are essentially 2 clouds of data meaning that there is some violation of variance homogeniety as the red line is not horizontal.
predict(pewpew, newdata = data.frame(waiting = 60), interval = "confidence" )
## fit lwr upr
## 1 2.663661 2.587644 2.739678
Variation in a linear model is noted by the R2 value, the adjusted r2 value of 0.8108 means that the linear model explains 81% of the variation seen in the raw data.
summary(pewpew)
##
## Call:
## lm(formula = eruptions ~ waiting, data = faithful)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.29917 -0.37689 0.03508 0.34909 1.19329
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.874016 0.160143 -11.70 <2e-16 ***
## waiting 0.075628 0.002219 34.09 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4965 on 270 degrees of freedom
## Multiple R-squared: 0.8115, Adjusted R-squared: 0.8108
## F-statistic: 1162 on 1 and 270 DF, p-value: < 2.2e-16
The large F-statistic and small associated P-value indicate that the linear model is a significant improvement over the intercept-only model (F2, 270 = 1152, P < 0.001).
Intercept and slope are both statistically significant. However, a negative eruption time at 0 minutes waiting time as suggested by the intercept (-1.87) does not make any sense in the real word. Therefore, the intercept should be removed from the model to force the regression line through the origin.
data("PlantGrowth")
PlantGrowth
## weight group
## 1 4.17 ctrl
## 2 5.58 ctrl
## 3 5.18 ctrl
## 4 6.11 ctrl
## 5 4.50 ctrl
## 6 4.61 ctrl
## 7 5.17 ctrl
## 8 4.53 ctrl
## 9 5.33 ctrl
## 10 5.14 ctrl
## 11 4.81 trt1
## 12 4.17 trt1
## 13 4.41 trt1
## 14 3.59 trt1
## 15 5.87 trt1
## 16 3.83 trt1
## 17 6.03 trt1
## 18 4.89 trt1
## 19 4.32 trt1
## 20 4.69 trt1
## 21 6.31 trt2
## 22 5.12 trt2
## 23 5.54 trt2
## 24 5.50 trt2
## 25 5.37 trt2
## 26 5.29 trt2
## 27 4.92 trt2
## 28 6.15 trt2
## 29 5.80 trt2
## 30 5.26 trt2
?PlantGrowth
#weight is response variable #treatment group is explanatory variable weight goes first
weed <- aov(weight ~ group, data = PlantGrowth )
weed
## Call:
## aov(formula = weight ~ group, data = PlantGrowth)
##
## Terms:
## group Residuals
## Sum of Squares 3.76634 10.49209
## Deg. of Freedom 2 27
##
## Residual standard error: 0.6233746
## Estimated effects may be unbalanced
plot(weed)
Residuals vs Fitted is equally spread with red line being relatively straight indicating homogeniety, normal qq polot lie on or close to the 1:1 line indicating normality. The cooks distance plot flags indicate that there are no observations that are highly influential.
summary.lm(weed)
##
## Call:
## aov(formula = weight ~ group, data = PlantGrowth)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.0710 -0.4180 -0.0060 0.2627 1.3690
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.0320 0.1971 25.527 <2e-16 ***
## grouptrt1 -0.3710 0.2788 -1.331 0.1944
## grouptrt2 0.4940 0.2788 1.772 0.0877 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.6234 on 27 degrees of freedom
## Multiple R-squared: 0.2641, Adjusted R-squared: 0.2096
## F-statistic: 4.846 on 2 and 27 DF, p-value: 0.01591
Value of the intercept
t-value = parameter estimate/standard error intercept = t-value x SE 40.516*6.167 = 249.86
Slope/SE = 21.326/1.018 = -20.949