Use pnorm(q,mean,sigma)
pnorm(q=70,mean=75,sd=5,lower.tail = TRUE)
## [1] 0.1586553
pnorm(70,75,5)
## [1] 0.1586553
pnorm(q=85,mean=75,sd=5,lower.tail = FALSE)
## [1] 0.02275013
pnorm(85,75,5,lower.tail = FALSE)
## [1] 0.02275013
qnorm function can be use to calculate Quantiles or Percentiles
Lets find the first quartile(Q1) for this variable
qnorm(p=0.25,mean=75,sd=5,lower.tail = TRUE)
## [1] 71.62755
qnorm(0.25,75,5,lower.tail = TRUE)
## [1] 71.62755
Use seq(from,to, by) function to create a sequence
Use dnorm to create density and plot(x, density)
To have the points attached in plot use plot(x,density,type=“l”)
You can add labels and titles by :
plot(x,density,type=“l”, main=“X~Normal: Mean= 75, sd=5”,xlab=“x”,ylab = “Probability Density”)
You can add a vertical line to show the mean by using: abline(v=mean)
x<-seq(from=55, to=95, by=0.25)
x
## [1] 55.00 55.25 55.50 55.75 56.00 56.25 56.50 56.75 57.00 57.25 57.50 57.75
## [13] 58.00 58.25 58.50 58.75 59.00 59.25 59.50 59.75 60.00 60.25 60.50 60.75
## [25] 61.00 61.25 61.50 61.75 62.00 62.25 62.50 62.75 63.00 63.25 63.50 63.75
## [37] 64.00 64.25 64.50 64.75 65.00 65.25 65.50 65.75 66.00 66.25 66.50 66.75
## [49] 67.00 67.25 67.50 67.75 68.00 68.25 68.50 68.75 69.00 69.25 69.50 69.75
## [61] 70.00 70.25 70.50 70.75 71.00 71.25 71.50 71.75 72.00 72.25 72.50 72.75
## [73] 73.00 73.25 73.50 73.75 74.00 74.25 74.50 74.75 75.00 75.25 75.50 75.75
## [85] 76.00 76.25 76.50 76.75 77.00 77.25 77.50 77.75 78.00 78.25 78.50 78.75
## [97] 79.00 79.25 79.50 79.75 80.00 80.25 80.50 80.75 81.00 81.25 81.50 81.75
## [109] 82.00 82.25 82.50 82.75 83.00 83.25 83.50 83.75 84.00 84.25 84.50 84.75
## [121] 85.00 85.25 85.50 85.75 86.00 86.25 86.50 86.75 87.00 87.25 87.50 87.75
## [133] 88.00 88.25 88.50 88.75 89.00 89.25 89.50 89.75 90.00 90.25 90.50 90.75
## [145] 91.00 91.25 91.50 91.75 92.00 92.25 92.50 92.75 93.00 93.25 93.50 93.75
## [157] 94.00 94.25 94.50 94.75 95.00
density<-dnorm(x,mean=75,sd=5)
plot(x,density)
plot(x,density,type="l")
plot(x,density,type="l", main="X~Normal: Mean= 75, sd=5",xlab="x",ylab = "Probability Density")
abline(v=75)
To draw a random sample from a normally distributed population
Use rnorm(siz, mean,sd)
rnorm(n=40,75,5)
## [1] 76.20173 68.23328 77.76532 76.12049 72.56559 68.57126 85.41460 72.15540
## [9] 71.66060 76.80773 78.39429 75.23548 76.04482 85.11366 70.06132 75.98531
## [17] 76.31632 52.34074 76.70232 69.28436 79.61299 74.69519 78.98842 74.55272
## [25] 81.14561 72.69119 81.61999 76.47443 80.40794 76.58560 73.55685 71.08607
## [33] 64.97134 76.54887 73.81189 74.13570 64.98633 77.04111 75.23523 81.80966
pnorm(q=1,mean=0,sd=1,lower.tail = FALSE)
## [1] 0.1586553
pnorm(1,0,1,lower.tail = FALSE)
## [1] 0.1586553
X~Bin(n=20,p=1/6)
Lets find the probability X is exactly equal to 3 X=3;
P(X=3)
dbinom(x=3,size=20,prob = 1/6)
## [1] 0.2378866
dbinom(3,20,1/6)
## [1] 0.2378866
dbinom(x=0:3,size=20,prob=1/6)
## [1] 0.02608405 0.10433621 0.19823881 0.23788657
dbinom(0:3,20,1/6)
## [1] 0.02608405 0.10433621 0.19823881 0.23788657
sum(dbinom(x=0:3,size=20,prob=1/6))
## [1] 0.5665456
Use pbinom to find the probability distribution function
Lets find P(X<=3) by using pbinom(q,size, probability)
pbinom(q=3,size=20,prob = 1/6)
## [1] 0.5665456
pbinom(3,20,1/6)
## [1] 0.5665456
-Use rbinom to take a random sample from a binomial distribution
rbinom(n=1,size=10,prob=1/2)
## [1] 6
rbinom(n=10,size=10,prob = 1/2)
## [1] 5 5 4 5 1 6 6 3 5 4
rbinom(10,10,0.5)
## [1] 5 3 6 5 5 4 7 4 6 6
-Use qbinom to find the quantiles for a binomial distribution
qbinom(p=0.25,10,prob=1/2,lower.tail = TRUE)
## [1] 4
qbinom(p=0.25,100,prob=1/2,lower.tail = TRUE)
## [1] 47
t-stat=2.3, df=25 find one sided p value P(t>2.3)
One sided p-value
pt(q=2.3,df=25, lower.tail=FALSE)
## [1] 0.01503675
pt(2.3,25,lower.tail = FALSE)
## [1] 0.01503675
pt(q=2.3,df=25, lower.tail=FALSE)+ pt(q=-2.3,df=25, lower.tail=TRUE)
## [1] 0.03007351
pt(q=2.3,df=25, lower.tail=FALSE)*2
## [1] 0.03007351
Constructing confidence interval
find t for 95% confidence interval df=25, value of t with 2.5% in each tail
qt(p=0.025,df=25,lower.tail = TRUE)
## [1] -2.059539
View(mtcars)
## Warning in system2("/usr/bin/otool", c("-L", shQuote(DSO)), stdout = TRUE):
## running command ''/usr/bin/otool' -L '/Library/Frameworks/R.framework/Resources/
## modules/R_de.so'' had status 1
head(mtcars)
summary(mtcars$mpg)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 10.40 15.43 19.20 20.09 22.80 33.90
#install.packages("psych")
library(psych)
describe(mtcars)
boxplot(mtcars$mpg,horizontal = TRUE)
qqnorm(mtcars$mpg)
qqline(mtcars$mpg,col=3)
hist(mtcars$mpg,breaks = 4)
Use t distribution to estimate population mean and standard deviation
if the t value is 2.7 and n=10, use pt(quantile, df) to find the area under t value
pt(2.7,9)
## [1] 0.9878032
m= t* s/sqrt(n) or m= z*sigma/sqrt(n)
SEM=s/sqrt(n)
par(mfrow=c(2,2))
plot(mtcars$hp,mtcars$mpg)
model=lm(mtcars$mpg~mtcars$cyl+mtcars$disp+mtcars$hp)
abline(model,col=4)
## Warning in abline(model, col = 4): only using the first two of 4 regression
## coefficients
summary(model)
##
## Call:
## lm(formula = mtcars$mpg ~ mtcars$cyl + mtcars$disp + mtcars$hp)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.0889 -2.0845 -0.7745 1.3972 6.9183
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 34.18492 2.59078 13.195 1.54e-13 ***
## mtcars$cyl -1.22742 0.79728 -1.540 0.1349
## mtcars$disp -0.01884 0.01040 -1.811 0.0809 .
## mtcars$hp -0.01468 0.01465 -1.002 0.3250
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.055 on 28 degrees of freedom
## Multiple R-squared: 0.7679, Adjusted R-squared: 0.743
## F-statistic: 30.88 on 3 and 28 DF, p-value: 5.054e-09
plot(model$residuals)
abline(0,0)
hist(model$residuals)
qqnorm(model$residuals)
qqline(model$residuals)
anova(model)
See the correlation between variables and round the values into two digits
if the correlation between variables are big, this may cause the multicollinearity
round(cor(mtcars),2)
## mpg cyl disp hp drat wt qsec vs am gear carb
## mpg 1.00 -0.85 -0.85 -0.78 0.68 -0.87 0.42 0.66 0.60 0.48 -0.55
## cyl -0.85 1.00 0.90 0.83 -0.70 0.78 -0.59 -0.81 -0.52 -0.49 0.53
## disp -0.85 0.90 1.00 0.79 -0.71 0.89 -0.43 -0.71 -0.59 -0.56 0.39
## hp -0.78 0.83 0.79 1.00 -0.45 0.66 -0.71 -0.72 -0.24 -0.13 0.75
## drat 0.68 -0.70 -0.71 -0.45 1.00 -0.71 0.09 0.44 0.71 0.70 -0.09
## wt -0.87 0.78 0.89 0.66 -0.71 1.00 -0.17 -0.55 -0.69 -0.58 0.43
## qsec 0.42 -0.59 -0.43 -0.71 0.09 -0.17 1.00 0.74 -0.23 -0.21 -0.66
## vs 0.66 -0.81 -0.71 -0.72 0.44 -0.55 0.74 1.00 0.17 0.21 -0.57
## am 0.60 -0.52 -0.59 -0.24 0.71 -0.69 -0.23 0.17 1.00 0.79 0.06
## gear 0.48 -0.49 -0.56 -0.13 0.70 -0.58 -0.21 0.21 0.79 1.00 0.27
## carb -0.55 0.53 0.39 0.75 -0.09 0.43 -0.66 -0.57 0.06 0.27 1.00
VIF=diag(solve(cor(mtcars)))
VIF
## mpg cyl disp hp drat wt qsec vs
## 7.634507 15.382159 22.194360 10.287988 3.411846 17.942416 7.980373 4.971265
## am gear carb
## 4.980880 5.406599 7.930553
eigen.stuff=eigen(cor(mtcars))
eigen.stuff$values
## [1] 6.60840025 2.65046789 0.62719727 0.26959744 0.22345110 0.21159612
## [7] 0.13526199 0.12290143 0.07704665 0.05203544 0.02204441
#eigen.stuff$vectors
round(eigen.stuff$vectors,2)
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
## [1,] 0.36 -0.02 -0.23 -0.02 -0.10 -0.11 0.37 0.75 0.24 0.14 -0.12
## [2,] -0.37 -0.04 -0.18 0.00 -0.06 0.17 0.06 0.23 0.05 -0.85 -0.14
## [3,] -0.37 0.05 -0.06 0.26 -0.39 -0.34 0.21 0.00 0.20 0.05 0.66
## [4,] -0.33 -0.25 0.14 -0.07 -0.54 0.07 0.00 0.22 -0.58 0.25 -0.26
## [5,] 0.29 -0.27 0.16 0.85 -0.08 0.24 0.02 -0.03 -0.05 -0.10 -0.04
## [6,] -0.35 0.14 0.34 0.25 0.08 -0.46 -0.02 0.01 0.36 0.09 -0.57
## [7,] 0.20 0.46 0.40 0.07 0.16 -0.33 0.05 0.23 -0.53 -0.27 0.18
## [8,] 0.31 0.23 0.43 -0.21 -0.60 0.19 -0.27 -0.03 0.36 -0.16 0.01
## [9,] 0.23 -0.43 -0.21 -0.03 -0.09 -0.57 -0.59 0.06 -0.05 -0.18 0.03
## [10,] 0.21 -0.46 0.29 -0.26 -0.05 -0.24 0.61 -0.34 0.00 -0.21 -0.05
## [11,] -0.21 -0.41 0.53 -0.13 0.36 0.18 -0.17 0.40 0.17 0.07 0.32
-H o: μ1=μ2 Ha= μ1 not equal μ2 (Here we are trying to test different samples’ mean)
# t.test(df_primed$Preference,df_nonprimed$Preference)
Is there a significant evidence at the 5% level that conductors consume more calories per day than do drivers? Use the two-sample t method to give a P-value, and then assess significance.
Ho: μdrivers=μ_conductor Ha: μ_drivers<μconductor
Example code(below)
t= (x_conductor - x_drivers)/SE where SE= sqrt(((s_conductor)^2)/83)+ (s_drivers)^2)/98)
x_conductor= 2844 x_drivers= 2821 s_conductor=437.3008 s_driver= 435.5778 SE= 65.11528
Hence, t value(df=82) is 0.3532197 and corresponding p value is 0.3624151. Since p value is bigger 0.05 we can not reject null hypothesis.
SE_cal=sqrt(((437.3008^2)/83)+ ((435.5778)^2)/98)
SE_cal
## [1] 65.11528
t_col= (2844-2821)/ SE_cal
t_col
## [1] 0.3532197
pt(0.3532197,82,lower.tail = FALSE)
## [1] 0.3624151
-Give a 99% confidence interval for the difference in mean daily alcohol consumption between drivers and conductors.
x_driver= 0.24 x_conductor= 0.39 where s_driver= 0.5939697 s_conductor= 1.002148
Confidence Interval= (x_driver-x_conductor)+- (t*SE)
where x_driver-x_conductor= -.15 t= 2.637123 , SE= 0.1252997
Hence, CI (-0.4804306 0.1804306)
qt(0.005,df=82)
## [1] -2.637123
SE_alc=sqrt(((1.002148^2)/83)+ ((0.5939697)^2)/98)
SE_alc
## [1] 0.1252997
UpperCI_dif= -0.15+ 2.637123*SE_alc
UpperCI_dif
## [1] 0.1804306
LowerCI_dif= -0.15- 2.637123*SE_alc
LowerCI_dif
## [1] -0.4804306