Probability is the likelihood of an outcome. e.g. {any combination of two dice}
Event is a collection of outcomes. e.g. {both dice show the same face}
The outcome space is all the possible outcomes. e.g. {all the possible outcomes die show}
binomial variable\[ P(E) = \frac{number\ of\ outcomes\ in\ E}{number\ of\ possible\ outcomes} \]
continuous varable\[ P\begin{pmatrix}a\leq X \leq b \end{pmatrix} = \int_a^b f(x) dx \] Probabilities of continuous random variables (X) are the area under the curve. The probability of any value is always zero. when X = k,
\[ P\begin{pmatrix}X = k \end{pmatrix} = 0 \]
union probability, addition rule + \[
\begin{aligned}
&P(A \cup B)=P(A)+P(B)-P(A \cap B) \\
&P(A \cup B)=P(A)+P(B) \quad \text { if } \mathrm{A} \text { and }
\mathrm{B} \text { are mutually exclusive }
\end{aligned}
\] joint probability, multiple rule x \[
\begin{aligned}
&P(A \cap B)=P(A \mid B) P(B)=P(B \mid A) P(A) \\
&P(A \cap B)=P(A) P(B) \quad \text { if } \mathrm{A} \text { and }
\mathrm{B} \text { are independent }
\end{aligned}
\] Marginal Probability is without reference to any
other event or events \[
P(A) or P(B)
\]
conditional Probability \[
P(A\mid B)=\dfrac{P(A \: \cap\: B)}{P(B)}
\] p-values are conditional probabilities.
multiple law \[ P(A \cap B) = P(A\ |\ B) P(B) = P(B\ |\ A) P(A) \]
bayes’s formula
\[ P(B_j\ |\ A) = \frac{P(A\ |\ B_j) P(B_j)}{P(A)}\ \]
\[ P(A) = P(A\ |\ B_1) P(B_1) + \cdots + P(A\ |\ B_n) P(B_n).\notag \]
Random variable takes on different values determined by chance. we can use random variables’ mathematical (distribution) function to find their probability.
joint distribution, discrete variables \[
P\left(X=x_{i}, Y=y_{j}\right)=p_{i j}, i, j=1,2, \ldots
\] Marginal distribution, discrete variables \[
P\left(X=x_{i}\right)=\sum_{j=1}^{\infty} p_{i j}=p_{i}, i=1,2, \ldots
\] conditional probability, discrete variables \[
P\left(Y=y_{j} \mid X=x_{i}\right)=\frac{p_{i j}}{p_{i}}, j=1,2, \ldots
\]
Conditional expectation is the mathematical expectation of a conditional distribution.
The discrete variable
\[ E(Y|X_i)=\sum_{i=1}^{N}{(Y_i|X_i)}\cdot p(Y_i|X_i) \]
The continuous variable
\[ E(Y|X)=\int{(Y|X)}\cdot g(Y|X)dY \]
expectation formula, discrete variable \[
\mu=E(X)=\sum x_if(x_i)
\]
variance formula, discrete variable \[
\sigma^2=\text{Var}(X)=\sum (x_i-\mu)^2f(x_i)
\]
Conditional expectation and conditional variance exist and can be estimated by regression models.
We make inferences about the population based on the sample (inference) after summarizing data (description).
The error is resulting from using a sample characteristic (statistic) to estimate a population characteristic (parameter).
Standard Error \[
SD(\bar{X})=SE(\bar{X})=\dfrac{\sigma}{\sqrt{n}}
\]
Central limit theorem and law of large numbers
For a large sample size, x mean is approximately normally distributed, regardless of the distribution of the population one samples from. so, the population parameter can be estimated using the sample.
With large samples, The mean of the sampling distribution is very close to the population mean.
The higher the confidence level, the wider the width of the interval and thus the poorer the precision.
\[
\text{point estimate }\pm M\times \hat{SE}(\text{estimate})
\] the margin of error \[
E=z_{\alpha/2}\sqrt{\dfrac{\hat{p}(1-\hat{p})}{n}}
\]
-Set up the hypotheses and decide on the significance level.
-Construct and calculate the test statistic.
-Calculate probability value (p-value).
-Make a decision and state an overall conclusion.
integrate(function(x){x^2},0,2)$value
## [1] 2.666667
fxy = expression(2*x^2+y+3*x*y^2)
dxy = deriv(fxy, c("x", "y"), func = TRUE)
dxy(1,2)
## [1] 16
## attr(,"gradient")
## x y
## [1,] 16 13
dnorm(0)# density at a number
## [1] 0.3989423
pnorm(1.28)# cumulative possibility
## [1] 0.8997274
qnorm(0.95)# quantile
## [1] 1.644854
rnorm(10)# random numbers
## [1] -0.88088511 0.49039771 1.07386519 1.16713589 -1.01172581 0.70453443
## [7] -0.84719449 -0.53398655 0.03326851 1.60801428
using covariance matrix to generate Gaussian multiple variables
library(MASS)
Sigma <- matrix(c(10,3,3,2),2,2)
mvrnorm(n=20, rep(0, 2), Sigma)
## [,1] [,2]
## [1,] 1.5187331 0.62621665
## [2,] 1.1050862 -0.49826048
## [3,] -1.9292320 0.04056535
## [4,] -1.4664689 1.57938123
## [5,] 0.2476428 -0.02949476
## [6,] 4.1485181 1.22342349
## [7,] 3.9925071 1.94807455
## [8,] -1.2674697 -1.35126479
## [9,] -1.6721104 -0.97817620
## [10,] 0.3848645 -0.19130117
## [11,] 1.3799726 -0.36814938
## [12,] 2.4352271 2.12898251
## [13,] -3.1106685 -0.95956219
## [14,] 1.3631895 1.59052657
## [15,] -1.6499431 0.22257692
## [16,] 3.2571223 0.80981086
## [17,] 2.1421184 0.96733243
## [18,] -3.5372718 -1.16167758
## [19,] 0.2839805 1.75396696
## [20,] -0.6402299 -2.44016838
pnorm(1.96, 0,1)
## [1] 0.9750021
qnorm(0.025, 0,1)
## [1] -1.959964
pchisq(3.84,1,lower.tail=F)
## [1] 0.05004352
mean(rchisq(10000,1)>3.84) #simulation
## [1] 0.0482
seq(1,10, 2)
## [1] 1 3 5 7 9
x=rep(1:3,6)
x
## [1] 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3
y=rep(1:3, each = 6)
y
## [1] 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3
x+y
## [1] 2 3 4 2 3 4 3 4 5 3 4 5 4 5 6 4 5 6
x-y
## [1] 0 1 2 0 1 2 -1 0 1 -1 0 1 -2 -1 0 -2 -1 0
x*y
## [1] 1 2 3 1 2 3 2 4 6 2 4 6 3 6 9 3 6 9
x/y
## [1] 1.0000000 2.0000000 3.0000000 1.0000000 2.0000000 3.0000000 0.5000000
## [8] 1.0000000 1.5000000 0.5000000 1.0000000 1.5000000 0.3333333 0.6666667
## [15] 1.0000000 0.3333333 0.6666667 1.0000000
x%*%y
## [,1]
## [1,] 72
x[c(2,3)]
## [1] 2 3
x[-1]
## [1] 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3
x[x>2]
## [1] 3 3 3 3 3 3
x[x==2]
## [1] 2 2 2 2 2 2
# substitute
x[x == 2] <- 0.5
x
## [1] 1.0 0.5 3.0 1.0 0.5 3.0 1.0 0.5 3.0 1.0 0.5 3.0 1.0 0.5 3.0 1.0 0.5 3.0
matrix(1:10,2,5)
## [,1] [,2] [,3] [,4] [,5]
## [1,] 1 3 5 7 9
## [2,] 2 4 6 8 10
matrix(1:10,5,2)
## [,1] [,2]
## [1,] 1 6
## [2,] 2 7
## [3,] 3 8
## [4,] 4 9
## [5,] 5 10
a <- matrix(12:20,3,3)
a[2,]
## [1] 13 16 19
a[,2]
## [1] 15 16 17
a[-2,]
## [,1] [,2] [,3]
## [1,] 12 15 18
## [2,] 14 17 20
a[,-2]
## [,1] [,2]
## [1,] 12 18
## [2,] 13 19
## [3,] 14 20
a[2,1]=21
a
## [,1] [,2] [,3]
## [1,] 12 15 18
## [2,] 21 16 19
## [3,] 14 17 20
a<-matrix(c(11,21,31,21,32,43,12,32,54),3,3)
solve(a)
## [,1] [,2] [,3]
## [1,] -1.9775281 3.471910 -1.6179775
## [2,] 0.7977528 -1.247191 0.5617978
## [3,] 0.5000000 -1.000000 0.5000000
det(a)
## [1] -178
solve(a)*det(a)
## [,1] [,2] [,3]
## [1,] 352 -618 288
## [2,] -142 222 -100
## [3,] -89 178 -89
t(a)
## [,1] [,2] [,3]
## [1,] 11 21 31
## [2,] 21 32 43
## [3,] 12 32 54
eigen(a)
## eigen() decomposition
## $values
## [1] 91.6892193 5.6541299 -0.3433491
##
## $vectors
## [,1] [,2] [,3]
## [1,] -0.2573423 -0.7530908 -0.9049786
## [2,] -0.5253459 -0.1712782 0.3538153
## [3,] -0.8110405 0.6352306 0.2362806
name<-c('A','B','C')
chinese<-c(92,96,95)
math<-c(86, 85, 92)
score<-data.frame(name, chinese, math)
score
## name chinese math
## 1 A 92 86
## 2 B 96 85
## 3 C 95 92
score[2]
## chinese
## 1 92
## 2 96
## 3 95
score$math
## [1] 86 85 92
for loop
sim<-10000
p<-numeric(sim)
# numeric=NULL
for (i in 1:sim){
p[i]<- abs(mean(rnorm(10,20,sqrt(3)))-mean(rnorm(15,20,sqrt(3))))<0.1
}
mean(p)
## [1] 0.1079
using replicate
mean(replicate(10000,abs(mean(rnorm(10,20,sqrt(3)))-mean(rnorm(15,20,sqrt(3))))<0.1))
## [1] 0.1175
using apply function
A<-matrix(rnorm(250000, 20,sqrt(3)),10000,25)
head(A)
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
## [1,] 19.38996 17.17169 21.68928 21.53605 20.65358 22.17426 21.38341 21.84597
## [2,] 24.19254 16.41728 17.38523 19.59159 21.04825 21.72394 19.09154 20.05932
## [3,] 18.78697 19.26176 21.30118 17.42785 23.48094 21.43805 21.67202 19.02939
## [4,] 18.28735 20.88258 18.94152 22.03726 17.28934 19.28584 19.93725 21.10955
## [5,] 24.01023 23.40581 20.63495 17.79877 23.24974 20.12610 16.93461 20.70022
## [6,] 17.85909 20.47354 25.19677 21.99099 17.31272 19.55983 20.59263 19.42591
## [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16]
## [1,] 18.51713 20.84713 19.57668 21.89467 17.67141 21.88368 22.82527 20.65207
## [2,] 19.05246 15.62194 19.61946 21.59534 23.62048 18.71575 18.46270 19.78524
## [3,] 23.72749 20.49721 21.37010 22.58976 20.56298 19.15932 21.93338 21.54959
## [4,] 23.46802 22.11435 18.20004 19.95149 18.15870 22.65712 21.45752 19.10825
## [5,] 19.29525 19.80707 18.80652 18.82406 19.71545 18.61493 19.71839 18.63235
## [6,] 21.56731 20.30297 20.69190 17.73491 21.27104 16.61442 19.46909 20.05049
## [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24]
## [1,] 19.64516 19.75267 21.85729 20.35406 18.78100 21.15610 17.62467 23.93520
## [2,] 19.32782 18.94165 24.30256 20.28542 21.06995 19.14494 20.84978 20.83973
## [3,] 19.14005 18.49508 17.92618 19.37619 19.72210 19.60864 21.69689 20.09254
## [4,] 17.61147 21.91340 17.84243 19.13915 20.35991 21.18536 22.54633 20.92910
## [5,] 20.90751 18.38613 21.18468 18.76978 20.82689 22.03421 17.90602 19.40751
## [6,] 19.25169 16.92567 20.75857 17.92523 18.96401 18.55858 20.27221 20.00757
## [,25]
## [1,] 21.18032
## [2,] 18.92239
## [3,] 21.01482
## [4,] 22.71097
## [5,] 20.41447
## [6,] 19.36075
f<-function(x) {abs(mean(x[1:10])-mean(x[11:25]))}
# solve the mean by apply
mean(apply(A,1,f)>0.1)
## [1] 0.8859
using probability method
pnorm(0.1,0,sqrt(0.5))-pnorm(-0.1,0,sqrt(0.5))
## [1] 0.1124629
choose(10,2)
## [1] 45
# combn(10,2)
factorial(10)
## [1] 3628800
prod(1:10)
## [1] 3628800
a<-c(1,2,3,5,0,9)
which(a==min(a))
## [1] 5
sum(a)
## [1] 20
unique(a)
## [1] 1 2 3 5 0 9
length(a)
## [1] 6
min(a)
## [1] 0
max(a)
## [1] 9
all(c(3,4) %in% a)
## [1] FALSE
plot and find the range of solve
f<-function(x){x^2-exp(x)}
uniroot(f,c(-0.8,-0.6)) $root
## [1] -0.7034781
f2<-function(x){abs(x^2-exp(x))}
optimize (f2,lower=-0.8,upper=-0.6)$minimum
## [1] -0.703479
questions: randomly select 3 numbers out of 1:10, the sum is 9.
badge<-1:10
sim<-10000
p<-numeric(sim)
for (i in 1:sim){ a<-sample(badge,3,replace=F)
p[i]<-sum(a)==9 }
mean(p)
## [1] 0.0278
questions: eat three flavors tangyuan.
Tangyuan<-c(rep('A',8),rep('B',8),rep('C',8))
sim<-10000
p<-numeric(sim)
# how to do it according to the condition
for (i in 1:sim){
a<-sample(Tangyuan,24,replace=F)
p[i]<-(length(unique(a[1:6]))==3)&(length(unique(a[7:12]))==3)&(length(unique(a[13:18]))==3)&(length(unique(a[19:24]))==3)
}
mean(p)
## [1] 0.4916
question: select 2 balls when they are the same color.
box1<-c(rep('white',5), rep("black",11), rep('red',8))
box2<-c(rep('white',10), rep("black",8), rep('red',6))
sim<-10000
p<-numeric(sim)
for (i in 1:sim){
a<-sample(box1, 1)
b<-sample(box2, 1)
p[i]<- a==b
}
mean(p)
## [1] 0.3242
select after putting them back
box<-c(rep("white",4),rep("red",2))
sim<-10000
t<-numeric(sim)
for (i in 1:sim){
a<-sample(box, 2 ,replace=T)
# there are two white balls
t[i]<-length(a[a=="white"])==2
}
mean(t)
## [1] 0.4539
question: two students have the same birthday out of 30 students
n<-30
sim<-10000
t<-numeric(sim)
for (i in 1:sim){
a<-sample(1:365, n, replace=T)
t[i]<-n-length(unique(a))
}
1-mean(t==0)
## [1] 0.709
# probability
1-prod(365:(365-30+1))/365^30
## [1] 0.7063162
An event is a set of outcomes. You can describe certain events using random variables (x, distribution). the distribution of random variable function.
question: choose correct one out of four answers
x<-0:5
y<-dbinom(x,5,1/4)
plot(x,y,col=2,type='h')
the probability of shooting is 0.02, what is the most likelihood of hit with 400 shootingsk<-0:400
p<-dbinom(k,400,0.02)
plot(k,p,type='h',col=2)
plot(k,p,type='h',col=2,xlim=c(0,20))
dbinom(7,400,0.02)
## [1] 0.1406443
dbinom(8,400,0.02)
## [1] 0.1410031
question: lifetime of a light (lamda=1/2000)
integrate(dexp,rate=1/2000,1000,Inf)$value
## [1] 0.6065307
f<-function(x){dexp(x,rate=1/2000)}
integrate(f,1000,Inf) $value
## [1] 0.6065307
1-pexp(1000,rate=1/2000)
## [1] 0.6065307
mean(rexp(10000,rate=1/2000)>1000)
## [1] 0.6059
continue variable
x<-seq(-3,3,0.01)
plot(x, dnorm(x,mean=0, sd=2),type="l",xlab="x",ylab = "f(x)", col=1,lwd=2,ylim=c(0,1)) #density function
lines(x, dnorm(x,mean=0, sd=1),lty=2, col=2,lwd=2)
lines(x, dnorm(x,mean=0, sd=0.5), lty=3,col=3,lwd=2)
exbeta<-c(expression(paste(mu,"=0,", sigma,"=2")), expression(paste(mu,"=0,",sigma,"=1")), expression(paste(mu,"=0,", sigma,"=0.5")))
legend("topright", exbeta, lty = c(1, 2,3),col=c(1,2,3),lwd=2)
x<-seq(-3,3,0.01)
plot(x, dnorm(x,mean=-1, sd=1),type="l",xlab="x",ylab = "f(x)", col=1,lwd=2,ylim=c(0,0.6))
lines(x, dnorm(x,mean=0, sd=1),lty=2, col=2,lwd=2)
lines(x, dnorm(x,mean=1, sd=1), lty=3,col=3,lwd=2)
exbeta<-c(expression(paste(mu,"=-1,", sigma,"=1")), expression(paste(mu,"=0,",sigma,"=1")), expression(paste(mu,"=1,", sigma,"=1")))
legend("topright", exbeta, lty = c(1, 2,3),col=c(1,2,3),lwd=2)
question: solve sigma using nomoral distribution
sigma<-1
repeat{
sigma<-sigma+0.01
if (pnorm(200,160,sigma)-pnorm(120,160,sigma)<0.80) break
}
sigma
## [1] 31.22
# alternative
sigma<-1
while( pnorm(200,160,sigma)-pnorm(120,160,sigma)>=0.80){sigma<-sigma+0.01}
sigma
## [1] 31.22
qestion: x^2 and 2x distributions
x<-c(-1,0,1,2,2.5)
weight<-c(0.2,0.1,0.1,0.3,0.3)
toss<-sample(x,10000,replace=T,weight)
table(toss^2)/length(toss^2)
##
## 0 1 4 6.25
## 0.0981 0.2995 0.2998 0.3026
table(2*toss)/length(2*toss)
##
## -2 0 2 4 5
## 0.2031 0.0981 0.0964 0.2998 0.3026
quetsion: continous vairable density
x <- seq(0,5,0.01)
truth<-rep(0,length(x))
truth[0<=x&x<1]<-2/3
truth[1<=x&x<2]<-1/3
plot(density(abs(runif(1000000,-1,2))),main=NA, ylim=c(0,1),lwd=3,lty=3)
lines(x,truth,col="red",lwd=2)
legend("topright",c("True Density","Estimated Density"), col=c("red","black"),lwd=3,lty=c(1,3))
question: x is randomly selected from 1:4, y randomly select from x
p<-function(x,y) {
sim<-10000
t<-numeric(sim)
for (i in 1:sim) {
a<-sample(1:4,1)
b<-sample(1:a,1)
t[i]<-(a==x)&(b==y) }
mean(t)
}
PF<-matrix(0,4,4)
for (i in 1:4) {
for (j in 1:4) {
PF[i,j]<-p(i, j) } }
PF
## [,1] [,2] [,3] [,4]
## [1,] 0.2412 0.0000 0.0000 0.0000
## [2,] 0.1264 0.1267 0.0000 0.0000
## [3,] 0.0893 0.0827 0.0844 0.0000
## [4,] 0.0629 0.0645 0.0612 0.0594
apply(PF,1,sum)
## [1] 0.2412 0.2531 0.2564 0.2480
apply(PF,2,sum)
## [1] 0.5198 0.2739 0.1456 0.0594
2 discrete variables distribution
x<-sample(1:4, 10000, replace=T, prob=c(1/4, 1/4, 1/4, 1/4))
y<-numeric(10000)
for(i in 1:10000) {
if(x[i]==1) {y[i]<-sample(1:4,1,replace=T, prob=c(1,0,0,0))}
if(x[i]==2) {y[i]<-sample(1:4,1,replace=T, prob=c(1/2,1/2,0,0))}
if(x[i]==3) {y[i]<-sample(1:4,1,replace=T, prob=c(1/3,1/3,1/3,0))}
if(x[i]==4) {y[i]<-sample(1:4,1,replace=T, prob=c(1/4,1/4,1/4,1/4))}
}
z1<-x+y
table(z1)/length(z1)
## z1
## 2 3 4 5 6 7 8
## 0.2463 0.1292 0.2081 0.1469 0.1471 0.0622 0.0602
z2<-x*y
table(z2)/length(z2)
## z2
## 1 2 3 4 6 8 9 12 16
## 0.2463 0.1292 0.0815 0.1894 0.0841 0.0632 0.0839 0.0622 0.0602
z3<-pmax(x,y)
table(z3)/length(z3)
## z3
## 1 2 3 4
## 0.2463 0.2558 0.2495 0.2484
z4<-x/y
table(z4)/length(z4)
## z4
## 1 1.33333333333333 1.5 2
## 0.5170 0.0622 0.0841 0.1924
## 3 4
## 0.0815 0.0628
two normal distributions
X~N(0,1),Y~N(0,1), Z=X+Y, therefre Z~N(0,2)
Z<-function(n){
x<-seq(-4,4,0.01)
truth<-dnorm(x,0,sqrt(2))
plot(density(rnorm(n)+rnorm(n)),main="Density Estimate of the Normal Addition Model",ylim=c(0,0.4),lwd=2,lty=2)
lines(x,truth,col="red",lwd=2)
legend("topright",c("True","Estimated"),col=c("red","black"),lwd=2,lty=c(1,2))
}
Z(10000)
D={(x,y)|x^2 + y^2 <= 1}
x<-runif(10000,-1,1)
y<-runif(10000,-1,1)
a<-x[x^2+y^2<=1]
b<-y[x^2+y^2<=1]
plot(a,b,col=4)
oval
a<-3
b<-1
x<-runif(10000,-a,a)
y<-runif(10000,-b,b)
x1<-x[x^2/a^2+y^2/b^2<=1]
y1<-y[x^2/a^2+y^2/b^2<=1]
plot(x1,y1,col=3)
discrete variable
question: the benefit of products is different.
sim<-10000
t<-numeric(sim)
for (i in 1:sim) {
Y<-1500
X<-rexp(1,rate=1/10)
Y[1<X&X<=2]<-2000
Y[2<X&X<=3]<-2500
Y[3<X]<-3000
t[i]<-Y
}
mean(t)
## [1] 2731.7
continue variable
###Central Limit Theorem for Expotential distribution
layout(matrix(c(1,3,2,4 ),ncol=2))
r<-1000
lambda<-1/100
for (n in c(1,5,10,30)){
mu<-1/lambda
xbar<-numeric(r)
sxbar<-1/(sqrt(n)*lambda)
for(i in 1:r){
xbar[i]<-mean(rexp(n,rate=lambda))
}
hist(xbar,prob=T,main=paste('SampDist.Xbar,n=',n),col=gray(.8))
Npdf<-dnorm(seq(mu-3*sxbar,mu+3*sxbar,0.01),mu,sxbar)
lines(seq(mu-3*sxbar,mu+3*sxbar,0.01),Npdf,lty=2,col=2)
box()
}
#####The central limit theorem for uniform distribution
layout(matrix(c(1,3,2,4),ncol=2))
r<-10000
mu<-5
sigma<-10/sqrt(12)
for (n in c(1,5,10,30)){
xbar<-numeric(r)
sxbar<-sigma/sqrt(n)
for (i in 1:r){
xbar[i]<-mean(runif(n,0,10))}
hist(xbar,prob=T,main=paste('SampDist.Xbar,n=',n),col=gray(0.8),ylim=c(0,1/(sqrt(1*pi)*sxbar)))
XX<-seq(mu-3*sxbar,mu+3*sxbar,0.01)
Npdf<-dnorm(XX,mu,sxbar)
lines(XX,Npdf,lty=2,col=2)
box()}
N <- 5000
set.seed(123)
x <- sample(1:10, N, replace = T)
s <- cumsum(x)
r.avg <- s/(1:N)
options(scipen = 10)
plot(r.avg, ylim=c(1, 10), type = "l", xlab = "Observations"
,ylab = "Probability", lwd = 2)
lines(c(0,N), c(5.5,5.5),col="red", lwd = 2)
x<-c(-2,-1.2,1.5,2.3,3.5)
plot(ecdf(x),col=2)
abline(v=0,col=3)
question: three numbers are from N(2,9), and what is the prob of their mean >3?
A<-matrix(rnorm(30000,2,3),10000,3)
mean(apply(A,1,mean)>3)
## [1] 0.2789
question: eatimate mean and variance
sample<-c(1.38, 3.96, -0.16, 8.12, 6.30, 2.61, -1.35, 0.03, 3.94, 1.11)
n<-length(sample)
muhat<-mean(sample)
sigsqhat<-sum((sample-muhat)^2)/n
muhat
## [1] 2.594
sigsqhat
## [1] 8.133884
loglike<-function(theta){
a<--n/2*log(2*pi)-n/2*log(theta[2])-sum((sample-theta[1])^2)/(2*theta[2])
return(-a)
}
optim(c(2,2),loglike,method="BFGS")$par
## [1] 2.593942 8.130340
n<-30
x<-seq(-6,6,0.01)
y<-seq(-6,6,1)
Truth<-df(x,1,n)
plot(density(rt(10000,n)^2),main="PDF",ylim=c(0,1),lty=2,xlim=c(-6,6)) #simulation
lines(x, dt(x,n), col=3) #t dist
lines(x, dchisq(x,2), col=4) #chisq dist
lines(x,Truth,col=2) #f dist
abline (v=0 ,col=7)
points(y,dbinom(y, size=12, prob=0.2),col=1) #binomial dist
points(y,dpois(y, 6),col=2) #poisson dist
lines(y,dunif(-6:6,min=-6,max=6 ),col=5) #uniform