1.1 Linear programming In Mathematics, linear programming is a method of optimising operations with some constraints. The main objective of linear programming is to maximize or minimize the numerical value. It consists of linear functions which are subjected to the constraints in the form of linear equations or in the form of inequalities. Linear programming is considered an important technique that is used to find the optimum resource utilisation. The term “linear programming” consists of two words as linear and programming. The word “linear” defines the relationship between multiple variables with degree one. The word “programming” defines the process of selecting the best solution from various alternatives
1.2 Characteristics of the linear programming problem Constraints: The limitations should be expressed in the mathematical form, regarding the resource. Objective Function: In a problem, the objective function should be specified in a quantitative way. Linearity: The relationship between two or more variables in the function must be linear. It means that the degree of the variable is one. Finiteness: There should be finite and infinite input and output numbers. In case, if the function has infinite factors, the optimal solution is not feasible. Non-negativity: The variable value should be positive or zero. It should not be a negative value. Decision Variables: The decision variable will decide the output. It gives the ultimate solution of the problem. For any problem, the first step is to identify the decision variables
1.3 Methods to Solve Linear Programming Problems The linear programming problem can be solved using different methods, such as the graphical method, simplex method, or by using tools such as R, open solver etc.
Solve the following problem: \[Maximize, z = 5x_1 + 7x_2\] Subject to, \[x_1 ≤ 16\] \[2x_1 + 3x2 ≤ 19\] \[x_1 + x_2 ≤ 8\] \[x_1, x_2 ≥ 0\]
#install.packages("lpSolve")
library(lpSolve)
obj<-c(5,7)
cons<-matrix(c(1,0,2,3,1,1),nrow = 3,byrow = T)
dir<-c("<=","<=","<=")
rhs<-c(16,19,8)
lp(direction = "max",objective.in = obj,const.mat = cons,const.dir = dir,const.rhs = rhs)
## Success: the objective function is 46
r<-lp(direction = "max",objective.in = obj,const.mat = cons,const.dir = dir,const.rhs = rhs)
r$objval
## [1] 46
r$solution
## [1] 5 3
Solve the following problem: \[Maximize,Z = 2x_1 + x_2\] Subject to, \[x_2 \leq 10\] \[2x_1 + 5x_2 \leq 60\] \[x_1 + x_2 \leq 18\] \[3x_1 + x_2 \leq 44\] and \[x_1 \geq 0, x_2 \geq 0\]
obj<-c(2,1)
cons<-matrix(c(0,1,2,5,1,1,3,1),nrow = 4,byrow = T)
dir<-c("<=","<=","<=","<=")
rhs<-c(10,60,18,44)
r<-lp(direction = "max",objective.in = obj,const.mat = cons,const.dir = dir,const.rhs = rhs)
r$objval
## [1] 31
r$solution
## [1] 13 5
Solve the following problem: \[Minimize, Z = 15x_1 + 20x_2\] subject to \[x_1 + 2x_2 ≥ 10\] \[2x_1 − 3x_2 ≤ 6\] \[x_1 + x_2 ≥ 6\] and \[x_1 ≥ 0, x_2 ≥ 0\]
obj<-c(15,20)
cons<-matrix(c(1,2,2,-3,1,1),nrow = 3,byrow = T)
dir<-c(">=","<=",">=")
rhs<-c(10,6,6)
r<-lp(direction = "min",objective.in = obj,const.mat = cons,const.dir = dir,const.rhs = rhs)
r$objval
## [1] 110
r$solution
## [1] 2 4
Ralph Edmund loves steaks and potatoes. Therefore, he has decided to go on a steady diet of only these two foods (plus some liquids and vitamin supplements) for all his meals. Ralph realizes that this isnt the healthiest diet, so he wants to make sure that he eats the right quantities of the two foods to satisfy some key nutritional requirements. He has obtained the following nutritional and cost information:
| Grans of ingredient per serving | |||
|---|---|---|---|
| Ingralient | Steak | Potatoes | Daily reguirements Grans |
| Carbohydnates | 5 | 15 | \(\geq 50\) |
| Protein | 20 | 5 | \(\geq40\) |
| Pat | 15 | 2 | \(\leq60\) |
| Cest per serving | \(\$ 1\) | \(\$ 2\) |
Ralph wishes to determine the number of daily servings (may be fractional) of steak and potatoes that will meet these requirements at a minimum cost.
obj<-c(4,2)
cons<-matrix(c(5,15,20,5,15,2),nrow = 3,byrow = T)
dir<-c(">=",">=","<=")
rhs<-c(50,40,60)
r<-lp(direction = "min",objective.in = obj,const.mat = cons,const.dir = dir,const.rhs = rhs)
r$objval
## [1] 10.90909
r$solution
## [1] 1.272727 2.909091
The Apex Television Company has to decide on the number of 27 and 20-inch sets to be produced at one of its factories. Market research indicates that at most 40 of the 27 -inch sets and 10 of the 20-inch sets can be sold per month. The maximum number of work- hours available is 500 per month. A 27-inch set requires 20 work-hours and a 20-inch set requires 10 work-hours. Each 27-inch set sold produces a profit of $120 and each 20-inch set produces a profit of $80. A wholesaler has agreed to purchase all the television sets produced if the numbers do not exceed the maximum indicated by the market research.
obj<-c(120,80)
cons<-matrix(c(1,0,0,1,20,10),nrow = 3,byrow = T)
dir<-c("<=","<=","<=")
rhs<-c(40,10,500)
r<-lp(direction = "max",objective.in = obj,const.mat = cons,const.dir = dir,const.rhs = rhs)
r$objval
## [1] 3200
r$solution
## [1] 20 10
1 Life table A life table is a statistical tool that summarizes the mortality experience of a population and yields information about longevity and life expectation. Although it is generally used for studying mortality, the life table format can be used to summarize any duration variable, such as duration of marriage, duration of contraceptive use, etc.
2 Columns of a life table A typical life table contains several columns, each with a unique interpretation. The columns are:
x : Age at exact (single) year (For complete life table) or at interval of years (For abridged life table)
qx : Probability of death between exact age (x, x + 1) (For complete life table) or (x, x + n) (For abridged life table denoted by nqx).
lx : Number of survived persons at age x.
dx : Number of deaths between age (x, x + 1) (For complete life table) or (x, x + n) (For abridged life table denoted by ndx).
Lx : Number of person years lived by the cohort between age (x, x + 1) (For complete life table) or (x, x + n) (For abridged life table denoted by nLx) i.e, Time (in year/day/month/week/any other unit) contribution by each member of the cohort between age (x, x + 1) or (x, x + n). The formula of Lx (for complete life table) is: \[L_x = l_x − \frac{d_x}{2}\] or \[L_x = \frac{(l_x + l_{x+1})}{2}\]
For abridged life table,
\[ _nL_x=\frac{n(l_x+l_{x+n})}{2} \] 6. Tx : Number of years lived beyond age x.The formula of Tx is: \[T_x = L_x + L_{x+1} + · · · + L_{end of the table} = L_x + T_{x+1}\] 7. ex : Expectation of life at age x i.e, how many years a person may live given that his/her current age is x. The formula of Tx is: \[e_x =\frac {T_x}{l_x}\] 3 Notations in life table
| Name Of The Function | For Complete Life table | For Abridged Life Table |
|---|---|---|
| Age | x | x −→ (x + n) |
| Death Prob. | \(q_x\) | \(_nq_x\) |
| No. of deaths | \(d_x\) | \(_nd_x\) |
| Survived persons | \(ℓ_x\) | \(ℓ_x\) |
| Survived person year | \(L_x\) | \(_nL_x\) |
| Survival Time beyond x | \(T_x\) | \(T_x\) |
| Expectation Of Life At Age x | \(e_x\) | \(e_x\) |
4 Fergany’s method of constructing Life table In this section we construct an ordinary life table with data on age-specific death rates based on a simple method suggested by Fergany (1971). In this method the age-specific death rate (nmx) will be converted into the proportion dying in the age interval (nqx) using a simple formula: \[_nq_x = 1 − e^{−n∗_nm_x}\] Once nqx is calculated with age-specific death rates, the remaining columns of the life table are easily calculated using the following relationships: \[l_0 = 100000\] \[_nd_x = l_x ∗_nq_x\] \[l_{x+n} = l_x − _nd_x\] \[_nL_x =\frac {_nd_x}{_nm_x}\] \[T_x = \sum_x^{end of table} (_nL_x)\] \[e_x^0 = \frac {T_x}{l_x}\]
Practice problem: (Please find the file life-table-practice.csv in Google Classroom) Further reading https://www.measureevaluation.org/resources/training/online-courses-and-resources/noncertificate-courses-and-mini-tutorials/multiple-decrement-life-tables/lesson-3.html
Consider the following two vectors: x = [1, 3, 2] y = [−2, 1,−1]
x<-c(1,3,2)
y<-c(-2,1,-1)
#length of x:
m<-t(x)%*%x
l_1<-sqrt(m)
sqrt(t(x)%*%x)
## [,1]
## [1,] 3.741657
#length of y:
n<-t(y)%*%y
l_2<-sqrt(n)
sqrt(t(y)%*%y)
## [,1]
## [1,] 2.44949
angle<-acos(t(x)%*%y/l_1/l_2)
# install.packages("REdaS")
library(REdaS)
## Warning: package 'REdaS' was built under R version 4.3.2
## Loading required package: grid
rad2deg(angle)
## [,1]
## [1,] 96.26395
# Loading grid package
library(REdaS)
# Calling deg2rad() function
deg2rad(0)
## [1] 0
deg2rad(30)
## [1] 0.5235988
deg2rad(50)
## [1] 0.8726646
Consider the following matrix & provide answer to the given questions: \[A =\left(\begin{array} {cc}1 & -5 \\ -5 & 1 \end{array}\right)\]
a<-matrix(data = c(1,-5,-5 ,1),nrow = 2,byrow = T)
eig<-eigen(a)
#(i)
eig$values
## [1] 6 -4
eig$vectors
## [,1] [,2]
## [1,] -0.7071068 -0.7071068
## [2,] 0.7071068 -0.7071068
#(ii)
det(a)
## [1] -24
prod(eig$values)
## [1] -24
1 Spectral Decomposition Let A be a square n × n matrix with n linearly independent eigenvectors qi (where i =1, . . . , n). Then A can be factorized as \[A = QΛQ^{−1}\] where Q is the square n × n matrix whose i th column is the eigenvector qi of A, and Λ is the diagonal matrix whose diagonal elements are the corresponding eigenvalues, \[Λ_{ii} = λi\]. The terminology “spectral decomposition” derives from the fact that the set of eigenvalues of a matrix is also called the “spectrum” of the matrix.
1.1 Computation of power series of matrices using spectral decomposition \[A^2 =(QΛQ^{−1})(QΛQ^{−1})= QΛ(Q^{−1}Q)ΛQ^{−1} = QΛ^2Q^{−1}\] \[A^n = QΛ^nQ^{−1}\]
1.2 Decomposition of real symmetric matrices For every real symmetric matrix A there exists an orthogonal matrix Q and a diagonal matrix Λ such that \[A = QΛQ^⊤\], where Q is an orthogonal matrix whose columns are (real and orthonormal) eigen vectors of A, and Λ is a diagonal matrix whose entries are the eigenvalues of A.
Consider the following matrix & provide the appropriate answer to the given questions: \[A =\left(\begin{array}{ccc} 13 & -4 & 2\\ -4 & 13 & -2\\ 2 & -2 & 10 \end{array}\right) \]
b<-matrix(data = c(13, -4, 2,-4, 13, -2,2, -2, 10),nrow = 3,byrow = T)
eigen<-eigen(b)
Q<-eigen$vectors
delta<-diag(x = eigen$values)
Q
## [,1] [,2] [,3]
## [1,] 0.6666667 -0.7453560 0.0000000
## [2,] -0.6666667 -0.5962848 0.4472136
## [3,] 0.3333333 0.2981424 0.8944272
delta
## [,1] [,2] [,3]
## [1,] 18 0 0
## [2,] 0 9 0
## [3,] 0 0 9
Q %*% delta %*% t(Q)
## [,1] [,2] [,3]
## [1,] 13 -4 2
## [2,] -4 13 -2
## [3,] 2 -2 10
Q<-eigen$vectors
delta<-diag(x = eigen$values^-1)
Q
## [,1] [,2] [,3]
## [1,] 0.6666667 -0.7453560 0.0000000
## [2,] -0.6666667 -0.5962848 0.4472136
## [3,] 0.3333333 0.2981424 0.8944272
delta
## [,1] [,2] [,3]
## [1,] 0.05555556 0.0000000 0.0000000
## [2,] 0.00000000 0.1111111 0.0000000
## [3,] 0.00000000 0.0000000 0.1111111
Q %*% delta %*% t(Q)
## [,1] [,2] [,3]
## [1,] 0.08641975 0.02469136 -0.01234568
## [2,] 0.02469136 0.08641975 0.01234568
## [3,] -0.01234568 0.01234568 0.10493827
Q<-eigen$vectors
delta<-diag(x = (eigen$values)^3)
Q
## [,1] [,2] [,3]
## [1,] 0.6666667 -0.7453560 0.0000000
## [2,] -0.6666667 -0.5962848 0.4472136
## [3,] 0.3333333 0.2981424 0.8944272
delta
## [,1] [,2] [,3]
## [1,] 5832 0 0
## [2,] 0 729 0
## [3,] 0 0 729
Q %*% delta %*% t(Q)
## [,1] [,2] [,3]
## [1,] 2997 -2268 1134
## [2,] -2268 2997 -1134
## [3,] 1134 -1134 1296
Q<-eigen$vectors
delta<-diag(x = (eigen$values)^10)
Q
## [,1] [,2] [,3]
## [1,] 0.6666667 -0.7453560 0.0000000
## [2,] -0.6666667 -0.5962848 0.4472136
## [3,] 0.3333333 0.2981424 0.8944272
delta
## [,1] [,2] [,3]
## [1,] 3.570467e+12 0 0
## [2,] 0.000000e+00 3486784401 0
## [3,] 0.000000e+00 0 3486784401
Q %*% delta %*% t(Q)
## [,1] [,2] [,3]
## [1,] 1.588811e+12 -1.585325e+12 792662320494
## [2,] -1.585325e+12 1.588811e+12 -792662320494
## [3,] 7.926623e+11 -7.926623e+11 399817944648
Birth interval data You are given a dataset in SPSS format BDHS-birth-interval-data.sav. The data contain information on birth interval for 5607 children of age under 5 years with associated background characteristics. The variables, available in this dataset, are given below:
| Variable Name | Variable label |
|---|---|
| caseid | case identification |
| v013 | age in 5-year groups |
| v024 | region |
| v025 | type of place of residence |
| v106 | highest educational level |
| v130 | religion |
| v190 | wealth index |
| v447a | women’s age in years |
| bord | birth order number |
| b4 | sex of child |
| b5 | child is alive |
| b8 | current age of child |
| b11 | preceding birth interval (months) |
| b12 | succeeding birth interval (months) |
library(haven)
BDHS_birth_interval_data <-
read_sav("D:/3rd Year/AST-332 batch-27/BDHS-birth-interval-data.sav")
#View(BDHS_birth_interval_data)
d <- BDHS_birth_interval_data
hist(d$b11)
qqnorm(d$b11)
qqline(d$b11)
summary(d$b11)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 9.00 31.00 47.00 53.84 70.00 246.00
#quantile(x = d$b11,probs = c(.25,.75))
sd(d$b11)
## [1] 30.29951
tapply(X = d$b11,INDEX = d$v106,FUN = mean)
## 0 1 2 3
## 51.93180 53.23751 55.08089 59.07197
tapply(X = d$b11,d$v190,mean)
## 1 2 3 4 5
## 47.57227 51.00567 55.96158 57.47862 60.39755
#aggregate(x = d$b11, list(d$v106),mean)
tapply(X = d$b11,d$v106,mean)
## 0 1 2 3
## 51.93180 53.23751 55.08089 59.07197
tapply(X = d$b11,d$v106,summary)
## $`0`
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 10.00 29.00 44.00 51.93 67.00 214.00
##
## $`1`
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 10.00 31.00 47.00 53.24 68.00 246.00
##
## $`2`
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 9.00 33.00 50.00 55.08 72.00 191.00
##
## $`3`
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 11.00 35.75 54.50 59.07 74.25 172.00
tapply(X = d$b11,d$v190,summary)
## $`1`
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 10.00 29.00 41.00 47.57 59.00 225.00
##
## $`2`
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 9.00 29.00 45.00 51.01 67.00 214.00
##
## $`3`
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 10.00 32.00 49.00 55.96 72.00 246.00
##
## $`4`
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 10.00 33.00 52.00 57.48 74.00 186.00
##
## $`5`
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 10.00 35.25 55.00 60.40 78.00 209.00
t.test(x = d$b11,mu = 48,alternative = "greater")
##
## One Sample t-test
##
## data: d$b11
## t = 13.978, df = 5256, p-value < 2.2e-16
## alternative hypothesis: true mean is greater than 48
## 95 percent confidence interval:
## 53.15386 Inf
## sample estimates:
## mean of x
## 53.84135
data_sylet<-d[d$v024==7,]
#data_sylet
t.test(x =data_sylet$b11,mu = 48,alternative = "less")
##
## One Sample t-test
##
## data: data_sylet$b11
## t = -4.7807, df = 970, p-value = 1.009e-06
## alternative hypothesis: true mean is less than 48
## 95 percent confidence interval:
## -Inf 45.42214
## sample estimates:
## mean of x
## 44.06797
#d$v106
data_edu<-d[d$v106 ==3 &
d$v447a >=19 & d$v447a <=35,]
#View(data_edu)
t.test(x =data_edu$b11,mu = 60,alternative = "greater")
##
## One Sample t-test
##
## data: data_edu$b11
## t = -2.503, df = 221, p-value = 0.9935
## alternative hypothesis: true mean is greater than 60
## 95 percent confidence interval:
## 52.29857 Inf
## sample estimates:
## mean of x
## 55.36036
table(d$b5)
##
## 0 1
## 5033 224
data_urban<-d[d$v025 ==1,]
data_rural<-d[d$v025 ==2,]
t.test(x = data_urban$b11,
y = data_rural$b11,
paired = F,
alternative = "greater",
var.equal = F
)
##
## Welch Two Sample t-test
##
## data: data_urban$b11 and data_rural$b11
## t = 6.8243, df = 2459.5, p-value = 5.544e-12
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
## 5.017497 Inf
## sample estimates:
## mean of x mean of y
## 58.59797 51.98625
data_dhaka<-d[d$v024 ==3,]
data_chittagong<-d[d$v024 ==2,]
t.test(x = data_dhaka$b11,y=data_chittagong$b11,paired = F,alternative = "two.sided",var.equal = F)
##
## Welch Two Sample t-test
##
## data: data_dhaka$b11 and data_chittagong$b11
## t = 4.5309, df = 1644.1, p-value = 6.296e-06
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 3.432866 8.673743
## sample estimates:
## mean of x mean of y
## 54.15284 48.09954
data_poorest<-d[d$v190 ==1,]
data_richest<-d[d$v190 ==5,]
t.test(x = data_poorest$b11,
y = data_richest$b11,
paired = F,
alternative = "less",
var.equal = F)
##
## Welch Two Sample t-test
##
## data: data_poorest$b11 and data_richest$b11
## t = -9.8727, df = 1607.7, p-value < 2.2e-16
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
## -Inf -10.68727
## sample estimates:
## mean of x mean of y
## 47.57227 60.39755
t.test(x = d$b11,y = d$b12,paired = T,alternative = "two.sided")
##
## Paired t-test
##
## data: d$b11 and d$b12
## t = 12.56, df = 683, p-value < 2.2e-16
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
## 11.21813 15.37544
## sample estimates:
## mean difference
## 13.29678
table(d$b5,d$v106)
##
## 0 1 2 3
## 0 1288 1708 1777 260
## 1 61 94 65 4
#table(d$v106, d$b5)
prop.table(table(d$b5,d$v106),margin = 2)
##
## 0 1 2 3
## 0 0.95478132 0.94783574 0.96471227 0.98484848
## 1 0.04521868 0.05216426 0.03528773 0.01515152
#round(pi,5)
round(100*prop.table(table(d$b5,d$v106),margin = 2),0)
##
## 0 1 2 3
## 0 95 95 96 98
## 1 5 5 4 2
#Does the data you extracted from BDHS suggest any evidence of association between mother’s education and child mortality?
chisq.test(d$b5,d$v106)
##
## Pearson's Chi-squared test
##
## data: d$b5 and d$v106
## X-squared = 11.558, df = 3, p-value = 0.009063
Calculate power of the following test considering level of significance α = 0.05, σ2 = 64,n = 16, and the true mean μ= 102. Additionally, create a power curve while varying the difference between the means within a range of up to 10 in both directions. \[H_0 : μ = 100\] \[H_a : μ \neq 100\]
powerfun<-function(alpha=.05,sigma,n,mu,pmean){
a=qnorm(alpha/2)*sigma/sqrt(n)+mu
b=-qnorm(alpha/2)*sigma/sqrt(n)+mu
l=(a-pmean)/(sigma/sqrt(n))
u=(b-pmean)/(sigma/sqrt(n))
beta=pnorm(u)-pnorm(l)
power<-1-beta
return(power)
}
powerfun(alpha = .05,sigma = 8,n = 16,mu = 100,pmean = 102)
## [1] 0.170075
#############################################
#if true mean differ 90 to 110
#pmean>>>>>>>90:100
plot(x=90:110,y = powerfun(alpha = .05,sigma = 8,n = 16,mu = 100,pmean = 90:110)
,type = "l")
#############################################
# if sample size differ 15 to 100
plot(x=15:100,y = powerfun(alpha = .05,sigma = 8,n = 15:100,mu = 100,pmean = 105)
,type = "l")
#############################################
# if population variance differ 50 to 80
plot(x=50:80,y = powerfun(alpha = .05,sigma = 50:80,n = 16,mu = 100,pmean = 102)
,type = "l")
#############################################
# if level of signifacnce differ 0.01 to 0.1
alpha=seq(.01,.1,by=.01)
plot(x=alpha,y = powerfun(alpha = alpha,sigma = 8,n = 16,mu = 100,pmean = 102)
,type = "l")
b<-(-1.96*2+100-102)/2
pnorm(q = (-1.96*2+100-102)/2)
## [1] 0.001538195
power <- function(sigma, n, mu_a) {
l=(-1.96*sigma/sqrt(n) +100-mu_a)/(sigma/sqrt(n))
u=(1.96*sigma/sqrt(n) +100-mu_a)/(sigma/sqrt(n))
beta=pnorm(u)-pnorm(l)
1-beta
}
power(sigma = 8,n = 16,mu_a = 102)
## [1] 0.1700658
mu_a<-90:110
store<-NULL
for(i in 1:length(mu_a)){
store[i]<- power(sigma = 8,n = 16,mu_a = mu_a[i])
}
store
## [1] 0.99881711 0.99445738 0.97932484 0.93821985 0.85083040 0.70540558
## [7] 0.51599091 0.32302820 0.17006580 0.07909189 0.04999579 0.07909189
## [13] 0.17006580 0.32302820 0.51599091 0.70540558 0.85083040 0.93821985
## [19] 0.97932484 0.99445738 0.99881711
plot(mu_a,store,type = "l")
power <- function(alpha =.05, sigma, n, mu_a) {
u=(-qnorm(1-alpha*sigma/sqrt(n)) +100-mu_a)/(sigma/sqrt(n))
beta=pnorm(u)
1-beta
}
power(sigma = 8,n = 16,mu_a = 102)
## [1] 0.949578
mu_a<-90:110
store<-NULL
for(i in 1:length(mu_a)){
store[i]<- power(sigma = 8,n = 16,mu_a = mu_a[i])
}
store
## [1] 6.526216e-06 5.687377e-05 3.908081e-04 2.123392e-03 9.156594e-03
## [6] 3.149768e-02 8.703777e-02 1.951084e-01 3.597137e-01 5.559765e-01
## [11] 7.391658e-01 8.730184e-01 9.495780e-01 9.838539e-01 9.958642e-01
## [16] 9.991575e-01 9.998641e-01 9.999827e-01 9.999983e-01 9.999999e-01
## [21] 1.000000e+00
plot(mu_a,store,type = "l")
Do the same as Question 4, for a less than type alternative hypothesis. \[H_0 : μ = 100\] \[H_a : μ ≤ 100\]
powerfun1<-function(alpha=.05,sigma,n,mu,pmean){
a=qnorm(alpha)*sigma/sqrt(n)+mu
l=(a-pmean)/(sigma/sqrt(n))
beta=1-pnorm(l)
power<-1-beta
return(power)
}
powerfun1(alpha = .05,sigma = 8,n = 16,mu = 100,pmean = 102)
## [1] 0.004086313
Do the same as Question 4, for a grater than type alternative hypothesis. \[H_0 : μ = 100\] \[H_a : μ ≥ 100\]
powerfun<-function(alpha=.05,sigma,n,mu,pmean){
b=-qnorm(alpha)*sigma/sqrt(n)+mu
u=(b-pmean)/(sigma/sqrt(n))
beta=pnorm(u)
power<-1-beta
return(power)
}
powerfun(alpha = .05,sigma = 8,n = 16,mu = 100,pmean = 102)
## [1] 0.259511
https://online.stat.psu.edu/stat505/lesson/7/7.1/7.1.3
A company wants to assess whether there is a significant difference in the purchasing behavior of its customers across multiple product categories. They collect data from a random sample of 80 customers and record their average spending in three different product categories: electronics (X1), clothing (X2), and home appliances (X3). The data is contained in the expenditure.csv file. The company wants to determine if the average spending is significantly different from the hypothesized population mean vector [300, 100, 250 250]. Perform the appropriate test to answer this.
hote_one<-function(x,mu=rep( 0,times= ncol(x))){
n<-nrow(x)
p<-ncol(x)
x_bar<-apply(X = x,MARGIN = 2,FUN = mean)
sigma_inv<-solve(var(x))
t2<-n*t(x_bar-mu) %*% sigma_inv %*% (x_bar-mu)
f<- (n-p)/(p*(n-1))*t2
p_val<-pf(q = f,df1 = p,df2 = n-p,lower.tail = F)
return(list(t2=t2,F=f,p_value=p_val))
}
library(mvtnorm)
## Warning: package 'mvtnorm' was built under R version 4.3.2
n <- 100 # observations
p <- 18 # parameter dimension
mu <- rep(0,p) # no signal: mu=0
x <- rmvnorm(n = n, mean = mu)
dim(x)
## [1] 100 18
hote_one(x)
## $t2
## [,1]
## [1,] 32.26588
##
## $F
## [,1]
## [1,] 1.484738
##
## $p_value
## [,1]
## [1,] 0.1170151
#install.packages("DEoptimR")
#install.packages("rrcov")
library(rrcov)
## Warning: package 'rrcov' was built under R version 4.3.2
## Loading required package: robustbase
## Warning: package 'robustbase' was built under R version 4.3.2
## Scalable Robust Estimators with High Breakdown Point (version 1.7-4)
library(DEoptimR)
#DescTools::HotellingsT2Test(x = x)
rrcov::T2.test(x)
##
## One-sample Hotelling test
##
## data: x
## T2 = 32.2659, F = 1.4847, df1 = 18, df2 = 82, p-value = 0.117
## alternative hypothesis: true mean vector is not equal to (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)'
##
## sample estimates:
## [,1] [,2] [,3] [,4] [,5]
## mean x-vector 0.01369284 -0.01057615 0.1205584 -0.01158277 0.004766602
## [,6] [,7] [,8] [,9] [,10] [,11]
## mean x-vector 0.1472291 0.1494278 -0.1201253 -0.23926 -0.02874162 -0.01802641
## [,12] [,13] [,14] [,15] [,16] [,17]
## mean x-vector -0.03766983 0.1811258 0.1902189 -0.2508392 0.03109334 -0.01059385
## [,18]
## mean x-vector -0.02673035
# Ques 1.
library(readxl)
## Warning: package 'readxl' was built under R version 4.3.2
expenditure <- read_excel("D:/3rd Year/AST-332 batch-27/expenditure.xlsx")
#View(expenditure)
hote_one(expenditure[,-1], mu=c(300,100,250))
## $t2
## [,1]
## [1,] 3.216412
##
## $F
## [,1]
## [1,] 1.044995
##
## $p_value
## [,1]
## [1,] 0.3775634
rrcov::T2.test(expenditure[,-1], mu=c(300,100,250))
##
## One-sample Hotelling test
##
## data: expenditure[, -1]
## T2 = 3.2164, F = 1.0450, df1 = 3, df2 = 77, p-value = 0.3776
## alternative hypothesis: true mean vector is not equal to (300, 100, 250)'
##
## sample estimates:
## X1 (Electronics) X2 (Clothing) X3 (Home Appliances)
## mean x-vector 290.15 98.15 252
https://online.stat.psu.edu/stat505/lesson/7/7.1/7.1.12
Test whether the mean differs between the following two groups:
| Group 01 | Group 02 | ||
|---|---|---|---|
| x11 | x12 | x21 | x22 |
| 6 | 27 | 25 | 15 |
| 6 | 23 | 28 | 13 |
| 18 | 64 | 36 | 22 |
| 8 | 44 | 35 | 29 |
| 11 | 30 | 15 | 31 |
| 34 | 75 | 44 | 64 |
| 28 | 26 | 42 | 30 |
| 71 | 124 | 54 | 64 |
| 43 | 54 | 34 | 56 |
| 33 | 30 | 29 | 20 |
| 20 | 14 | 39 | 21 |
Help: Consider the following hypotheses are to be tested: \[\begin{array}{c} H_0:\left[\begin{array}{l} \mu_{11} \\ \mu_{12} \end{array}\right]-\left[\begin{array}{l} \mu_{21} \\ \mu_{22} \end{array}\right]=\left[\begin{array}{l} \delta_1 \\ \delta_2 \end{array}\right]=\left[\begin{array}{l} 0 \\ 0 \end{array}\right] \\ H_1:\left[\begin{array}{l} \mu_{11} \\ \mu_{12} \end{array}\right]-\left[\begin{array}{l} \mu_{21} \\ \mu_{22} \end{array}\right] \neq\left[\begin{array}{l} \delta_1 \\ \delta_2 \end{array}\right] \neq\left[\begin{array}{l} 0 \\ 0 \end{array}\right] \end{array}\]
The corresponding test statistics is given by \[F = \frac{(n_1 + n_2 − p − 1) × T^2}{(n_1 + n_2 − 2) × p}∼ F_{p,n_1+n_2−p−1}\] where \[(1) T^2 = (\bar x −\bar y − δ)′ × S_p^{-1} × (\bar x −\bar y − δ)\] \[(2) S_p = \left( \frac {1}{n_1} + \frac {1}{n_2} \right)× S_{pooled}\] \[(3) S_{pooled} = \frac {(n_1−1)S_x+(n_2−1)S_y} {n_1+n_2−2}\]
hote_Two <- function(data1, data2, delta) {
data1 <- as.matrix(data1)
data2 <- as.matrix(data2)
p <- dim(data1)[2]
n1 <- dim(data1)[1]
n2 <- dim(data2)[1]
x1_bar = as.matrix(apply(X = data1, MARGIN = 2, FUN = mean))
x2_bar = as.matrix(apply(X = data2, MARGIN = 2, FUN = mean))
s1 = as.matrix(cov(data1))
s2 = as.matrix(cov(data2))
s_pool = as.matrix((s1 * (n1 - 1) + s2 * (n2 - 1)) / (n1 + n2 - 2))
m <- x1_bar - x2_bar - as.matrix(delta)
t2 <- t(m) %*% solve(s_pool * ((1 / n1) + (1 / n2))) %*% m
f = (n1 + n2 - p - 1) / (p * (n1 + n2 - 2)) * t2
p_val = pf(
q = f,
df1 = p,
df2 = n1 + n2 - p - 1,
lower.tail = F
)
return(c(
T2 = t2,
F_stat = f,
p_value = p_val
))
}
#1st Population
x11 <- c(6, 6, 18, 8, 11, 34, 28, 71, 43, 33, 20)
x12 <- c(27, 23, 64, 44, 30, 75, 26, 124, 54, 30, 14)
#2nd Population
x21 <- c(25, 28, 36, 35, 15, 44, 42, 54, 34, 29, 39)
x22 <- c(15, 13, 22, 29, 31, 64, 30, 64, 56, 20, 21)
data1 <- data.frame(x11, x12)
data2 <- data.frame(x21, x22)
delta <- rep(0, 2)
hote_Two(data1 = data1,data2 = data2,delta = delta)
## T2 F_stat p_value
## 12.664804984 6.015782367 0.009462901
rrcov::T2.test(x = data1,y = data2)
##
## Two-sample Hotelling test
##
## data: data1 and data2
## T2 = 12.6648, F = 6.0158, df1 = 2, df2 = 19, p-value = 0.009463
## alternative hypothesis: true difference in mean vectors is not equal to (0,0)
## sample estimates:
## x11 x12
## mean x-vector 25.27273 46.45455
## mean y-vector 34.63636 33.18182
# When the data is in proper (long) format
dat <- data.frame(group=factor(rep(1:2, each=11)),
x1= c(x11, x21),
x2= c(x12, x22))
rrcov::T2.test(cbind(x1,x2)~group, data=dat)
##
## Two-sample Hotelling test
##
## data: cbind(x1, x2) by group
## T2 = 12.6648, F = 6.0158, df1 = 2, df2 = 19, p-value = 0.009463
## alternative hypothesis: true difference in mean vectors is not equal to (0,0)
## sample estimates:
## x1 x2
## mean x-vector 25.27273 46.45455
## mean y-vector 34.63636 33.18182
Test whether there exists mean difference between two species of flower:‘setosa’ & ‘versicolor’for built in data set ‘iris’.
data("iris")
data1 <- iris[iris$Species == "setosa", ]
data1
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
## 7 4.6 3.4 1.4 0.3 setosa
## 8 5.0 3.4 1.5 0.2 setosa
## 9 4.4 2.9 1.4 0.2 setosa
## 10 4.9 3.1 1.5 0.1 setosa
## 11 5.4 3.7 1.5 0.2 setosa
## 12 4.8 3.4 1.6 0.2 setosa
## 13 4.8 3.0 1.4 0.1 setosa
## 14 4.3 3.0 1.1 0.1 setosa
## 15 5.8 4.0 1.2 0.2 setosa
## 16 5.7 4.4 1.5 0.4 setosa
## 17 5.4 3.9 1.3 0.4 setosa
## 18 5.1 3.5 1.4 0.3 setosa
## 19 5.7 3.8 1.7 0.3 setosa
## 20 5.1 3.8 1.5 0.3 setosa
## 21 5.4 3.4 1.7 0.2 setosa
## 22 5.1 3.7 1.5 0.4 setosa
## 23 4.6 3.6 1.0 0.2 setosa
## 24 5.1 3.3 1.7 0.5 setosa
## 25 4.8 3.4 1.9 0.2 setosa
## 26 5.0 3.0 1.6 0.2 setosa
## 27 5.0 3.4 1.6 0.4 setosa
## 28 5.2 3.5 1.5 0.2 setosa
## 29 5.2 3.4 1.4 0.2 setosa
## 30 4.7 3.2 1.6 0.2 setosa
## 31 4.8 3.1 1.6 0.2 setosa
## 32 5.4 3.4 1.5 0.4 setosa
## 33 5.2 4.1 1.5 0.1 setosa
## 34 5.5 4.2 1.4 0.2 setosa
## 35 4.9 3.1 1.5 0.2 setosa
## 36 5.0 3.2 1.2 0.2 setosa
## 37 5.5 3.5 1.3 0.2 setosa
## 38 4.9 3.6 1.4 0.1 setosa
## 39 4.4 3.0 1.3 0.2 setosa
## 40 5.1 3.4 1.5 0.2 setosa
## 41 5.0 3.5 1.3 0.3 setosa
## 42 4.5 2.3 1.3 0.3 setosa
## 43 4.4 3.2 1.3 0.2 setosa
## 44 5.0 3.5 1.6 0.6 setosa
## 45 5.1 3.8 1.9 0.4 setosa
## 46 4.8 3.0 1.4 0.3 setosa
## 47 5.1 3.8 1.6 0.2 setosa
## 48 4.6 3.2 1.4 0.2 setosa
## 49 5.3 3.7 1.5 0.2 setosa
## 50 5.0 3.3 1.4 0.2 setosa
data1 <- data1[,-5]
data2 <- iris[iris$Species == "versicolor", ]
data2 <- data2[,-5]
delta <- rep(0, 4)
hote_Two(data1, data2, delta)
## T2 F_stat p_value
## 2.580839e+03 6.254583e+02 2.664857e-67
rrcov::T2.test(x = data1, y=data2)
##
## Two-sample Hotelling test
##
## data: data1 and data2
## T2 = 2580.84, F = 625.46, df1 = 4, df2 = 95, p-value < 2.2e-16
## alternative hypothesis: true difference in mean vectors is not equal to (0,0,0,0)
## sample estimates:
## Sepal.Length Sepal.Width Petal.Length Petal.Width
## mean x-vector 5.006 3.428 1.462 0.246
## mean y-vector 5.936 2.770 4.260 1.326
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.3.2
## Warning: package 'purrr' was built under R version 4.3.2
## Warning: package 'lubridate' was built under R version 4.3.3
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.3 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.3 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.0
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
data("iris")
data1 <- iris[iris$Species == "setosa", ]
data1
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
## 7 4.6 3.4 1.4 0.3 setosa
## 8 5.0 3.4 1.5 0.2 setosa
## 9 4.4 2.9 1.4 0.2 setosa
## 10 4.9 3.1 1.5 0.1 setosa
## 11 5.4 3.7 1.5 0.2 setosa
## 12 4.8 3.4 1.6 0.2 setosa
## 13 4.8 3.0 1.4 0.1 setosa
## 14 4.3 3.0 1.1 0.1 setosa
## 15 5.8 4.0 1.2 0.2 setosa
## 16 5.7 4.4 1.5 0.4 setosa
## 17 5.4 3.9 1.3 0.4 setosa
## 18 5.1 3.5 1.4 0.3 setosa
## 19 5.7 3.8 1.7 0.3 setosa
## 20 5.1 3.8 1.5 0.3 setosa
## 21 5.4 3.4 1.7 0.2 setosa
## 22 5.1 3.7 1.5 0.4 setosa
## 23 4.6 3.6 1.0 0.2 setosa
## 24 5.1 3.3 1.7 0.5 setosa
## 25 4.8 3.4 1.9 0.2 setosa
## 26 5.0 3.0 1.6 0.2 setosa
## 27 5.0 3.4 1.6 0.4 setosa
## 28 5.2 3.5 1.5 0.2 setosa
## 29 5.2 3.4 1.4 0.2 setosa
## 30 4.7 3.2 1.6 0.2 setosa
## 31 4.8 3.1 1.6 0.2 setosa
## 32 5.4 3.4 1.5 0.4 setosa
## 33 5.2 4.1 1.5 0.1 setosa
## 34 5.5 4.2 1.4 0.2 setosa
## 35 4.9 3.1 1.5 0.2 setosa
## 36 5.0 3.2 1.2 0.2 setosa
## 37 5.5 3.5 1.3 0.2 setosa
## 38 4.9 3.6 1.4 0.1 setosa
## 39 4.4 3.0 1.3 0.2 setosa
## 40 5.1 3.4 1.5 0.2 setosa
## 41 5.0 3.5 1.3 0.3 setosa
## 42 4.5 2.3 1.3 0.3 setosa
## 43 4.4 3.2 1.3 0.2 setosa
## 44 5.0 3.5 1.6 0.6 setosa
## 45 5.1 3.8 1.9 0.4 setosa
## 46 4.8 3.0 1.4 0.3 setosa
## 47 5.1 3.8 1.6 0.2 setosa
## 48 4.6 3.2 1.4 0.2 setosa
## 49 5.3 3.7 1.5 0.2 setosa
## 50 5.0 3.3 1.4 0.2 setosa
data1 <- data1 %>% select(Sepal.Length,Petal.Length)
data2 <- iris[iris$Species == "versicolor", ]
data2 <- data2[,c(1,3)]
delta <- rep(0, 2)
rrcov::T2.test(x = data1, y=data2)
##
## Two-sample Hotelling test
##
## data: data1 and data2
## T2 = 1918.32, F = 949.37, df1 = 2, df2 = 97, p-value < 2.2e-16
## alternative hypothesis: true difference in mean vectors is not equal to (0,0)
## sample estimates:
## Sepal.Length Petal.Length
## mean x-vector 5.006 1.462
## mean y-vector 5.936 4.260
https://online.stat.psu.edu/stat505/lesson/8
ANOVA: Testing mean difference more than two groups (i.e, treatments) for a single variable. MANOVA: Testing mean difference more than two groups (i.e, treatments) for at least two variable. Consider the following data set of 3 different groups with 2 variables. Test whether the mean difference exists among these 3 different groups
| K1 | K2 | K3 | |||
|---|---|---|---|---|---|
| y1 | y2 | y1 | y2 | y1 | y2 |
| 2 | 3 | 4 | 8 | 7 | 6 |
| 3 | 4 | 5 | 6 | 8 | 7 |
| 5 | 4 | 5 | 7 | 10 | 8 |
| 2 | 5 | 9 | 5 | ||
| 7 | 6 | ||||
| \(\bar y_{11}=3\) | \(\bar y_{21}=4\) | \(\bar y_{12} =5\) | \(\bar y_{22} =7\) | \(\bar y_{13}=8.2\) | \(\bar y_{13}=6.4\) |
\[H_0 : μ_1 = μ_2 = μ_3\] \[H_1 : At-least-one-pair- is-unequal\] Assumptions In MANOVA:
group <- as.factor(rep(1:3,c(4,3,5)))
y1 <- c(2,3,5,2, 4,5,5, 7,8,10,9,7)
y2 <- c(3,4,4,5, 8,6,7, 6,7,8,5,6)
manova.data <- data.frame(y1,y2,group)
manova.data
## y1 y2 group
## 1 2 3 1
## 2 3 4 1
## 3 5 4 1
## 4 2 5 1
## 5 4 8 2
## 6 5 6 2
## 7 5 7 2
## 8 7 6 3
## 9 8 7 3
## 10 10 8 3
## 11 9 5 3
## 12 7 6 3
analysis <- manova(cbind(y1,y2)~group, manova.data)
summary(test = "Wilks", analysis)
## Df Wilks approx F num Df den Df Pr(>F)
## group 2 0.077761 10.344 4 16 0.0002475 ***
## Residuals 9
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
1.1 Ques 1 Let’s take the built-in iris data set. Suppose we want to know if there is any important difference, in sepal and petal length, between the different species. How can we test it?
#iris$Species
summary(aov(iris$Sepal.Length~iris$Species))
## Df Sum Sq Mean Sq F value Pr(>F)
## iris$Species 2 63.21 31.606 119.3 <2e-16 ***
## Residuals 147 38.96 0.265
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
library(tidyverse)
iris%>%
group_by(Species)%>%
summarise(a=mean(Sepal.Length))
## # A tibble: 3 × 2
## Species a
## <fct> <dbl>
## 1 setosa 5.01
## 2 versicolor 5.94
## 3 virginica 6.59
summarise(group_by(iris,Species),a=mean(Sepal.Length))
## # A tibble: 3 × 2
## Species a
## <fct> <dbl>
## 1 setosa 5.01
## 2 versicolor 5.94
## 3 virginica 6.59
aggregate(iris,by = list(iris$Species),FUN = mean)
## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA
## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA
## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA
## Group.1 Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 setosa 5.006 3.428 1.462 0.246 NA
## 2 versicolor 5.936 2.770 4.260 1.326 NA
## 3 virginica 6.588 2.974 5.552 2.026 NA
analysis1 <- manova(cbind(Sepal.Length, Petal.Length)~Species, iris )
summary(test = "Wilks", analysis1)
## Df Wilks approx F num Df den Df Pr(>F)
## Species 2 0.039878 292.56 4 292 < 2.2e-16 ***
## Residuals 147
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
When the idea of “Multivariate Linear Regression” is used? 1. Fit a multivariate straight line regression model using the following data set:
z1= 0 1 2 3 4 y1= 1 4 3 8 9 y2= −1 −1 2 3 2
z1 <- 0:4
y1 <- c(1,4,3,8,9)
y2 <- c(-1,-1,2,3,2)
reg.data <- data.frame(z1,y1,y2)
reg.data
## z1 y1 y2
## 1 0 1 -1
## 2 1 4 -1
## 3 2 3 2
## 4 3 8 3
## 5 4 9 2
analysis <- lm(cbind(y1,y2)~z1, data = reg.data)
summary(analysis)
## Response y1 :
##
## Call:
## lm(formula = y1 ~ z1, data = reg.data)
##
## Residuals:
## 1 2 3 4 5
## 0 1 -2 1 0
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.0000 1.0954 0.913 0.4286
## z1 2.0000 0.4472 4.472 0.0208 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.414 on 3 degrees of freedom
## Multiple R-squared: 0.8696, Adjusted R-squared: 0.8261
## F-statistic: 20 on 1 and 3 DF, p-value: 0.02084
##
##
## Response y2 :
##
## Call:
## lm(formula = y2 ~ z1, data = reg.data)
##
## Residuals:
## 1 2 3 4 5
## -1.11e-16 -1.00e+00 1.00e+00 1.00e+00 -1.00e+00
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.0000 0.8944 -1.118 0.3450
## z1 1.0000 0.3651 2.739 0.0714 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.155 on 3 degrees of freedom
## Multiple R-squared: 0.7143, Adjusted R-squared: 0.619
## F-statistic: 7.5 on 1 and 3 DF, p-value: 0.07142
fit1 <- lm(y1~z1, data = reg.data)
summary(fit1)
##
## Call:
## lm(formula = y1 ~ z1, data = reg.data)
##
## Residuals:
## 1 2 3 4 5
## 0 1 -2 1 0
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.0000 1.0954 0.913 0.4286
## z1 2.0000 0.4472 4.472 0.0208 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.414 on 3 degrees of freedom
## Multiple R-squared: 0.8696, Adjusted R-squared: 0.8261
## F-statistic: 20 on 1 and 3 DF, p-value: 0.02084
fit2 <- lm(y2~z1, data = reg.data)
summary(fit2)
##
## Call:
## lm(formula = y2 ~ z1, data = reg.data)
##
## Residuals:
## 1 2 3 4 5
## -1.11e-16 -1.00e+00 1.00e+00 1.00e+00 -1.00e+00
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.0000 0.8944 -1.118 0.3450
## z1 1.0000 0.3651 2.739 0.0714 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.155 on 3 degrees of freedom
## Multiple R-squared: 0.7143, Adjusted R-squared: 0.619
## F-statistic: 7.5 on 1 and 3 DF, p-value: 0.07142
vcov(analysis)
## y1:(Intercept) y1:z1 y2:(Intercept) y2:z1
## y1:(Intercept) 1.2000000 -0.40000000 -0.4000000 0.13333333
## y1:z1 -0.4000000 0.20000000 0.1333333 -0.06666667
## y2:(Intercept) -0.4000000 0.13333333 0.8000000 -0.26666667
## y2:z1 0.1333333 -0.06666667 -0.2666667 0.13333333
vcov(fit1)
## (Intercept) z1
## (Intercept) 1.2 -0.4
## z1 -0.4 0.2
vcov(fit2)
## (Intercept) z1
## (Intercept) 0.8000000 -0.2666667
## z1 -0.2666667 0.1333333
we get the cov(B1,B2) from MVT Regression, which might be required to test hypothesis involving both B1, and B2.
cov(B1,B2) is not obtainable from univariate regression
Population pyramid A population pyramid (age structure diagram) or “age-sex pyramid” is a graphical illustration of the distribution of a population (typically that of a country or region of the world) by age groups and sex; it typically takes the shape of a pyramid when the population is growing. Males are usually shown on the left and females on the right, and they may be measured in absolute numbers or as a percentage of the total population.
The Three Basic Shapes of Population Pyramids
Expansive Expansive population pyramids are used to describe populations that are young and growing. They are often characterized by their typical ‘pyramid’ shape, which has a broad base and narrow top. Expansive population pyramids show a larger percentage of the population in the younger age cohorts, usually with each age cohort smaller in size than the one below it. These types of populations are typically representative of developing nations, whose populations often have high fertility rates and lower than average life expectancies.
Constrictive Constrictive population pyramids are used to describe populations that are elderly and shrinking. Constrictive pyramids can often look like beehives and typically have an inverted shape with the graph tapering in at the bottom. Constrictive pyramids have smaller percentages of people in the younger age cohorts and are typically characteristic of countries with higher levels of social and economic development, where access to quality education and health care is available to a large portion of the population.
Stationary Stationary, or near stationary, population pyramids are used to describe populations that are not growing. They are characterized by their rectangular shape, displaying somewhat equal percentages across age cohorts that taper off toward the top. These pyramids are often characteristic of developed nations, where birth rates are low and overall quality of life is high.
https://www.visualcapitalist.com/population-pyramids-compared/
The sheet1 of lec11-population-pyramid.xlsx shows single year population of USA by sex.
# load library MASS
library(MASS)
##
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
##
## select
# set seed and create data vectors
set.seed(98989)
sample_size <- 100
sample_meanvector <- c(10, 5)
sample_covariance_matrix <- matrix(c(10, 5, 5, 9),
ncol = 2)
# create bivariate normal distribution
sample_distribution <- mvrnorm(n = sample_size,
mu = sample_meanvector,
Sigma = sample_covariance_matrix)
# print top of distribution
head(sample_distribution)
## [,1] [,2]
## [1,] 8.458875 6.1000172
## [2,] 9.914924 5.4988621
## [3,] 8.376009 4.8306038
## [4,] 8.720649 0.1349441
## [5,] 7.858077 8.4643691
## [6,] 10.283415 -0.3931910
plot(sample_distribution, xlab = expression(X[1]), ylab = expression(X[2]))
### Question 1 The pdf of a Bivariate Normal (BVN) distribution is given
by - \[f_{X_1,X_2} (X_1 = x_1,X_2 = x_2) =
\frac{1}{2 \pi|Σ|^{1/2}}exp^{\frac {-1}{2(1- \rho_{12}^2)}}
\left[\left( \frac {x_1- \mu_1}{\sqrt{\sigma_{11}}}\right)^2
+\left( \frac {x_2- \mu_2}{\sqrt{\sigma_{22}}}\right)^2-2\rho_{12}
\left( \frac {x_1- \mu_1}{\sqrt{\sigma_{11}}}\right)
\left( \frac {x_2- \mu_2}{\sqrt{\sigma_{22}}}\right)
\right
]\] \[=\frac{1}{2 \pi
\sqrt{\sigma_{11} \sigma_{22}(1- \rho_{12}^2) } }exp^{\frac {-1}{2(1-
\rho_{12}^2)}}
\left[\left( \frac {x_1- \mu_1}{\sqrt{\sigma_{11}}}\right)^2
+\left( \frac {x_2- \mu_2}{\sqrt{\sigma_{22}}}\right)^2-2\rho_{12}
\left( \frac {x_1- \mu_1}{\sqrt{\sigma_{11}}}\right)
\left( \frac {x_2- \mu_2}{\sqrt{\sigma_{22}}}\right)
\right
]\]
Consider that X1 & X2 follow the given Bivariate Normal distribution: \[X_1,X_2 ∼ N_2 (μ_1 = 0, μ_2 = 0, σ_{11} = 1, σ_{22} = 1, ρ_{12} = 0.75)\]
Write a function which takes X1 & X2 as input and produce a BVN density for X1 =X2 = 1. Then plot the BVN density curve.
Find the marginal distribution for both X1 & X2.
Bivariate <- function(x1, x2){
mu1<- 0
mu2 <- 0
sig11 <- 1
sig22 <- 1
rho <- 0.75
term1 <- 1/(2*pi*sqrt(sig11*sig22*(1 - rho**2)))
term2 <- 1/(2*(1 - rho**2))
term3 <- ((x1 - mu1)/sqrt(sig11))**2
term4 <- ((x2 - mu2)/sqrt(sig22))**2
term5 <- 2*rho*((x1 - mu1)/sqrt(sig11))*((x2 - mu2)/sqrt(sig22))
z <- term1*exp(-term2*(term3+term4-term5))
return(z)
}
#i
Bivariate(x1 = 1, x2 = 1)
## [1] 0.1358823
mvtnorm::dmvnorm(x = c(1,1),
mean = c(0,0),
sigma = matrix(c(1, 0.75, 0.75, 1), ncol = 2) )
## [1] 0.1358823
#### x1 and x2 are created arbitrarily
x1 <- seq(-3, 3, length.out = 100)
x2 <- seq(-3, 3, length.out = 100)
#'outer()' function produce each and every possible values of the combination of x1 and x2
z <- outer(x1,x2,Bivariate)
library(mvtnorm)
#z <- outer(x1,x2,mvtnorm::dmvnorm)
#head(z)[1:5,1:5]
persp(x1,x2,z, col = "red", theta = 45, phi = 0.20)
title(main = "Bivariate Normal Distribution")
#ii
par(mfrow = c(2, 1))
fx1 <- rowSums(z)
plot(x1,fx1, type = "l", col = 2, lwd = 3, ann = F)
title(main = "Marginal Distribution of X1", xlab = "x", ylab = "f(x)")
fx2 <- colSums(z)
plot(x2,fx2, type = "l", col = 4, lwd = 3, ann = F)
title(main = "Marginal Distribution of X2", xlab = "x", ylab = "f(x)")
mvtnorm::dmvnorm
## function (x, mean = rep(0, p), sigma = diag(p), log = FALSE,
## checkSymmetry = TRUE)
## {
## if (is.vector(x))
## x <- matrix(x, ncol = length(x))
## p <- ncol(x)
## if (!missing(mean)) {
## if (!is.null(dim(mean)))
## dim(mean) <- NULL
## if (length(mean) != p)
## stop("x and mean have non-conforming size")
## }
## if (!missing(sigma)) {
## if (p != ncol(sigma))
## stop("x and sigma have non-conforming size")
## if (checkSymmetry && !isSymmetric(sigma, tol = sqrt(.Machine$double.eps),
## check.attributes = FALSE))
## stop("sigma must be a symmetric matrix")
## }
## dec <- tryCatch(base::chol(sigma), error = function(e) e)
## if (inherits(dec, "error")) {
## x.is.mu <- colSums(t(x) != mean) == 0
## logretval <- rep.int(-Inf, nrow(x))
## logretval[x.is.mu] <- Inf
## }
## else {
## tmp <- backsolve(dec, t(x) - mean, transpose = TRUE)
## rss <- colSums(tmp^2)
## logretval <- -sum(log(diag(dec))) - 0.5 * p * log(2 *
## pi) - 0.5 * rss
## }
## names(logretval) <- rownames(x)
## if (log)
## logretval
## else exp(logretval)
## }
## <bytecode: 0x0000026c75763618>
## <environment: namespace:mvtnorm>
#install.packages("rrcov")
library(rrcov)
rrcov::T2.test
## function (x, ...)
## UseMethod("T2.test")
## <bytecode: 0x0000026c71788550>
## <environment: namespace:rrcov>
outer(1:5,1:5,FUN = "*")
## [,1] [,2] [,3] [,4] [,5]
## [1,] 1 2 3 4 5
## [2,] 2 4 6 8 10
## [3,] 3 6 9 12 15
## [4,] 4 8 12 16 20
## [5,] 5 10 15 20 25
outer(1:5,1:5,FUN = "+")
## [,1] [,2] [,3] [,4] [,5]
## [1,] 2 3 4 5 6
## [2,] 3 4 5 6 7
## [3,] 4 5 6 7 8
## [4,] 5 6 7 8 9
## [5,] 6 7 8 9 10
outer(1:5,1:5,FUN = "paste")
## [,1] [,2] [,3] [,4] [,5]
## [1,] "1 1" "1 2" "1 3" "1 4" "1 5"
## [2,] "2 1" "2 2" "2 3" "2 4" "2 5"
## [3,] "3 1" "3 2" "3 3" "3 4" "3 5"
## [4,] "4 1" "4 2" "4 3" "4 4" "4 5"
## [5,] "5 1" "5 2" "5 3" "5 4" "5 5"
# load library MASS
library(MASS)
# set seed and create data vectors
set.seed(98989)
sample_size <- 1000
sample_meanvector <- c(10, 5, 7, 9, 20)
sample_covariance_matrix <- matrix(c(5, 4, 3, 2, 1, 4, 5, 4, 3, 2,
3, 4, 5, 4, 3, 2, 3, 4, 5, 4, 1,
2, 3, 4, 5), ncol = 5)
# create multivariate normal distribution
sample_distribution <- mvrnorm(n = sample_size,
mu = sample_meanvector,
Sigma = sample_covariance_matrix)
# print top of distribution
head(sample_distribution)
## [,1] [,2] [,3] [,4] [,5]
## [1,] 9.059425 4.412918 6.918059 9.690694 19.78744
## [2,] 9.267290 5.569379 7.575157 8.664539 20.46564
## [3,] 9.986655 3.189981 5.898267 8.745072 20.25941
## [4,] 4.859023 3.346945 5.312794 6.979162 19.99343
## [5,] 10.964113 4.544509 5.589856 10.234550 21.97238
## [6,] 7.663888 2.803528 4.822008 7.335070 20.30340
Ho: Distribution of data is multivariate normal
H1: Distribution of data is not multivariate normal
https://www.statology.org/multivariate-normality-test-r/
#install.packages("QuantPsyc")
library("QuantPsyc")
## Warning: package 'QuantPsyc' was built under R version 4.3.2
## Loading required package: boot
##
## Attaching package: 'boot'
## The following object is masked from 'package:robustbase':
##
## salinity
##
## Attaching package: 'QuantPsyc'
## The following object is masked from 'package:base':
##
## norm
#create dataset
set.seed(0)
data <- data.frame(x1 = rnorm(50),
x2 = rnorm(50),
x3 = rnorm(50))
#perform Multivariate normality test
QuantPsyc::mult.norm(data)$mult.test
## Beta-hat kappa p-val
## Skewness 1.630474 13.5872843 0.1926626
## Kurtosis 13.895364 -0.7130395 0.4758213
#install.packages("energy")
library(energy)
## Warning: package 'energy' was built under R version 4.3.2
# create dataset
sample_size <- 1000
sample_meanvector <- c(10, 5, 7, 9, 20)
sample_covariance_matrix <- matrix(c(5, 4, 3, 2, 1, 4, 5, 4, 3, 2,
3, 4, 5, 4, 3, 2, 3, 4, 5, 4, 1,
2, 3, 4, 5), ncol = 5)
# create multivariate normal distribution
data <- rmvnorm(n = sample_size, mean =
sample_meanvector, sigma =
sample_covariance_matrix)
#perform Multivariate normality test
energy::mvnorm.etest(data, R=100)
##
## Energy test of multivariate normality: estimated parameters
##
## data: x, sample size 1000, dimension 5, replicates 100
## E-statistic = 1.1783, p-value = 0.26
data <- iris[1:50, 1:4]
mvnormtest::mshapiro.test(t(data))
##
## Shapiro-Wilk normality test
##
## data: Z
## W = 0.95878, p-value = 0.07906
1 Question Data collected on 22 U.S public utility companies for the year 1975 have the variables:
• X1: Fixed-charge coverage ratio (income/debt) • X2: Rate of return on capital • X3: Cost per KW capacity in place • X4: Annual load factor • X5: Peak KWh demand growth from 1974 to 1975 • X6: Sales (kWh use per year) • X7: Percent nuclear • X8: Total fuel costs (cents per kWh)
Using utility data, address the following questions
Compute the mean matrix, covariance matrix, and correlation matrix \((i.e., find \bar x, S_n and R)\).
my_data<-read.csv("D:/3rd Year/AST-332 batch-27/AST 332 Lec 13 utility.csv")
my_data
## X1 X2 X3 X4 X5 X6 X7 X8 X9
## 1 1.06 9.2 151 54.4 1.6 9077 0.0 0.628 Arizona
## 2 0.89 10.3 202 57.9 2.2 5088 25.3 1.555 Boston
## 3 1.43 15.4 113 53.0 3.4 9212 0.0 1.058 Central
## 4 1.02 11.2 168 56.0 0.3 6423 34.3 0.700 Common
## 5 1.49 8.8 192 51.2 1.0 3300 15.6 2.044 Consolid
## 6 1.32 13.5 111 60.0 -2.2 11127 22.5 1.241 Florida
## 7 1.22 12.2 175 67.6 2.2 7642 0.0 1.652 Hawaiian
## 8 1.10 9.2 245 57.0 3.3 13082 0.0 0.309 Idaho
## 9 1.34 13.0 168 60.4 7.2 8406 0.0 0.862 Kentucky
## 10 1.12 12.4 197 53.0 2.7 6455 39.2 0.623 Madison
## 11 0.75 7.5 173 51.5 6.5 17441 0.0 0.768 Nevada
## 12 1.13 10.9 178 62.0 3.7 6154 0.0 1.897 NewEngla
## 13 1.15 12.7 199 53.7 6.4 7179 50.2 0.527 Northern
## 14 1.09 12.0 96 49.8 1.4 9673 0.0 0.588 Oklahoma
## 15 0.96 7.6 164 62.2 -0.1 6468 0.9 1.400 Pacific
## 16 1.16 9.9 252 56.0 9.2 15991 0.0 0.620 Puget
## 17 0.76 6.4 136 61.9 9.0 5714 8.3 1.920 SanDiego
## 18 1.05 12.6 150 56.7 2.7 10140 0.0 1.108 Southern
## 19 1.16 11.7 104 54.0 -2.1 13507 0.0 0.636 Texas
## 20 1.20 11.8 148 59.9 3.5 7287 41.1 0.702 Wisconsi
## 21 1.04 8.6 204 61.0 3.5 6650 0.0 2.116 United
## 22 1.07 9.3 174 54.3 5.9 10093 26.6 1.306 Virginia
data<-my_data[-9]
data
## X1 X2 X3 X4 X5 X6 X7 X8
## 1 1.06 9.2 151 54.4 1.6 9077 0.0 0.628
## 2 0.89 10.3 202 57.9 2.2 5088 25.3 1.555
## 3 1.43 15.4 113 53.0 3.4 9212 0.0 1.058
## 4 1.02 11.2 168 56.0 0.3 6423 34.3 0.700
## 5 1.49 8.8 192 51.2 1.0 3300 15.6 2.044
## 6 1.32 13.5 111 60.0 -2.2 11127 22.5 1.241
## 7 1.22 12.2 175 67.6 2.2 7642 0.0 1.652
## 8 1.10 9.2 245 57.0 3.3 13082 0.0 0.309
## 9 1.34 13.0 168 60.4 7.2 8406 0.0 0.862
## 10 1.12 12.4 197 53.0 2.7 6455 39.2 0.623
## 11 0.75 7.5 173 51.5 6.5 17441 0.0 0.768
## 12 1.13 10.9 178 62.0 3.7 6154 0.0 1.897
## 13 1.15 12.7 199 53.7 6.4 7179 50.2 0.527
## 14 1.09 12.0 96 49.8 1.4 9673 0.0 0.588
## 15 0.96 7.6 164 62.2 -0.1 6468 0.9 1.400
## 16 1.16 9.9 252 56.0 9.2 15991 0.0 0.620
## 17 0.76 6.4 136 61.9 9.0 5714 8.3 1.920
## 18 1.05 12.6 150 56.7 2.7 10140 0.0 1.108
## 19 1.16 11.7 104 54.0 -2.1 13507 0.0 0.636
## 20 1.20 11.8 148 59.9 3.5 7287 41.1 0.702
## 21 1.04 8.6 204 61.0 3.5 6650 0.0 2.116
## 22 1.07 9.3 174 54.3 5.9 10093 26.6 1.306
#qu1
x_bar<-apply(data, MARGIN = 2,FUN = mean)
x_bar
## X1 X2 X3 X4 X5 X6
## 1.114091 10.736364 168.181818 56.977273 3.240909 8914.045455
## X7 X8
## 12.000000 1.102727
#alternative
colMeans(data)
## X1 X2 X3 X4 X5 X6
## 1.114091 10.736364 168.181818 56.977273 3.240909 8914.045455
## X7 X8
## 12.000000 1.102727
## covariance
covariance<-cov(data)
covariance
## X1 X2 X3 X4 X5
## X1 0.034044372 0.2661299 -0.7812554 -6.752165e-02 -0.14908009
## X2 0.266129870 5.0357576 -32.1259740 -8.643723e-01 -1.82012987
## X3 -0.781255411 -32.1259740 1696.7272727 1.843290e+01 55.92077922
## X4 -0.067521645 -0.8643723 18.4329004 1.990184e+01 0.46573593
## X5 -0.149080087 -1.8201299 55.9207792 4.657359e-01 9.72348485
## X6 -99.346385281 -76.6160173 4092.5151515 -4.560037e+03 1952.87424242
## X7 0.138809524 7.9676190 79.3095238 -1.229762e+01 -1.00142857
## X8 -0.001372165 -0.4088848 0.1195758 1.204446e+00 -0.01236926
## X6 X7 X8
## X1 -9.934639e+01 1.388095e-01 -1.372165e-03
## X2 -7.661602e+01 7.967619e+00 -4.088848e-01
## X3 4.092515e+03 7.930952e+01 1.195758e-01
## X4 -4.560037e+03 -1.229762e+01 1.204446e+00
## X5 1.952874e+03 -1.001429e+00 -1.236926e-02
## X6 1.260239e+07 -2.227602e+04 -1.106557e+03
## X7 -2.227602e+04 2.819686e+02 -1.728324e+00
## X8 -1.106557e+03 -1.728324e+00 3.092451e-01
cor(data)
## X1 X2 X3 X4 X5 X6
## X1 1.00000000 0.642744766 -0.102793192 -0.08203019 -0.259111089 -0.151671159
## X2 0.64274477 1.000000000 -0.347550467 -0.08634194 -0.260111168 -0.009617468
## X3 -0.10279319 -0.347550467 1.000000000 0.10030926 0.435367718 0.027987098
## X4 -0.08203019 -0.086341943 0.100309264 1.00000000 0.033479746 -0.287935594
## X5 -0.25911109 -0.260111168 0.435367718 0.03347975 1.000000000 0.176415568
## X6 -0.15167116 -0.009617468 0.027987098 -0.28793559 0.176415568 1.000000000
## X7 0.04480188 0.211444212 0.114661857 -0.16416254 -0.019125318 -0.373689523
## X8 -0.01337310 -0.327655318 0.005220183 0.48550006 -0.007133152 -0.560526327
## X7 X8
## X1 0.04480188 -0.013373101
## X2 0.21144421 -0.327655318
## X3 0.11466186 0.005220183
## X4 -0.16416254 0.485500063
## X5 -0.01912532 -0.007133152
## X6 -0.37368952 -0.560526327
## X7 1.00000000 -0.185085916
## X8 -0.18508592 1.000000000
cov2cor(covariance)
## X1 X2 X3 X4 X5 X6
## X1 1.00000000 0.642744766 -0.102793192 -0.08203019 -0.259111089 -0.151671159
## X2 0.64274477 1.000000000 -0.347550467 -0.08634194 -0.260111168 -0.009617468
## X3 -0.10279319 -0.347550467 1.000000000 0.10030926 0.435367718 0.027987098
## X4 -0.08203019 -0.086341943 0.100309264 1.00000000 0.033479746 -0.287935594
## X5 -0.25911109 -0.260111168 0.435367718 0.03347975 1.000000000 0.176415568
## X6 -0.15167116 -0.009617468 0.027987098 -0.28793559 0.176415568 1.000000000
## X7 0.04480188 0.211444212 0.114661857 -0.16416254 -0.019125318 -0.373689523
## X8 -0.01337310 -0.327655318 0.005220183 0.48550006 -0.007133152 -0.560526327
## X7 X8
## X1 0.04480188 -0.013373101
## X2 0.21144421 -0.327655318
## X3 0.11466186 0.005220183
## X4 -0.16416254 0.485500063
## X5 -0.01912532 -0.007133152
## X6 -0.37368952 -0.560526327
## X7 1.00000000 -0.185085916
## X8 -0.18508592 1.000000000
1 %in% c(1:10)
## [1] TRUE
1 %in% c(5:10)
## [1] FALSE
Considering the variables X1 and X2, compute the statistical distance between Boston Edison Co. and Texas Utilities Co. (Assume the coordinates X1 and X2 are independent)
##b
#statistical distance
my_data[my_data$X9 %in% c("Boston","Texas"),c("X1","X2")]
## X1 X2
## 2 0.89 10.3
## 19 1.16 11.7
var(my_data$X1)
## [1] 0.03404437
var(my_data$X2)
## [1] 5.035758
Plot the scatter diagram and marginal dot diagram for each variables. Comment on the appearance of the diagram.
plot(my_data$X1, my_data$X2)
#install.packages("unrepx")
unrepx::dot.plot(data$X1,
spacing = 1,
pch = 19,
col = 2)
Plot the 3D scatter diagram for first three variables.
scatterplot3d::scatterplot3d(data$X1,data$X2,data$X3)
scatterplot3d::scatterplot3d(data[c("X1","X2","X3")],type= "h",angle = 60,main = "3D plot",xlab = "value of x1",ylab = "value of x2",zlab = "value of x3")
Represent the public utility companies as stars. Visually group the companies into four or five clusters.
Represent the public utility companies as Chernoff faces with assignments of variablesto facial characteristics
## (e) and (f)
label <- my_data$X9
stars(my_data[,-9], labels = label, col.stars = 1:22)
#install.packages("aplpack")
library(aplpack)
aplpack::faces(my_data[1:22,-9], labels = label)
## effect of variables:
## modified item Var
## "height of face " "X1"
## "width of face " "X2"
## "structure of face" "X3"
## "height of mouth " "X4"
## "width of mouth " "X5"
## "smiling " "X6"
## "height of eyes " "X7"
## "width of eyes " "X8"
## "height of hair " "X1"
## "width of hair " "X2"
## "style of hair " "X3"
## "height of nose " "X4"
## "width of nose " "X5"
## "width of ear " "X6"
## "height of ear " "X7"
label <- as.character(my_data[,9])
stars(my_data[,-9], labels = label, col.stars = 1:22)
aplpack::faces(my_data[1:22,-9], labels = label)
## effect of variables:
## modified item Var
## "height of face " "X1"
## "width of face " "X2"
## "structure of face" "X3"
## "height of mouth " "X4"
## "width of mouth " "X5"
## "smiling " "X6"
## "height of eyes " "X7"
## "width of eyes " "X8"
## "height of hair " "X1"
## "width of hair " "X2"
## "style of hair " "X3"
## "height of nose " "X4"
## "width of nose " "X5"
## "width of ear " "X6"
## "height of ear " "X7"
let,Vitamin-A= x vitamin-c= y objective,z=8x+10y
subject to, 2x+y >= 50 x+2y >= 70 ### 1.b
#install.packages("lpSolve")
library(lpSolve)
z<-c(8,10)
sub<-matrix(c(2,1,1,2),nrow = 2,byrow = T)
sign<-c(">=",">=")
cost<-c(50,70)
lp<-lp("min",z,sub,sign,cost)
lp$objval
## [1] 380
lp$solution
## [1] 10 30
a<-matrix(c(1,5,-2,5,4,5,-2,5,1),nrow = 3,byrow = T)
#a
eigen_value<-eigen(a)$value
eigen_value
## [1] 9 3 -6
eigen_vector<-eigen(a)$vector
eigen_vector
## [,1] [,2] [,3]
## [1,] 0.4082483 7.071068e-01 0.5773503
## [2,] 0.8164966 1.110223e-16 -0.5773503
## [3,] 0.4082483 -7.071068e-01 0.5773503
### a interse
v<-diag(eigen_value^(-1))
a_inverse<-eigen_vector %*% v %*% t(eigen_vector)
a_inverse
## [,1] [,2] [,3]
## [1,] 0.12962963 0.09259259 -0.20370370
## [2,] 0.09259259 0.01851852 0.09259259
## [3,] -0.20370370 0.09259259 0.12962963
## a^10
v10<-diag(eigen_value^10)
a_10<-eigen_vector %*% v10 %*% t(eigen_vector)
a_10
## [,1] [,2] [,3]
## [1,] 601315650 1142106075 601256601
## [2,] 1142106075 2344678326 1142106075
## [3,] 601256601 1142106075 601315650
x_bar<-matrix(c(6.84,5.06,4.18,3.41),ncol=1)
sigma<-matrix(c(10.1,2.0,1.1,0.2,
2.0,6.5,2.1,0.5,
1.1,2.1,4.9,1.1,
0.2,0.5,1.1,1.4),nrow = 4,byrow = T)
n<-7
p<-4
null<-matrix(c(0,0,0,0))
t2=n*t(x_bar-null) %*% solve(sigma) %*% (x_bar-null)
t2
## [,1]
## [1,] 94.40542
f<-(n-p)/(p*(n-1))*t2
p_val<-pf(f,p,n-p,lower.tail = F)
f
## [,1]
## [1,] 11.80068
p_val
## [,1]
## [1,] 0.03521062
x<-rmvnorm(n = n,mean = x_bar,sigma = sigma)
rrcov::T2.test(x)
##
## One-sample Hotelling test
##
## data: x
## T2 = 514.707, F = 64.338, df1 = 4, df2 = 3, p-value = 0.003071
## alternative hypothesis: true mean vector is not equal to (0, 0, 0, 0)'
##
## sample estimates:
## [,1] [,2] [,3] [,4]
## mean x-vector 8.114213 3.735761 4.001301 3.22251
pow<-function(alpha,sigma,n,null,p_mean){
a=qnorm(alpha/2)*sigma/sqrt(n)+null
b=-qnorm(alpha/2)*sigma/sqrt(n)+null
l=(a-p_mean)/(sigma/sqrt(n))
u=(b-p_mean)/(sigma/sqrt(n))
beta<-pnorm(u)-pnorm(l)
p<-1-beta
return(power =p)
}
pow(alpha = .05,sigma = 20,n=50,null = 300,p_mean = 290)
## [1] 0.9424375