Consider the seven pairs of measurements (\(x_{1}\),\(x_{2}\)) plotted in Figure 1.1 :
## [,1] [,2] [,3] [,4] [,5] [,6] [,7]
## x1 3 4.0 2 6 8 2 5.0
## x2 5 5.5 4 7 10 5 7.5
Calculate the sample means \(\bar{x}_{1}\) and \(\bar{x}_{2}\) , the sample variances \(s_{11}\) and \(s_{22}\) and the sampel covariance \(s_{12}\) !
The 10 largest U.S. industrial corporations yield the following data :
Sumber Soal : Johnson, R. A. dan Winchern, D. W. 2002. Applied Multivariate Statistical Analysis 5th edition. New Jersey: Prentice Hall, Pages 38-39.
Calculate the sample means \(\bar{x}_{1}\) and \(\bar{x}_{2}\) , the sample variances \(s_{11}\) and \(s_{22}\) and the sampel covariance \(s_{12}\) !
x1 <- c(3,4,2,6,8,2,5) # Buat vektor x1
x2 <- c(5,5.5,4,7,10,5,7.5) # Buat vektor x2
data1.1 <- data.frame(x1,x2)
xbar1.1 <- colMeans(data1.1) # Melakukan penghitungan rata-rata pada tiap kolom di data1.1
print("Vector Means : ")
print(xbar1.1)
varian.kovarian <- function(x){ # Membuat fungsi untuk matriks varian kovarian (biased)
A = as.matrix(x)
n = dim(x)[1]
satu = rep(1,n) # Vektor dengan elemen 1
matriks1 = satu %*% t(satu) # Membuat matriks persegi dengan semua elemen bernilai 1
a = A - matriks1 %*% A / n
ata = t(a) %*% a
Sn = ata / n # Matriks varian kovarian (Jika ingin unbiased ganti n dengan n-1)
# maka akan setara dengan fungsi cov(...) yang unbiased.
return(Sn)
}
matriks1.1 <- varian.kovarian(data1.1) # Melakukan penghitungan matriks varian-kovarian
print("Matriks varians-kovarians : ")
print(matriks1.1) # Matriks varians-kovarians
## [1] "Vector Means : "
## x1 x2
## 4.285714 6.285714
## [1] "Matriks varians-kovarians : "
## x1 x2
## x1 4.204082 3.704082
## x2 3.704082 3.561224
Hasil Penghitungan :
(a.) Plot the scatter diagram and marginal dot diagrams for variables \(x_{1}\) and \(x_{2}\) . Comment on the appearance of the diagrams.
sales <- c(126974,96933,86656,63438,55264,50976,39069,36156,35209,32416) #X1
profits <- c(4224,3835,3510,3758,3939,1809,2946,359,2480,2413) #X2
data1.4 <- data.frame(sales,profits)
#Script plot didapatkan dari situs :
## https://www.stat.ncsu.edu/people/bloomfield/courses/st731/dotplots
## Define a layout. Here we devide the plotting area to four subareas.
## For details, read the help
## file for function "layout".
nf <- layout(matrix(c(3,1,0,2),2,2,byrow=TRUE), c(1,5), c(5,1), TRUE)
x1 <- data1.4$sales
x2 <- data1.4$profits
## Plot scatter plot first.
## par(mar = c()) is used to leave proper margins around plots.
par(mar=c(5,4,2,2))
plot(x1, x2, xlim=c(33000,127000), ylim=c(350,4300),
xlab="X1, Sales,millions of dollars",
ylab="X2, Profits, millions of dollars",
main = "Scatter Diagram dan Marginal Dot Diagram")
## Then plot dot diagram under the scatter plot, and dot diagram to the left of it.
## Some setting of axis is just to make the plots look better.
par(mar=c(3,4,2,2))
plot(x1, rep(1,10), xlim=c(33000,127000), ylim=c(1,1), xlab="", ylab="",axes = F)
axis(side = 3,at = seq(from = 33000, to = 127000, by = 3000))
par(mar=c(5,3,2,2))
plot(rep(1,10), x2, xlim=c(1,1), ylim=c(350,4300), xlab="", ylab="",axes = F)
axis(side = 4,at = seq(from = 350, to = 4300, by = 300))
Data \(x_{1}\) dan \(x_{2}\) :
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## x1 126974 96933 86656 63438 55264 50976 39069 36156 35209 32416
## x2 4224 3835 3510 3758 3939 1809 2946 359 2480 2413
padat <- data.frame(x1,x2)
p <- ggplot(padat, aes(x=x1, y=x2, color=1, size=2)) +
geom_point() +
theme(legend.position="none") +
ggtitle("Scatter diagram dan Marginal density diagram")
# with marginal density diagram
p1 <- ggMarginal(p, type="density",fill = "LightSalmon")
p1
Interpretasi Scatter dan Marginal dot diagram:
Scatter diagram memperlihatkan bahwa variabel sales ( \(x_{1}\) ) dan profit ( \(x_{2}\) ) memiliki hubungan yang positif. Marginal dot diagram menunjukkan bahwa variabel \(x_{1}\) memiliki distribusi menceng kanan (positive skewness). Sementara itu, untuk variabel \(x_{2}\) memiliki distribusi menceng kiri (negative skewness). Untuk memperjelas bentuk distribusi dari \(x_{1}\) dan \(x_{2}\) juga dibentuk marginal density diagram.
(b.) Compute \(\bar{x}_{1}\) , \(\bar{x}_{2}\) , \(s_{11}\), \(s_{22}\), \(s_{12}\) and \(r_{12}\) . Interpret \(r_{12}\) .
sales <- c(126974,96933,86656,63438,55264,50976,39069,36156,35209,32416)
profits <- c(4224,3835,3510,3758,3939,1809,2946,359,2480,2413)
data1.4 <- data.frame(sales,profits)
xbar1.4 <- colMeans(data1.4)
xbar1.4 # Vector means
## sales profits
## 62309.1 2927.3
## sales profits
## sales 900458202 23018040
## profits 23018040 1287018
## sales profits
## sales 1.0000000 0.6761519
## profits 0.6761519 1.0000000
Hasil Penghitungan :
Interpretasi \(r_{12}\) :
Nilai korelasi antara variabel sales dan profits sebesar 0,6761 , mengindikasikan bahwa hubungan antara kedua variabel tersebut cukup kuat dan positif.