Create an R Function to Compute Weighted Multivariate Descriptive
Statistics
weighted_multi_desc <- function(X, w) {
if(nrow(X) != length(w))
stop("Number of rows in X must equal length of w")
n <- nrow(X)
p <- ncol(X)
ones <- matrix(1, nrow = n, ncol = 1)
W <- diag(w)
n_w <- as.numeric(t(w) %*% ones)
w_mean <- (1/n_w) * t(X) %*% W %*% ones
D <- X - ones %*% t(w_mean)
w_cov <- (1/(n_w-1)) * t(D) %*% W %*% D
w_sd <- sqrt(diag(w_cov))
return(list( W = W,
w_mean = as.numeric(w_mean),
w_cov = w_cov,
w_std = w_sd))
}
data <- read.csv("C:/Users/cacaf/OneDrive/Desktop/caca/Kuliah ITS/komstat/quiz/data_quiz.csv")
X <- as.matrix(data[, c("Income", "Expenditure", "HH_Size")])
w <- data$Weight
print(data)
Kecamatan Income Expenditure HH_Size Weight
1 Asemrowo 3.1 2.5 4.2 5
2 Benowo 3.8 3.0 4.0 7
3 Bubutan 4.2 3.4 3.5 10
4 Bulak 3.5 2.8 4.1 5
5 Dukuh Pakis 6.5 5.2 3.0 6
6 Gayungan 5.8 4.6 3.1 4
7 Genteng 5.5 4.4 2.8 6
8 Gubeng 5.9 4.7 2.9 13
9 Gunung Anyar 4.8 3.8 3.6 6
10 Jambangan 4.3 3.4 3.8 5
11 Karangpilang 4.5 3.6 3.7 7
12 Kenjeran 3.0 2.4 4.5 18
13 Krembangan 3.4 2.7 4.3 11
14 Lakarsantri 4.0 3.2 3.9 6
15 Mulyorejo 5.2 4.1 3.2 9
16 Pabean Cantikan 3.3 2.6 4.4 7
17 Pakal 3.6 2.9 4.1 6
18 Rungkut 5.0 4.0 3.3 12
19 Sambikerep 4.1 3.3 3.8 7
20 Sawahan 3.7 3.0 4.2 20
21 Semampir 2.8 2.3 4.7 18
22 Simokerto 3.2 2.6 4.3 9
23 Sukolilo 5.6 4.5 3.0 11
24 Sukomanunggal 4.6 3.7 3.5 10
25 Tambaksari 3.5 2.8 4.4 23
26 Tandes 4.0 3.2 3.9 9
27 Tegalsari 4.1 3.3 3.7 10
28 Tenggilis Mejoyo 5.3 4.2 3.1 6
29 Wiyung 4.7 3.7 3.6 7
30 Wonocolo 4.9 3.9 3.4 8
31 Wonokromo 4.4 3.5 3.8 16
result <- weighted_multi_desc(X, w)
print(result$w_mean)
[1] 4.186869 3.350505 3.832997
print(result$w_cov)
Income Expenditure HH_Size
Income 0.8703337 0.6840776 -0.4882477
Expenditure 0.6840776 0.5384541 -0.3843073
HH_Size -0.4882477 -0.3843073 0.2926238
print(result$w_sd)
NULL
A function is created to compute weighted multivariate statistics,
including the weighted mean, covariance matrix, and standard deviation
using matrix operations. The results show the average values considering
weights, the relationships between variables, and the spread of the
data, helping to better understand the pattern between income,
expenditure, and household size.
plot(data$Income, data$Expenditure,
main = "Income vs Expenditure",
xlab = "Income",
ylab = "Expenditure",
pch = 19,
col = "steelblue",
cex = 1.5)
abline(lm(Expenditure ~ Income, data = data),
col = "red", lwd = 2)
