Name: Violetta Ammara Irianti Agung | Student Number: 5003251126 | Class: Statistics K
Trimmed Mean Function:
trimmed_mean <- function(x, alpha) {
n <- length(x)
k <- floor(n*alpha)
x_sorted <- sort(x)
x_trimmed <- x_sorted[(k+1):(n-k)]
total <- 0
for(i in 1:length(x_trimmed)){
total <- total+x_trimmed[i]
}
result <- total/(n-2*k)
return(result)
}
Set Up the Data:
x <- c(850, 920, 1100, 1250, 1300, 1400, 1500, 1550, 1600, 1750, 2100, 8500)
Ordinary Mean (\(\alpha = 0\)):
ordinary_mean <- trimmed_mean(x, alpha = 0)
cat("Ordinary Mean:", ordinary_mean, "\n")
## Ordinary Mean: 1985
10% Trimmed Mean (\(\alpha =0.10\)):
mean_trimmed <- trimmed_mean(x, alpha = 0.10)
cat("10% Trimmed Mean:", mean_trimmed, "\n")
## 10% Trimmed Mean: 1447
Boxplot Visualization to Detect Outlier:
boxplot(x,
main = "Boxplot of Monthly Expenditure",
ylab = "Expenditure (thousand IDR)",
col = "lightblue")
Barplot Visualization to Compare Means:
means <- c(ordinary_mean, mean_trimmed)
names(means) <- c("Ordinary Mean", "10% Trimmed Mean")
barplot(means,
main = "Comparison of Means",
ylab = "Value",
col = "lightgreen")
Interpretation of Question 1:
The ordinary mean is significantly affected by the extreme value 8500, which acts as an outlier. The 10% trimmed mean removes extreme values from both ends, making it more robust. The boxplot clearly shows 8500 as an outlier, far from the rest of the data. Therefore, the trimmed mean provides a more reliable measure of central tendency for this dataset.
Conclusion of Question 1:
The trimmed mean is more robust than the ordinary mean when outliers are present. The dataset clearly contains an outlier (8500), making trimmed mean more appropriate.
Weighted Multivariate Descriptive Function:
weighted_multi_desc <- function(X, w) {
n <- nrow(X)
p <- ncol(X)
one <- matrix(1, nrow = n, ncol = 1)
W <- diag(w)
n_w <- as.numeric(t(w) %*% one)
xbar_w <- (1/n_w)*t(X) %*% W %*% one
D <- X - one %*% t(xbar_w)
S_w <- (1/(n_w - 1))* t(D) %*% W %*% D
s_w <- sqrt(diag(S_w))
return(list(W = W,
xbar_w = as.numeric(xbar_w),
S_w = S_w,
s_w = s_w))
}
Set Up the Data:
data <- read.csv("data_quiz.csv")
X <- as.matrix(data[, c("Income", "Expenditure", "HH_Size")])
w <- data$Weight
result <- weighted_multi_desc(X, w)
Weighted Mean Vector:
names(result$xbar_w) <- colnames(X)
print(result$xbar_w)
## Income Expenditure HH_Size
## 4.186869 3.350505 3.832997
Weighted Covariance Matrix:
print(result$S_w)
## Income Expenditure HH_Size
## Income 0.8703337 0.6840776 -0.4882477
## Expenditure 0.6840776 0.5384541 -0.3843073
## HH_Size -0.4882477 -0.3843073 0.2926238
Weighted Standard Deviation Vector:
names(result$s_w) <- colnames(X)
print(result$s_w)
## Income Expenditure HH_Size
## 0.9329168 0.7337943 0.5409471
Pairwise Scatter Plot Matrix Visualization:
pairs(X,
main = "Scatter Plot Matrix",
pch = 19,
col = "darkblue")
Correlation Matrix:
cor_matrix <- cor(X)
print(cor_matrix)
## Income Expenditure HH_Size
## Income 1.0000000 0.9992739 -0.9594547
## Expenditure 0.9992739 1.0000000 -0.9606524
## HH_Size -0.9594547 -0.9606524 1.0000000
Heatmap of Correlation Matrix Visualization:
heatmap(cor_matrix,
main = "Correlation Heatmap")
Interpretation of Question 2:
The weighted mean vector reflects the average values of income, expenditure, and household size, adjusted by population weights, making it more representative. The weighted covariance matrix shows relationships between variables where positive covariance indicates variables increase together. Based on the scatter plot, income and expenditure show a positive relationship (higher income → higher spending) and household size may show weaker relationships with the other variables. The correlation matrix and heatmap confirms that there is strong positive correlation between income and expenditure. While the weighted standard deviation indicates variability such as higher values mean more variation across districts.
Conclusion of Question 2:
Weighted multivariate statistics are essential when observations have different importance. Visualizations helps to understand relationships between variables.