Statistik Deskriptif dan Distribusi
.
Asprak: Fida Fariha A
Statistika Deskriptif
Untuk melakukan proses penyajian dan peringkasan data statistik deskriptif, digunakan data Marriage dari package ‘mosaicData’ dan package ‘ggplot2’ untuk proses pembuatan grafik.
## Warning: package 'mosaicData' was built under R version 4.3.3
Penyajian dan Peringkasan Data Kategorik
Untuk data kategorik digunakan variabel ‘race’ pada data marriage sebagai contoh dalam pembuatan grafik.
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
Tabel Frekuensi
## race n
## 1 American Indian 1
## 2 Black 22
## 3 Hispanic 1
## 4 White 74
Bar Chart/Grafik Batang
ggplot(kategorik,
aes(x = race,
y = n)) +
geom_bar(stat = "identity", fill = "cornflowerblue",
color="black") +
geom_text(aes(label = n),
vjust=-0.5) +
labs(x = "Race",
y = "Frequency",
title = "Participants by race")
### Pie Chart
kategorik <- Marriage %>%
count(race) %>%
arrange(desc(race)) %>%
mutate(prop = round(n*100/sum(n), 1),
lab.ypos = cumsum(prop) - 0.5*prop)
kategorik$label <- paste0(kategorik$race, "\n",
round(kategorik$prop), "%")
ggplot(kategorik,
aes(x = "",
y = prop,
fill = race)) +
geom_bar(width = 1,
stat = "identity",
color = "black") +
geom_text(aes(y = lab.ypos, label = label),
color = "black") +
coord_polar("y",
start = 0,
direction = -1) +
theme_void() +
theme(legend.position = "FALSE") +
labs(title = "Participants by race")Penyajian dan Peringkasan Data Numerik
Untuk data numerik digunakan variabel ‘age’ pada data marriage sebagai contoh dalam pembuatan grafik.
Histogram
ggplot(Marriage, aes(x = age)) +
geom_histogram(fill = "cornflowerblue",
color = "white") +
labs(title="Participants by age",
x = "Age")## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Box Plot
boxplot(Marriage$age,
main = "Age of Marriage Boxplot",
ylab = "values",
col = "blue",
border = "black")
### Steam and Leaf Chart/Diagram Batang dan Daun
##
## The decimal point is 1 digit(s) to the right of the |
##
## 1 | 67888889999999
## 2 | 00000111111333344555556677778999
## 3 | 0122344555788889999
## 4 | 001122233334445569
## 5 | 002234567
## 6 | 88
## 7 | 1134
Peringkasan Data
rataan <- mean(Marriage$age)
st.dev <- sd(Marriage$age)
ragam <- var(Marriage$age)
q1 <- quantile(Marriage$age, 0.25)
q2 <- quantile(Marriage$age, 0.5)
q3 <- quantile(Marriage$age, 0.75)
rataan## [1] 34.51197
## [1] 14.40441
## [1] 207.4871
## 25%
## 21.66096
## 50%
## 31.90274
## 75%
## 42.82192
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 16.27 21.66 31.90 34.51 42.82 74.25
Statistika Inferensia
Dalam simulasinya, akan dibangkitkan data dengan sebaran normal, seragam, dan chi-square.
Sebaran Normal
hist(sebaran_normal, main="Histogram of Normal Distribution",
xlab="Values", ylab="Frequency", col="blue", border="black", freq=FALSE)
lines(density(sebaran_normal), col="red", lwd=2)## [1] 0.03440355
## [1] 0.8572352
Sebaran Seragam
hist(sebaran_seragam, main="Histogram of Uniform Distribution",
xlab="Values", ylab="Frequency", col="green", border="black", freq=FALSE)
lines(density(sebaran_seragam), col="red", lwd=2)## [1] 0.520091
## [1] 0.08660557