Normalitas memiliki peran penting karena sebagian besar metode statistik didasarkan pada distribusi normal. Berikut adalah panduan ringkas yang mencakup cara memeriksa normalitas.
Dalam panduan ini, kami akan mengerjakan tiga cara menilai normalitas di R. Pertama, kami akan mengerjakan metode visual untuk menilai normalitas. Kemudian, kami akan menyajikan uji normalitas paling terkenal yang tersedia di R. Terakhir, kami akan membahas kasus dengan ukuran sampel yang besar.
Pada bagian ini, kita akan menggunakan kumpulan data iris yang tersedia di R. Pertama, mari kita buat kumpulan data yang akan kita kerjakan. Kami mengerjakan panjang sepal tipe setosa (salah satu tipe iris).
data <- iris$Sepal.Length[1:50]
Pada bagian ini, kami mengerjakan metode visual untuk menilai normalitas di R. Ada dua grafik utama untuk menilai distribusi; yaitu, plot kepadatan ( density plot ) dan Q-Q plot.
Plot kepadatan memberikan penilaian visual tentang apakah distribusi
berbentuk lonceng atau tidak. Kami menggunakan fungsi
ggdensity()
yang tersedia dalam paket
ggpubr (Kassambara, 2020).
ggpubr::ggdensity(data, fill = "lightgray", add = "mean", xlab = "Sepal Length of Setosa Type")
## Warning: `geom_vline()`: Ignoring `mapping` because `xintercept` was provided.
## Warning: `geom_vline()`: Ignoring `data` because `xintercept` was provided.
Q-Q plot (Quantile-Quantile plot) ditarik antara sampel yang
diberikan dan distribusi normal. Garis referensi 45 derajat juga diplot
untuk menilai seberapa dekat nilai sampel dengan distribusi normal.
Untuk menggambar plot Q-Q, kami menggunakan fungsi
ggqqplot()
yang tersedia dalam paket
ggpubr (Kassambara, 2020).
ggpubr::ggqqplot(data)
## Warning: The following aesthetics were dropped during statistical transformation: sample
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
## the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
## variable into a factor?
## The following aesthetics were dropped during statistical transformation: sample
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
## the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
## variable into a factor?
In this part, we go through seven well-known normality tests in R. The following is the list of these tests.
Kami menggunakan fungsi shapiro.test()
untuk menguji
normalitas data dengan Uji Shapiro-Wilk.
shapiro.test(data)
##
## Shapiro-Wilk normality test
##
## data: data
## W = 0.9777, p-value = 0.4595
Menurut hasil uji Shapiro-Wilk, tidak ada cukup bukti untuk menolak hipotesis nol (Ho: Data terdistribusi normal) karena p-value (0,4595) lebih besar dari alpha (0,05). Maka dapat disimpulkan bahwa data berdistribusi normal.
We use jarque.bera.test() function to check the normality of data with Jarque-Bera test available in tseries package (Trapletti and Hornik, 2019).
tseries::jarque.bera.test(data)
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
##
## Jarque Bera Test
##
## data: data
## X-squared = 0.36208, df = 2, p-value = 0.8344
Jarque-Bera test suggest that there is no enough evidence to reject null hypothesis (Ho: Data are normally distributed) since p-value (0.8344) is larger than alpha (0.05). Therefore, we can conclude that the data are normally distributed.
To use Anderson-Darling test for assessing normality in R, we apply ad.test() function available in nortest package (Gross and Ligges, 2015).
nortest::ad.test(data)
##
## Anderson-Darling normality test
##
## data: data
## A = 0.40799, p-value = 0.3352
According to Anderson-Darling test, there is no enough evidence to reject null hypothesis (Ho: Data are normally distributed) since p-value (0.3352) is larger than alpha (0.05). Therefore, we can conclude that the data are normally distributed.
Researchers can use Cramer-von Mises test to assess normality of data with cvm.test() function available in nortest package (Gross and Ligges, 2015).
nortest::cvm.test(data)
##
## Cramer-von Mises normality test
##
## data: data
## W = 0.071753, p-value = 0.2597
According to Cramer-von Mises test, there is no enough evidence to reject null hypothesis (Ho: Data are normally distributed) since p-value (0.2597) is larger than alpha (0.05). Therefore, we can conclude that the data are normally distributed.
Also Check: How to Recode Character Variables in R
Seseorang dapat menilai normalitas melalui Uji Lilliefors dengan
fungsi lillie.test()
yang tersedia dalam paket
nortest (Gross and Ligges, 2015).
nortest::lillie.test(data)
##
## Lilliefors (Kolmogorov-Smirnov) normality test
##
## data: data
## D = 0.11486, p-value = 0.09693
Uji Lilliefors menyatakan bahwa tidak ada cukup bukti untuk menolak hipotesis nol (H0: Data terdistribusi normal) karena p-value (0,09693) lebih besar dari alpha (0,05). Maka dapat disimpulkan bahwa data berdistribusi normal.
Researchers can use Pearson chi-square test for assessing normality with pearson.test() function available in nortest package (Gross and Ligges, 2015).
nortest::pearson.test(data)
##
## Pearson chi-square normality test
##
## data: data
## P = 9.2, p-value = 0.2386
We can conclude that the data are normally distributed since p-value (0.2386) is larger than alpha (0.05) according to Pearson chi-square test.
One can assess the normality through Shapiro-Francia test with sf.test() function available in nortest package (Gross and Ligges, 2015).
nortest::sf.test(data)
##
## Shapiro-Francia normality test
##
## data: data
## W = 0.9817, p-value = 0.5357
According to the results of Shapiro-Francia test, there is no enough evidence to reject null hypothesis (Ho: Data are normally distributed) since p-value (0.5357) is larger than alpha (0.05). Therefore, we can conclude that the data are normally distributed.
According to Central Limit Theorem, no matter what distribution is, the sampling distribution of mean tends to be normal if the sample is large enough (n ≥ 30). It is important to note that sample size in each group must be large enough. In such a case, normality is not needed to be met since the sampling distribution is normal.
Gross, J., Ligges, U. (2015). nortest: Tests for Normality. R package version 1.0-4.
Kassambara, A. (2020). ggpubr: ‘ggplot2’ Based Publication Ready Plots. R package version 0.4.0.
Trapletti, A., Hornik, K. (2019). tseries: Time Series Analysis and Computational Finance. R package version 0.10-47.
Sumber: https://universeofdatascience.com/how-to-assess-normality-in-r/