Normalitas Data

Normalitas memiliki peran penting karena sebagian besar metode statistik didasarkan pada distribusi normal. Berikut adalah panduan ringkas yang mencakup cara memeriksa normalitas.

Dalam panduan ini, kami akan mengerjakan tiga cara menilai normalitas di R. Pertama, kami akan mengerjakan metode visual untuk menilai normalitas. Kemudian, kami akan menyajikan uji normalitas paling terkenal yang tersedia di R. Terakhir, kami akan membahas kasus dengan ukuran sampel yang besar.

Pada bagian ini, kita akan menggunakan kumpulan data iris yang tersedia di R. Pertama, mari kita buat kumpulan data yang akan kita kerjakan. Kami mengerjakan panjang sepal tipe setosa (salah satu tipe iris).

data <- iris$Sepal.Length[1:50]

Metode Visual untuk Memeriksa Normalitas

Pada bagian ini, kami mengerjakan metode visual untuk menilai normalitas di R. Ada dua grafik utama untuk menilai distribusi; yaitu, plot kepadatan ( density plot ) dan Q-Q plot.

1. Density Plot

Plot kepadatan memberikan penilaian visual tentang apakah distribusi berbentuk lonceng atau tidak. Kami menggunakan fungsi ggdensity() yang tersedia dalam paket ggpubr (Kassambara, 2020).

ggpubr::ggdensity(data,  fill = "lightgray", add = "mean",  xlab = "Sepal Length of Setosa Type")
## Warning: `geom_vline()`: Ignoring `mapping` because `xintercept` was provided.
## Warning: `geom_vline()`: Ignoring `data` because `xintercept` was provided.

2. QQ-Plot

Q-Q plot (Quantile-Quantile plot) ditarik antara sampel yang diberikan dan distribusi normal. Garis referensi 45 derajat juga diplot untuk menilai seberapa dekat nilai sampel dengan distribusi normal. Untuk menggambar plot Q-Q, kami menggunakan fungsi ggqqplot() yang tersedia dalam paket ggpubr (Kassambara, 2020).

ggpubr::ggqqplot(data)
## Warning: The following aesthetics were dropped during statistical transformation: sample
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
##   the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
##   variable into a factor?
## The following aesthetics were dropped during statistical transformation: sample
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
##   the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
##   variable into a factor?

Uji Normalitas Data

In this part, we go through seven well-known normality tests in R. The following is the list of these tests.

1. Uji Shapiro-Wilk

Kami menggunakan fungsi shapiro.test() untuk menguji normalitas data dengan Uji Shapiro-Wilk.

shapiro.test(data)
## 
##  Shapiro-Wilk normality test
## 
## data:  data
## W = 0.9777, p-value = 0.4595

Menurut hasil uji Shapiro-Wilk, tidak ada cukup bukti untuk menolak hipotesis nol (Ho: Data terdistribusi normal) karena p-value (0,4595) lebih besar dari alpha (0,05). Maka dapat disimpulkan bahwa data berdistribusi normal.

2. Uji Jarque-Bera

We use jarque.bera.test() function to check the normality of data with Jarque-Bera test available in tseries package (Trapletti and Hornik, 2019).

tseries::jarque.bera.test(data)
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo
## 
##  Jarque Bera Test
## 
## data:  data
## X-squared = 0.36208, df = 2, p-value = 0.8344

Jarque-Bera test suggest that there is no enough evidence to reject null hypothesis (Ho: Data are normally distributed) since p-value (0.8344) is larger than alpha (0.05). Therefore, we can conclude that the data are normally distributed.

3. Uji Anderson-Darling

To use Anderson-Darling test for assessing normality in R, we apply ad.test() function available in nortest package (Gross and Ligges, 2015).

nortest::ad.test(data)
## 
##  Anderson-Darling normality test
## 
## data:  data
## A = 0.40799, p-value = 0.3352

According to Anderson-Darling test, there is no enough evidence to reject null hypothesis (Ho: Data are normally distributed) since p-value (0.3352) is larger than alpha (0.05). Therefore, we can conclude that the data are normally distributed.

4. Uji Cramer-von Mises

Researchers can use Cramer-von Mises test to assess normality of data with cvm.test() function available in nortest package (Gross and Ligges, 2015).

nortest::cvm.test(data)
## 
##  Cramer-von Mises normality test
## 
## data:  data
## W = 0.071753, p-value = 0.2597

According to Cramer-von Mises test, there is no enough evidence to reject null hypothesis (Ho: Data are normally distributed) since p-value (0.2597) is larger than alpha (0.05). Therefore, we can conclude that the data are normally distributed.

Also Check: How to Recode Character Variables in R

5. Uji Lilliefors (Kolmogorov-Smirnov)

Seseorang dapat menilai normalitas melalui Uji Lilliefors dengan fungsi lillie.test() yang tersedia dalam paket nortest (Gross and Ligges, 2015).

nortest::lillie.test(data)
## 
##  Lilliefors (Kolmogorov-Smirnov) normality test
## 
## data:  data
## D = 0.11486, p-value = 0.09693

Uji Lilliefors menyatakan bahwa tidak ada cukup bukti untuk menolak hipotesis nol (H0: Data terdistribusi normal) karena p-value (0,09693) lebih besar dari alpha (0,05). Maka dapat disimpulkan bahwa data berdistribusi normal.

6. Uji Pearson Chi-square

Researchers can use Pearson chi-square test for assessing normality with pearson.test() function available in nortest package (Gross and Ligges, 2015).

nortest::pearson.test(data)
## 
##  Pearson chi-square normality test
## 
## data:  data
## P = 9.2, p-value = 0.2386

We can conclude that the data are normally distributed since p-value (0.2386) is larger than alpha (0.05) according to Pearson chi-square test.

7. Uji Shapiro-Francia

One can assess the normality through Shapiro-Francia test with sf.test() function available in nortest package (Gross and Ligges, 2015).

nortest::sf.test(data)
## 
##  Shapiro-Francia normality test
## 
## data:  data
## W = 0.9817, p-value = 0.5357

According to the results of Shapiro-Francia test, there is no enough evidence to reject null hypothesis (Ho: Data are normally distributed) since p-value (0.5357) is larger than alpha (0.05). Therefore, we can conclude that the data are normally distributed.

Data Sampel Berukuran Besar

According to Central Limit Theorem, no matter what distribution is, the sampling distribution of mean tends to be normal if the sample is large enough (n ≥ 30). It is important to note that sample size in each group must be large enough. In such a case, normality is not needed to be met since the sampling distribution is normal.

References

Gross, J., Ligges, U. (2015). nortest: Tests for Normality. R package version 1.0-4.

Kassambara, A. (2020). ggpubr: ‘ggplot2’ Based Publication Ready Plots. R package version 0.4.0.

Trapletti, A., Hornik, K. (2019). tseries: Time Series Analysis and Computational Finance. R package version 0.10-47.

Sumber: https://universeofdatascience.com/how-to-assess-normality-in-r/