Procedimiento para calcular el Coeficiente de Pearson Winzorizado en R
Paso 1. Fija el directorio donde esta la data.
setwd("D:/INVESTIGACIONES/3. ARTICULOS PENDIENTES/2020/4. Pearson Robusto") #esto va depender de donde guardaste el archivo
#Data obtenida de https://ourworldindata.org/hiv-aids
Paso 2. Carga la base de datos
#Ejemplo 1
library(readxl)
Data_Aplicacion3 <- read_excel("Data OMS/Data_Aplicacion3.xlsx",
sheet = "Hoja2")
Paso 3. Revisa las etiquetas de las variables
labels(Data_Aplicacion3)
## [[1]]
## [1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "11" "12" "13" "14"
## [15] "15" "16" "17" "18" "19" "20" "21" "22" "23" "24" "25" "26" "27" "28"
##
## [[2]]
## [1] "Entity"
## [2] "Code"
## [3] "Year"
## [4] "Meningitis_deaths"
## [5] "Lowerrespiratoryinfections_deaths"
## [6] "Intestinalinfectiousdiseases_deaths"
## [7] "Protein-energymalnutrition_deaths"
## [8] "Terrorism_deaths"
## [9] "Cardiovascular_diseases_deaths"
## [10] "Dementia_deaths"
## [11] "Kidney_disease_deaths"
## [12] "Respiratory_diseases_deaths"
## [13] "Liver_diseases_deaths"
## [14] "Digestive_diseases_deaths"
## [15] "Hepatitis_deaths"
## [16] "Cancers_deaths"
## [17] "Parkinson_disease_deaths"
## [18] "Fire_deaths"
## [19] "Malaria_deaths"
## [20] "Drowning_deaths"
## [21] "Homicide_deaths"
## [22] "HIV_AIDS_deaths"
## [23] "Drug_use_disorders_deaths"
## [24] "Tuberculosis_deaths"
## [25] "Road_injuries_deaths"
## [26] "Maternal_disorders_deaths"
## [27] "Neonatal_disorders_deaths"
## [28] "Alcohol_use_disorders_deaths"
## [29] "Natural_disasters_deaths"
## [30] "Diarrheal_diseases_deaths"
## [31] "Heat__hot_and_cold_exposure_deaths"
## [32] "Nutritional_deficiencie_deaths"
## [33] "Suicide_deaths"
## [34] "Conflict_deaths"
## [35] "Diabetes_deaths"
## [36] "Poisonings_deaths"
Paso 4. Selecciona solo dos variables
data_tbc1 <- subset(Data_Aplicacion3, select = c("HIV_AIDS_deaths", "Tuberculosis_deaths"))
Paso 5. Ahora, revisemos la existencia de valores atipicos.
library(MVN)
result1 <- mvn(data = data_tbc1, mvnTest = "mardia", multivariatePlot = "qqplot", multivariateOutlierMethod = "adj", showOutliers = TRUE, showNewData = TRUE)

result1$multivariateOutliers
## Observation Mahalanobis Distance Outlier
## 1 1 119.198 TRUE
## 2 2 117.797 TRUE
## 3 3 117.743 TRUE
## 4 4 98.085 TRUE
## 5 5 97.588 TRUE
## 6 6 86.892 TRUE
## 7 7 63.437 TRUE
## 8 8 44.304 TRUE
## 9 9 27.107 TRUE
## 10 10 12.178 TRUE
result1$multivariateNormality
## Test Statistic p value Result
## 1 Mardia Skewness 10.9876818574254 0.0267028026625831 NO
## 2 Mardia Kurtosis -1.07193018553917 0.283751390949153 YES
## 3 MVN <NA> <NA> NO
Paso 6. Veamos un gráfico de correlación
library(ggplot2)
ggplot(data = data_tbc1, aes(x = HIV_AIDS_deaths, y=Tuberculosis_deaths)) +
geom_point() +
geom_smooth(method = "lm", color = "black")

Paso 7. Calculemos el Coeficiente de Pearson
r1 = cor(data_tbc1, method = c("pearson"))
r1 = round(r1, 2) #redondeando a dos decimales
r1
## HIV_AIDS_deaths Tuberculosis_deaths
## HIV_AIDS_deaths 1.00 -0.92
## Tuberculosis_deaths -0.92 1.00
Paso 8. Calculo de la Coeficiente de Pearson Winzorizada
library(WRS2)
rw1 = winall(data_tbc1, tr = 0.20)
rw1 = round(rw1$cor, 2) #redondeando a dos decimales
rw1
## HIV_AIDS_deaths Tuberculosis_deaths
## HIV_AIDS_deaths 1.00 -0.96
## Tuberculosis_deaths -0.96 1.00
Paso 9. Diferencia entre ambas correlaciones
r1-rw1
## HIV_AIDS_deaths Tuberculosis_deaths
## HIV_AIDS_deaths 0.00 0.04
## Tuberculosis_deaths 0.04 0.00