Procedimiento para calcular el Coeficiente de Pearson Winzorizado en R

Paso 1. Fija el directorio donde esta la data.

setwd("D:/INVESTIGACIONES/3. ARTICULOS PENDIENTES/2020/4. Pearson Robusto") #esto va depender de donde guardaste el archivo 
#Data obtenida de https://ourworldindata.org/hiv-aids

Paso 2. Carga la base de datos

#Ejemplo 1
library(readxl)
Data_Aplicacion3 <- read_excel("Data OMS/Data_Aplicacion3.xlsx", 
                               sheet = "Hoja2")

Paso 3. Revisa las etiquetas de las variables

labels(Data_Aplicacion3)
## [[1]]
##  [1] "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10" "11" "12" "13" "14"
## [15] "15" "16" "17" "18" "19" "20" "21" "22" "23" "24" "25" "26" "27" "28"
## 
## [[2]]
##  [1] "Entity"                             
##  [2] "Code"                               
##  [3] "Year"                               
##  [4] "Meningitis_deaths"                  
##  [5] "Lowerrespiratoryinfections_deaths"  
##  [6] "Intestinalinfectiousdiseases_deaths"
##  [7] "Protein-energymalnutrition_deaths"  
##  [8] "Terrorism_deaths"                   
##  [9] "Cardiovascular_diseases_deaths"     
## [10] "Dementia_deaths"                    
## [11] "Kidney_disease_deaths"              
## [12] "Respiratory_diseases_deaths"        
## [13] "Liver_diseases_deaths"              
## [14] "Digestive_diseases_deaths"          
## [15] "Hepatitis_deaths"                   
## [16] "Cancers_deaths"                     
## [17] "Parkinson_disease_deaths"           
## [18] "Fire_deaths"                        
## [19] "Malaria_deaths"                     
## [20] "Drowning_deaths"                    
## [21] "Homicide_deaths"                    
## [22] "HIV_AIDS_deaths"                    
## [23] "Drug_use_disorders_deaths"          
## [24] "Tuberculosis_deaths"                
## [25] "Road_injuries_deaths"               
## [26] "Maternal_disorders_deaths"          
## [27] "Neonatal_disorders_deaths"          
## [28] "Alcohol_use_disorders_deaths"       
## [29] "Natural_disasters_deaths"           
## [30] "Diarrheal_diseases_deaths"          
## [31] "Heat__hot_and_cold_exposure_deaths" 
## [32] "Nutritional_deficiencie_deaths"     
## [33] "Suicide_deaths"                     
## [34] "Conflict_deaths"                    
## [35] "Diabetes_deaths"                    
## [36] "Poisonings_deaths"

Paso 4. Selecciona solo dos variables

data_tbc1 <- subset(Data_Aplicacion3, select = c("HIV_AIDS_deaths", "Tuberculosis_deaths"))

Paso 5. Ahora, revisemos la existencia de valores atipicos.

library(MVN)
result1 <- mvn(data = data_tbc1, mvnTest = "mardia", multivariatePlot = "qqplot", multivariateOutlierMethod = "adj", showOutliers = TRUE, showNewData = TRUE)

result1$multivariateOutliers
##    Observation Mahalanobis Distance Outlier
## 1            1              119.198    TRUE
## 2            2              117.797    TRUE
## 3            3              117.743    TRUE
## 4            4               98.085    TRUE
## 5            5               97.588    TRUE
## 6            6               86.892    TRUE
## 7            7               63.437    TRUE
## 8            8               44.304    TRUE
## 9            9               27.107    TRUE
## 10          10               12.178    TRUE
result1$multivariateNormality
##              Test         Statistic            p value Result
## 1 Mardia Skewness  10.9876818574254 0.0267028026625831     NO
## 2 Mardia Kurtosis -1.07193018553917  0.283751390949153    YES
## 3             MVN              <NA>               <NA>     NO

Paso 6. Veamos un gráfico de correlación

library(ggplot2)
ggplot(data = data_tbc1, aes(x = HIV_AIDS_deaths, y=Tuberculosis_deaths)) + 
  geom_point() +
  geom_smooth(method = "lm", color = "black")

Paso 7. Calculemos el Coeficiente de Pearson

r1 = cor(data_tbc1, method = c("pearson")) 
r1 = round(r1, 2) #redondeando a dos decimales
r1
##                     HIV_AIDS_deaths Tuberculosis_deaths
## HIV_AIDS_deaths                1.00               -0.92
## Tuberculosis_deaths           -0.92                1.00

Paso 8. Calculo de la Coeficiente de Pearson Winzorizada

library(WRS2)
rw1 = winall(data_tbc1, tr = 0.20)
rw1 = round(rw1$cor, 2) #redondeando a dos decimales
rw1
##                     HIV_AIDS_deaths Tuberculosis_deaths
## HIV_AIDS_deaths                1.00               -0.96
## Tuberculosis_deaths           -0.96                1.00

Paso 9. Diferencia entre ambas correlaciones

r1-rw1
##                     HIV_AIDS_deaths Tuberculosis_deaths
## HIV_AIDS_deaths                0.00                0.04
## Tuberculosis_deaths            0.04                0.00