Practica Correlacion y regresion
Variable ordinal- Categorias
Variable intervalos- Numerica
Ambas variables deben ser de una escala numerica y tener una distribucion normal.
Para visualizar un plot de correlacion tengo el comando pairs(x,y ). Para analizar este plot se debe observar que entre mas alto sea el valor de r mas va a tener una tendencia a verse eliptico, y entre mas se acerque la correlacion a 1 o -1 va a ser mas lineal.
peso <- c(51, 59, 49, 54, 50, 55, 48, 53, 52, 57)
long <- c(33.5, 38, 32, 37.5, 31.5, 33, 31, 36.5, 34, 35)
pairs(long ~ peso)
Otra forma de analizar la correlacion es realizar un otro plot que se llama chart.Correlation. En este plot puedo visualizar el nivel de significancia de la correlacion con los asteriscos, entre mas salen mas significativa es.
** Pasos para realizar este plot**
1- Activar el paquete de datos Performance Analytics
2- Hacer un data frame de las variables y darle un nombre
3-aplicar el comando chart.Correlation y visualizar el plot
## Warning: package 'PerformanceAnalytics' was built under R version 3.4.2
## Loading required package: xts
## Warning: package 'xts' was built under R version 3.4.2
## Loading required package: zoo
## Warning: package 'zoo' was built under R version 3.4.2
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
##
## Attaching package: 'PerformanceAnalytics'
## The following object is masked from 'package:graphics':
##
## legend
## Warning in plot.window(...): "method" is not a graphical parameter
## Warning in plot.window(...): "exact" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "method" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "exact" is not a graphical parameter
## Warning in title(...): "method" is not a graphical parameter
## Warning in title(...): "exact" is not a graphical parameter
## Warning in plot.window(...): "method" is not a graphical parameter
## Warning in plot.window(...): "exact" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "method" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "exact" is not a graphical parameter
## Warning in title(...): "method" is not a graphical parameter
## Warning in title(...): "exact" is not a graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "method" is
## not a graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "exact" is not
## a graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "method" is
## not a graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "exact" is not
## a graphical parameter
## Warning in plot.window(...): "method" is not a graphical parameter
## Warning in plot.window(...): "exact" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "method" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "exact" is not a graphical parameter
## Warning in title(...): "method" is not a graphical parameter
## Warning in title(...): "exact" is not a graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "method" is
## not a graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "exact" is not
## a graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "method" is
## not a graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "exact" is not
## a graphical parameter
## Warning in plot.xy(xy.coords(x, y), type = type, ...): "method" is not a
## graphical parameter
## Warning in plot.xy(xy.coords(x, y), type = type, ...): "exact" is not a
## graphical parameter
## Warning in plot.window(...): "method" is not a graphical parameter
## Warning in plot.window(...): "exact" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "method" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "exact" is not a graphical parameter
## Warning in title(...): "method" is not a graphical parameter
## Warning in title(...): "exact" is not a graphical parameter
luego de esto se debe revisar la correlacion y la significancia de esto son los siguientes comandos
cor(peso, long) #me da
## [1] 0.7794691
cor.test(peso, long)#Significancia y debe de llevar las dos variables a las que queremos analizar
##
## Pearson's product-moment correlation
##
## data: peso and long
## t = 3.5194, df = 8, p-value = 0.007853
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.2942560 0.9452104
## sample estimates:
## cor
## 0.7794691
En este caso las dos variables son normales por lo tnto se utiliza el metodo de Pearson, si las variables fueran no normales se deberia de usar el metodo no parametrico, que puede ser el de spearman o kendall esto lo debemos de indicar
cor(x, y,method = c(“pearson”, “kendall”, “spearman”))
r2 es el coeficiente de determinacion que se hace elevando al cuadrado r (el % de correlacion) es en base a 10,entonces esto nos da como resultado el % que si esta relacionado y lo otro que hace falta para llegar a 10 es el % que no tiene relacion
0.7794691^2 #r2 determina el & de aceptacion, es en base a 10 por lo que le resta es el % que no aceptacion
## [1] 0.6075721
Practica de la guia
data(women)
women
## height weight
## 1 58 115
## 2 59 117
## 3 60 120
## 4 61 123
## 5 62 126
## 6 63 129
## 7 64 132
## 8 65 135
## 9 66 139
## 10 67 142
## 11 68 146
## 12 69 150
## 13 70 154
## 14 71 159
## 15 72 164
Observamos que la base de datos tiene dos variables, las cuales debemos revisar si son normales para determinar el metodo ya sea parametrico(pearson) o no parametrico (kendall/Spearman)
shapiro.test(women$height)#normal
##
## Shapiro-Wilk normality test
##
## data: women$height
## W = 0.96359, p-value = 0.7545
shapiro.test(women$weight)#normal
##
## Shapiro-Wilk normality test
##
## data: women$weight
## W = 0.96036, p-value = 0.6986
cor(women,method = "p") #coeficiente de correlacion
## height weight
## height 1.0000000 0.9954948
## weight 0.9954948 1.0000000
cor(women$height,women$weight)
## [1] 0.9954948
cor.test(women$height,women$weight)# coeficiente de determinancion
##
## Pearson's product-moment correlation
##
## data: women$height and women$weight
## t = 37.855, df = 13, p-value = 1.091e-14
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.9860970 0.9985447
## sample estimates:
## cor
## 0.9954948
podemos concluir que la el peso esta 99% correlacionado con la altura y lo corroboramos con el plot de pairs que al ser una correlacion de 99 deberia ser bien recta
pairs(women$height~women$weight)
Ya que vimos el plot y corroboramos lo anterior podemos decir que es una correlacion positiva y fuerte tambien podemos hacerlo con el otro plot que se llama chart.Correlation donde tambien podemos ver que es una correlacion positiva y fuerte- fijarse en los asteriscos
library(PerformanceAnalytics)
dfw<- data.frame(women$height,women$weight)
chart.Correlation(dfw)
## Warning in plot.window(...): "method" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "method" is not a graphical parameter
## Warning in title(...): "method" is not a graphical parameter
## Warning in plot.window(...): "method" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "method" is not a graphical parameter
## Warning in title(...): "method" is not a graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "method" is
## not a graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "method" is
## not a graphical parameter
## Warning in plot.window(...): "method" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "method" is not a graphical parameter
## Warning in title(...): "method" is not a graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "method" is
## not a graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "method" is
## not a graphical parameter
## Warning in plot.xy(xy.coords(x, y), type = type, ...): "method" is not a
## graphical parameter
## Warning in plot.window(...): "method" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "method" is not a graphical parameter
## Warning in title(...): "method" is not a graphical parameter
Debo de tener dos variables, una dependiente -> y y otra independiente -> x
La relacion entre ambas variables debe ser aproximadamente lineal
los residuos deben ser normales, independiente y con varianza cte
peso <- c(61, 60, 78, 62, 66, 60, 54, 84, 68) #el peso depende de la altura
altura <- c(162, 154, 180, 158, 171, 169, 166, 176, 163) #la altura no se ve afectada por el peso
regresion<-lm(peso~altura)
regresion
##
## Call:
## lm(formula = peso ~ altura)
##
## Coefficients:
## (Intercept) altura
## -67.4679 0.8007
summary(regresion)
##
## Call:
## lm(formula = peso ~ altura)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11.444 -3.447 1.347 4.164 10.549
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -67.4679 51.1995 -1.318 0.229
## altura 0.8007 0.3071 2.608 0.035 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.268 on 7 degrees of freedom
## Multiple R-squared: 0.4927, Adjusted R-squared: 0.4203
## F-statistic: 6.799 on 1 and 7 DF, p-value: 0.03504
Luego de que revise el summary debo interpretar los siguientes valores.
P-value = Significancia ->p-value: 0.03504
Intercepto -> -67.4679
pendiente=valor abajo del intercepto -> 0.8007
Grados de libertad= df -> 7
con estos valores ya puedo crear la ecuacion de la recta y(v.indep) =b(intercepto) + m(pendiente)* x(v.dependiente) en este caso
peso= -67.4679 + 0.8007 * altura
Ahora se debe proceder a revisar los supuestos
shapiro.test(resid(regresion))
##
## Shapiro-Wilk normality test
##
## data: resid(regresion)
## W = 0.97706, p-value = 0.9475
library(car)
## Warning: package 'car' was built under R version 3.4.2
ncvTest(regresion)
## Non-constant Variance Score Test
## Variance formula: ~ fitted.values
## Chisquare = 0.3545291 Df = 1 p = 0.5515605
Para una r. multiple se debe de tener una base de datos que contenga mas de dos variables independientes.
library(car)
data(Prestige)
Prestige
## education income women prestige census type
## gov.administrators 13.11 12351 11.16 68.8 1113 prof
## general.managers 12.26 25879 4.02 69.1 1130 prof
## accountants 12.77 9271 15.70 63.4 1171 prof
## purchasing.officers 11.42 8865 9.11 56.8 1175 prof
## chemists 14.62 8403 11.68 73.5 2111 prof
## physicists 15.64 11030 5.13 77.6 2113 prof
## biologists 15.09 8258 25.65 72.6 2133 prof
## architects 15.44 14163 2.69 78.1 2141 prof
## civil.engineers 14.52 11377 1.03 73.1 2143 prof
## mining.engineers 14.64 11023 0.94 68.8 2153 prof
## surveyors 12.39 5902 1.91 62.0 2161 prof
## draughtsmen 12.30 7059 7.83 60.0 2163 prof
## computer.programers 13.83 8425 15.33 53.8 2183 prof
## economists 14.44 8049 57.31 62.2 2311 prof
## psychologists 14.36 7405 48.28 74.9 2315 prof
## social.workers 14.21 6336 54.77 55.1 2331 prof
## lawyers 15.77 19263 5.13 82.3 2343 prof
## librarians 14.15 6112 77.10 58.1 2351 prof
## vocational.counsellors 15.22 9593 34.89 58.3 2391 prof
## ministers 14.50 4686 4.14 72.8 2511 prof
## university.teachers 15.97 12480 19.59 84.6 2711 prof
## primary.school.teachers 13.62 5648 83.78 59.6 2731 prof
## secondary.school.teachers 15.08 8034 46.80 66.1 2733 prof
## physicians 15.96 25308 10.56 87.2 3111 prof
## veterinarians 15.94 14558 4.32 66.7 3115 prof
## osteopaths.chiropractors 14.71 17498 6.91 68.4 3117 prof
## nurses 12.46 4614 96.12 64.7 3131 prof
## nursing.aides 9.45 3485 76.14 34.9 3135 bc
## physio.therapsts 13.62 5092 82.66 72.1 3137 prof
## pharmacists 15.21 10432 24.71 69.3 3151 prof
## medical.technicians 12.79 5180 76.04 67.5 3156 wc
## commercial.artists 11.09 6197 21.03 57.2 3314 prof
## radio.tv.announcers 12.71 7562 11.15 57.6 3337 wc
## athletes 11.44 8206 8.13 54.1 3373 <NA>
## secretaries 11.59 4036 97.51 46.0 4111 wc
## typists 11.49 3148 95.97 41.9 4113 wc
## bookkeepers 11.32 4348 68.24 49.4 4131 wc
## tellers.cashiers 10.64 2448 91.76 42.3 4133 wc
## computer.operators 11.36 4330 75.92 47.7 4143 wc
## shipping.clerks 9.17 4761 11.37 30.9 4153 wc
## file.clerks 12.09 3016 83.19 32.7 4161 wc
## receptionsts 11.04 2901 92.86 38.7 4171 wc
## mail.carriers 9.22 5511 7.62 36.1 4172 wc
## postal.clerks 10.07 3739 52.27 37.2 4173 wc
## telephone.operators 10.51 3161 96.14 38.1 4175 wc
## collectors 11.20 4741 47.06 29.4 4191 wc
## claim.adjustors 11.13 5052 56.10 51.1 4192 wc
## travel.clerks 11.43 6259 39.17 35.7 4193 wc
## office.clerks 11.00 4075 63.23 35.6 4197 wc
## sales.supervisors 9.84 7482 17.04 41.5 5130 wc
## commercial.travellers 11.13 8780 3.16 40.2 5133 wc
## sales.clerks 10.05 2594 67.82 26.5 5137 wc
## newsboys 9.62 918 7.00 14.8 5143 <NA>
## service.station.attendant 9.93 2370 3.69 23.3 5145 bc
## insurance.agents 11.60 8131 13.09 47.3 5171 wc
## real.estate.salesmen 11.09 6992 24.44 47.1 5172 wc
## buyers 11.03 7956 23.88 51.1 5191 wc
## firefighters 9.47 8895 0.00 43.5 6111 bc
## policemen 10.93 8891 1.65 51.6 6112 bc
## cooks 7.74 3116 52.00 29.7 6121 bc
## bartenders 8.50 3930 15.51 20.2 6123 bc
## funeral.directors 10.57 7869 6.01 54.9 6141 bc
## babysitters 9.46 611 96.53 25.9 6147 <NA>
## launderers 7.33 3000 69.31 20.8 6162 bc
## janitors 7.11 3472 33.57 17.3 6191 bc
## elevator.operators 7.58 3582 30.08 20.1 6193 bc
## farmers 6.84 3643 3.60 44.1 7112 <NA>
## farm.workers 8.60 1656 27.75 21.5 7182 bc
## rotary.well.drillers 8.88 6860 0.00 35.3 7711 bc
## bakers 7.54 4199 33.30 38.9 8213 bc
## slaughterers.1 7.64 5134 17.26 25.2 8215 bc
## slaughterers.2 7.64 5134 17.26 34.8 8215 bc
## canners 7.42 1890 72.24 23.2 8221 bc
## textile.weavers 6.69 4443 31.36 33.3 8267 bc
## textile.labourers 6.74 3485 39.48 28.8 8278 bc
## tool.die.makers 10.09 8043 1.50 42.5 8311 bc
## machinists 8.81 6686 4.28 44.2 8313 bc
## sheet.metal.workers 8.40 6565 2.30 35.9 8333 bc
## welders 7.92 6477 5.17 41.8 8335 bc
## auto.workers 8.43 5811 13.62 35.9 8513 bc
## aircraft.workers 8.78 6573 5.78 43.7 8515 bc
## electronic.workers 8.76 3942 74.54 50.8 8534 bc
## radio.tv.repairmen 10.29 5449 2.92 37.2 8537 bc
## sewing.mach.operators 6.38 2847 90.67 28.2 8563 bc
## auto.repairmen 8.10 5795 0.81 38.1 8581 bc
## aircraft.repairmen 10.10 7716 0.78 50.3 8582 bc
## railway.sectionmen 6.67 4696 0.00 27.3 8715 bc
## electrical.linemen 9.05 8316 1.34 40.9 8731 bc
## electricians 9.93 7147 0.99 50.2 8733 bc
## construction.foremen 8.24 8880 0.65 51.1 8780 bc
## carpenters 6.92 5299 0.56 38.9 8781 bc
## masons 6.60 5959 0.52 36.2 8782 bc
## house.painters 7.81 4549 2.46 29.9 8785 bc
## plumbers 8.33 6928 0.61 42.9 8791 bc
## construction.labourers 7.52 3910 1.09 26.5 8798 bc
## pilots 12.27 14032 0.58 66.1 9111 prof
## train.engineers 8.49 8845 0.00 48.9 9131 bc
## bus.drivers 7.58 5562 9.47 35.9 9171 bc
## taxi.drivers 7.93 4224 3.59 25.1 9173 bc
## longshoremen 8.37 4753 0.00 26.1 9313 bc
## typesetters 10.00 6462 13.58 42.2 9511 bc
## bookbinders 8.55 3617 70.87 35.2 9517 bc
head(Prestige) #encabezado
## education income women prestige census type
## gov.administrators 13.11 12351 11.16 68.8 1113 prof
## general.managers 12.26 25879 4.02 69.1 1130 prof
## accountants 12.77 9271 15.70 63.4 1171 prof
## purchasing.officers 11.42 8865 9.11 56.8 1175 prof
## chemists 14.62 8403 11.68 73.5 2111 prof
## physicists 15.64 11030 5.13 77.6 2113 prof
names(Prestige) #titulos de las variables
## [1] "education" "income" "women" "prestige" "census" "type"
Observamos la base de datos y realizamos una regresion multiple con variables independientes
regmult1<-lm(prestige~education+women, data = Prestige)
regmult1
##
## Call:
## lm(formula = prestige ~ education + women, data = Prestige)
##
## Coefficients:
## (Intercept) education women
## -8.75416 5.42780 -0.09305
summary(regmult1)
##
## Call:
## lm(formula = prestige ~ education + women, data = Prestige)
##
## Residuals:
## Min 1Q Median 3Q Max
## -28.010 -4.069 1.050 5.027 18.942
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -8.75416 3.54213 -2.471 0.015164 *
## education 5.42780 0.31612 17.170 < 2e-16 ***
## women -0.09305 0.02719 -3.422 0.000904 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.652 on 99 degrees of freedom
## Multiple R-squared: 0.7521, Adjusted R-squared: 0.7471
## F-statistic: 150.2 on 2 and 99 DF, p-value: < 2.2e-16
shapiro.test(residuals(regmult1)) #normalidad de residuos
##
## Shapiro-Wilk normality test
##
## data: residuals(regmult1)
## W = 0.98367, p-value = 0.2419
ncvTest(regmult1) #homocedasticidad
## Non-constant Variance Score Test
## Variance formula: ~ fitted.values
## Chisquare = 1.068151 Df = 1 p = 0.3013634
ahora hacemos otro modelo de regresion, esta vez le ponemos otra variable mas
regmult2<-lm(prestige~education+women+income,data=Prestige)
regmult2
##
## Call:
## lm(formula = prestige ~ education + women + income, data = Prestige)
##
## Coefficients:
## (Intercept) education women income
## -6.794334 4.186637 -0.008905 0.001314
summary(regmult2)
##
## Call:
## lm(formula = prestige ~ education + women + income, data = Prestige)
##
## Residuals:
## Min 1Q Median 3Q Max
## -19.8246 -5.3332 -0.1364 5.1587 17.5045
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -6.7943342 3.2390886 -2.098 0.0385 *
## education 4.1866373 0.3887013 10.771 < 2e-16 ***
## women -0.0089052 0.0304071 -0.293 0.7702
## income 0.0013136 0.0002778 4.729 7.58e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.846 on 98 degrees of freedom
## Multiple R-squared: 0.7982, Adjusted R-squared: 0.792
## F-statistic: 129.2 on 3 and 98 DF, p-value: < 2.2e-16
shapiro.test(residuals(regmult2))
##
## Shapiro-Wilk normality test
##
## data: residuals(regmult2)
## W = 0.99363, p-value = 0.918
ncvTest(regmult2)
## Non-constant Variance Score Test
## Variance formula: ~ fitted.values
## Chisquare = 0.2490143 Df = 1 p = 0.61777
para determinar cual modelo es mejor hago el test de akaike, el mejor modelo va a ser el que
AIC(regmult1,regmult2)
## df AIC
## regmult1 4 734.5998
## regmult2 5 715.6358
concluimos que la regresion 1 es un mejor modelo de regresion
frecuencia<-c(15,19)
frecuencia
## [1] 15 19
chisq.test(frecuencia) #obtengo el valor de x2,los df y el p-value
##
## Chi-squared test for given probabilities
##
## data: frecuencia
## X-squared = 0.47059, df = 1, p-value = 0.4927
qchisq(0.95,1) #me da el valor critico tabular
## [1] 3.841459
chisq.test(frecuencia)$expected # me da la frecuencia esperada
## [1] 17 17
capt<-c(12,4)
chisq.test(capt)
##
## Chi-squared test for given probabilities
##
## data: capt
## X-squared = 4, df = 1, p-value = 0.0455
qchisq(capt,1)
## Warning in qchisq(capt, 1): NaNs produced
## [1] NaN NaN
Debo de comparar el x2 con el valor tabular tambien debo de tener en cuanta el % si es al 98,95 o el que me indiquen. si el valor de pvalue
matris<-matrix(c(50,40,25,45),nrow = 2,dimnames = list(c("fem", "masc"),c("si","no")))
matris
## si no
## fem 50 25
## masc 40 45
chisq.test(matris)$expected
## si no
## fem 42.1875 32.8125
## masc 47.8125 37.1875
prop.table(matris)
## si no
## fem 0.3125 0.15625
## masc 0.2500 0.28125
qchisq(0.95,1) #el problema me indica si es al 99% 98% o 95%
## [1] 3.841459
chisq.test(matris)
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: matris
## X-squared = 5.4534, df = 1, p-value = 0.01953
library(car)
data("Chile")
str(Chile)
## 'data.frame': 2700 obs. of 8 variables:
## $ region : Factor w/ 5 levels "C","M","N","S",..: 3 3 3 3 3 3 3 3 3 3 ...
## $ population: int 175000 175000 175000 175000 175000 175000 175000 175000 175000 175000 ...
## $ sex : Factor w/ 2 levels "F","M": 2 2 1 1 1 1 2 1 1 2 ...
## $ age : int 65 29 38 49 23 28 26 24 41 41 ...
## $ education : Factor w/ 3 levels "P","PS","S": 1 2 1 1 3 1 2 3 1 1 ...
## $ income : int 35000 7500 15000 35000 35000 7500 35000 15000 15000 15000 ...
## $ statusquo : num 1.01 -1.3 1.23 -1.03 -1.1 ...
## $ vote : Factor w/ 4 levels "A","N","U","Y": 4 2 4 2 2 2 2 2 3 2 ...
alex<-table(Chile$sex,Chile$region)
chisq.test(alex)
##
## Pearson's Chi-squared test
##
## data: alex
## X-squared = 0.80328, df = 4, p-value = 0.938
qchisq(0.95,4)
## [1] 9.487729
chisq.test(alex)$expected
##
## C M N S SA
## F 306.4444 51.07407 164.4585 366.7119 490.3111
## M 293.5556 48.92593 157.5415 351.2881 469.6889
tabla<-table(Chile$sex,Chile$vote)
chisq.test(tabla)
##
## Pearson's Chi-squared test
##
## data: tabla
## X-squared = 70.612, df = 3, p-value = 3.156e-15
qchisq(0.95,3)
## [1] 7.814728
#si el valor calculado es mayor que el tabulado acepto la hip nula, en este caso la acepto xq x2=70.612 y el tabulado es 7.814728 por lo cual se determina que si hay diferencia
tabla2<-table(Chile$sex,Chile$education)
tabla2
##
## P PS S
## F 607 199 566
## M 500 263 554
chisq.test(tabla2)$expected #lo hago de primero para sabeer si debo hacer fisher en el caso que me de que hayan valores cero
##
## P PS S
## F 564.8211 235.7248 571.4541
## M 542.1789 226.2752 548.5459
chisq.test(tabla2)
##
## Pearson's Chi-squared test
##
## data: tabla2
## X-squared = 18.219, df = 2, p-value = 0.0001106
qchisq(0.95,2)
## [1] 5.991465
#luego de ver que el valor calculado es mayor que el tabulado establesco que si hay una diferencia establecida
tabla3<-table(Chile$age,Chile$education)
tabla3
##
## P PS S
## 18 11 9 70
## 19 11 13 52
## 20 19 17 42
## 21 19 22 55
## 22 23 23 46
## 23 15 17 43
## 24 20 22 39
## 25 24 18 37
## 26 15 17 31
## 27 20 18 33
## 28 18 16 34
## 29 18 11 31
## 30 20 7 46
## 31 11 10 22
## 32 14 12 31
## 33 20 11 25
## 34 17 12 22
## 35 23 14 20
## 36 25 25 45
## 37 32 17 29
## 38 25 14 26
## 39 29 9 17
## 40 29 18 21
## 41 15 8 11
## 42 21 14 26
## 43 29 5 12
## 44 27 10 13
## 45 35 8 19
## 46 14 4 14
## 47 19 3 14
## 48 28 3 12
## 49 27 4 7
## 50 24 2 13
## 51 14 3 9
## 52 19 2 7
## 53 20 8 8
## 54 24 2 8
## 55 22 3 11
## 56 16 2 6
## 57 21 3 10
## 58 27 3 12
## 59 27 3 12
## 60 36 2 12
## 61 11 1 2
## 62 18 1 8
## 63 14 1 6
## 64 20 1 5
## 65 24 5 9
## 66 20 1 3
## 67 15 3 5
## 68 17 0 6
## 69 8 1 7
## 70 36 4 16
chisq.test(tabla3)$expected
## Warning in chisq.test(tabla3): Chi-squared approximation may be incorrect
##
## P PS S
## 18 37.031250 15.468750 37.500000
## 19 31.270833 13.062500 31.666667
## 20 32.093750 13.406250 32.500000
## 21 39.500000 16.500000 40.000000
## 22 37.854167 15.812500 38.333333
## 23 30.859375 12.890625 31.250000
## 24 33.328125 13.921875 33.750000
## 25 32.505208 13.578125 32.916667
## 26 25.921875 10.828125 26.250000
## 27 29.213542 12.203125 29.583333
## 28 27.979167 11.687500 28.333333
## 29 24.687500 10.312500 25.000000
## 30 30.036458 12.546875 30.416667
## 31 17.692708 7.390625 17.916667
## 32 23.453125 9.796875 23.750000
## 33 23.041667 9.625000 23.333333
## 34 20.984375 8.765625 21.250000
## 35 23.453125 9.796875 23.750000
## 36 39.088542 16.328125 39.583333
## 37 32.093750 13.406250 32.500000
## 38 26.744792 11.171875 27.083333
## 39 22.630208 9.453125 22.916667
## 40 27.979167 11.687500 28.333333
## 41 13.989583 5.843750 14.166667
## 42 25.098958 10.484375 25.416667
## 43 18.927083 7.906250 19.166667
## 44 20.572917 8.593750 20.833333
## 45 25.510417 10.656250 25.833333
## 46 13.166667 5.500000 13.333333
## 47 14.812500 6.187500 15.000000
## 48 17.692708 7.390625 17.916667
## 49 15.635417 6.531250 15.833333
## 50 16.046875 6.703125 16.250000
## 51 10.697917 4.468750 10.833333
## 52 11.520833 4.812500 11.666667
## 53 14.812500 6.187500 15.000000
## 54 13.989583 5.843750 14.166667
## 55 14.812500 6.187500 15.000000
## 56 9.875000 4.125000 10.000000
## 57 13.989583 5.843750 14.166667
## 58 17.281250 7.218750 17.500000
## 59 17.281250 7.218750 17.500000
## 60 20.572917 8.593750 20.833333
## 61 5.760417 2.406250 5.833333
## 62 11.109375 4.640625 11.250000
## 63 8.640625 3.609375 8.750000
## 64 10.697917 4.468750 10.833333
## 65 15.635417 6.531250 15.833333
## 66 9.875000 4.125000 10.000000
## 67 9.463542 3.953125 9.583333
## 68 9.463542 3.953125 9.583333
## 69 6.583333 2.750000 6.666667
## 70 23.041667 9.625000 23.333333
library(DAAG)
## Warning: package 'DAAG' was built under R version 3.4.4
## Loading required package: lattice
##
## Attaching package: 'DAAG'
## The following object is masked from 'package:car':
##
## vif
data(dengue)
shapiro.test(dengue$humid)
##
## Shapiro-Wilk normality test
##
## data: dengue$humid
## W = 0.93811, p-value < 2.2e-16
shapiro.test(dengue$temp)
##
## Shapiro-Wilk normality test
##
## data: dengue$temp
## W = 0.91009, p-value < 2.2e-16
cor.test(dengue$humid,dengue$temp, method = "s")
## Warning in cor.test.default(dengue$humid, dengue$temp, method = "s"):
## Cannot compute exact p-value with ties
##
## Spearman's rank correlation rho
##
## data: dengue$humid and dengue$temp
## S = 177920000, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## 0.8661581
cor(dengue$humid,dengue$temp,method = "s")
## [1] NA
0.8661581*0.8661581 #con el coef de determinacion determino el % de explicacion de las variables que estan siendo relacionadass, en este caso el 75%
## [1] 0.7502299
shapiro.test(dengue$trees)
##
## Shapiro-Wilk normality test
##
## data: dengue$trees
## W = 0.85452, p-value < 2.2e-16
shapiro.test(dengue$temp)
##
## Shapiro-Wilk normality test
##
## data: dengue$temp
## W = 0.91009, p-value < 2.2e-16
cor.test(dengue$trees,dengue$temp, method = "s") #pvalue mayor a 0.05, no es significativo
## Warning in cor.test.default(dengue$trees, dengue$temp, method = "s"):
## Cannot compute exact p-value with ties
##
## Spearman's rank correlation rho
##
## data: dengue$trees and dengue$temp
## S = 1252200000, p-value = 0.06868
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## 0.04086041
0.04086041*0.04086041 #es una correlacion positiva pero muy debil
## [1] 0.001669573
data("codling")
str(codling)
## 'data.frame': 99 obs. of 10 variables:
## $ dose : num 5 8 12 16 20 24 5 8 12 16 ...
## $ tot : num 866 911 906 712 582 ...
## $ dead : num 246 220 360 271 414 742 154 168 180 240 ...
## $ pobs : num 0.284 0.241 0.397 0.381 0.711 ...
## $ cm : num 0.218 0.218 0.218 0.218 0.218 ...
## $ ct : num 15.6 20.3 28.6 32.7 45.4 ...
## $ Cultivar: chr "ROYAL" "ROYAL" "ROYAL" "ROYAL" ...
## $ gp : num 1 1 1 1 1 1 2 2 2 2 ...
## $ year : Factor w/ 2 levels "1988","1989": 1 1 1 1 1 1 1 1 1 1 ...
## $ numcm : num 1676 1676 1676 1676 1676 ...
shapiro.test(codling$dead)
##
## Shapiro-Wilk normality test
##
## data: codling$dead
## W = 0.76395, p-value = 2.584e-11
shapiro.test(codling$ct)
##
## Shapiro-Wilk normality test
##
## data: codling$ct
## W = 0.96658, p-value = 0.01281
cor.test(codling$dead,codling$ct,method = "s") #correlacion positiva pero no tan fuerte
## Warning in cor.test.default(codling$dead, codling$ct, method = "s"): Cannot
## compute exact p-value with ties
##
## Spearman's rank correlation rho
##
## data: codling$dead and codling$ct
## S = 48861, p-value = 1.003e-15
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## 0.6978299
0.6978299*0.6978299 #porcentaje explicdo es de 48% , si me preg % lo pongo en base a 100 si solo me piden el coef de det pongo el rho
## [1] 0.4869666
Regresion
data("fossum")
fossum
## case site Pop sex age hdlngth skullw totlngth taill footlgth
## C5 2 1 Vic f 6 92.5 57.6 91.5 36.5 72.5
## C10 3 1 Vic f 6 94.0 60.0 95.5 39.0 75.4
## C15 4 1 Vic f 6 93.2 57.1 92.0 38.0 76.1
## C23 5 1 Vic f 2 91.5 56.3 85.5 36.0 71.0
## C24 6 1 Vic f 1 93.1 54.8 90.5 35.5 73.2
## C27 8 1 Vic f 6 94.8 57.6 91.0 37.0 72.7
## C28 9 1 Vic f 9 93.4 56.3 91.5 37.0 72.4
## C31 10 1 Vic f 6 91.8 58.0 89.5 37.5 70.9
## C32 11 1 Vic f 9 93.3 57.2 89.5 39.0 77.2
## C34 12 1 Vic f 5 94.9 55.6 92.0 35.5 71.7
## C45 17 1 Vic f 1 94.7 67.7 89.5 36.5 73.2
## C48 19 1 Vic f 5 94.4 55.4 90.5 35.0 73.4
## C50 20 1 Vic f 4 94.8 56.3 89.0 38.0 73.8
## C54 21 1 Vic f 3 95.9 58.1 96.5 39.5 77.9
## C58 23 1 Vic f 4 92.5 56.1 89.0 36.0 72.8
## C63 27 1 Vic f 2 90.5 54.5 85.0 35.0 70.3
## A1 29 1 Vic f 3 92.8 56.0 88.0 35.0 74.9
## A2 30 1 Vic f 2 92.1 54.4 84.0 33.5 70.6
## A4 32 1 Vic f 4 94.3 56.7 94.0 39.0 74.8
## BB17 37 2 Vic f 2 89.3 54.8 82.5 35.0 71.2
## BB31 39 2 Vic f 1 84.7 51.5 75.0 34.0 68.7
## BB33 40 2 Vic f 3 91.0 55.0 84.5 36.0 72.8
## BB36 41 2 Vic f 5 88.4 57.0 83.0 36.5 NA
## BB40 43 2 Vic f 2 90.0 55.5 81.0 32.0 72.0
## WW4 50 3 other f 5 91.6 56.4 88.0 38.0 65.0
## WW5 51 3 other f 5 95.6 59.6 85.0 36.0 64.0
## WW7 53 3 other f 3 93.1 58.1 91.0 38.0 67.4
## BR4 57 4 other f 4 95.1 59.4 93.0 41.0 67.2
## BR7 60 4 other f 2 91.3 57.7 88.0 39.0 63.1
## CD2 62 5 other f 3 91.3 58.0 90.5 39.0 65.5
## CD3 63 5 other f 6 92.0 56.4 88.5 38.0 64.1
## CD4 64 5 other f 3 96.9 56.5 89.5 38.5 63.0
## CD5 65 5 other f 5 93.5 57.4 88.5 38.0 68.2
## CD6 66 5 other f 3 90.4 55.8 86.0 36.5 63.2
## CD10 70 5 other f 7 91.9 56.4 87.0 38.0 65.4
## BSF1 74 6 other f 4 88.7 52.0 83.0 38.0 61.5
## BSF9 82 6 other f 4 86.0 54.0 82.0 36.5 60.7
## BSF10 83 6 other f 3 90.0 53.8 81.5 36.0 62.0
## BSF13 86 6 other f 3 88.2 53.2 86.5 38.5 60.3
## BTP3 88 7 other f 2 89.6 58.0 87.5 38.0 66.7
## BTP15 99 7 other f 3 93.3 56.2 86.5 38.5 64.8
## BTP19 102 7 other f 6 92.4 55.0 89.0 38.0 63.5
## BTP21 104 7 other f 3 93.6 59.9 89.0 40.0 67.6
## earconch eye chest belly
## C5 51.2 16.0 28.5 33.0
## C10 51.9 15.5 30.0 34.0
## C15 52.2 15.2 28.0 34.0
## C23 53.2 15.1 28.5 33.0
## C24 53.6 14.2 30.0 32.0
## C27 53.9 14.5 29.0 34.0
## C28 52.9 15.5 28.0 33.0
## C31 53.4 14.4 27.5 32.0
## C32 51.3 14.9 31.0 34.0
## C34 51.0 15.3 28.0 33.0
## C45 53.2 14.7 29.0 31.0
## C48 53.9 15.2 28.0 32.0
## C50 52.4 15.5 27.0 36.0
## C54 52.9 14.2 30.0 40.0
## C58 53.3 15.4 28.0 35.0
## C63 50.8 14.2 23.0 28.0
## A1 51.8 14.0 24.0 32.0
## A2 50.8 14.5 24.5 33.0
## A4 52.0 14.9 28.0 34.0
## BB17 52.0 13.6 28.0 31.5
## BB31 53.4 13.0 25.0 25.0
## BB33 51.4 13.6 27.0 30.0
## BB36 40.3 15.9 27.0 30.5
## BB40 49.4 13.4 29.0 31.0
## WW4 47.2 14.9 28.0 36.0
## WW5 43.9 17.4 28.0 38.5
## WW7 46.0 16.5 26.0 33.5
## BR4 45.3 14.5 31.0 39.0
## BR7 47.0 14.4 26.0 30.0
## CD2 41.3 16.0 27.0 32.0
## CD3 46.3 15.2 25.5 28.5
## CD4 45.1 17.1 25.5 33.0
## CD5 41.7 14.0 29.0 38.5
## CD6 44.2 15.7 26.5 34.0
## CD10 44.1 13.0 27.0 34.0
## BSF1 45.9 14.7 26.0 34.0
## BSF9 42.9 15.4 26.0 32.0
## BSF10 43.3 14.0 25.0 29.0
## BSF13 43.7 13.6 26.0 31.0
## BTP3 43.5 16.0 25.5 31.5
## BTP15 43.8 14.0 28.0 35.0
## BTP19 45.4 13.0 25.0 30.0
## BTP21 46.0 14.8 28.5 33.5
str(fossum)
## 'data.frame': 43 obs. of 14 variables:
## $ case : num 2 3 4 5 6 8 9 10 11 12 ...
## $ site : num 1 1 1 1 1 1 1 1 1 1 ...
## $ Pop : Factor w/ 2 levels "Vic","other": 1 1 1 1 1 1 1 1 1 1 ...
## $ sex : Factor w/ 2 levels "f","m": 1 1 1 1 1 1 1 1 1 1 ...
## $ age : num 6 6 6 2 1 6 9 6 9 5 ...
## $ hdlngth : num 92.5 94 93.2 91.5 93.1 94.8 93.4 91.8 93.3 94.9 ...
## $ skullw : num 57.6 60 57.1 56.3 54.8 57.6 56.3 58 57.2 55.6 ...
## $ totlngth: num 91.5 95.5 92 85.5 90.5 91 91.5 89.5 89.5 92 ...
## $ taill : num 36.5 39 38 36 35.5 37 37 37.5 39 35.5 ...
## $ footlgth: num 72.5 75.4 76.1 71 73.2 72.7 72.4 70.9 77.2 71.7 ...
## $ earconch: num 51.2 51.9 52.2 53.2 53.6 53.9 52.9 53.4 51.3 51 ...
## $ eye : num 16 15.5 15.2 15.1 14.2 14.5 15.5 14.4 14.9 15.3 ...
## $ chest : num 28.5 30 28 28.5 30 29 28 27.5 31 28 ...
## $ belly : num 33 34 34 33 32 34 33 32 34 33 ...
reg<-lm(fossum$skullw~fossum$age)
reg
##
## Call:
## lm(formula = fossum$skullw ~ fossum$age)
##
## Coefficients:
## (Intercept) fossum$age
## 56.26097 0.08233
#revisar supuestos
shapiro.test(residuals(reg)) #es normal si es mayor a 0,05
##
## Shapiro-Wilk normality test
##
## data: residuals(reg)
## W = 0.85065, p-value = 5.422e-05
ncv.test(reg) #es hoodedastico es es mayor a 0,05
## Warning: 'ncv.test' is deprecated.
## Use 'ncvTest' instead.
## See help("Deprecated") and help("car-deprecated").
## Non-constant Variance Score Test
## Variance formula: ~ fitted.values
## Chisquare = 17.59019 Df = 1 p = 2.739975e-05
reg2<-lm(fossum$age~fossum$taill)
reg2
##
## Call:
## lm(formula = fossum$age ~ fossum$taill)
##
## Coefficients:
## (Intercept) fossum$taill
## -8.5702 0.3382
shapiro.test(residuals(reg2)) #es normal si es mayor a 0,05
##
## Shapiro-Wilk normality test
##
## data: residuals(reg2)
## W = 0.94219, p-value = 0.03111
ncv.test(reg2)
## Warning: 'ncv.test' is deprecated.
## Use 'ncvTest' instead.
## See help("Deprecated") and help("car-deprecated").
## Non-constant Variance Score Test
## Variance formula: ~ fitted.values
## Chisquare = 1.055169 Df = 1 p = 0.3043196
reg3<-lm(fossum$age~fossum$footlgth)
reg3
##
## Call:
## lm(formula = fossum$age ~ fossum$footlgth)
##
## Coefficients:
## (Intercept) fossum$footlgth
## -0.2842 0.0613
shapiro.test(residuals(reg3))
##
## Shapiro-Wilk normality test
##
## data: residuals(reg3)
## W = 0.96664, p-value = 0.2536
ncv.test(reg3)
## Warning: 'ncv.test' is deprecated.
## Use 'ncvTest' instead.
## See help("Deprecated") and help("car-deprecated").
## Non-constant Variance Score Test
## Variance formula: ~ fitted.values
## Chisquare = 3.69248 Df = 1 p = 0.0546583
AIC(reg,reg2,reg3) #el mejor modelo es el #2
## Warning in AIC.default(reg, reg2, reg3): models are not all fitted to the
## same number of observations
## df AIC
## reg 3 207.9847
## reg2 3 179.6631
## reg3 3 179.8121
reg4<-lm(fossum$age~fossum$skullw+fossum$taill)
reg4
##
## Call:
## lm(formula = fossum$age ~ fossum$skullw + fossum$taill)
##
## Coefficients:
## (Intercept) fossum$skullw fossum$taill
## -6.81773 -0.04653 0.36188
reg5<-lm(fossum$age~fossum$skullw+fossum$footlgth)
reg5
##
## Call:
## lm(formula = fossum$age ~ fossum$skullw + fossum$footlgth)
##
## Coefficients:
## (Intercept) fossum$skullw fossum$footlgth
## -1.02793 0.01589 0.05905
#ecuacion
#70= 1.0 + 0.04*x #es una ecuacion poco creible xq los datos son muy ambiguos
#y es a x
min(fossum$age)
## [1] 1
#multicolinearidad #si el valor es mayor a 4 no sirve, las variables que uso en este modelos on redundantes
#vif depende de la libreria car y se hace para modelos con mas de 1 variable, o aditivos los valores menores a 4 si son buenos para utilizarlos en una reg multiple.
vif(reg5)
## fossum$skullw fossum$footlgth
## 1.0768 1.0768
vif(reg4)
## fossum$skullw fossum$taill
## 1.1523 1.1523
data("appletaste")
str(appletaste)
## 'data.frame': 60 obs. of 3 variables:
## $ aftertaste: num 89 98 108 13 55 104 40 122 148 6 ...
## $ panelist : Factor w/ 20 levels "a","b","c","d",..: 1 1 1 2 2 2 3 3 3 4 ...
## $ product : Factor w/ 4 levels "298","493","649",..: 4 1 2 4 1 2 4 1 2 4 ...
anova<-aov(appletaste$aftertaste~appletaste$product)
anova
## Call:
## aov(formula = appletaste$aftertaste ~ appletaste$product)
##
## Terms:
## appletaste$product Residuals
## Sum of Squares 32892.18 58474.67
## Deg. of Freedom 3 56
##
## Residual standard error: 32.31394
## Estimated effects may be unbalanced
shapiro.test(residuals(anova))# como los datos son normales reviso homocedasticidad con barlett
##
## Shapiro-Wilk normality test
##
## data: residuals(anova)
## W = 0.97595, p-value = 0.2818
bartlett.test(appletaste$aftertaste~appletaste$product) # si son homocedasticos
##
## Bartlett test of homogeneity of variances
##
## data: appletaste$aftertaste by appletaste$product
## Bartlett's K-squared = 3.838, df = 3, p-value = 0.2795
tapply(appletaste$aftertaste,appletaste$product,length)
## 298 493 649 937
## 15 15 15 15
summary(anova)
## Df Sum Sq Mean Sq F value Pr(>F)
## appletaste$product 3 32892 10964 10.5 1.4e-05 ***
## Residuals 56 58475 1044
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(anova)# xq losdatos son balanceados uso tukey
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = appletaste$aftertaste ~ appletaste$product)
##
## $`appletaste$product`
## diff lwr upr p adj
## 493-298 25.73333 -5.510099 56.976765 0.1411423
## 649-298 -16.53333 -47.776765 14.710099 0.5039271
## 937-298 -38.33333 -69.576765 -7.089901 0.0102884
## 649-493 -42.26667 -73.510099 -11.023235 0.0038763
## 937-493 -64.06667 -95.310099 -32.823235 0.0000074
## 937-649 -21.80000 -53.043432 9.443432 0.2624349
#si los datos fueran desbalanceado uso pairwise t test
#ejemplo
pairwise.t.test(appletaste$aftertaste,appletaste$product,p.adjust.method = "none")
##
## Pairwise comparisons using t tests with pooled SD
##
## data: appletaste$aftertaste and appletaste$product
##
## 298 493 649
## 493 0.03341 - -
## 649 0.16667 0.00071 -
## 937 0.00196 1.3e-06 0.06995
##
## P value adjustment method: none
# metodo lsd