Practica Correlacion y regresion

Variable ordinal- Categorias

Variable intervalos- Numerica

Coeficiente de Correlacion de Pearsson

Ambas variables deben ser de una escala numerica y tener una distribucion normal.

Para visualizar un plot de correlacion tengo el comando pairs(x,y ). Para analizar este plot se debe observar que entre mas alto sea el valor de r mas va a tener una tendencia a verse eliptico, y entre mas se acerque la correlacion a 1 o -1 va a ser mas lineal.

peso <- c(51, 59, 49, 54, 50, 55, 48, 53, 52, 57)
long <- c(33.5, 38, 32, 37.5, 31.5, 33, 31, 36.5, 34, 35)
pairs(long ~ peso)

Otra forma de analizar la correlacion es realizar un otro plot que se llama chart.Correlation. En este plot puedo visualizar el nivel de significancia de la correlacion con los asteriscos, entre mas salen mas significativa es.

** Pasos para realizar este plot**

1- Activar el paquete de datos Performance Analytics

2- Hacer un data frame de las variables y darle un nombre

3-aplicar el comando chart.Correlation y visualizar el plot

## Warning: package 'PerformanceAnalytics' was built under R version 3.4.2
## Loading required package: xts
## Warning: package 'xts' was built under R version 3.4.2
## Loading required package: zoo
## Warning: package 'zoo' was built under R version 3.4.2
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## 
## Attaching package: 'PerformanceAnalytics'
## The following object is masked from 'package:graphics':
## 
##     legend
## Warning in plot.window(...): "method" is not a graphical parameter
## Warning in plot.window(...): "exact" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "method" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "exact" is not a graphical parameter
## Warning in title(...): "method" is not a graphical parameter
## Warning in title(...): "exact" is not a graphical parameter
## Warning in plot.window(...): "method" is not a graphical parameter
## Warning in plot.window(...): "exact" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "method" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "exact" is not a graphical parameter
## Warning in title(...): "method" is not a graphical parameter
## Warning in title(...): "exact" is not a graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "method" is
## not a graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "exact" is not
## a graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "method" is
## not a graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "exact" is not
## a graphical parameter
## Warning in plot.window(...): "method" is not a graphical parameter
## Warning in plot.window(...): "exact" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "method" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "exact" is not a graphical parameter
## Warning in title(...): "method" is not a graphical parameter
## Warning in title(...): "exact" is not a graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "method" is
## not a graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "exact" is not
## a graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "method" is
## not a graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "exact" is not
## a graphical parameter
## Warning in plot.xy(xy.coords(x, y), type = type, ...): "method" is not a
## graphical parameter
## Warning in plot.xy(xy.coords(x, y), type = type, ...): "exact" is not a
## graphical parameter
## Warning in plot.window(...): "method" is not a graphical parameter
## Warning in plot.window(...): "exact" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "method" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "exact" is not a graphical parameter
## Warning in title(...): "method" is not a graphical parameter
## Warning in title(...): "exact" is not a graphical parameter

luego de esto se debe revisar la correlacion y la significancia de esto son los siguientes comandos

cor(peso, long)  #me da 
## [1] 0.7794691
cor.test(peso, long)#Significancia y debe de llevar las dos variables a las que queremos analizar 
## 
##  Pearson's product-moment correlation
## 
## data:  peso and long
## t = 3.5194, df = 8, p-value = 0.007853
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.2942560 0.9452104
## sample estimates:
##       cor 
## 0.7794691

En este caso las dos variables son normales por lo tnto se utiliza el metodo de Pearson, si las variables fueran no normales se deberia de usar el metodo no parametrico, que puede ser el de spearman o kendall esto lo debemos de indicar

cor(x, y,method = c(“pearson”, “kendall”, “spearman”))

r2 es el coeficiente de determinacion que se hace elevando al cuadrado r (el % de correlacion) es en base a 10,entonces esto nos da como resultado el % que si esta relacionado y lo otro que hace falta para llegar a 10 es el % que no tiene relacion
0.7794691^2 #r2 determina el & de aceptacion, es en base a 10 por lo que le resta es el % que no aceptacion
## [1] 0.6075721

Practica de la guia

data(women)
women
##    height weight
## 1      58    115
## 2      59    117
## 3      60    120
## 4      61    123
## 5      62    126
## 6      63    129
## 7      64    132
## 8      65    135
## 9      66    139
## 10     67    142
## 11     68    146
## 12     69    150
## 13     70    154
## 14     71    159
## 15     72    164

Observamos que la base de datos tiene dos variables, las cuales debemos revisar si son normales para determinar el metodo ya sea parametrico(pearson) o no parametrico (kendall/Spearman)

shapiro.test(women$height)#normal
## 
##  Shapiro-Wilk normality test
## 
## data:  women$height
## W = 0.96359, p-value = 0.7545
shapiro.test(women$weight)#normal
## 
##  Shapiro-Wilk normality test
## 
## data:  women$weight
## W = 0.96036, p-value = 0.6986
cor(women,method = "p") #coeficiente de correlacion
##           height    weight
## height 1.0000000 0.9954948
## weight 0.9954948 1.0000000
  cor(women$height,women$weight)
## [1] 0.9954948
cor.test(women$height,women$weight)# coeficiente de determinancion
## 
##  Pearson's product-moment correlation
## 
## data:  women$height and women$weight
## t = 37.855, df = 13, p-value = 1.091e-14
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.9860970 0.9985447
## sample estimates:
##       cor 
## 0.9954948

podemos concluir que la el peso esta 99% correlacionado con la altura y lo corroboramos con el plot de pairs que al ser una correlacion de 99 deberia ser bien recta

pairs(women$height~women$weight)

Ya que vimos el plot y corroboramos lo anterior podemos decir que es una correlacion positiva y fuerte tambien podemos hacerlo con el otro plot que se llama chart.Correlation donde tambien podemos ver que es una correlacion positiva y fuerte- fijarse en los asteriscos

library(PerformanceAnalytics)

 dfw<- data.frame(women$height,women$weight)


 chart.Correlation(dfw)
## Warning in plot.window(...): "method" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "method" is not a graphical parameter
## Warning in title(...): "method" is not a graphical parameter
## Warning in plot.window(...): "method" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "method" is not a graphical parameter
## Warning in title(...): "method" is not a graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "method" is
## not a graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "method" is
## not a graphical parameter
## Warning in plot.window(...): "method" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "method" is not a graphical parameter
## Warning in title(...): "method" is not a graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "method" is
## not a graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "method" is
## not a graphical parameter
## Warning in plot.xy(xy.coords(x, y), type = type, ...): "method" is not a
## graphical parameter
## Warning in plot.window(...): "method" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "method" is not a graphical parameter
## Warning in title(...): "method" is not a graphical parameter

Regresion

Debo de tener dos variables, una dependiente -> y y otra independiente -> x

La relacion entre ambas variables debe ser aproximadamente lineal

los residuos deben ser normales, independiente y con varianza cte

peso <- c(61, 60, 78, 62, 66, 60, 54, 84, 68) #el peso depende de la altura
altura <- c(162, 154, 180, 158, 171, 169, 166, 176, 163) #la altura no se ve afectada por el peso 
regresion<-lm(peso~altura)
regresion
## 
## Call:
## lm(formula = peso ~ altura)
## 
## Coefficients:
## (Intercept)       altura  
##    -67.4679       0.8007
summary(regresion)
## 
## Call:
## lm(formula = peso ~ altura)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -11.444  -3.447   1.347   4.164  10.549 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept) -67.4679    51.1995  -1.318    0.229  
## altura        0.8007     0.3071   2.608    0.035 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.268 on 7 degrees of freedom
## Multiple R-squared:  0.4927, Adjusted R-squared:  0.4203 
## F-statistic: 6.799 on 1 and 7 DF,  p-value: 0.03504

Luego de que revise el summary debo interpretar los siguientes valores.

P-value = Significancia ->p-value: 0.03504

Intercepto -> -67.4679

pendiente=valor abajo del intercepto -> 0.8007

Grados de libertad= df -> 7

con estos valores ya puedo crear la ecuacion de la recta y(v.indep) =b(intercepto) + m(pendiente)* x(v.dependiente) en este caso

peso= -67.4679 + 0.8007 * altura

Ahora se debe proceder a revisar los supuestos

shapiro.test(resid(regresion))
## 
##  Shapiro-Wilk normality test
## 
## data:  resid(regresion)
## W = 0.97706, p-value = 0.9475
library(car)
## Warning: package 'car' was built under R version 3.4.2
ncvTest(regresion) 
## Non-constant Variance Score Test 
## Variance formula: ~ fitted.values 
## Chisquare = 0.3545291    Df = 1     p = 0.5515605

*Regresiones Multilpes

Para una r. multiple se debe de tener una base de datos que contenga mas de dos variables independientes.

library(car)
data(Prestige)
Prestige
##                           education income women prestige census type
## gov.administrators            13.11  12351 11.16     68.8   1113 prof
## general.managers              12.26  25879  4.02     69.1   1130 prof
## accountants                   12.77   9271 15.70     63.4   1171 prof
## purchasing.officers           11.42   8865  9.11     56.8   1175 prof
## chemists                      14.62   8403 11.68     73.5   2111 prof
## physicists                    15.64  11030  5.13     77.6   2113 prof
## biologists                    15.09   8258 25.65     72.6   2133 prof
## architects                    15.44  14163  2.69     78.1   2141 prof
## civil.engineers               14.52  11377  1.03     73.1   2143 prof
## mining.engineers              14.64  11023  0.94     68.8   2153 prof
## surveyors                     12.39   5902  1.91     62.0   2161 prof
## draughtsmen                   12.30   7059  7.83     60.0   2163 prof
## computer.programers           13.83   8425 15.33     53.8   2183 prof
## economists                    14.44   8049 57.31     62.2   2311 prof
## psychologists                 14.36   7405 48.28     74.9   2315 prof
## social.workers                14.21   6336 54.77     55.1   2331 prof
## lawyers                       15.77  19263  5.13     82.3   2343 prof
## librarians                    14.15   6112 77.10     58.1   2351 prof
## vocational.counsellors        15.22   9593 34.89     58.3   2391 prof
## ministers                     14.50   4686  4.14     72.8   2511 prof
## university.teachers           15.97  12480 19.59     84.6   2711 prof
## primary.school.teachers       13.62   5648 83.78     59.6   2731 prof
## secondary.school.teachers     15.08   8034 46.80     66.1   2733 prof
## physicians                    15.96  25308 10.56     87.2   3111 prof
## veterinarians                 15.94  14558  4.32     66.7   3115 prof
## osteopaths.chiropractors      14.71  17498  6.91     68.4   3117 prof
## nurses                        12.46   4614 96.12     64.7   3131 prof
## nursing.aides                  9.45   3485 76.14     34.9   3135   bc
## physio.therapsts              13.62   5092 82.66     72.1   3137 prof
## pharmacists                   15.21  10432 24.71     69.3   3151 prof
## medical.technicians           12.79   5180 76.04     67.5   3156   wc
## commercial.artists            11.09   6197 21.03     57.2   3314 prof
## radio.tv.announcers           12.71   7562 11.15     57.6   3337   wc
## athletes                      11.44   8206  8.13     54.1   3373 <NA>
## secretaries                   11.59   4036 97.51     46.0   4111   wc
## typists                       11.49   3148 95.97     41.9   4113   wc
## bookkeepers                   11.32   4348 68.24     49.4   4131   wc
## tellers.cashiers              10.64   2448 91.76     42.3   4133   wc
## computer.operators            11.36   4330 75.92     47.7   4143   wc
## shipping.clerks                9.17   4761 11.37     30.9   4153   wc
## file.clerks                   12.09   3016 83.19     32.7   4161   wc
## receptionsts                  11.04   2901 92.86     38.7   4171   wc
## mail.carriers                  9.22   5511  7.62     36.1   4172   wc
## postal.clerks                 10.07   3739 52.27     37.2   4173   wc
## telephone.operators           10.51   3161 96.14     38.1   4175   wc
## collectors                    11.20   4741 47.06     29.4   4191   wc
## claim.adjustors               11.13   5052 56.10     51.1   4192   wc
## travel.clerks                 11.43   6259 39.17     35.7   4193   wc
## office.clerks                 11.00   4075 63.23     35.6   4197   wc
## sales.supervisors              9.84   7482 17.04     41.5   5130   wc
## commercial.travellers         11.13   8780  3.16     40.2   5133   wc
## sales.clerks                  10.05   2594 67.82     26.5   5137   wc
## newsboys                       9.62    918  7.00     14.8   5143 <NA>
## service.station.attendant      9.93   2370  3.69     23.3   5145   bc
## insurance.agents              11.60   8131 13.09     47.3   5171   wc
## real.estate.salesmen          11.09   6992 24.44     47.1   5172   wc
## buyers                        11.03   7956 23.88     51.1   5191   wc
## firefighters                   9.47   8895  0.00     43.5   6111   bc
## policemen                     10.93   8891  1.65     51.6   6112   bc
## cooks                          7.74   3116 52.00     29.7   6121   bc
## bartenders                     8.50   3930 15.51     20.2   6123   bc
## funeral.directors             10.57   7869  6.01     54.9   6141   bc
## babysitters                    9.46    611 96.53     25.9   6147 <NA>
## launderers                     7.33   3000 69.31     20.8   6162   bc
## janitors                       7.11   3472 33.57     17.3   6191   bc
## elevator.operators             7.58   3582 30.08     20.1   6193   bc
## farmers                        6.84   3643  3.60     44.1   7112 <NA>
## farm.workers                   8.60   1656 27.75     21.5   7182   bc
## rotary.well.drillers           8.88   6860  0.00     35.3   7711   bc
## bakers                         7.54   4199 33.30     38.9   8213   bc
## slaughterers.1                 7.64   5134 17.26     25.2   8215   bc
## slaughterers.2                 7.64   5134 17.26     34.8   8215   bc
## canners                        7.42   1890 72.24     23.2   8221   bc
## textile.weavers                6.69   4443 31.36     33.3   8267   bc
## textile.labourers              6.74   3485 39.48     28.8   8278   bc
## tool.die.makers               10.09   8043  1.50     42.5   8311   bc
## machinists                     8.81   6686  4.28     44.2   8313   bc
## sheet.metal.workers            8.40   6565  2.30     35.9   8333   bc
## welders                        7.92   6477  5.17     41.8   8335   bc
## auto.workers                   8.43   5811 13.62     35.9   8513   bc
## aircraft.workers               8.78   6573  5.78     43.7   8515   bc
## electronic.workers             8.76   3942 74.54     50.8   8534   bc
## radio.tv.repairmen            10.29   5449  2.92     37.2   8537   bc
## sewing.mach.operators          6.38   2847 90.67     28.2   8563   bc
## auto.repairmen                 8.10   5795  0.81     38.1   8581   bc
## aircraft.repairmen            10.10   7716  0.78     50.3   8582   bc
## railway.sectionmen             6.67   4696  0.00     27.3   8715   bc
## electrical.linemen             9.05   8316  1.34     40.9   8731   bc
## electricians                   9.93   7147  0.99     50.2   8733   bc
## construction.foremen           8.24   8880  0.65     51.1   8780   bc
## carpenters                     6.92   5299  0.56     38.9   8781   bc
## masons                         6.60   5959  0.52     36.2   8782   bc
## house.painters                 7.81   4549  2.46     29.9   8785   bc
## plumbers                       8.33   6928  0.61     42.9   8791   bc
## construction.labourers         7.52   3910  1.09     26.5   8798   bc
## pilots                        12.27  14032  0.58     66.1   9111 prof
## train.engineers                8.49   8845  0.00     48.9   9131   bc
## bus.drivers                    7.58   5562  9.47     35.9   9171   bc
## taxi.drivers                   7.93   4224  3.59     25.1   9173   bc
## longshoremen                   8.37   4753  0.00     26.1   9313   bc
## typesetters                   10.00   6462 13.58     42.2   9511   bc
## bookbinders                    8.55   3617 70.87     35.2   9517   bc
head(Prestige) #encabezado
##                     education income women prestige census type
## gov.administrators      13.11  12351 11.16     68.8   1113 prof
## general.managers        12.26  25879  4.02     69.1   1130 prof
## accountants             12.77   9271 15.70     63.4   1171 prof
## purchasing.officers     11.42   8865  9.11     56.8   1175 prof
## chemists                14.62   8403 11.68     73.5   2111 prof
## physicists              15.64  11030  5.13     77.6   2113 prof
names(Prestige) #titulos de las variables
## [1] "education" "income"    "women"     "prestige"  "census"    "type"

Observamos la base de datos y realizamos una regresion multiple con variables independientes

regmult1<-lm(prestige~education+women, data = Prestige)
regmult1
## 
## Call:
## lm(formula = prestige ~ education + women, data = Prestige)
## 
## Coefficients:
## (Intercept)    education        women  
##    -8.75416      5.42780     -0.09305
summary(regmult1)
## 
## Call:
## lm(formula = prestige ~ education + women, data = Prestige)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -28.010  -4.069   1.050   5.027  18.942 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -8.75416    3.54213  -2.471 0.015164 *  
## education    5.42780    0.31612  17.170  < 2e-16 ***
## women       -0.09305    0.02719  -3.422 0.000904 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.652 on 99 degrees of freedom
## Multiple R-squared:  0.7521, Adjusted R-squared:  0.7471 
## F-statistic: 150.2 on 2 and 99 DF,  p-value: < 2.2e-16
shapiro.test(residuals(regmult1)) #normalidad de residuos
## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(regmult1)
## W = 0.98367, p-value = 0.2419
ncvTest(regmult1) #homocedasticidad
## Non-constant Variance Score Test 
## Variance formula: ~ fitted.values 
## Chisquare = 1.068151    Df = 1     p = 0.3013634

ahora hacemos otro modelo de regresion, esta vez le ponemos otra variable mas

regmult2<-lm(prestige~education+women+income,data=Prestige)
regmult2
## 
## Call:
## lm(formula = prestige ~ education + women + income, data = Prestige)
## 
## Coefficients:
## (Intercept)    education        women       income  
##   -6.794334     4.186637    -0.008905     0.001314
summary(regmult2)
## 
## Call:
## lm(formula = prestige ~ education + women + income, data = Prestige)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -19.8246  -5.3332  -0.1364   5.1587  17.5045 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -6.7943342  3.2390886  -2.098   0.0385 *  
## education    4.1866373  0.3887013  10.771  < 2e-16 ***
## women       -0.0089052  0.0304071  -0.293   0.7702    
## income       0.0013136  0.0002778   4.729 7.58e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.846 on 98 degrees of freedom
## Multiple R-squared:  0.7982, Adjusted R-squared:  0.792 
## F-statistic: 129.2 on 3 and 98 DF,  p-value: < 2.2e-16
shapiro.test(residuals(regmult2))
## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(regmult2)
## W = 0.99363, p-value = 0.918
ncvTest(regmult2)
## Non-constant Variance Score Test 
## Variance formula: ~ fitted.values 
## Chisquare = 0.2490143    Df = 1     p = 0.61777

para determinar cual modelo es mejor hago el test de akaike, el mejor modelo va a ser el que

AIC(regmult1,regmult2)
##          df      AIC
## regmult1  4 734.5998
## regmult2  5 715.6358

concluimos que la regresion 1 es un mejor modelo de regresion

*Chi cuadrado (practica teoria)

frecuencia<-c(15,19)



frecuencia
## [1] 15 19
chisq.test(frecuencia) #obtengo el valor de x2,los df y el p-value
## 
##  Chi-squared test for given probabilities
## 
## data:  frecuencia
## X-squared = 0.47059, df = 1, p-value = 0.4927
qchisq(0.95,1) #me da el valor critico tabular
## [1] 3.841459
chisq.test(frecuencia)$expected # me da la frecuencia esperada
## [1] 17 17
capt<-c(12,4)
chisq.test(capt)
## 
##  Chi-squared test for given probabilities
## 
## data:  capt
## X-squared = 4, df = 1, p-value = 0.0455
qchisq(capt,1)
## Warning in qchisq(capt, 1): NaNs produced
## [1] NaN NaN

Debo de comparar el x2 con el valor tabular tambien debo de tener en cuanta el % si es al 98,95 o el que me indiquen. si el valor de pvalue

matris<-matrix(c(50,40,25,45),nrow = 2,dimnames = list(c("fem", "masc"),c("si","no")))
matris
##      si no
## fem  50 25
## masc 40 45
chisq.test(matris)$expected
##           si      no
## fem  42.1875 32.8125
## masc 47.8125 37.1875
prop.table(matris)
##          si      no
## fem  0.3125 0.15625
## masc 0.2500 0.28125
qchisq(0.95,1) #el problema me indica si es al 99% 98% o 95%
## [1] 3.841459
chisq.test(matris)
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  matris
## X-squared = 5.4534, df = 1, p-value = 0.01953
library(car)
data("Chile")
str(Chile)
## 'data.frame':    2700 obs. of  8 variables:
##  $ region    : Factor w/ 5 levels "C","M","N","S",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ population: int  175000 175000 175000 175000 175000 175000 175000 175000 175000 175000 ...
##  $ sex       : Factor w/ 2 levels "F","M": 2 2 1 1 1 1 2 1 1 2 ...
##  $ age       : int  65 29 38 49 23 28 26 24 41 41 ...
##  $ education : Factor w/ 3 levels "P","PS","S": 1 2 1 1 3 1 2 3 1 1 ...
##  $ income    : int  35000 7500 15000 35000 35000 7500 35000 15000 15000 15000 ...
##  $ statusquo : num  1.01 -1.3 1.23 -1.03 -1.1 ...
##  $ vote      : Factor w/ 4 levels "A","N","U","Y": 4 2 4 2 2 2 2 2 3 2 ...
alex<-table(Chile$sex,Chile$region) 
chisq.test(alex)
## 
##  Pearson's Chi-squared test
## 
## data:  alex
## X-squared = 0.80328, df = 4, p-value = 0.938
qchisq(0.95,4)
## [1] 9.487729
chisq.test(alex)$expected
##    
##            C        M        N        S       SA
##   F 306.4444 51.07407 164.4585 366.7119 490.3111
##   M 293.5556 48.92593 157.5415 351.2881 469.6889
tabla<-table(Chile$sex,Chile$vote)
chisq.test(tabla)
## 
##  Pearson's Chi-squared test
## 
## data:  tabla
## X-squared = 70.612, df = 3, p-value = 3.156e-15
qchisq(0.95,3)
## [1] 7.814728
#si el valor calculado es mayor que el tabulado acepto la hip nula, en este caso la acepto xq x2=70.612 y el tabulado es 7.814728 por lo cual se determina que si hay diferencia 

tabla2<-table(Chile$sex,Chile$education)
tabla2
##    
##       P  PS   S
##   F 607 199 566
##   M 500 263 554
chisq.test(tabla2)$expected #lo hago de primero para sabeer si debo hacer fisher en el caso que me de que hayan valores cero
##    
##            P       PS        S
##   F 564.8211 235.7248 571.4541
##   M 542.1789 226.2752 548.5459
chisq.test(tabla2)
## 
##  Pearson's Chi-squared test
## 
## data:  tabla2
## X-squared = 18.219, df = 2, p-value = 0.0001106
qchisq(0.95,2)
## [1] 5.991465
#luego de ver que el valor calculado es mayor que el tabulado establesco que si hay una diferencia establecida


tabla3<-table(Chile$age,Chile$education)
tabla3
##     
##       P PS  S
##   18 11  9 70
##   19 11 13 52
##   20 19 17 42
##   21 19 22 55
##   22 23 23 46
##   23 15 17 43
##   24 20 22 39
##   25 24 18 37
##   26 15 17 31
##   27 20 18 33
##   28 18 16 34
##   29 18 11 31
##   30 20  7 46
##   31 11 10 22
##   32 14 12 31
##   33 20 11 25
##   34 17 12 22
##   35 23 14 20
##   36 25 25 45
##   37 32 17 29
##   38 25 14 26
##   39 29  9 17
##   40 29 18 21
##   41 15  8 11
##   42 21 14 26
##   43 29  5 12
##   44 27 10 13
##   45 35  8 19
##   46 14  4 14
##   47 19  3 14
##   48 28  3 12
##   49 27  4  7
##   50 24  2 13
##   51 14  3  9
##   52 19  2  7
##   53 20  8  8
##   54 24  2  8
##   55 22  3 11
##   56 16  2  6
##   57 21  3 10
##   58 27  3 12
##   59 27  3 12
##   60 36  2 12
##   61 11  1  2
##   62 18  1  8
##   63 14  1  6
##   64 20  1  5
##   65 24  5  9
##   66 20  1  3
##   67 15  3  5
##   68 17  0  6
##   69  8  1  7
##   70 36  4 16
chisq.test(tabla3)$expected
## Warning in chisq.test(tabla3): Chi-squared approximation may be incorrect
##     
##              P        PS         S
##   18 37.031250 15.468750 37.500000
##   19 31.270833 13.062500 31.666667
##   20 32.093750 13.406250 32.500000
##   21 39.500000 16.500000 40.000000
##   22 37.854167 15.812500 38.333333
##   23 30.859375 12.890625 31.250000
##   24 33.328125 13.921875 33.750000
##   25 32.505208 13.578125 32.916667
##   26 25.921875 10.828125 26.250000
##   27 29.213542 12.203125 29.583333
##   28 27.979167 11.687500 28.333333
##   29 24.687500 10.312500 25.000000
##   30 30.036458 12.546875 30.416667
##   31 17.692708  7.390625 17.916667
##   32 23.453125  9.796875 23.750000
##   33 23.041667  9.625000 23.333333
##   34 20.984375  8.765625 21.250000
##   35 23.453125  9.796875 23.750000
##   36 39.088542 16.328125 39.583333
##   37 32.093750 13.406250 32.500000
##   38 26.744792 11.171875 27.083333
##   39 22.630208  9.453125 22.916667
##   40 27.979167 11.687500 28.333333
##   41 13.989583  5.843750 14.166667
##   42 25.098958 10.484375 25.416667
##   43 18.927083  7.906250 19.166667
##   44 20.572917  8.593750 20.833333
##   45 25.510417 10.656250 25.833333
##   46 13.166667  5.500000 13.333333
##   47 14.812500  6.187500 15.000000
##   48 17.692708  7.390625 17.916667
##   49 15.635417  6.531250 15.833333
##   50 16.046875  6.703125 16.250000
##   51 10.697917  4.468750 10.833333
##   52 11.520833  4.812500 11.666667
##   53 14.812500  6.187500 15.000000
##   54 13.989583  5.843750 14.166667
##   55 14.812500  6.187500 15.000000
##   56  9.875000  4.125000 10.000000
##   57 13.989583  5.843750 14.166667
##   58 17.281250  7.218750 17.500000
##   59 17.281250  7.218750 17.500000
##   60 20.572917  8.593750 20.833333
##   61  5.760417  2.406250  5.833333
##   62 11.109375  4.640625 11.250000
##   63  8.640625  3.609375  8.750000
##   64 10.697917  4.468750 10.833333
##   65 15.635417  6.531250 15.833333
##   66  9.875000  4.125000 10.000000
##   67  9.463542  3.953125  9.583333
##   68  9.463542  3.953125  9.583333
##   69  6.583333  2.750000  6.666667
##   70 23.041667  9.625000 23.333333
library(DAAG)
## Warning: package 'DAAG' was built under R version 3.4.4
## Loading required package: lattice
## 
## Attaching package: 'DAAG'
## The following object is masked from 'package:car':
## 
##     vif
data(dengue)

shapiro.test(dengue$humid)
## 
##  Shapiro-Wilk normality test
## 
## data:  dengue$humid
## W = 0.93811, p-value < 2.2e-16
shapiro.test(dengue$temp)
## 
##  Shapiro-Wilk normality test
## 
## data:  dengue$temp
## W = 0.91009, p-value < 2.2e-16
cor.test(dengue$humid,dengue$temp, method = "s")
## Warning in cor.test.default(dengue$humid, dengue$temp, method = "s"):
## Cannot compute exact p-value with ties
## 
##  Spearman's rank correlation rho
## 
## data:  dengue$humid and dengue$temp
## S = 177920000, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##       rho 
## 0.8661581
cor(dengue$humid,dengue$temp,method = "s")
## [1] NA
0.8661581*0.8661581 #con el coef de  determinacion determino el % de explicacion de las variables que estan siendo relacionadass, en este caso el 75%
## [1] 0.7502299
shapiro.test(dengue$trees)
## 
##  Shapiro-Wilk normality test
## 
## data:  dengue$trees
## W = 0.85452, p-value < 2.2e-16
shapiro.test(dengue$temp)
## 
##  Shapiro-Wilk normality test
## 
## data:  dengue$temp
## W = 0.91009, p-value < 2.2e-16
cor.test(dengue$trees,dengue$temp, method = "s") #pvalue mayor a 0.05, no es significativo
## Warning in cor.test.default(dengue$trees, dengue$temp, method = "s"):
## Cannot compute exact p-value with ties
## 
##  Spearman's rank correlation rho
## 
## data:  dengue$trees and dengue$temp
## S = 1252200000, p-value = 0.06868
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##        rho 
## 0.04086041
0.04086041*0.04086041 #es una correlacion positiva pero muy debil
## [1] 0.001669573
data("codling")
str(codling)
## 'data.frame':    99 obs. of  10 variables:
##  $ dose    : num  5 8 12 16 20 24 5 8 12 16 ...
##  $ tot     : num  866 911 906 712 582 ...
##  $ dead    : num  246 220 360 271 414 742 154 168 180 240 ...
##  $ pobs    : num  0.284 0.241 0.397 0.381 0.711 ...
##  $ cm      : num  0.218 0.218 0.218 0.218 0.218 ...
##  $ ct      : num  15.6 20.3 28.6 32.7 45.4 ...
##  $ Cultivar: chr  "ROYAL" "ROYAL" "ROYAL" "ROYAL" ...
##  $ gp      : num  1 1 1 1 1 1 2 2 2 2 ...
##  $ year    : Factor w/ 2 levels "1988","1989": 1 1 1 1 1 1 1 1 1 1 ...
##  $ numcm   : num  1676 1676 1676 1676 1676 ...
shapiro.test(codling$dead)
## 
##  Shapiro-Wilk normality test
## 
## data:  codling$dead
## W = 0.76395, p-value = 2.584e-11
 shapiro.test(codling$ct)
## 
##  Shapiro-Wilk normality test
## 
## data:  codling$ct
## W = 0.96658, p-value = 0.01281
 cor.test(codling$dead,codling$ct,method = "s") #correlacion positiva pero no tan fuerte
## Warning in cor.test.default(codling$dead, codling$ct, method = "s"): Cannot
## compute exact p-value with ties
## 
##  Spearman's rank correlation rho
## 
## data:  codling$dead and codling$ct
## S = 48861, p-value = 1.003e-15
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##       rho 
## 0.6978299
0.6978299*0.6978299 #porcentaje explicdo es de 48% , si me preg % lo pongo en base a 100 si solo me piden el coef de det pongo el rho
## [1] 0.4869666

Regresion

data("fossum")
fossum
##       case site   Pop sex age hdlngth skullw totlngth taill footlgth
## C5       2    1   Vic   f   6    92.5   57.6     91.5  36.5     72.5
## C10      3    1   Vic   f   6    94.0   60.0     95.5  39.0     75.4
## C15      4    1   Vic   f   6    93.2   57.1     92.0  38.0     76.1
## C23      5    1   Vic   f   2    91.5   56.3     85.5  36.0     71.0
## C24      6    1   Vic   f   1    93.1   54.8     90.5  35.5     73.2
## C27      8    1   Vic   f   6    94.8   57.6     91.0  37.0     72.7
## C28      9    1   Vic   f   9    93.4   56.3     91.5  37.0     72.4
## C31     10    1   Vic   f   6    91.8   58.0     89.5  37.5     70.9
## C32     11    1   Vic   f   9    93.3   57.2     89.5  39.0     77.2
## C34     12    1   Vic   f   5    94.9   55.6     92.0  35.5     71.7
## C45     17    1   Vic   f   1    94.7   67.7     89.5  36.5     73.2
## C48     19    1   Vic   f   5    94.4   55.4     90.5  35.0     73.4
## C50     20    1   Vic   f   4    94.8   56.3     89.0  38.0     73.8
## C54     21    1   Vic   f   3    95.9   58.1     96.5  39.5     77.9
## C58     23    1   Vic   f   4    92.5   56.1     89.0  36.0     72.8
## C63     27    1   Vic   f   2    90.5   54.5     85.0  35.0     70.3
## A1      29    1   Vic   f   3    92.8   56.0     88.0  35.0     74.9
## A2      30    1   Vic   f   2    92.1   54.4     84.0  33.5     70.6
## A4      32    1   Vic   f   4    94.3   56.7     94.0  39.0     74.8
## BB17    37    2   Vic   f   2    89.3   54.8     82.5  35.0     71.2
## BB31    39    2   Vic   f   1    84.7   51.5     75.0  34.0     68.7
## BB33    40    2   Vic   f   3    91.0   55.0     84.5  36.0     72.8
## BB36    41    2   Vic   f   5    88.4   57.0     83.0  36.5       NA
## BB40    43    2   Vic   f   2    90.0   55.5     81.0  32.0     72.0
## WW4     50    3 other   f   5    91.6   56.4     88.0  38.0     65.0
## WW5     51    3 other   f   5    95.6   59.6     85.0  36.0     64.0
## WW7     53    3 other   f   3    93.1   58.1     91.0  38.0     67.4
## BR4     57    4 other   f   4    95.1   59.4     93.0  41.0     67.2
## BR7     60    4 other   f   2    91.3   57.7     88.0  39.0     63.1
## CD2     62    5 other   f   3    91.3   58.0     90.5  39.0     65.5
## CD3     63    5 other   f   6    92.0   56.4     88.5  38.0     64.1
## CD4     64    5 other   f   3    96.9   56.5     89.5  38.5     63.0
## CD5     65    5 other   f   5    93.5   57.4     88.5  38.0     68.2
## CD6     66    5 other   f   3    90.4   55.8     86.0  36.5     63.2
## CD10    70    5 other   f   7    91.9   56.4     87.0  38.0     65.4
## BSF1    74    6 other   f   4    88.7   52.0     83.0  38.0     61.5
## BSF9    82    6 other   f   4    86.0   54.0     82.0  36.5     60.7
## BSF10   83    6 other   f   3    90.0   53.8     81.5  36.0     62.0
## BSF13   86    6 other   f   3    88.2   53.2     86.5  38.5     60.3
## BTP3    88    7 other   f   2    89.6   58.0     87.5  38.0     66.7
## BTP15   99    7 other   f   3    93.3   56.2     86.5  38.5     64.8
## BTP19  102    7 other   f   6    92.4   55.0     89.0  38.0     63.5
## BTP21  104    7 other   f   3    93.6   59.9     89.0  40.0     67.6
##       earconch  eye chest belly
## C5        51.2 16.0  28.5  33.0
## C10       51.9 15.5  30.0  34.0
## C15       52.2 15.2  28.0  34.0
## C23       53.2 15.1  28.5  33.0
## C24       53.6 14.2  30.0  32.0
## C27       53.9 14.5  29.0  34.0
## C28       52.9 15.5  28.0  33.0
## C31       53.4 14.4  27.5  32.0
## C32       51.3 14.9  31.0  34.0
## C34       51.0 15.3  28.0  33.0
## C45       53.2 14.7  29.0  31.0
## C48       53.9 15.2  28.0  32.0
## C50       52.4 15.5  27.0  36.0
## C54       52.9 14.2  30.0  40.0
## C58       53.3 15.4  28.0  35.0
## C63       50.8 14.2  23.0  28.0
## A1        51.8 14.0  24.0  32.0
## A2        50.8 14.5  24.5  33.0
## A4        52.0 14.9  28.0  34.0
## BB17      52.0 13.6  28.0  31.5
## BB31      53.4 13.0  25.0  25.0
## BB33      51.4 13.6  27.0  30.0
## BB36      40.3 15.9  27.0  30.5
## BB40      49.4 13.4  29.0  31.0
## WW4       47.2 14.9  28.0  36.0
## WW5       43.9 17.4  28.0  38.5
## WW7       46.0 16.5  26.0  33.5
## BR4       45.3 14.5  31.0  39.0
## BR7       47.0 14.4  26.0  30.0
## CD2       41.3 16.0  27.0  32.0
## CD3       46.3 15.2  25.5  28.5
## CD4       45.1 17.1  25.5  33.0
## CD5       41.7 14.0  29.0  38.5
## CD6       44.2 15.7  26.5  34.0
## CD10      44.1 13.0  27.0  34.0
## BSF1      45.9 14.7  26.0  34.0
## BSF9      42.9 15.4  26.0  32.0
## BSF10     43.3 14.0  25.0  29.0
## BSF13     43.7 13.6  26.0  31.0
## BTP3      43.5 16.0  25.5  31.5
## BTP15     43.8 14.0  28.0  35.0
## BTP19     45.4 13.0  25.0  30.0
## BTP21     46.0 14.8  28.5  33.5
str(fossum)
## 'data.frame':    43 obs. of  14 variables:
##  $ case    : num  2 3 4 5 6 8 9 10 11 12 ...
##  $ site    : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ Pop     : Factor w/ 2 levels "Vic","other": 1 1 1 1 1 1 1 1 1 1 ...
##  $ sex     : Factor w/ 2 levels "f","m": 1 1 1 1 1 1 1 1 1 1 ...
##  $ age     : num  6 6 6 2 1 6 9 6 9 5 ...
##  $ hdlngth : num  92.5 94 93.2 91.5 93.1 94.8 93.4 91.8 93.3 94.9 ...
##  $ skullw  : num  57.6 60 57.1 56.3 54.8 57.6 56.3 58 57.2 55.6 ...
##  $ totlngth: num  91.5 95.5 92 85.5 90.5 91 91.5 89.5 89.5 92 ...
##  $ taill   : num  36.5 39 38 36 35.5 37 37 37.5 39 35.5 ...
##  $ footlgth: num  72.5 75.4 76.1 71 73.2 72.7 72.4 70.9 77.2 71.7 ...
##  $ earconch: num  51.2 51.9 52.2 53.2 53.6 53.9 52.9 53.4 51.3 51 ...
##  $ eye     : num  16 15.5 15.2 15.1 14.2 14.5 15.5 14.4 14.9 15.3 ...
##  $ chest   : num  28.5 30 28 28.5 30 29 28 27.5 31 28 ...
##  $ belly   : num  33 34 34 33 32 34 33 32 34 33 ...
reg<-lm(fossum$skullw~fossum$age)
reg
## 
## Call:
## lm(formula = fossum$skullw ~ fossum$age)
## 
## Coefficients:
## (Intercept)   fossum$age  
##    56.26097      0.08233
#revisar supuestos
shapiro.test(residuals(reg)) #es normal si es mayor a 0,05
## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(reg)
## W = 0.85065, p-value = 5.422e-05
ncv.test(reg) #es hoodedastico es es mayor a 0,05
## Warning: 'ncv.test' is deprecated.
## Use 'ncvTest' instead.
## See help("Deprecated") and help("car-deprecated").
## Non-constant Variance Score Test 
## Variance formula: ~ fitted.values 
## Chisquare = 17.59019    Df = 1     p = 2.739975e-05
reg2<-lm(fossum$age~fossum$taill)
reg2
## 
## Call:
## lm(formula = fossum$age ~ fossum$taill)
## 
## Coefficients:
##  (Intercept)  fossum$taill  
##      -8.5702        0.3382
shapiro.test(residuals(reg2)) #es normal si es mayor a 0,05
## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(reg2)
## W = 0.94219, p-value = 0.03111
ncv.test(reg2)
## Warning: 'ncv.test' is deprecated.
## Use 'ncvTest' instead.
## See help("Deprecated") and help("car-deprecated").
## Non-constant Variance Score Test 
## Variance formula: ~ fitted.values 
## Chisquare = 1.055169    Df = 1     p = 0.3043196
reg3<-lm(fossum$age~fossum$footlgth)
reg3
## 
## Call:
## lm(formula = fossum$age ~ fossum$footlgth)
## 
## Coefficients:
##     (Intercept)  fossum$footlgth  
##         -0.2842           0.0613
shapiro.test(residuals(reg3))
## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(reg3)
## W = 0.96664, p-value = 0.2536
ncv.test(reg3)
## Warning: 'ncv.test' is deprecated.
## Use 'ncvTest' instead.
## See help("Deprecated") and help("car-deprecated").
## Non-constant Variance Score Test 
## Variance formula: ~ fitted.values 
## Chisquare = 3.69248    Df = 1     p = 0.0546583
AIC(reg,reg2,reg3) #el mejor modelo es el #2
## Warning in AIC.default(reg, reg2, reg3): models are not all fitted to the
## same number of observations
##      df      AIC
## reg   3 207.9847
## reg2  3 179.6631
## reg3  3 179.8121
reg4<-lm(fossum$age~fossum$skullw+fossum$taill)
reg4
## 
## Call:
## lm(formula = fossum$age ~ fossum$skullw + fossum$taill)
## 
## Coefficients:
##   (Intercept)  fossum$skullw   fossum$taill  
##      -6.81773       -0.04653        0.36188
reg5<-lm(fossum$age~fossum$skullw+fossum$footlgth)
reg5
## 
## Call:
## lm(formula = fossum$age ~ fossum$skullw + fossum$footlgth)
## 
## Coefficients:
##     (Intercept)    fossum$skullw  fossum$footlgth  
##        -1.02793          0.01589          0.05905
#ecuacion 

#70=  1.0 +   0.04*x #es una ecuacion poco creible xq los datos son muy ambiguos




#y es a x 
min(fossum$age)
## [1] 1
#multicolinearidad #si el valor es mayor a 4 no sirve, las variables que uso en este modelos on redundantes
#vif depende de la libreria car y se hace para modelos con mas de 1 variable, o aditivos los valores menores a 4 si son buenos para utilizarlos en una reg multiple.


vif(reg5)
##   fossum$skullw fossum$footlgth 
##          1.0768          1.0768
vif(reg4)
## fossum$skullw  fossum$taill 
##        1.1523        1.1523

anovas

data("appletaste")
str(appletaste)
## 'data.frame':    60 obs. of  3 variables:
##  $ aftertaste: num  89 98 108 13 55 104 40 122 148 6 ...
##  $ panelist  : Factor w/ 20 levels "a","b","c","d",..: 1 1 1 2 2 2 3 3 3 4 ...
##  $ product   : Factor w/ 4 levels "298","493","649",..: 4 1 2 4 1 2 4 1 2 4 ...
anova<-aov(appletaste$aftertaste~appletaste$product)
anova
## Call:
##    aov(formula = appletaste$aftertaste ~ appletaste$product)
## 
## Terms:
##                 appletaste$product Residuals
## Sum of Squares            32892.18  58474.67
## Deg. of Freedom                  3        56
## 
## Residual standard error: 32.31394
## Estimated effects may be unbalanced
shapiro.test(residuals(anova))# como los datos son normales reviso homocedasticidad con barlett
## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(anova)
## W = 0.97595, p-value = 0.2818
bartlett.test(appletaste$aftertaste~appletaste$product) # si son homocedasticos
## 
##  Bartlett test of homogeneity of variances
## 
## data:  appletaste$aftertaste by appletaste$product
## Bartlett's K-squared = 3.838, df = 3, p-value = 0.2795
tapply(appletaste$aftertaste,appletaste$product,length)
## 298 493 649 937 
##  15  15  15  15
summary(anova)
##                    Df Sum Sq Mean Sq F value  Pr(>F)    
## appletaste$product  3  32892   10964    10.5 1.4e-05 ***
## Residuals          56  58475    1044                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(anova)# xq losdatos son balanceados uso tukey
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = appletaste$aftertaste ~ appletaste$product)
## 
## $`appletaste$product`
##              diff        lwr        upr     p adj
## 493-298  25.73333  -5.510099  56.976765 0.1411423
## 649-298 -16.53333 -47.776765  14.710099 0.5039271
## 937-298 -38.33333 -69.576765  -7.089901 0.0102884
## 649-493 -42.26667 -73.510099 -11.023235 0.0038763
## 937-493 -64.06667 -95.310099 -32.823235 0.0000074
## 937-649 -21.80000 -53.043432   9.443432 0.2624349
#si los datos fueran desbalanceado uso pairwise t test 
#ejemplo 
pairwise.t.test(appletaste$aftertaste,appletaste$product,p.adjust.method = "none")
## 
##  Pairwise comparisons using t tests with pooled SD 
## 
## data:  appletaste$aftertaste and appletaste$product 
## 
##     298     493     649    
## 493 0.03341 -       -      
## 649 0.16667 0.00071 -      
## 937 0.00196 1.3e-06 0.06995
## 
## P value adjustment method: none
             #  metodo lsd