Análisis de datos con R
Integrantes
Lady Tatiana Parra Vargas 63151164, Karol Ferreira Quevedo 63112019, Katherin Arenas Figueroa 63141105
DATA FRAME
library(knitr)
## Warning: package 'knitr' was built under R version 3.5.2
kable(df1<-data.frame(miedad=c(23,22,18,20,25),peso=c(45,56,47,67,80), estatura=c(150,161,155,172,182)))
| 23 |
45 |
150 |
| 22 |
56 |
161 |
| 18 |
47 |
155 |
| 20 |
67 |
172 |
| 25 |
80 |
182 |
df1
is.list(df1)
## [1] TRUE
- Cree el siguiente data.frame df1:
a.compare df1$miedad, df1[,1] y df1[“miedad”]
kable(df1$miedad)
kable(df1[,1])
kable(df1["miedad"])
- extraiga el tercer peso.
str(df1)
## 'data.frame': 5 obs. of 3 variables:
## $ miedad : num 23 22 18 20 25
## $ peso : num 45 56 47 67 80
## $ estatura: num 150 161 155 172 182
mean(df1$estatura)
## [1] 164
- extraiga la úlltima fila.
kable(df1[3,2])
- extraiga las dos primeras columnas.
kable(df1[,c(1,2)])
| 23 |
45 |
| 22 |
56 |
| 18 |
47 |
| 20 |
67 |
| 25 |
80 |
- cree un dataframe df2 con las dos primeras columnas de df1.
kable(df2<-df1[,c(1,2)])
| 23 |
45 |
| 22 |
56 |
| 18 |
47 |
| 20 |
67 |
| 25 |
80 |
df2
- cree un dataframe df3 con las tres últimas filas de df1.
kable(df3<-df1[3:5,])
| 3 |
18 |
47 |
155 |
| 4 |
20 |
67 |
172 |
| 5 |
25 |
80 |
182 |
df3
- La función apply(x, MARGEN, FUNCION) calcula una función a las filas o columnas de un dataframe o matriz. Cuando MARGEN es 1 a las filas y cuando es 2 a las columnas. Calcule la media y la desviación estándar de las columnas de df1 del inciso anterior.
kable(df1 <- data.frame(miedad=c(23,22,18,20,25), peso=c(45,56,47,67,80), estatura=c(150,161,155,172,182)))
| 23 |
45 |
150 |
| 22 |
56 |
161 |
| 18 |
47 |
155 |
| 20 |
67 |
172 |
| 25 |
80 |
182 |
df1
kable(apply(df1,2,mean))
| miedad |
21.6 |
| peso |
59.0 |
| estatura |
164.0 |
el minimo y maximo de las variable:
kable(apply(df1,2,min))
| miedad |
18 |
| peso |
45 |
| estatura |
150 |
kable(apply(df1,2,max))
| miedad |
25 |
| peso |
80 |
| estatura |
182 |
| 3. El conju |
nto de datos primes del paquete UsingR contiene los primos de 1 a 2003. |
- ¿Cuantos primos tiene primes?
library(UsingR)
## Warning: package 'UsingR' was built under R version 3.5.3
## Loading required package: MASS
## Warning: package 'MASS' was built under R version 3.5.2
## Loading required package: HistData
## Warning: package 'HistData' was built under R version 3.5.3
## Loading required package: Hmisc
## Warning: package 'Hmisc' was built under R version 3.5.3
## Loading required package: lattice
## Warning: package 'lattice' was built under R version 3.5.2
## Loading required package: survival
## Warning: package 'survival' was built under R version 3.5.2
## Loading required package: Formula
## Warning: package 'Formula' was built under R version 3.5.2
## Loading required package: ggplot2
## Warning: package 'ggplot2' was built under R version 3.5.2
##
## Attaching package: 'Hmisc'
## The following objects are masked from 'package:base':
##
## format.pval, units
##
## Attaching package: 'UsingR'
## The following object is masked from 'package:survival':
##
## cancer
length(primes)
## [1] 304
- ¿Cuántos son menores a 100?
subset(primes,primes<100)
## [1] 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83
## [24] 89 97
- Los primos amigos son los enteros p, p + 2 con p primo. ¿Cuántos primos amigos hay en primes?
length(primes[diff(primes)==2])
## [1] 61
- Considere el conjunto airquality
kable(head(airquality))
| 41 |
190 |
7.4 |
67 |
5 |
1 |
| 36 |
118 |
8.0 |
72 |
5 |
2 |
| 12 |
149 |
12.6 |
74 |
5 |
3 |
| 18 |
313 |
11.5 |
62 |
5 |
4 |
| NA |
NA |
14.3 |
56 |
5 |
5 |
| 28 |
NA |
14.9 |
66 |
5 |
6 |
str(airquality)
## 'data.frame': 153 obs. of 6 variables:
## $ Ozone : int 41 36 12 18 NA 28 23 19 8 NA ...
## $ Solar.R: int 190 118 149 313 NA NA 299 99 19 194 ...
## $ Wind : num 7.4 8 12.6 11.5 14.3 14.9 8.6 13.8 20.1 8.6 ...
## $ Temp : int 67 72 74 62 56 66 65 59 61 69 ...
## $ Month : int 5 5 5 5 5 5 5 5 5 5 ...
## $ Day : int 1 2 3 4 5 6 7 8 9 10 ...
dim(airquality)
## [1] 153 6
Estudie el resultado del siguiente código y explique el uso de la función subset :
kable(subset(airquality, Temp > 80, select = c(Ozone, Temp)))
| 29 |
45 |
81 |
| 35 |
NA |
84 |
| 36 |
NA |
85 |
| 38 |
29 |
82 |
| 39 |
NA |
87 |
| 40 |
71 |
90 |
| 41 |
39 |
87 |
| 42 |
NA |
93 |
| 43 |
NA |
92 |
| 44 |
23 |
82 |
| 61 |
NA |
83 |
| 62 |
135 |
84 |
| 63 |
49 |
85 |
| 64 |
32 |
81 |
| 65 |
NA |
84 |
| 66 |
64 |
83 |
| 67 |
40 |
83 |
| 68 |
77 |
88 |
| 69 |
97 |
92 |
| 70 |
97 |
92 |
| 71 |
85 |
89 |
| 72 |
NA |
82 |
| 74 |
27 |
81 |
| 75 |
NA |
91 |
| 77 |
48 |
81 |
| 78 |
35 |
82 |
| 79 |
61 |
84 |
| 80 |
79 |
87 |
| 81 |
63 |
85 |
| 83 |
NA |
81 |
| 84 |
NA |
82 |
| 85 |
80 |
86 |
| 86 |
108 |
85 |
| 87 |
20 |
82 |
| 88 |
52 |
86 |
| 89 |
82 |
88 |
| 90 |
50 |
86 |
| 91 |
64 |
83 |
| 92 |
59 |
81 |
| 93 |
39 |
81 |
| 94 |
9 |
81 |
| 95 |
16 |
82 |
| 96 |
78 |
86 |
| 97 |
35 |
85 |
| 98 |
66 |
87 |
| 99 |
122 |
89 |
| 100 |
89 |
90 |
| 101 |
110 |
90 |
| 102 |
NA |
92 |
| 103 |
NA |
86 |
| 104 |
44 |
86 |
| 105 |
28 |
82 |
| 117 |
168 |
81 |
| 118 |
73 |
86 |
| 119 |
NA |
88 |
| 120 |
76 |
97 |
| 121 |
118 |
94 |
| 122 |
84 |
96 |
| 123 |
85 |
94 |
| 124 |
96 |
91 |
| 125 |
78 |
92 |
| 126 |
73 |
93 |
| 127 |
91 |
93 |
| 128 |
47 |
87 |
| 129 |
32 |
84 |
| 134 |
44 |
81 |
| 143 |
16 |
82 |
| 146 |
36 |
81 |
La función subset se encarga de extraer los valores que sean filtrados de acuerdo a los requerimientos, en este caso dentro de las variables Ozono y Temperarura la función Subset se encarga de extraer del vector(conjunto) aquellas temperaturas que esten por encima de 80.
kable(subset(airquality, Day == 1, select = -Temp))
| 1 |
41 |
190 |
7.4 |
5 |
1 |
| 32 |
NA |
286 |
8.6 |
6 |
1 |
| 62 |
135 |
269 |
4.1 |
7 |
1 |
| 93 |
39 |
83 |
6.9 |
8 |
1 |
| 124 |
96 |
167 |
6.9 |
9 |
1 |
| En est |
e caso l |
a función |
extrae |
todos lo |
s datos requeridos, es decir, los datos del primer día sin contar los otros y quitando la temperatura. |
kable(subset(airquality, select = Ozone:Wind))
| 41 |
190 |
7.4 |
| 36 |
118 |
8.0 |
| 12 |
149 |
12.6 |
| 18 |
313 |
11.5 |
| NA |
NA |
14.3 |
| 28 |
NA |
14.9 |
| 23 |
299 |
8.6 |
| 19 |
99 |
13.8 |
| 8 |
19 |
20.1 |
| NA |
194 |
8.6 |
| 7 |
NA |
6.9 |
| 16 |
256 |
9.7 |
| 11 |
290 |
9.2 |
| 14 |
274 |
10.9 |
| 18 |
65 |
13.2 |
| 14 |
334 |
11.5 |
| 34 |
307 |
12.0 |
| 6 |
78 |
18.4 |
| 30 |
322 |
11.5 |
| 11 |
44 |
9.7 |
| 1 |
8 |
9.7 |
| 11 |
320 |
16.6 |
| 4 |
25 |
9.7 |
| 32 |
92 |
12.0 |
| NA |
66 |
16.6 |
| NA |
266 |
14.9 |
| NA |
NA |
8.0 |
| 23 |
13 |
12.0 |
| 45 |
252 |
14.9 |
| 115 |
223 |
5.7 |
| 37 |
279 |
7.4 |
| NA |
286 |
8.6 |
| NA |
287 |
9.7 |
| NA |
242 |
16.1 |
| NA |
186 |
9.2 |
| NA |
220 |
8.6 |
| NA |
264 |
14.3 |
| 29 |
127 |
9.7 |
| NA |
273 |
6.9 |
| 71 |
291 |
13.8 |
| 39 |
323 |
11.5 |
| NA |
259 |
10.9 |
| NA |
250 |
9.2 |
| 23 |
148 |
8.0 |
| NA |
332 |
13.8 |
| NA |
322 |
11.5 |
| 21 |
191 |
14.9 |
| 37 |
284 |
20.7 |
| 20 |
37 |
9.2 |
| 12 |
120 |
11.5 |
| 13 |
137 |
10.3 |
| NA |
150 |
6.3 |
| NA |
59 |
1.7 |
| NA |
91 |
4.6 |
| NA |
250 |
6.3 |
| NA |
135 |
8.0 |
| NA |
127 |
8.0 |
| NA |
47 |
10.3 |
| NA |
98 |
11.5 |
| NA |
31 |
14.9 |
| NA |
138 |
8.0 |
| 135 |
269 |
4.1 |
| 49 |
248 |
9.2 |
| 32 |
236 |
9.2 |
| NA |
101 |
10.9 |
| 64 |
175 |
4.6 |
| 40 |
314 |
10.9 |
| 77 |
276 |
5.1 |
| 97 |
267 |
6.3 |
| 97 |
272 |
5.7 |
| 85 |
175 |
7.4 |
| NA |
139 |
8.6 |
| 10 |
264 |
14.3 |
| 27 |
175 |
14.9 |
| NA |
291 |
14.9 |
| 7 |
48 |
14.3 |
| 48 |
260 |
6.9 |
| 35 |
274 |
10.3 |
| 61 |
285 |
6.3 |
| 79 |
187 |
5.1 |
| 63 |
220 |
11.5 |
| 16 |
7 |
6.9 |
| NA |
258 |
9.7 |
| NA |
295 |
11.5 |
| 80 |
294 |
8.6 |
| 108 |
223 |
8.0 |
| 20 |
81 |
8.6 |
| 52 |
82 |
12.0 |
| 82 |
213 |
7.4 |
| 50 |
275 |
7.4 |
| 64 |
253 |
7.4 |
| 59 |
254 |
9.2 |
| 39 |
83 |
6.9 |
| 9 |
24 |
13.8 |
| 16 |
77 |
7.4 |
| 78 |
NA |
6.9 |
| 35 |
NA |
7.4 |
| 66 |
NA |
4.6 |
| 122 |
255 |
4.0 |
| 89 |
229 |
10.3 |
| 110 |
207 |
8.0 |
| NA |
222 |
8.6 |
| NA |
137 |
11.5 |
| 44 |
192 |
11.5 |
| 28 |
273 |
11.5 |
| 65 |
157 |
9.7 |
| NA |
64 |
11.5 |
| 22 |
71 |
10.3 |
| 59 |
51 |
6.3 |
| 23 |
115 |
7.4 |
| 31 |
244 |
10.9 |
| 44 |
190 |
10.3 |
| 21 |
259 |
15.5 |
| 9 |
36 |
14.3 |
| NA |
255 |
12.6 |
| 45 |
212 |
9.7 |
| 168 |
238 |
3.4 |
| 73 |
215 |
8.0 |
| NA |
153 |
5.7 |
| 76 |
203 |
9.7 |
| 118 |
225 |
2.3 |
| 84 |
237 |
6.3 |
| 85 |
188 |
6.3 |
| 96 |
167 |
6.9 |
| 78 |
197 |
5.1 |
| 73 |
183 |
2.8 |
| 91 |
189 |
4.6 |
| 47 |
95 |
7.4 |
| 32 |
92 |
15.5 |
| 20 |
252 |
10.9 |
| 23 |
220 |
10.3 |
| 21 |
230 |
10.9 |
| 24 |
259 |
9.7 |
| 44 |
236 |
14.9 |
| 21 |
259 |
15.5 |
| 28 |
238 |
6.3 |
| 9 |
24 |
10.9 |
| 13 |
112 |
11.5 |
| 46 |
237 |
6.9 |
| 18 |
224 |
13.8 |
| 13 |
27 |
10.3 |
| 24 |
238 |
10.3 |
| 16 |
201 |
8.0 |
| 13 |
238 |
12.6 |
| 23 |
14 |
9.2 |
| 36 |
139 |
10.3 |
| 7 |
49 |
10.3 |
| 14 |
20 |
16.6 |
| 30 |
193 |
6.9 |
| NA |
145 |
13.2 |
| 14 |
191 |
14.3 |
| 18 |
131 |
8.0 |
| 20 |
223 |
11.5 |
| En este |
caso, subs |
et airquality se encarga de extraer todas las variables que esten comprendidas entre ozono, y viento. |
- Con el dataframe mtcars
kable(names(mtcars))
| mpg |
| cyl |
| disp |
| hp |
| drat |
| wt |
| qsec |
| vs |
| am |
| gear |
| carb |
kable(head(mtcars,3))
| Mazda RX4 |
21.0 |
6 |
160 |
110 |
3.90 |
2.620 |
16.46 |
0 |
1 |
4 |
4 |
| Mazda RX4 Wag |
21.0 |
6 |
160 |
110 |
3.90 |
2.875 |
17.02 |
0 |
1 |
4 |
4 |
| Datsun 710 |
22.8 |
4 |
108 |
93 |
3.85 |
2.320 |
18.61 |
1 |
1 |
4 |
1 |
| Extraiga un data |
frame c |
on las |
siguie |
ntes c |
ondicio |
nes: |
|
|
|
|
|
- seleccione las variables cyl y hp para los autos que tienen más de 20 mpg.
kable(subset(mtcars, mpg>20,select =c(cyl,hp)))
| Mazda RX4 |
6 |
110 |
| Mazda RX4 Wag |
6 |
110 |
| Datsun 710 |
4 |
93 |
| Hornet 4 Drive |
6 |
110 |
| Merc 240D |
4 |
62 |
| Merc 230 |
4 |
95 |
| Fiat 128 |
4 |
66 |
| Honda Civic |
4 |
52 |
| Toyota Corolla |
4 |
65 |
| Toyota Corona |
4 |
97 |
| Fiat X1-9 |
4 |
66 |
| Porsche 914-2 |
4 |
91 |
| Lotus Europa |
4 |
113 |
| Volvo 142E |
4 |
109 |
- Para los carros que tienen motor en “V”, seleccione todas las variables menos la variable disp.
kable(subset(mtcars,vs==1,select = -disp))
| Datsun 710 |
22.8 |
4 |
93 |
3.85 |
2.320 |
18.61 |
1 |
1 |
4 |
1 |
| Hornet 4 Drive |
21.4 |
6 |
110 |
3.08 |
3.215 |
19.44 |
1 |
0 |
3 |
1 |
| Valiant |
18.1 |
6 |
105 |
2.76 |
3.460 |
20.22 |
1 |
0 |
3 |
1 |
| Merc 240D |
24.4 |
4 |
62 |
3.69 |
3.190 |
20.00 |
1 |
0 |
4 |
2 |
| Merc 230 |
22.8 |
4 |
95 |
3.92 |
3.150 |
22.90 |
1 |
0 |
4 |
2 |
| Merc 280 |
19.2 |
6 |
123 |
3.92 |
3.440 |
18.30 |
1 |
0 |
4 |
4 |
| Merc 280C |
17.8 |
6 |
123 |
3.92 |
3.440 |
18.90 |
1 |
0 |
4 |
4 |
| Fiat 128 |
32.4 |
4 |
66 |
4.08 |
2.200 |
19.47 |
1 |
1 |
4 |
1 |
| Honda Civic |
30.4 |
4 |
52 |
4.93 |
1.615 |
18.52 |
1 |
1 |
4 |
2 |
| Toyota Corolla |
33.9 |
4 |
65 |
4.22 |
1.835 |
19.90 |
1 |
1 |
4 |
1 |
| Toyota Corona |
21.5 |
4 |
97 |
3.70 |
2.465 |
20.01 |
1 |
0 |
3 |
1 |
| Fiat X1-9 |
27.3 |
4 |
66 |
4.08 |
1.935 |
18.90 |
1 |
1 |
4 |
1 |
| Lotus Europa |
30.4 |
4 |
113 |
3.77 |
1.513 |
16.90 |
1 |
1 |
5 |
2 |
| Volvo 142E |
21.4 |
4 |
109 |
4.11 |
2.780 |
18.60 |
1 |
1 |
4 |
2 |
- seleccione las 4 últimas variables.
tail(mtcars, 4)
- Con el conjunto mtcars, obtenga:
- la tabla de la variable cyl.
kable(subset(mtcars, select = c(cyl)))
| Mazda RX4 |
6 |
| Mazda RX4 Wag |
6 |
| Datsun 710 |
4 |
| Hornet 4 Drive |
6 |
| Hornet Sportabout |
8 |
| Valiant |
6 |
| Duster 360 |
8 |
| Merc 240D |
4 |
| Merc 230 |
4 |
| Merc 280 |
6 |
| Merc 280C |
6 |
| Merc 450SE |
8 |
| Merc 450SL |
8 |
| Merc 450SLC |
8 |
| Cadillac Fleetwood |
8 |
| Lincoln Continental |
8 |
| Chrysler Imperial |
8 |
| Fiat 128 |
4 |
| Honda Civic |
4 |
| Toyota Corolla |
4 |
| Toyota Corona |
4 |
| Dodge Challenger |
8 |
| AMC Javelin |
8 |
| Camaro Z28 |
8 |
| Pontiac Firebird |
8 |
| Fiat X1-9 |
4 |
| Porsche 914-2 |
4 |
| Lotus Europa |
4 |
| Ford Pantera L |
8 |
| Ferrari Dino |
6 |
| Maserati Bora |
8 |
| Volvo 142E |
4 |
- Cuál es la moda de la variable cyl?
ta1<- table(mtcars$cyl)
names(ta1)[which(ta1==max(ta1))]
## [1] "8"
- la media de la variable hp
mean(mtcars$hp)
## [1] 146.6875
- la desviación estandar de las millas por galónn mpg.
sd(mtcars$mpg)
## [1] 6.026948
- el mínimo de todas las variables.
kable(apply(mtcars, 2, min))
| mpg |
10.400 |
| cyl |
4.000 |
| disp |
71.100 |
| hp |
52.000 |
| drat |
2.760 |
| wt |
1.513 |
| qsec |
14.500 |
| vs |
0.000 |
| am |
0.000 |
| gear |
3.000 |
| carb |
1.000 |
- ¿Qué hace la expresión rownames(mtcars)?
rownames(mtcars)
## [1] "Mazda RX4" "Mazda RX4 Wag" "Datsun 710"
## [4] "Hornet 4 Drive" "Hornet Sportabout" "Valiant"
## [7] "Duster 360" "Merc 240D" "Merc 230"
## [10] "Merc 280" "Merc 280C" "Merc 450SE"
## [13] "Merc 450SL" "Merc 450SLC" "Cadillac Fleetwood"
## [16] "Lincoln Continental" "Chrysler Imperial" "Fiat 128"
## [19] "Honda Civic" "Toyota Corolla" "Toyota Corona"
## [22] "Dodge Challenger" "AMC Javelin" "Camaro Z28"
## [25] "Pontiac Firebird" "Fiat X1-9" "Porsche 914-2"
## [28] "Lotus Europa" "Ford Pantera L" "Ferrari Dino"
## [31] "Maserati Bora" "Volvo 142E"
- extraiga los datos del Lotus Europa.
kable(mtcars["Lotus Europa", ])
| Lotus Europa |
30.4 |
4 |
95.1 |
113 |
3.77 |
1.513 |
16.9 |
1 |
1 |
5 |
2 |
- Con el dataframe babynames del paquete babynames, conteste
- ¿Cuál es la dimensión del dataframe?
library(babynames)
## Warning: package 'babynames' was built under R version 3.5.3
length(babynames)
## [1] 5
- ¿Cuántas variables tiene?
dim(babynames)
## [1] 1924665 5
- ¿Qué dimensión tiene un dataframe que solo contiene las mujeres?
subset(babynames, sex == "F")
- ¿Cuántos niños se registraron en el año 2000?
df1<- subset(babynames, sex == "M")
df2<- subset(df1,year=="2000")
sum(df2$n)
## [1] 1962969
- ¿Cuántos registros tienen el nombre Valkyrie?
babynames["Valkyrie",]
- Obtenga una muestra aleatoria de 25 registros.
babynames[sample(nrow(babynames),25),]
- Estudie el código siguiente:
with(mtcars, mpg[cyl == 8 & disp > 350])
## [1] 18.7 14.3 10.4 10.4 14.7 19.2 15.8
mtcars$mpg[mtcars$cyl == 8 & mtcars$disp > 350]
## [1] 18.7 14.3 10.4 10.4 14.7 19.2 15.8
¿Qué hace la función with?
Se puede hacer uso de la variable una vez, ya que abre y cierra el attach, sin cambiar los datos originales, evalua una expresión R en un entorno construido a partir de datos, posiblemente modificando (una copia de) los datos originales.
Agregados.
Cuando se usa la función aggregate las variables en la opción by deben estar en una lista, así se genera sólo una.
attach(mtcars)
## The following object is masked from package:ggplot2:
##
## mpg
ag <- aggregate(mtcars, by=list(cyl), FUN=mean, na.rm=TRUE)
print(ag)
## Group.1 mpg cyl disp hp drat wt qsec
## 1 4 26.66364 4 105.1364 82.63636 4.070909 2.285727 19.13727
## 2 6 19.74286 6 183.3143 122.28571 3.585714 3.117143 17.97714
## 3 8 15.10000 8 353.1000 209.21429 3.229286 3.999214 16.77214
## vs am gear carb
## 1 0.9090909 0.7272727 4.090909 1.545455
## 2 0.5714286 0.4285714 3.857143 3.428571
## 3 0.0000000 0.1428571 3.285714 3.500000
ag1 <- aggregate(mtcars, by=list(cyl,vs), FUN=mean, na.rm=TRUE)
print(ag1)
## Group.1 Group.2 mpg cyl disp hp drat wt qsec
## 1 4 0 26.00000 4 120.30 91.0000 4.430000 2.140000 16.70000
## 2 6 0 20.56667 6 155.00 131.6667 3.806667 2.755000 16.32667
## 3 8 0 15.10000 8 353.10 209.2143 3.229286 3.999214 16.77214
## 4 4 1 26.73000 4 103.62 81.8000 4.035000 2.300300 19.38100
## 5 6 1 19.12500 6 204.55 115.2500 3.420000 3.388750 19.21500
## vs am gear carb
## 1 0 1.0000000 5.000000 2.000000
## 2 0 1.0000000 4.333333 4.666667
## 3 0 0.1428571 3.285714 3.500000
## 4 1 0.7000000 4.000000 1.500000
## 5 1 0.0000000 3.500000 2.500000
- Los conjuntos de datos state.region y state.x77 contienen:
- la región a la que pertenece cada uno de los estados de USA
- información por estado. Una los dos conjuntos en un dataframe y saque la media agregada por regi´on para todas las variables.
cbind(state.region,state.x77)
## state.region Population Income Illiteracy Life Exp Murder
## Alabama 2 3615 3624 2.1 69.05 15.1
## Alaska 4 365 6315 1.5 69.31 11.3
## Arizona 4 2212 4530 1.8 70.55 7.8
## Arkansas 2 2110 3378 1.9 70.66 10.1
## California 4 21198 5114 1.1 71.71 10.3
## Colorado 4 2541 4884 0.7 72.06 6.8
## Connecticut 1 3100 5348 1.1 72.48 3.1
## Delaware 2 579 4809 0.9 70.06 6.2
## Florida 2 8277 4815 1.3 70.66 10.7
## Georgia 2 4931 4091 2.0 68.54 13.9
## Hawaii 4 868 4963 1.9 73.60 6.2
## Idaho 4 813 4119 0.6 71.87 5.3
## Illinois 3 11197 5107 0.9 70.14 10.3
## Indiana 3 5313 4458 0.7 70.88 7.1
## Iowa 3 2861 4628 0.5 72.56 2.3
## Kansas 3 2280 4669 0.6 72.58 4.5
## Kentucky 2 3387 3712 1.6 70.10 10.6
## Louisiana 2 3806 3545 2.8 68.76 13.2
## Maine 1 1058 3694 0.7 70.39 2.7
## Maryland 2 4122 5299 0.9 70.22 8.5
## Massachusetts 1 5814 4755 1.1 71.83 3.3
## Michigan 3 9111 4751 0.9 70.63 11.1
## Minnesota 3 3921 4675 0.6 72.96 2.3
## Mississippi 2 2341 3098 2.4 68.09 12.5
## Missouri 3 4767 4254 0.8 70.69 9.3
## Montana 4 746 4347 0.6 70.56 5.0
## Nebraska 3 1544 4508 0.6 72.60 2.9
## Nevada 4 590 5149 0.5 69.03 11.5
## New Hampshire 1 812 4281 0.7 71.23 3.3
## New Jersey 1 7333 5237 1.1 70.93 5.2
## New Mexico 4 1144 3601 2.2 70.32 9.7
## New York 1 18076 4903 1.4 70.55 10.9
## North Carolina 2 5441 3875 1.8 69.21 11.1
## North Dakota 3 637 5087 0.8 72.78 1.4
## Ohio 3 10735 4561 0.8 70.82 7.4
## Oklahoma 2 2715 3983 1.1 71.42 6.4
## Oregon 4 2284 4660 0.6 72.13 4.2
## Pennsylvania 1 11860 4449 1.0 70.43 6.1
## Rhode Island 1 931 4558 1.3 71.90 2.4
## South Carolina 2 2816 3635 2.3 67.96 11.6
## South Dakota 3 681 4167 0.5 72.08 1.7
## Tennessee 2 4173 3821 1.7 70.11 11.0
## Texas 2 12237 4188 2.2 70.90 12.2
## Utah 4 1203 4022 0.6 72.90 4.5
## Vermont 1 472 3907 0.6 71.64 5.5
## Virginia 2 4981 4701 1.4 70.08 9.5
## Washington 4 3559 4864 0.6 71.72 4.3
## West Virginia 2 1799 3617 1.4 69.48 6.7
## Wisconsin 3 4589 4468 0.7 72.48 3.0
## Wyoming 4 376 4566 0.6 70.29 6.9
## HS Grad Frost Area
## Alabama 41.3 20 50708
## Alaska 66.7 152 566432
## Arizona 58.1 15 113417
## Arkansas 39.9 65 51945
## California 62.6 20 156361
## Colorado 63.9 166 103766
## Connecticut 56.0 139 4862
## Delaware 54.6 103 1982
## Florida 52.6 11 54090
## Georgia 40.6 60 58073
## Hawaii 61.9 0 6425
## Idaho 59.5 126 82677
## Illinois 52.6 127 55748
## Indiana 52.9 122 36097
## Iowa 59.0 140 55941
## Kansas 59.9 114 81787
## Kentucky 38.5 95 39650
## Louisiana 42.2 12 44930
## Maine 54.7 161 30920
## Maryland 52.3 101 9891
## Massachusetts 58.5 103 7826
## Michigan 52.8 125 56817
## Minnesota 57.6 160 79289
## Mississippi 41.0 50 47296
## Missouri 48.8 108 68995
## Montana 59.2 155 145587
## Nebraska 59.3 139 76483
## Nevada 65.2 188 109889
## New Hampshire 57.6 174 9027
## New Jersey 52.5 115 7521
## New Mexico 55.2 120 121412
## New York 52.7 82 47831
## North Carolina 38.5 80 48798
## North Dakota 50.3 186 69273
## Ohio 53.2 124 40975
## Oklahoma 51.6 82 68782
## Oregon 60.0 44 96184
## Pennsylvania 50.2 126 44966
## Rhode Island 46.4 127 1049
## South Carolina 37.8 65 30225
## South Dakota 53.3 172 75955
## Tennessee 41.8 70 41328
## Texas 47.4 35 262134
## Utah 67.3 137 82096
## Vermont 57.1 168 9267
## Virginia 47.8 85 39780
## Washington 63.5 32 66570
## West Virginia 41.6 100 24070
## Wisconsin 54.5 149 54464
## Wyoming 62.9 173 97203
aggregate(state.x77, list(region = state.region), mean)
¿Qué hace la función expand.grid en el código siguiente:
expand.grid(letters[1:2], 1:3, c("+", "-"))
Se crea un marco de datos a partir de todas las combinaciones de los vectores o factores proporcionados,que contiene una fila para cada combinación de los factores suministrados. Los primeros factores varían más rápido. Las columnas están etiquetadas por los factores si se suministran como argumentos con nombre o componentes, con nombre de una lista. Los nombres de las filas son “automáticos”. La conversión a un factor se realiza con niveles en el orden en que aparecen en los vectores de caracteres (y no alfabéticamente, como es más común cuando se convierten a factores).