Nombres:
df1 <- data.frame(miedad=c(23,22,18,20,25),
peso=c(45,56,47,67,80),
estatura=c(150,161,155,172,182))
df1
## miedad peso estatura
## 1 23 45 150
## 2 22 56 161
## 3 18 47 155
## 4 20 67 172
## 5 25 80 182
Un data frame crea una estructura en forma de tabla, cada columna tiene los valores de una variable y cada fila tiene un conjunto de valores para cada columna.
df1$miedad
## [1] 23 22 18 20 25
df1[,1]
## [1] 23 22 18 20 25
df1["miedad"]
## miedad
## 1 23
## 2 22
## 3 18
## 4 20
## 5 25
str(df1)
## 'data.frame': 5 obs. of 3 variables:
## $ miedad : num 23 22 18 20 25
## $ peso : num 45 56 47 67 80
## $ estatura: num 150 161 155 172 182
Estos comandos permiten extraer del data frame los datos que necesitemos, en este caso la edad.
df1[3,2]
## [1] 47
df1[nrow(df1),]
## miedad peso estatura
## 5 25 80 182
df1[,c(1,2)]
## miedad peso
## 1 23 45
## 2 22 56
## 3 18 47
## 4 20 67
## 5 25 80
df2 <- df1[,c(1,2)]
df2
## miedad peso
## 1 23 45
## 2 22 56
## 3 18 47
## 4 20 67
## 5 25 80
df3 <- df1[(nrow(df1)-2):nrow(df1),]
df3
## miedad peso estatura
## 3 18 47 155
## 4 20 67 172
## 5 25 80 182
apply(df1,2,mean)
## miedad peso estatura
## 21.6 59.0 164.0
apply(df1,2,sd)
## miedad peso estatura
## 2.701851 14.611639 12.980755
library(UsingR)
## Loading required package: MASS
## Loading required package: HistData
## Loading required package: Hmisc
## Loading required package: lattice
## Loading required package: survival
## Loading required package: Formula
## Loading required package: ggplot2
##
## Attaching package: 'Hmisc'
## The following objects are masked from 'package:base':
##
## format.pval, units
##
## Attaching package: 'UsingR'
## The following object is masked from 'package:survival':
##
## cancer
length(primes)
## [1] 304
length(primes[primes<100])
## [1] 25
length(primes[diff(primes)==2])
## [1] 61
head(airquality)
## Ozone Solar.R Wind Temp Month Day
## 1 41 190 7.4 67 5 1
## 2 36 118 8.0 72 5 2
## 3 12 149 12.6 74 5 3
## 4 18 313 11.5 62 5 4
## 5 NA NA 14.3 56 5 5
## 6 28 NA 14.9 66 5 6
str(airquality)
## 'data.frame': 153 obs. of 6 variables:
## $ Ozone : int 41 36 12 18 NA 28 23 19 8 NA ...
## $ Solar.R: int 190 118 149 313 NA NA 299 99 19 194 ...
## $ Wind : num 7.4 8 12.6 11.5 14.3 14.9 8.6 13.8 20.1 8.6 ...
## $ Temp : int 67 72 74 62 56 66 65 59 61 69 ...
## $ Month : int 5 5 5 5 5 5 5 5 5 5 ...
## $ Day : int 1 2 3 4 5 6 7 8 9 10 ...
Estudie el resultado del siguiente código y explique el uso de la función subset:
subset(airquality, Temp > 80, select = c(Ozone, Temp))
## Ozone Temp
## 29 45 81
## 35 NA 84
## 36 NA 85
## 38 29 82
## 39 NA 87
## 40 71 90
## 41 39 87
## 42 NA 93
## 43 NA 92
## 44 23 82
## 61 NA 83
## 62 135 84
## 63 49 85
## 64 32 81
## 65 NA 84
## 66 64 83
## 67 40 83
## 68 77 88
## 69 97 92
## 70 97 92
## 71 85 89
## 72 NA 82
## 74 27 81
## 75 NA 91
## 77 48 81
## 78 35 82
## 79 61 84
## 80 79 87
## 81 63 85
## 83 NA 81
## 84 NA 82
## 85 80 86
## 86 108 85
## 87 20 82
## 88 52 86
## 89 82 88
## 90 50 86
## 91 64 83
## 92 59 81
## 93 39 81
## 94 9 81
## 95 16 82
## 96 78 86
## 97 35 85
## 98 66 87
## 99 122 89
## 100 89 90
## 101 110 90
## 102 NA 92
## 103 NA 86
## 104 44 86
## 105 28 82
## 117 168 81
## 118 73 86
## 119 NA 88
## 120 76 97
## 121 118 94
## 122 84 96
## 123 85 94
## 124 96 91
## 125 78 92
## 126 73 93
## 127 91 93
## 128 47 87
## 129 32 84
## 134 44 81
## 143 16 82
## 146 36 81
subset(airquality, Day == 1, select = -Temp)
## Ozone Solar.R Wind Month Day
## 1 41 190 7.4 5 1
## 32 NA 286 8.6 6 1
## 62 135 269 4.1 7 1
## 93 39 83 6.9 8 1
## 124 96 167 6.9 9 1
z<- subset(airquality, select = Ozone:Wind)
La función subset extrae subconjuntos de data frames con datos según las especificaiones que se requieran.
names(mtcars)
## [1] "mpg" "cyl" "disp" "hp" "drat" "wt" "qsec" "vs" "am" "gear"
## [11] "carb"
head(mtcars, 3)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
u=subset(mtcars, mpg>20, select = c(cyl, hp))
uv=subset(mtcars, vs==0, select = -disp)
ux=mtcars[,(length(mtcars)-3):length(mtcars)]
table(mtcars$cyl)
##
## 4 6 8
## 11 7 14
¿Cuál es la moda de la variable cyl? La moda es 8 de acuerdo al punto anterior.
la media de la variable hp.
mean(mtcars$hp)
## [1] 146.6875
sd(mtcars$mpg)
## [1] 6.026948
min(mtcars)
## [1] 0
rownames(mtcars)
## [1] "Mazda RX4" "Mazda RX4 Wag" "Datsun 710"
## [4] "Hornet 4 Drive" "Hornet Sportabout" "Valiant"
## [7] "Duster 360" "Merc 240D" "Merc 230"
## [10] "Merc 280" "Merc 280C" "Merc 450SE"
## [13] "Merc 450SL" "Merc 450SLC" "Cadillac Fleetwood"
## [16] "Lincoln Continental" "Chrysler Imperial" "Fiat 128"
## [19] "Honda Civic" "Toyota Corolla" "Toyota Corona"
## [22] "Dodge Challenger" "AMC Javelin" "Camaro Z28"
## [25] "Pontiac Firebird" "Fiat X1-9" "Porsche 914-2"
## [28] "Lotus Europa" "Ford Pantera L" "Ferrari Dino"
## [31] "Maserati Bora" "Volvo 142E"
Da los nombres de las filas del data frame.
mtcars["Lotus Europa",]
## mpg cyl disp hp drat wt qsec vs am gear carb
## Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.9 1 1 5 2
library(babynames)
dim(babynames)
## [1] 1924665 5
La dimensión del dataframe es de 1924665 por 5
str(babynames)
## Classes 'tbl_df', 'tbl' and 'data.frame': 1924665 obs. of 5 variables:
## $ year: num 1880 1880 1880 1880 1880 1880 1880 1880 1880 1880 ...
## $ sex : chr "F" "F" "F" "F" ...
## $ name: chr "Mary" "Anna" "Emma" "Elizabeth" ...
## $ n : int 7065 2604 2003 1939 1746 1578 1472 1414 1320 1288 ...
## $ prop: num 0.0724 0.0267 0.0205 0.0199 0.0179 ...
Tiene 5 variables
Women=subset(babynames, sex=="F", select = year:prop)
str(Women)
## Classes 'tbl_df', 'tbl' and 'data.frame': 1138293 obs. of 5 variables:
## $ year: num 1880 1880 1880 1880 1880 1880 1880 1880 1880 1880 ...
## $ sex : chr "F" "F" "F" "F" ...
## $ name: chr "Mary" "Anna" "Emma" "Elizabeth" ...
## $ n : int 7065 2604 2003 1939 1746 1578 1472 1414 1320 1288 ...
## $ prop: num 0.0724 0.0267 0.0205 0.0199 0.0179 ...
Es de 1.138.293 por 5
d)¿cuántos niños se registraron en el 2000?
n2000=subset(babynames, year==2000, select = n)
sum(n2000)
## [1] 3778079
se registraron 3.778.079 nacimientos
nameV=subset(babynames, name=="Valkyrie", select = n)
sum(nameV)
## [1] 382
El nombre Valkyrie presentó 382 registros
sample(babynames,25,replace=TRUE,prob = NULL)
## # A tibble: 1,924,665 x 25
## name year sex sex prop year n n name prop sex
## <chr> <dbl> <chr> <chr> <dbl> <dbl> <int> <int> <chr> <dbl> <chr>
## 1 Mary 1880 F F 0.0724 1880 7065 7065 Mary 0.0724 F
## 2 Anna 1880 F F 0.0267 1880 2604 2604 Anna 0.0267 F
## 3 Emma 1880 F F 0.0205 1880 2003 2003 Emma 0.0205 F
## 4 Eliz~ 1880 F F 0.0199 1880 1939 1939 Eliz~ 0.0199 F
## 5 Minn~ 1880 F F 0.0179 1880 1746 1746 Minn~ 0.0179 F
## 6 Marg~ 1880 F F 0.0162 1880 1578 1578 Marg~ 0.0162 F
## 7 Ida 1880 F F 0.0151 1880 1472 1472 Ida 0.0151 F
## 8 Alice 1880 F F 0.0145 1880 1414 1414 Alice 0.0145 F
## 9 Bert~ 1880 F F 0.0135 1880 1320 1320 Bert~ 0.0135 F
## 10 Sarah 1880 F F 0.0132 1880 1288 1288 Sarah 0.0132 F
## # ... with 1,924,655 more rows, and 14 more variables: n <int>,
## # name <chr>, prop <dbl>, prop <dbl>, sex <chr>, sex <chr>, name <chr>,
## # n <int>, year <dbl>, year <dbl>, year <dbl>, name <chr>, sex <chr>,
## # year <dbl>
with(mtcars, mpg[cyl==8 & disp> 350])
## [1] 18.7 14.3 10.4 10.4 14.7 19.2 15.8
mtcars$mpg[mtcars$cyl==8 & mtcars$disp>350]
## [1] 18.7 14.3 10.4 10.4 14.7 19.2 15.8
Extrae los datos de la variable especificada que se asocian a las variables que cumplen con las condiciones establecidas
State=data.frame(state.region,state.x77)
aggregate(State, by=list(state.region), FUN=mean)
## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA
## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA
## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA
## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA
## Group.1 state.region Population Income Illiteracy Life.Exp
## 1 Northeast NA 5495.111 4570.222 1.000000 71.26444
## 2 South NA 4208.125 4011.938 1.737500 69.70625
## 3 North Central NA 4803.000 4611.083 0.700000 71.76667
## 4 West NA 2915.308 4702.615 1.023077 71.23462
## Murder HS.Grad Frost Area
## 1 4.722222 53.96667 132.7778 18141.00
## 2 10.581250 44.34375 64.6250 54605.12
## 3 5.275000 54.51667 138.8333 62652.00
## 4 7.215385 62.00000 102.1538 134463.00
expand.grid(letters[1:2], 1:3, c("+", "-"))
## Var1 Var2 Var3
## 1 a 1 +
## 2 b 1 +
## 3 a 2 +
## 4 b 2 +
## 5 a 3 +
## 6 b 3 +
## 7 a 1 -
## 8 b 1 -
## 9 a 2 -
## 10 b 2 -
## 11 a 3 -
## 12 b 3 -
Crea un data frame con todas las combinaciones de todos los factores dados