Software para el análisis de datos: LAB1

Ejercicios y casos prácticos con R

Ejercicio 1:

Utilizando las funciones citadas en este Laboratorio, comprobad qué paquetes tenéis instalados en vuestra versión de RStudio e instalad el paquete MASS y el paquete Survival y comprobad la información que contienen.

Buscad información sobre el paquete Rcmdr (R Commander) desde la consola.

Respuesta:

Para comprobar qué paquetes están instalados:

library()

Para instalar los paquetes MASS y Survival:

install.packages(c("MASS", "survival"), repos = "https://cloud.r-project.org/")

## package 'MASS' successfully unpacked and MD5 sums checked

## package 'survival' successfully unpacked and MD5 sums checked
## 
## The downloaded binary packages are in
##  C:\Users\marma\AppData\Local\Temp\RtmpO2GfCr\downloaded_packages

Esta es la información que contiene el paquete MASS:

packageDescription("MASS")

## Package: MASS
## Priority: recommended
## Version: 7.3-65
## Date: 2025-02-19
## Revision: $Rev: 3681 $
## Depends: R (>= 4.4.0), grDevices, graphics, stats, utils
## Imports: methods
## Suggests: lattice, nlme, nnet, survival
## Authors@R: c(person("Brian", "Ripley", role = c("aut", "cre", "cph"),
##         email = "Brian.Ripley@R-project.org"), person("Bill",
##         "Venables", role = c("aut", "cph")), person(c("Douglas", "M."),
##         "Bates", role = "ctb"), person("Kurt", "Hornik", role = "trl",
##         comment = "partial port ca 1998"), person("Albrecht",
##         "Gebhardt", role = "trl", comment = "partial port ca 1998"),
##         person("David", "Firth", role = "ctb", comment = "support
##         functions for polr"))
## Description: Functions and datasets to support Venables and Ripley,
##         "Modern Applied Statistics with S" (4th edition, 2002).
## Title: Support Functions and Datasets for Venables and Ripley's MASS
## LazyData: yes
## ByteCompile: yes
## License: GPL-2 | GPL-3
## URL: http://www.stats.ox.ac.uk/pub/MASS4/
## Contact: <MASS@stats.ox.ac.uk>
## NeedsCompilation: yes
## Packaged: 2025-02-19 08:49:43 UTC; ripley
## Author: Brian Ripley [aut, cre, cph], Bill Venables [aut, cph], Douglas
##         M. Bates [ctb], Kurt Hornik [trl] (partial port ca 1998),
##         Albrecht Gebhardt [trl] (partial port ca 1998), David Firth
##         [ctb] (support functions for polr)
## Maintainer: Brian Ripley <Brian.Ripley@R-project.org>
## Repository: CRAN
## Date/Publication: 2025-02-28 17:44:52 UTC
## Built: R 4.5.1; x86_64-w64-mingw32; 2025-10-06 00:50:00 UTC; windows
## Archs: x64
## RemoteType: standard
## RemotePkgRef: MASS
## RemoteRef: MASS
## RemoteRepos: https://cran.rstudio.com
## RemoteSha: 7.3-65
## 
## -- File: D:/R-4.5.1/library/MASS/Meta/package.rds

Y esta la que tiene el paquete Survival:

packageDescription("survival")

## Title: Survival Analysis
## Priority: recommended
## Package: survival
## Version: 3.8-3
## Date: 2024-12-17
## Depends: R (>= 3.5.0)
## Imports: graphics, Matrix, methods, splines, stats, utils
## LazyData: Yes
## LazyDataCompression: xz
## ByteCompile: Yes
## Authors@R: c(person(c("Terry", "M"), "Therneau",
##         email="therneau.terry@mayo.edu", role=c("aut", "cre")),
##         person("Thomas", "Lumley", role=c("ctb", "trl"),
##         comment="original S->R port and R maintainer until 2009"),
##         person("Atkinson", "Elizabeth", role="ctb"), person("Crowson",
##         "Cynthia", role="ctb"))
## Description: Contains the core survival analysis routines, including
##         definition of Surv objects, Kaplan-Meier and Aalen-Johansen
##         (multi-state) curves, Cox models, and parametric accelerated
##         failure time models.
## License: LGPL (>= 2)
## URL: https://github.com/therneau/survival
## NeedsCompilation: yes
## Packaged: 2024-12-17 16:37:18 UTC; therneau
## Author: Terry M Therneau [aut, cre], Thomas Lumley [ctb, trl] (original
##         S->R port and R maintainer until 2009), Atkinson Elizabeth
##         [ctb], Crowson Cynthia [ctb]
## Maintainer: Terry M Therneau <therneau.terry@mayo.edu>
## Repository: CRAN
## Date/Publication: 2024-12-17 20:20:02 UTC
## Built: R 4.5.1; x86_64-w64-mingw32; 2025-10-06 01:53:08 UTC; windows
## Archs: x64
## 
## -- File: D:/R-4.5.1/library/survival/Meta/package.rds

Para buscar información sobre el paquete Rcmdr desde la consola:

??Rcmdr

## starting httpd help server ... done

Ejercicio 2:

Importad un archivo de texto y buscad un summary() de tres variables que escojáis.

Importo el archivo “survey.txt”, sabiendo que usa tabulador como separador y asegurándome de que los datos numéricos se lean como tal y no como cadenas de texto.

survey <- read.csv(here("datasets", "survey.txt"), sep = "\t", stringsAsFactors = FALSE)
head(survey)

##      Sex Wr.Hnd NW.Hnd W.Hnd    Fold Pulse    Clap Exer Smoke Height      M.I
## 1 Female   18.5   18.0 Right  R on L    92    Left Some Never 173.00   Metric
## 2   Male   19.5   20.5  Left  R on L   104    Left None Regul 177.80 Imperial
## 3   Male   18.0   13.3 Right  L on R    87 Neither None Occas     NA     <NA>
## 4   Male   18.8   18.9 Right  R on L    NA Neither None Never 160.00   Metric
## 5   Male   20.0   20.0 Right Neither    35   Right Some Never 165.00   Metric
## 6 Female   18.0   17.7 Right  L on R    64   Right Some Never 172.72 Imperial
##      Age
## 1 18.250
## 2 17.583
## 3 16.917
## 4 20.333
## 5 23.667
## 6 21.000

De entre todas las variables escojo “Sex”, “Pulse” y “Exer” para hacer un resumen de las mismas:

summary(survey$Sex)

##    Length     Class      Mode 
##       237 character character

summary(survey$Pulse)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   35.00   66.00   72.50   74.15   80.00  104.00      45

summary(survey$Exer)

##    Length     Class      Mode 
##       237 character character

Importad un archivo «.csv» y buscad un fivenum() de dos variables que os parezcan relevantes para el estudio.

leuk <- read.csv(here("datasets", "leuk.csv"))
head(leuk)

##     wbc      ag time
## 1  2300 present   65
## 2   750 present  156
## 3  4300 present  100
## 4  2600 present  134
## 5  6000 present   16
## 6 10500 present  108

Escojo la variable wbc y time para extraer los cinco números:

fivenum(leuk$wbc)

## [1]    750   5300  10500  32000 100000

fivenum(leuk$time)

## [1]   1   4  22  65 156

Ejercicio 3:

A partir del conjunto de datos anorexia del paquete MASS, que corresponden a los datos de cambio de peso de pacientes jóvenes con anorexia, mostrad los tipos de datos que contiene y comprobad si existen valores NA y NULL. Para la variable Treat, transformad los valores «CBT», «Cont» y «FT» en «Cogn Beh Tr», «Contr» y «Fam Tr», respectivamente.

Respuesta:

library("MASS")

## 
## Adjuntando el paquete: 'MASS'

## The following objects are masked _by_ '.GlobalEnv':
## 
##     leuk, survey

data("anorexia") # cargar dataset 
str(anorexia) # tipos de datos

## 'data.frame':    72 obs. of  3 variables:
##  $ Treat : Factor w/ 3 levels "CBT","Cont","FT": 2 2 2 2 2 2 2 2 2 2 ...
##  $ Prewt : num  80.7 89.4 91.8 74 78.1 88.3 87.3 75.1 80.6 78.4 ...
##  $ Postwt: num  80.2 80.1 86.4 86.3 76.1 78.1 75.1 86.7 73.5 84.6 ...

is.na(anorexia) # hay NA?

##    Treat Prewt Postwt
## 1  FALSE FALSE  FALSE
## 2  FALSE FALSE  FALSE
## 3  FALSE FALSE  FALSE
## 4  FALSE FALSE  FALSE
## 5  FALSE FALSE  FALSE
## 6  FALSE FALSE  FALSE
## 7  FALSE FALSE  FALSE
## 8  FALSE FALSE  FALSE
## 9  FALSE FALSE  FALSE
## 10 FALSE FALSE  FALSE
## 11 FALSE FALSE  FALSE
## 12 FALSE FALSE  FALSE
## 13 FALSE FALSE  FALSE
## 14 FALSE FALSE  FALSE
## 15 FALSE FALSE  FALSE
## 16 FALSE FALSE  FALSE
## 17 FALSE FALSE  FALSE
## 18 FALSE FALSE  FALSE
## 19 FALSE FALSE  FALSE
## 20 FALSE FALSE  FALSE
## 21 FALSE FALSE  FALSE
## 22 FALSE FALSE  FALSE
## 23 FALSE FALSE  FALSE
## 24 FALSE FALSE  FALSE
## 25 FALSE FALSE  FALSE
## 26 FALSE FALSE  FALSE
## 27 FALSE FALSE  FALSE
## 28 FALSE FALSE  FALSE
## 29 FALSE FALSE  FALSE
## 30 FALSE FALSE  FALSE
## 31 FALSE FALSE  FALSE
## 32 FALSE FALSE  FALSE
## 33 FALSE FALSE  FALSE
## 34 FALSE FALSE  FALSE
## 35 FALSE FALSE  FALSE
## 36 FALSE FALSE  FALSE
## 37 FALSE FALSE  FALSE
## 38 FALSE FALSE  FALSE
## 39 FALSE FALSE  FALSE
## 40 FALSE FALSE  FALSE
## 41 FALSE FALSE  FALSE
## 42 FALSE FALSE  FALSE
## 43 FALSE FALSE  FALSE
## 44 FALSE FALSE  FALSE
## 45 FALSE FALSE  FALSE
## 46 FALSE FALSE  FALSE
## 47 FALSE FALSE  FALSE
## 48 FALSE FALSE  FALSE
## 49 FALSE FALSE  FALSE
## 50 FALSE FALSE  FALSE
## 51 FALSE FALSE  FALSE
## 52 FALSE FALSE  FALSE
## 53 FALSE FALSE  FALSE
## 54 FALSE FALSE  FALSE
## 55 FALSE FALSE  FALSE
## 56 FALSE FALSE  FALSE
## 57 FALSE FALSE  FALSE
## 58 FALSE FALSE  FALSE
## 59 FALSE FALSE  FALSE
## 60 FALSE FALSE  FALSE
## 61 FALSE FALSE  FALSE
## 62 FALSE FALSE  FALSE
## 63 FALSE FALSE  FALSE
## 64 FALSE FALSE  FALSE
## 65 FALSE FALSE  FALSE
## 66 FALSE FALSE  FALSE
## 67 FALSE FALSE  FALSE
## 68 FALSE FALSE  FALSE
## 69 FALSE FALSE  FALSE
## 70 FALSE FALSE  FALSE
## 71 FALSE FALSE  FALSE
## 72 FALSE FALSE  FALSE

No hay valores NA en la tabla.

is.null(anorexia)

## [1] FALSE

is.NULL() devuelve FALSE porque el objeto “anorexia” existe.

Como Treat es un factor tal y como hemos visto con str(), podemos cambiar sus niveles de la siguiente forma:

levels(anorexia$Treat) <- c("Cogn Beh Tr", "Contr", "Fam Tr")
str(anorexia$Treat)

##  Factor w/ 3 levels "Cogn Beh Tr",..: 2 2 2 2 2 2 2 2 2 2 ...

Al ser un factor se guardarán ordenados.

Ejercicio 4:

Exportad los datos biopsy del paquete MASS a un archivo «.csv.»

data("biopsy")
write.csv(biopsy, here("datasets", "biopsy.csv"))

Exportad los datos melanoma del paquete MASS a archivos de tres diferentes formatos y comprobad que se han creado los diferentes archivos en los formatos y las rutas especificados. Podéis generar una captura de pantalla de su ubicación en la carpeta.

# Exporto el dataset
data("Melanoma")
write.csv(Melanoma, here("datasets", "melanoma.csv"), row.names = FALSE)
write.csv(Melanoma, here("datasets", "melanoma.txt"), row.names = FALSE)
write_xlsx(Melanoma, here("datasets", "melanoma.xlsx"))

# Compruebo que se han creado correctamente, deberían cargarse
melanoma_csv <- read.csv(here("datasets", "melanoma.csv"))
melanoma_txt <- read.csv(here("datasets", "melanoma.txt"))
menlanoma_xlsx <- read_xlsx(here("datasets", "melanoma.xlsx"))

Generad un resumen (summary) de la variable age de melanoma y guardad la salida que os aparece en un documento .doc

summ_age <- capture.output(summary(Melanoma$age))
writeLines(summ_age, here("datasets", "summary_age_melanoma.doc"))

Buscad un data frame en algún repositorio de datos de Biomedicina, descargad un conjunto de datos en «.csv» e importad este fichero a un documento R Markdown usando el código o el menú de importación de RStudio.

diabetes <- read.csv(here("datasets", "diabetes.csv"))
head(diabetes)

##     id chol stab.glu hdl ratio glyhb   location age gender height weight  frame
## 1 1000  203       82  56   3.6  4.31 Buckingham  46 female     62    121 medium
## 2 1001  165       97  24   6.9  4.44 Buckingham  29 female     64    218  large
## 3 1002  228       92  37   6.2  4.64 Buckingham  58 female     61    256  large
## 4 1003   78       93  12   6.5  4.63 Buckingham  67   male     67    119  large
## 5 1005  249       90  28   8.9  7.72 Buckingham  64   male     68    183 medium
## 6 1008  248       94  69   3.6  4.81 Buckingham  34   male     71    190  large
##   bp.1s bp.1d bp.2s bp.2d waist hip time.ppn
## 1   118    59    NA    NA    29  38      720
## 2   112    68    NA    NA    46  48      360
## 3   190    92   185    92    49  57      180
## 4   110    50    NA    NA    33  38      480
## 5   138    80    NA    NA    44  41      300
## 6   132    86    NA    NA    36  42      195

Ejercicio 5:

En el siguiente ejemplo veremos cómo utilizar diferentes operadores sobre el conjunto de datos birthwt, así como también algunas funciones que nos permiten obtener más información de las variables:

¿Cuál es la edad máxima de las madres del conjunto de datos?

data("birthwt")
max(birthwt$age)

## [1] 45

¿Cuál es la edad mínima de las madres del conjunto de datos?

min(birthwt$age)

## [1] 14

¿Cuál es el rango de edad de las madres?

range(birthwt$age)

## [1] 14 45

¿Fumaba la madre cuyo recién nacido era el de menor peso?

birthwt$smoke[birthwt$bwt == min(birthwt$bwt)]

## [1] 1

Sí fumaba

¿Cuánto pesó el recién nacido cuya madre tenía la edad máxima?

birthwt$bwt[birthwt$age == max(birthwt$age)]

## [1] 4990

Listad los pesos de los recién nacidos, cuyas madres visitarán menos de dos veces al médico durante el primer trimestre.

birthwt[birthwt$ftv < 2, c("bwt", "ftv")]

##      bwt ftv
## 85  2523   0
## 87  2557   1
## 89  2600   0
## 91  2622   0
## 92  2637   1
## 93  2637   1
## 94  2663   1
## 95  2665   0
## 96  2722   0
## 97  2733   1
## 98  2751   0
## 100 2769   0
## 101 2769   0
## 102 2778   0
## 104 2807   0
## 105 2821   1
## 108 2836   1
## 109 2863   0
## 112 2877   0
## 113 2906   0
## 115 2920   0
## 116 2920   1
## 117 2920   1
## 118 2948   1
## 119 2948   1
## 120 2977   1
## 121 2977   0
## 125 2922   0
## 127 3033   1
## 130 3062   1
## 131 3062   0
## 132 3062   0
## 133 3062   0
## 135 3090   0
## 137 3090   0
## 138 3100   1
## 139 3104   0
## 140 3132   0
## 142 3175   0
## 143 3175   0
## 144 3203   0
## 145 3203   0
## 146 3203   0
## 147 3225   0
## 148 3225   0
## 150 3232   0
## 151 3234   0
## 154 3260   0
## 155 3274   1
## 160 3317   1
## 162 3317   0
## 164 3331   1
## 166 3374   0
## 167 3374   0
## 168 3402   0
## 169 3416   1
## 172 3444   0
## 173 3459   0
## 174 3460   1
## 175 3473   0
## 176 3544   0
## 177 3487   0
## 179 3544   0
## 180 3572   0
## 181 3572   0
## 182 3586   0
## 183 3600   0
## 184 3614   1
## 185 3614   0
## 187 3629   0
## 188 3637   0
## 189 3643   0
## 190 3651   1
## 191 3651   1
## 192 3651   0
## 193 3651   0
## 195 3699   1
## 196 3728   1
## 197 3756   0
## 199 3770   0
## 200 3770   1
## 201 3770   0
## 202 3790   0
## 203 3799   1
## 204 3827   0
## 208 3884   1
## 210 3912   1
## 211 3940   0
## 212 3941   1
## 213 3941   0
## 214 3969   0
## 216 3997   1
## 217 3997   1
## 218 4054   0
## 219 4054   1
## 220 4111   0
## 223 4174   1
## 224 4238   0
## 225 4593   1
## 226 4990   1
## 4    709   0
## 11  1135   0
## 13  1330   0
## 15  1474   0
## 16  1588   0
## 17  1588   1
## 18  1701   1
## 19  1729   0
## 20  1790   1
## 22  1818   0
## 23  1885   0
## 24  1893   0
## 25  1899   1
## 26  1928   0
## 29  1936   0
## 30  1970   0
## 31  2055   0
## 32  2055   1
## 34  2084   0
## 35  2084   0
## 36  2100   0
## 37  2125   0
## 42  2187   1
## 43  2187   0
## 44  2211   0
## 45  2225   0
## 46  2240   1
## 47  2240   0
## 49  2282   0
## 50  2296   0
## 51  2296   0
## 54  2325   0
## 56  2353   1
## 57  2353   0
## 59  2367   1
## 60  2381   0
## 61  2381   0
## 62  2381   0
## 63  2410   0
## 65  2410   0
## 67  2410   1
## 69  2424   0
## 75  2442   1
## 77  2466   0
## 78  2466   0
## 82  2495   0
## 83  2495   0

Ejercicio 6:

A partir del conjunto de datos anorexia trabajado en apartados anteriores, cread una matriz que tenga como columnas los valores de Prewt y Postwt, y cada fila sean los valores correspondientes para cada posición.

data1 <- c(anorexia$Prewt, anorexia$Postwt)
data2 <- (c(72,2))
matrix(data1, data2)

##       [,1]  [,2]
##  [1,] 80.7  80.2
##  [2,] 89.4  80.1
##  [3,] 91.8  86.4
##  [4,] 74.0  86.3
##  [5,] 78.1  76.1
##  [6,] 88.3  78.1
##  [7,] 87.3  75.1
##  [8,] 75.1  86.7
##  [9,] 80.6  73.5
## [10,] 78.4  84.6
## [11,] 77.6  77.4
## [12,] 88.7  79.5
## [13,] 81.3  89.6
## [14,] 78.1  81.4
## [15,] 70.5  81.8
## [16,] 77.3  77.3
## [17,] 85.2  84.2
## [18,] 86.0  75.4
## [19,] 84.1  79.5
## [20,] 79.7  73.0
## [21,] 85.5  88.3
## [22,] 84.4  84.7
## [23,] 79.6  81.4
## [24,] 77.5  81.2
## [25,] 72.3  88.2
## [26,] 89.0  78.8
## [27,] 80.5  82.2
## [28,] 84.9  85.6
## [29,] 81.5  81.4
## [30,] 82.6  81.9
## [31,] 79.9  76.4
## [32,] 88.7 103.6
## [33,] 94.9  98.4
## [34,] 76.3  93.4
## [35,] 81.0  73.4
## [36,] 80.5  82.1
## [37,] 85.0  96.7
## [38,] 89.2  95.3
## [39,] 81.3  82.4
## [40,] 76.5  72.5
## [41,] 70.0  90.9
## [42,] 80.4  71.3
## [43,] 83.3  85.4
## [44,] 83.0  81.6
## [45,] 87.7  89.1
## [46,] 84.2  83.9
## [47,] 86.4  82.7
## [48,] 76.5  75.7
## [49,] 80.2  82.6
## [50,] 87.8 100.4
## [51,] 83.3  85.2
## [52,] 79.7  83.6
## [53,] 84.5  84.6
## [54,] 80.8  96.2
## [55,] 87.4  86.7
## [56,] 83.8  95.2
## [57,] 83.3  94.3
## [58,] 86.0  91.5
## [59,] 82.5  91.9
## [60,] 86.7 100.3
## [61,] 79.6  76.7
## [62,] 76.9  76.8
## [63,] 94.2 101.6
## [64,] 73.4  94.9
## [65,] 80.5  75.2
## [66,] 81.6  77.8
## [67,] 82.1  95.5
## [68,] 77.6  90.7
## [69,] 83.5  92.5
## [70,] 89.9  93.8
## [71,] 86.0  91.7
## [72,] 87.3  98.0

Ejercicio 7:

Identificador <- c("I1","I2","I3","I4","I5","I6","I7","I8","I9","I10","I11","I12","I13","I14", "I15","I16","I17","I18","I19","I20","I21","I22","I23","I24","I25")   

Edad <- c(23,24,21,22,23,25,26,24,21,22,23,25,26,24,22,21,25,26,24,21,25,27,26,22,29)   

Sexo <-c(1,2,1,1,1,2,2,2,1,2,1,2,2,2,1,1,1,2,2,2,1,2,1,1,2) #1 para mujeres y 2 para hombres    
Peso <- c(76.5,81.2,79.3,59.5,67.3,78.6,67.9,100.2,97.8,56.4,65.4,67.5,87.4,99.7,87.6
 ,93.4,65.4,73.7,85.1,61.2,54.8,103.4,65.8,71.7,85.0)   
Alt <- c(165,154,178,165,164,175,182,165,178,165,158,183,184,164,189,167,182,179,165
 ,158,183,184,189,166,175) #altura en cm    
Fuma <- c("SÍ","NO","SÍ","SÍ","NO","NO","NO","SÍ","SÍ","SÍ","NO","NO","SÍ","SÍ","SÍ",
 "SÍ","NO","NO","SÍ","SÍ","SÍ","NO","SÍ","NO","SÍ") 

Trat_Pulmon <- data.frame(Identificador,Edad,Sexo,Peso,Alt,Fuma)    

Trat_Pulmon

##    Identificador Edad Sexo  Peso Alt Fuma
## 1             I1   23    1  76.5 165   SÍ
## 2             I2   24    2  81.2 154   NO
## 3             I3   21    1  79.3 178   SÍ
## 4             I4   22    1  59.5 165   SÍ
## 5             I5   23    1  67.3 164   NO
## 6             I6   25    2  78.6 175   NO
## 7             I7   26    2  67.9 182   NO
## 8             I8   24    2 100.2 165   SÍ
## 9             I9   21    1  97.8 178   SÍ
## 10           I10   22    2  56.4 165   SÍ
## 11           I11   23    1  65.4 158   NO
## 12           I12   25    2  67.5 183   NO
## 13           I13   26    2  87.4 184   SÍ
## 14           I14   24    2  99.7 164   SÍ
## 15           I15   22    1  87.6 189   SÍ
## 16           I16   21    1  93.4 167   SÍ
## 17           I17   25    1  65.4 182   NO
## 18           I18   26    2  73.7 179   NO
## 19           I19   24    2  85.1 165   SÍ
## 20           I20   21    2  61.2 158   SÍ
## 21           I21   25    1  54.8 183   SÍ
## 22           I22   27    2 103.4 184   NO
## 23           I23   26    1  65.8 189   SÍ
## 24           I24   22    1  71.7 166   NO
## 25           I25   29    2  85.0 175   SÍ

Seleccionad los registros con edad > 22.

Trat_Pulmon_22 <- Trat_Pulmon[Trat_Pulmon$Edad > 22, ]
Trat_Pulmon_22

##    Identificador Edad Sexo  Peso Alt Fuma
## 1             I1   23    1  76.5 165   SÍ
## 2             I2   24    2  81.2 154   NO
## 5             I5   23    1  67.3 164   NO
## 6             I6   25    2  78.6 175   NO
## 7             I7   26    2  67.9 182   NO
## 8             I8   24    2 100.2 165   SÍ
## 11           I11   23    1  65.4 158   NO
## 12           I12   25    2  67.5 183   NO
## 13           I13   26    2  87.4 184   SÍ
## 14           I14   24    2  99.7 164   SÍ
## 17           I17   25    1  65.4 182   NO
## 18           I18   26    2  73.7 179   NO
## 19           I19   24    2  85.1 165   SÍ
## 21           I21   25    1  54.8 183   SÍ
## 22           I22   27    2 103.4 184   NO
## 23           I23   26    1  65.8 189   SÍ
## 25           I25   29    2  85.0 175   SÍ

Seleccionad el elemento 3 de la columna 4 del conjunto de datos (contando el identificador).

elemento <- Trat_Pulmon[3,4]
elemento

## [1] 79.3

Usad el comando subset() para seleccionar todas las filas que tienen una edad menor que 27 años y sin incluir la columna Alt.

Trat_Pulmon_27 <- subset(Trat_Pulmon, Edad < 27, select = c(-Alt))
Trat_Pulmon_27

##    Identificador Edad Sexo  Peso Fuma
## 1             I1   23    1  76.5   SÍ
## 2             I2   24    2  81.2   NO
## 3             I3   21    1  79.3   SÍ
## 4             I4   22    1  59.5   SÍ
## 5             I5   23    1  67.3   NO
## 6             I6   25    2  78.6   NO
## 7             I7   26    2  67.9   NO
## 8             I8   24    2 100.2   SÍ
## 9             I9   21    1  97.8   SÍ
## 10           I10   22    2  56.4   SÍ
## 11           I11   23    1  65.4   NO
## 12           I12   25    2  67.5   NO
## 13           I13   26    2  87.4   SÍ
## 14           I14   24    2  99.7   SÍ
## 15           I15   22    1  87.6   SÍ
## 16           I16   21    1  93.4   SÍ
## 17           I17   25    1  65.4   NO
## 18           I18   26    2  73.7   NO
## 19           I19   24    2  85.1   SÍ
## 20           I20   21    2  61.2   SÍ
## 21           I21   25    1  54.8   SÍ
## 23           I23   26    1  65.8   SÍ
## 24           I24   22    1  71.7   NO

Ejercicio 8:

Incorporad el dataset ChickWeight que contiene información sobre el peso de 578 pollitos en gramos (weight), el tiempo desde la medición al nacer (Time), una variable identificadora de cada pollito (Chick) a partir del rango de peso y una variable factor con el tipo de dieta experimental que cada pollito recibió (Diet).

Incorporad el conjunto de datos ChickWeight del paquete datasets a vuestro entorno de trabajo.

library(datasets)
data("ChickWeight")

Generad un gráfico de dispersión de la variable weight.

plot(ChickWeight$weight, main = "Gráfico de dispersión del peso")

Cread un diagrama de caja con la variable Time.

boxplot(ChickWeight$Time, main = "Boxplot de días hasta medición", ylab = "Days")

Ejercicio 9:

A partir del conjunto de datos anorexia del paquete MASS, cread otro data frame que se llame anorexia_treat_df formado por Treat y por un vector nuevo calculado a partir de la diferencia Prewt-Postwt. De esta manera, nos quedará un data frame que contenga el tipo de tratamiento y el valor del peso ganado o perdido después de haber realizado el tratamiento.
Seleccionad aquellos individuos que han ganado peso después del tratamiento y cread un nuevo conjunto llamado anorexia_treat_C_df que contenga solo los datos de aquellos que han seguido el tratamiento «Cont» y que han ganado peso después del tratamiento.

# Creo el df y cambio el nombre de las columnas
anorexia_dif_peso <- data.frame(anorexia$Treat, anorexia$Postwt - anorexia$Prewt )
colnames(anorexia_dif_peso)[1] <- "Treat"
colnames(anorexia_dif_peso)[2] <- "Prewt-Postwt"
head(anorexia_dif_peso)

##   Treat Prewt-Postwt
## 1 Contr         -0.5
## 2 Contr         -9.3
## 3 Contr         -5.4
## 4 Contr         12.3
## 5 Contr         -2.0
## 6 Contr        -10.2

# Los que han ganado peso:
anorexia_treat_C_df <- subset(anorexia_dif_peso, anorexia_dif_peso$`Prewt-Postwt` > 0 & anorexia_dif_peso$Treat == "Contr")
head(anorexia_treat_C_df)

##    Treat Prewt-Postwt
## 4  Contr         12.3
## 8  Contr         11.6
## 10 Contr          6.2
## 13 Contr          8.3
## 14 Contr          3.3
## 15 Contr         11.3

Caso práctico

Resolved los siguientes apartados:

Cread un conjunto de datos inventado con R. Debe contener treinta observaciones (quince para hombres y quince para mujeres) para seis variables.

set.seed(27)

Id <- as.character(trunc(runif(n = 30, min = 90000, max = 99999)))

Edad <- trunc(runif(n = 30, min = 18, max = 35))

genero <- c(rep("mujer", 15), rep("hombre", 15))
Gene <- factor(genero,
               levels = c("mujer", "hombre"),
               labels = c("mujer", "hombre"))

tratamiento <- trunc(runif(n = 30, min = 1, max = 4))
Trat <- factor(tratamiento, 
               levels = c("1", "2", "3"),
               labels = c("A", "B", "C"))

Peso <- round(runif(n = 30, min = 60, max = 120), digits = 2)

Alt <- round(runif(n = 30, min = 150, max = 190), digits = 2)

caso_df <- data.frame(Id, Edad, Gene, Trat, Peso, Alt)

head(caso_df)

##      Id Edad  Gene Trat   Peso    Alt
## 1 99716   20 mujer    B  68.40 155.18
## 2 90837   26 mujer    B 108.89 179.54
## 3 98737   18 mujer    A  62.34 176.98
## 4 93291   22 mujer    B 117.58 174.31
## 5 92222   20 mujer    C 100.15 165.90
## 6 94016   26 mujer    C  80.13 182.79

Buscad información de vuestro conjunto de datos y de vuestras variables.

str(caso_df)

## 'data.frame':    30 obs. of  6 variables:
##  $ Id  : chr  "99716" "90837" "98737" "93291" ...
##  $ Edad: num  20 26 18 22 20 26 26 28 28 24 ...
##  $ Gene: Factor w/ 2 levels "mujer","hombre": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Trat: Factor w/ 3 levels "A","B","C": 2 2 1 2 3 3 1 2 2 2 ...
##  $ Peso: num  68.4 108.9 62.3 117.6 100.2 ...
##  $ Alt : num  155 180 177 174 166 ...

summary(caso_df$Edad)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   18.00   20.00   26.00   24.77   28.00   33.00

summary(caso_df$Peso)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   60.72   80.25   98.91   95.66  111.63  118.82

summary(caso_df$Alt)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   150.7   159.7   175.5   171.5   181.2   189.8

summary(caso_df$Gene)

##  mujer hombre 
##     15     15

summary(caso_df$Trat)

##  A  B  C 
##  6 14 10

Cread una nueva variable a partir de alguna de las que tengamos. Por ejemplo, podéis calcular el IMC (IMC = peso (kg)/ [estatura (m)]2 e incluid la nueva variable en el conjunto de datos.

caso_df$IMC <- round((caso_df$Peso / (caso_df$Alt/100)^2), digit = 2)

head(caso_df)

##      Id Edad  Gene Trat   Peso    Alt   IMC
## 1 99716   20 mujer    B  68.40 155.18 28.40
## 2 90837   26 mujer    B 108.89 179.54 33.78
## 3 98737   18 mujer    A  62.34 176.98 19.90
## 4 93291   22 mujer    B 117.58 174.31 38.70
## 5 92222   20 mujer    C 100.15 165.90 36.39
## 6 94016   26 mujer    C  80.13 182.79 23.98

Cread dos data frames diferenciados para hombres y mujeres con dos nombres diferentes: Df_Hombres y Df_Mujeres.

Df_Hombres <- subset(caso_df, Gene == "hombre")

head(Df_Hombres)

##       Id Edad   Gene Trat   Peso    Alt   IMC
## 16 97862   33 hombre    B  69.56 179.87 21.50
## 17 99835   26 hombre    B  66.24 160.07 25.85
## 18 90906   20 hombre    A  92.22 152.63 39.59
## 19 95939   18 hombre    C 117.29 178.50 36.81
## 20 90460   28 hombre    C 117.84 182.18 35.51
## 21 90993   19 hombre    A 108.33 164.34 40.11

Df_Mujeres <- subset(caso_df, Gene == "mujer")

head(Df_Mujeres)

##      Id Edad  Gene Trat   Peso    Alt   IMC
## 1 99716   20 mujer    B  68.40 155.18 28.40
## 2 90837   26 mujer    B 108.89 179.54 33.78
## 3 98737   18 mujer    A  62.34 176.98 19.90
## 4 93291   22 mujer    B 117.58 174.31 38.70
## 5 92222   20 mujer    C 100.15 165.90 36.39
## 6 94016   26 mujer    C  80.13 182.79 23.98

Combinad de nuevo los dos ficheros anteriores y cread el primero de nuevo con el comando rbind().

nuevo_caso_df <- rbind(Df_Mujeres, Df_Hombres)

head(nuevo_caso_df)

##      Id Edad  Gene Trat   Peso    Alt   IMC
## 1 99716   20 mujer    B  68.40 155.18 28.40
## 2 90837   26 mujer    B 108.89 179.54 33.78
## 3 98737   18 mujer    A  62.34 176.98 19.90
## 4 93291   22 mujer    B 117.58 174.31 38.70
## 5 92222   20 mujer    C 100.15 165.90 36.39
## 6 94016   26 mujer    C  80.13 182.79 23.98

Software para el análisis de datos: LAB1

María del Mar Mascarell López

2025-09-29

Ejercicios y casos prácticos con R

Ejercicio 1:

Respuesta:

Ejercicio 2:

Ejercicio 3:

Respuesta:

Ejercicio 4:

Ejercicio 5:

Ejercicio 6:

Ejercicio 7:

Ejercicio 8:

Ejercicio 9:

Caso práctico