R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

summary(cars)
##      speed           dist       
##  Min.   : 4.0   Min.   :  2.00  
##  1st Qu.:12.0   1st Qu.: 26.00  
##  Median :15.0   Median : 36.00  
##  Mean   :15.4   Mean   : 42.98  
##  3rd Qu.:19.0   3rd Qu.: 56.00  
##  Max.   :25.0   Max.   :120.00

Including Plots

You can also embed plots, for example:

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.

datos <- read.csv("C:/Users/Sotto Family.DESKTOP-P9Q8G8G/Desktop/UOC 2021/CIENCIA DE DATOS  Y SALUS/PRACTICA/sarcopenia_data.txt", na.strings="", stringsAsFactors=TRUE)

descargo paquetes

library (arules)
## Loading required package: Matrix
## 
## Attaching package: 'arules'
## The following objects are masked from 'package:base':
## 
##     abbreviate, write
library (arulesViz)

library (tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.3     v purrr   0.3.4
## v tibble  3.1.1     v dplyr   1.0.6
## v tidyr   1.1.3     v stringr 1.4.0
## v readr   1.4.0     v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x tidyr::expand() masks Matrix::expand()
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
## x tidyr::pack()   masks Matrix::pack()
## x dplyr::recode() masks arules::recode()
## x tidyr::unpack() masks Matrix::unpack()

##Estudiamos la base de datos##

str(datos)
## 'data.frame':    20796 obs. of  2 variables:
##  $ PatientID: int  1 2 2 3 3 3 4 5 5 5 ...
##  $ Symptom  : Factor w/ 36 levels "Abdominal_Obesity",..: 21 25 25 1 24 4 7 32 12 21 ...

tenemos 20.796 datos de 2 variables , Symptom y PatientID ## eliminamos los datos duplicados y eliminamos los datos NA ##

datosSD <- datos[!duplicated (datos), ]
view(datosSD)

19.146 datos tras eliminar duplicados

datosSDN <- na.omit(datosSD)
view (datosSDN)

18.960 datos finales sin duplicados y sin espacios vacios.

Analizamos la base ya limpia sobre la que vamos a trabajar

levels(datosSDN$Symptom)
##  [1] "Abdominal_Obesity"                    
##  [2] "Arteriosclerosis"                     
##  [3] "Atrophic_Muscular_Disorders"          
##  [4] "Chronic_Kidney_Failure"               
##  [5] "Chronic_Obstructive_Pulmonary_Disease"
##  [6] "Cognition_Disorders"                  
##  [7] "Ecoli_Infections"                     
##  [8] "Gastrointestinal_Neoplasms"           
##  [9] "Hip_Fractures"                        
## [10] "Hypokinesia"                          
## [11] "Hypothyroidism"                       
## [12] "Inflammation"                         
## [13] "Kidney_Neoplasms"                     
## [14] "Liver_Diseases"                       
## [15] "Malnutrition"                         
## [16] "Metabolic_Syndrome_X"                 
## [17] "Muscle_Weakness"                      
## [18] "Muscular_Atrophy"                     
## [19] "Neurologic_Gait_Disorders"            
## [20] "Non-Small-Cell_Lung_Carcinoma"        
## [21] "Obesity"                              
## [22] "Osteoporosis"                         
## [23] "Overnutrition"                        
## [24] "Pain"                                 
## [25] "Pancreatic_Neoplasms"                 
## [26] "Prostatic_Neoplasms"                  
## [27] "Protein-Energy_Malnutrition"          
## [28] "Pseudomonas_Infections"               
## [29] "Renal_Cell_Carcinoma"                 
## [30] "Reperfusion_Injury"                   
## [31] "Rheumatoid_Arthritis"                 
## [32] "Sarcopenia"                           
## [33] "Sensation_Disorders"                  
## [34] "Starvation"                           
## [35] "Systemic_Lupus_Erythematosus"         
## [36] "Type2_Diabetes_Mellitus"

##síntomas mas frecuentes##

summary(datosSDN)
##    PatientID                Symptom    
##  Min.   :   1   Sarcopenia      :4528  
##  1st Qu.:2523   Obesity         :3097  
##  Median :5035   Muscle_Weakness :1350  
##  Mean   :4944   Muscular_Atrophy: 983  
##  3rd Qu.:7358   Inflammation    : 815  
##  Max.   :9684   Malnutrition    : 753  
##                 (Other)         :7434

El síntoma más frecuente es sarcopenia, seguido de obesidad, debilidad muscular, atrofia muscular, inflamación y malnutrición

library(dplyr)
sunflowerplot(datosSDN$Symptom, datosSDN$PatientID, main= "sunflowerplot")

##trabajamos ahora con arules, primer paso es la importacion de datos a un objeto tipo transaction ##

datos_split <- split(x = datosSDN$Symptom, f = datosSDN$PatientID)
transacciones <- as(datos_split, Class = "transactions")
transacciones
## transactions in sparse format with
##  9409 transactions (rows) and
##  36 items (columns)

procedemos a buscar en este formato los síntomas mas frecuentes

summary(transacciones)
## transactions as itemMatrix in sparse format with
##  9409 rows (elements/itemsets/transactions) and
##  36 columns (items) and a density of 0.05597478 
## 
## most frequent items:
##       Sarcopenia          Obesity  Muscle_Weakness Muscular_Atrophy 
##             4528             3097             1350              983 
##     Inflammation          (Other) 
##              815             8187 
## 
## element (itemset/transaction) length distribution:
## sizes
##    1    2    3    4    5    6    7    8    9 
## 3814 3092 1511  676  209   82   16    5    4 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   1.000   2.000   2.015   3.000   9.000 
## 
## includes extended item information - examples:
##                        labels
## 1           Abdominal_Obesity
## 2            Arteriosclerosis
## 3 Atrophic_Muscular_Disorders
## 
## includes extended transaction information - examples:
##   transactionID
## 1             1
## 2             2
## 3             3

tenemos 9.409 transacciones, y 36 columnas que sonlos síntomas., solo 4 transacciones tienen 4 elementos, la mayoría solo tiene 1(3.841), y la mediana está en 2 elementos. ## Empezamos a explorar el df de transacciones para buscar las asociaciones mas frecuentes##

inspect (transacciones[1:6])
##     items                                   transactionID
## [1] {Obesity}                                           1
## [2] {Pancreatic_Neoplasms}                              2
## [3] {Abdominal_Obesity,                                  
##      Chronic_Kidney_Failure,                             
##      Pain}                                              3
## [4] {Ecoli_Infections}                                  4
## [5] {Inflammation,                                       
##      Obesity,                                            
##      Sarcopenia}                                        5
## [6] {Chronic_Obstructive_Pulmonary_Disease,              
##      Ecoli_Infections,                                   
##      Inflammation}                                      6
df_transacciones <- as(transacciones, Class = "data.frame")
# Para que el tamaño de la tabla se ajuste mejor, usamos tibble#
as_tibble(df_transacciones) %>% head()

A continuacion, el siguiente análisis básico consiste en identificar cuáles son los items más frecuentes. Con la función itemFrequency() se puede extraer esta información de un objeto tipo transactions. Por “frecuencia” se hace referencia al soporte de cada síntoma, que es la fracción de pacientes que contienen dicho síntoma respecto al total de todos los pacientes.

frecuencia_items <- itemFrequency(x = transacciones, type = "relative")
frecuencia_items %>%  sort(decreasing = TRUE) %>% head (10)
##                            Sarcopenia                               Obesity 
##                            0.48124136                            0.32915294 
##                       Muscle_Weakness                      Muscular_Atrophy 
##                            0.14347965                            0.10447444 
##                          Inflammation                          Malnutrition 
##                            0.08661919                            0.08002976 
##                  Metabolic_Syndrome_X Chronic_Obstructive_Pulmonary_Disease 
##                            0.07227123                            0.06217451 
##                     Abdominal_Obesity                Chronic_Kidney_Failure 
##                            0.05866723                            0.05473483

##podemos tambien ver los sintomas menos frecuentes## frecuencia_items <- itemFrequency(x = transacciones, type = “relative”) frecuencia_items %>% sort(decreasing = TRUE) %>% tail (10) ##

frecuencia_items <- itemFrequency(x = transacciones, type = "relative")
frecuencia_items %>%  sort(decreasing = TRUE) %>% tail (10)
##               Hypokinesia          Arteriosclerosis Neurologic_Gait_Disorders 
##               0.014241683               0.013285153               0.013072590 
##   Type2_Diabetes_Mellitus        Reperfusion_Injury             Overnutrition 
##               0.010628122               0.010521841               0.008608779 
##                Starvation             Hip_Fractures            Hypothyroidism 
##               0.008183654               0.006695717               0.006164311 
##       Cognition_Disorders 
##               0.004570092

vemos la distribución de las frecuencias

summary(frecuencia_items)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 0.00457 0.01400 0.03151 0.05597 0.05572 0.48124

La frecuencia máxima de un item es 48% y la que menos 0.4% ##Es muy importante estudiar cómo se distribuye el soporte de los síntomas individuales en un conjunto de pacientes para identificar itemsets frecuentes o crear reglas de asociación, ya que, dependiendo del caso, tendrá sentido un límite de soporte u otro. Por lo general, cuando el número de posibles items es muy grande (varios miles) prácticamente todos los artículos son raros, por lo que los soportes son muy bajos## La soporte máxima es es 48.5%, el soporte elegido debe ser menor de 0.5 y mayor o igual de 0.0045 que es el mínimo

itemFrequencyPlot(transacciones, support = 0.005, cex.names = 0.8)

options(repr.plot.width = 3, repr.plot.height = 2)
itemFrequencyPlot(transacciones , topN = 36)

En esta gráfica se puede ver ordenado por frecuencias relativas para cada item. ##buscamos itemsets frecuentes## para ello aplicamos el algoritmo apriori que en realidad nos permite reducir el número de operaciones.

soporte <- 10/dim(transacciones)[1]
itemsets <- apriori (data = transacciones, parameter = list(support = soporte, minlen = 1, maxlen = 20, target = "frequent itemset"))
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime     support minlen
##          NA    0.1    1 none FALSE            TRUE       5 0.001062812      1
##  maxlen            target  ext
##      20 frequent itemsets TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 10 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[36 item(s), 9409 transaction(s)] done [0.00s].
## sorting and recoding items ... [36 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## sorting transactions ... done [0.00s].
## writing ... [493 set(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
summary (itemsets)
## set of 493 itemsets
## 
## most frequent items:
##       Sarcopenia          Obesity  Muscle_Weakness Muscular_Atrophy 
##              176              107               83               73 
##     Malnutrition          (Other) 
##               61              661 
## 
## element (itemset/transaction) length distribution:sizes
##   1   2   3   4 
##  36 253 197   7 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   2.000   2.000   2.355   3.000   4.000 
## 
## summary of quality measures:
##     support         transIdenticalToItemsets     count        
##  Min.   :0.001063   Min.   :0.0000000        Min.   :  10.00  
##  1st Qu.:0.001488   1st Qu.:0.0001063        1st Qu.:  14.00  
##  Median :0.002232   Median :0.0003188        Median :  21.00  
##  Mean   :0.008125   Mean   :0.0016830        Mean   :  76.45  
##  3rd Qu.:0.004995   3rd Qu.:0.0008502        3rd Qu.:  47.00  
##  Max.   :0.481241   Max.   :0.1259433        Max.   :4528.00  
## 
## includes transaction ID lists: FALSE 
## 
## mining info:
##           data ntransactions     support confidence
##  transacciones          9409 0.001062812          1
top_20_itemsets <- sort(itemsets, by = "support", decreasing = FALSE)[1:20]
inspect(top_20_itemsets)
##      items                               support transIdenticalToItemsets count
## [1]  {Reperfusion_Injury,                                                      
##       Sensation_Disorders}           0.001062812             0.0000000000    10
## [2]  {Kidney_Neoplasms,                                                        
##       Neurologic_Gait_Disorders}     0.001062812             0.0002125624    10
## [3]  {Neurologic_Gait_Disorders,                                               
##       Non-Small-Cell_Lung_Carcinoma} 0.001062812             0.0002125624    10
## [4]  {Arteriosclerosis,                                                        
##       Chronic_Kidney_Failure}        0.001062812             0.0000000000    10
## [5]  {Prostatic_Neoplasms,                                                     
##       Pseudomonas_Infections}        0.001062812             0.0000000000    10
## [6]  {Prostatic_Neoplasms,                                                     
##       Renal_Cell_Carcinoma}          0.001062812             0.0000000000    10
## [7]  {Prostatic_Neoplasms,                                                     
##       Rheumatoid_Arthritis}          0.001062812             0.0000000000    10
## [8]  {Muscular_Atrophy,                                                        
##       Sarcopenia,                                                              
##       Starvation}                    0.001062812             0.0001062812    10
## [9]  {Muscular_Atrophy,                                                        
##       Reperfusion_Injury,                                                      
##       Sarcopenia}                    0.001062812             0.0001062812    10
## [10] {Obesity,                                                                 
##       Reperfusion_Injury,                                                      
##       Sarcopenia}                    0.001062812             0.0001062812    10
## [11] {Metabolic_Syndrome_X,                                                    
##       Neurologic_Gait_Disorders,                                               
##       Sarcopenia}                    0.001062812             0.0001062812    10
## [12] {Arteriosclerosis,                                                        
##       Muscular_Atrophy,                                                        
##       Sarcopenia}                    0.001062812             0.0001062812    10
## [13] {Arteriosclerosis,                                                        
##       Muscle_Weakness,                                                         
##       Obesity}                       0.001062812             0.0000000000    10
## [14] {Hypokinesia,                                                             
##       Malnutrition,                                                            
##       Metabolic_Syndrome_X}          0.001062812             0.0000000000    10
## [15] {Hypokinesia,                                                             
##       Malnutrition,                                                            
##       Sarcopenia}                    0.001062812             0.0003188437    10
## [16] {Kidney_Neoplasms,                                                        
##       Metabolic_Syndrome_X,                                                    
##       Sensation_Disorders}           0.001062812             0.0001062812    10
## [17] {Malnutrition,                                                            
##       Non-Small-Cell_Lung_Carcinoma,                                           
##       Sarcopenia}                    0.001062812             0.0003188437    10
## [18] {Muscular_Atrophy,                                                        
##       Non-Small-Cell_Lung_Carcinoma,                                           
##       Sarcopenia}                    0.001062812             0.0001062812    10
## [19] {Inflammation,                                                            
##       Protein-Energy_Malnutrition,                                             
##       Sarcopenia}                    0.001062812             0.0002125624    10
## [20] {Muscle_Weakness,                                                         
##       Obesity,                                                                 
##       Pseudomonas_Infections}        0.001062812             0.0001062812    10

####identificados los itemsets más frecuentes, podemos filtrarlos por síntomas presentes, buscamos aquellos donde aparece sarcopenia##

itemsets_filtrado <-arules::subset(itemsets, subset = items %in% "Sarcopenia")
itemsets_filtrado
## set of 176 itemsets
inspect (itemsets_filtrado [1:10])
##      items                                     support    
## [1]  {Sarcopenia}                              0.481241365
## [2]  {Hip_Fractures,Sarcopenia}                0.005420342
## [3]  {Hypothyroidism,Sarcopenia}               0.003294718
## [4]  {Overnutrition,Sarcopenia}                0.004570092
## [5]  {Sarcopenia,Starvation}                   0.004038686
## [6]  {Atrophic_Muscular_Disorders,Sarcopenia}  0.002975874
## [7]  {Sarcopenia,Type2_Diabetes_Mellitus}      0.005739186
## [8]  {Sarcopenia,Systemic_Lupus_Erythematosus} 0.003294718
## [9]  {Reperfusion_Injury,Sarcopenia}           0.006589436
## [10] {Pain,Sarcopenia}                         0.004144968
##      transIdenticalToItemsets count
## [1]  0.1212668721             4528 
## [2]  0.0021256244               51 
## [3]  0.0008502498               31 
## [4]  0.0010628122               43 
## [5]  0.0000000000               38 
## [6]  0.0011690934               28 
## [7]  0.0009565310               54 
## [8]  0.0005314061               31 
## [9]  0.0015942183               62 
## [10] 0.0009565310               39

##Empezamos a buscar asociaciones, aplicando Reglas de asociación con el paquete arules.#

soporte <- 10/ dim(transacciones)[1]
reglas <- apriori(data = transacciones,
 parameter = list(support = soporte,
 confidence = 0,
 # Se especifica que se creen reglas
target = "rules"))
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime     support minlen
##           0    0.1    1 none FALSE            TRUE       5 0.001062812      1
##  maxlen target  ext
##      10  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 10 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[36 item(s), 9409 transaction(s)] done [0.00s].
## sorting and recoding items ... [36 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [1161 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
summary(reglas)
## set of 1161 rules
## 
## rule length distribution (lhs + rhs):sizes
##   1   2   3   4 
##  36 506 591  28 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   2.000   3.000   2.526   3.000   4.000 
## 
## summary of quality measures:
##     support           confidence         coverage             lift        
##  Min.   :0.001063   Min.   :0.00457   Min.   :0.001169   Min.   : 0.1868  
##  1st Qu.:0.001382   1st Qu.:0.04658   1st Qu.:0.009672   1st Qu.: 0.8381  
##  Median :0.002019   Median :0.09239   Median :0.028377   Median : 1.1604  
##  Mean   :0.005565   Mean   :0.16799   Mean   :0.085510   Mean   : 1.4709  
##  3rd Qu.:0.003826   3rd Qu.:0.22222   3rd Qu.:0.058667   3rd Qu.: 1.6862  
##  Max.   :0.481241   Max.   :0.90909   Max.   :1.000000   Max.   :11.2079  
##      count        
##  Min.   :  10.00  
##  1st Qu.:  13.00  
##  Median :  19.00  
##  Mean   :  52.36  
##  3rd Qu.:  36.00  
##  Max.   :4528.00  
## 
## mining info:
##           data ntransactions     support confidence
##  transacciones          9409 0.001062812          0

Probamos con la mínima confianza, buscando la máxima para luego aplicarla, vemos que es 0.8, trabajamos con esa confianza mínima

soporte <- 10/ dim(transacciones)[1]
reglas <- apriori(data = transacciones,
 parameter = list(support = soporte,
 confidence = 0.8,
 # Se especifica que se creen reglas
target = "rules"))
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime     support minlen
##         0.8    0.1    1 none FALSE            TRUE       5 0.001062812      1
##  maxlen target  ext
##      10  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 10 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[36 item(s), 9409 transaction(s)] done [0.00s].
## sorting and recoding items ... [36 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [6 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
summary(reglas)
## set of 6 rules
## 
## rule length distribution (lhs + rhs):sizes
## 2 3 4 
## 1 4 1 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       2       3       3       3       3       4 
## 
## summary of quality measures:
##     support           confidence        coverage             lift      
##  Min.   :0.001063   Min.   :0.8095   Min.   :0.001169   Min.   :1.682  
##  1st Qu.:0.001302   1st Qu.:0.8333   1st Qu.:0.001515   1st Qu.:1.732  
##  Median :0.001488   Median :0.8452   Median :0.001754   Median :1.756  
##  Mean   :0.002055   Mean   :0.8515   Mean   :0.002462   Mean   :1.769  
##  3rd Qu.:0.001594   3rd Qu.:0.8643   3rd Qu.:0.001913   3rd Qu.:1.796  
##  Max.   :0.005420   Max.   :0.9091   Max.   :0.006696   Max.   :1.889  
##      count      
##  Min.   :10.00  
##  1st Qu.:12.25  
##  Median :14.00  
##  Mean   :19.33  
##  3rd Qu.:15.00  
##  Max.   :51.00  
## 
## mining info:
##           data ntransactions     support confidence
##  transacciones          9409 0.001062812        0.8

Obtenemos 6 reglas, la mayoría formadas por dos items en el antecedente. ##inspeccionamos las reglas obtenidas#

inspect(sort(x = reglas, decreasing = TRUE, by = "confidence"))
##     lhs                                        rhs              support confidence    coverage     lift count
## [1] {Abdominal_Obesity,                                                                                      
##      Chronic_Obstructive_Pulmonary_Disease,                                                                  
##      Malnutrition}                          => {Sarcopenia} 0.001062812  0.9090909 0.001169093 1.889054    10
## [2] {Inflammation,                                                                                           
##      Pseudomonas_Infections}                => {Sarcopenia} 0.001381656  0.8666667 0.001594218 1.800898    13
## [3] {Metabolic_Syndrome_X,                                                                                   
##      Type2_Diabetes_Mellitus}               => {Sarcopenia} 0.001275375  0.8571429 0.001487937 1.781108    12
## [4] {Metabolic_Syndrome_X,                                                                                   
##      Reperfusion_Injury}                    => {Sarcopenia} 0.001594218  0.8333333 0.001913062 1.731633    15
## [5] {Gastrointestinal_Neoplasms,                                                                             
##      Malnutrition}                          => {Sarcopenia} 0.001594218  0.8333333 0.001913062 1.731633    15
## [6] {Hip_Fractures}                         => {Sarcopenia} 0.005420342  0.8095238 0.006695717 1.682158    51

##exploramos las reglas maximales ##

reglas_maximales <- reglas [is.maximal(reglas)]
reglas_maximales
## set of 6 rules
inspect(reglas_maximales)
##     lhs                                        rhs              support confidence    coverage     lift count
## [1] {Hip_Fractures}                         => {Sarcopenia} 0.005420342  0.8095238 0.006695717 1.682158    51
## [2] {Metabolic_Syndrome_X,                                                                                   
##      Type2_Diabetes_Mellitus}               => {Sarcopenia} 0.001275375  0.8571429 0.001487937 1.781108    12
## [3] {Metabolic_Syndrome_X,                                                                                   
##      Reperfusion_Injury}                    => {Sarcopenia} 0.001594218  0.8333333 0.001913062 1.731633    15
## [4] {Inflammation,                                                                                           
##      Pseudomonas_Infections}                => {Sarcopenia} 0.001381656  0.8666667 0.001594218 1.800898    13
## [5] {Gastrointestinal_Neoplasms,                                                                             
##      Malnutrition}                          => {Sarcopenia} 0.001594218  0.8333333 0.001913062 1.731633    15
## [6] {Abdominal_Obesity,                                                                                      
##      Chronic_Obstructive_Pulmonary_Disease,                                                                  
##      Malnutrition}                          => {Sarcopenia} 0.001062812  0.9090909 0.001169093 1.889054    10

#todas las reglas son maximales# ##buscamos reglas redundantes##

reglas_redundantes <- reglas[is.redundant(x = reglas, measure = "confidence")]
reglas_redundantes
## set of 0 rules

No hay reglas redundantes ## grafica con las reglas ##

plot(reglas)
## To reduce overplotting, jitter is added! Use jitter = 0 to prevent jitter.

inspeccionamos las reglas

plot (reglas,method="graph", "interactive "=TRUE, shading=NA)
## Warning: Unknown control parameters: interactive
## Available control parameters (with default values):
## layout    =  list(fun = function (graph, dim = 2, ...)  {     if ("layout" %in% graph_attr_names(graph)) {         lay <- graph_attr(graph, "layout")         if (is.function(lay)) {             lay(graph, ...)         }         else {             lay         }     }     else if (all(c("x", "y") %in% vertex_attr_names(graph))) {         if ("z" %in% vertex_attr_names(graph)) {             cbind(V(graph)$x, V(graph)$y, V(graph)$z)         }         else {             cbind(V(graph)$x, V(graph)$y)         }     }     else if (vcount(graph) < 1000) {         layout_with_fr(graph, dim = dim, ...)     }     else {         layout_with_drl(graph, dim = dim, ...)     } }, call_str = c("layout_nicely(<graph>, input = \"C:/Users/Sotto Family.DESKTOP-P9Q8G8G/Desktop/definitivo 12.6/junio.Rmd\", ", "    encoding = \"UTF-8\")"), args = list())
## edges     =  <environment>
## nodes     =  <environment>
## nodetext  =  <environment>
## colors    =  c("#EE0000FF", "#EEEEEEFF")
## engine    =  ggplot2
## max   =  100
## verbose   =  FALSE

Grafica representativa dinámica de las reglas by confidence

subrules <- head(sort(reglas, by="confidence"), 20)
plot(subrules, method="graph", engine = "htmlwidget")

##obtenemos un plot dinámico en el que podemos estudiar cada regla de forma independiente. de las  6 reglas, parece que todas ellas tienen como  lhf sarcopenia, las reglas con color rojo mas oscuro, son las de mayor lift, (1),El tamaño de los circulos se corresponde con el suppportlos mas pequeños tienen el menor support y la confidence máxima es 0.9 para la regla 1.

```r
reglas_seleccionadas2 <- subset(reglas, subset = confidence > 0.9)
 reglas_seleccionadas2
## set of 1 rules
inspect (reglas_seleccionadas2)
##     lhs                                        rhs              support confidence    coverage     lift count
## [1] {Abdominal_Obesity,                                                                                      
##      Chronic_Obstructive_Pulmonary_Disease,                                                                  
##      Malnutrition}                          => {Sarcopenia} 0.001062812  0.9090909 0.001169093 1.889054    10

Esta es la regla con una confidence mayor ##inspectionamos por el lift ##

reglas_seleccionadas3 <- subset(reglas, subset = lift >1.8)
 reglas_seleccionadas3
## set of 2 rules
inspect (reglas_seleccionadas3)
##     lhs                                        rhs              support confidence    coverage     lift count
## [1] {Inflammation,                                                                                           
##      Pseudomonas_Infections}                => {Sarcopenia} 0.001381656  0.8666667 0.001594218 1.800898    13
## [2] {Abdominal_Obesity,                                                                                      
##      Chronic_Obstructive_Pulmonary_Disease,                                                                  
##      Malnutrition}                          => {Sarcopenia} 0.001062812  0.9090909 0.001169093 1.889054    10

Tenemos dos regals con un lift mayor de 1.8, que es el valor mas alto del lift, ##analizamos las reglas por otros criterios de calidad ##

 mInteres <- interestMeasure(reglas,
                            measure = c("gini", "chiSquared"),
     
   
                  transactions=transacciones)
quality(reglas) <- cbind(quality(reglas), mInteres)

Todos los valores de gini son bajos, cerca de 0 luego hay mucha igualdad.

ahora otro modelo de plot pero por el lift alto

```