REGRESION NO LINEAL

2023-06-07

Portada

portada

Ajuste a una regresión no lineal

## Regresión no lineal

Parámetros a determinar

portada

Regresión no lineal

portada

Regresión no lineal

portada

Librería “easynls”

portada

Librería “easreg”

portada

Valores Iniciales

Función Self-Starting (SS)

portada

Cargar librerías

## Warning: package 'easyreg' was built under R version 4.2.3

## Warning: package 'htmlTable' was built under R version 4.2.3

Ejercicio 1

Los datos corresponden al crecimiento acumulado (altura) de una especie de planta desde el día 1 hasta el día 16. El objetivo es modelar la evolución del crecimiento a través de la altura acumulada en función del tiempo. La altura se expresa en milímetros.

df<-as.data.frame(read.xlsx("D:/ACTIVIDAD DE TITULACION I (2023)/REG/regnolineal.xlsx",
  sheet=1))

Estructura del set de datos

str(df)

  'data.frame': 15 obs. of  2 variables:
   $ x: num  1 2 3 4 5 6 7 8 9 10 ...
   $ y: num  16.1 33.8 65.8 97.2 191.6 ...

Tabla de datos

flextable(df)

x	y
1	16.08
2	33.83
3	65.80
4	97.20
5	191.55
6	326.20
7	386.87
8	520.53
9	590.03
10	651.92
11	724.93
12	699.56
13	689.96
14	637.56
15	717.41

Gráfico de dispersión

plot(df)

## Modelos para curvas sigmoideas Probar ajustar a los siguientes modelos, disponibles en los paquetes easynls y easyreg de R: Gompertz Logístico Degradación ruminal Curva de Lactación Modelo de Michaelis-Menten

Ajuste a modelo Gompertz

Búsqueda de valores iniciales

mod_Gomp<-nls(y~SSgompertz(x,a,b,c), data = df)
summary(mod_Gomp)

  
  Formula: y ~ SSgompertz(x, a, b, c)
  
  Parameters:
     Estimate Std. Error t value Pr(>|t|)    
  a 723.10867   22.06048  32.778 4.12e-13 ***
  b  12.18470    3.46893   3.513  0.00428 ** 
  c   0.63756    0.03301  19.314 2.10e-10 ***
  ---
  Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  
  Residual standard error: 33.67 on 12 degrees of freedom
  
  Number of iterations to convergence: 0 
  Achieved convergence tolerance: 2.989e-06

Ajuste con easyreg

regplot(df, model=10, start=c(723, 12, 0.6), 
        digits=2, position=8, xlab="x", ylab="y")

  $y
  $y$`Gompertz model`
                               estimates
  coefficient a                   723.11
  coefficient b                    12.18
  coefficient c                     0.45
  p-value t.test for a              0.00
  p-value t.test for b              0.00
  p-value t.test for c              0.00
  residual mean square           1133.85
  r-squared                         0.99
  adjusted r-squared                0.99
  AIC                             152.72
  BIC                             155.55
  p.value Shapiro-Wilk test         0.26
  coefficient of variation (%)      8.04
  first value most discrepant      14.00
  second value most discrepant     11.00
  third value most discrepant       7.00
  
  $y$`Parameters, standard error and confidence intervals`
    parameter estimate standard error IC 2.5% IC 97.5%
  1         a   723.11          22.06  675.04   771.17
  2         b    12.18           3.47    4.63    19.74
  3         c     0.45           0.05    0.34     0.56
  
  $y$`Whole model test`
             df sum of squares mean squares     Fcal p-value
  regression  2     1062855.45   531427.727 468.6951  <0.001
  residuals  12       13606.14     1133.845        -       -
  total      14     1076461.60    76890.114        -       -
  
  $y$`Residuals of Gompertz model`
   [1]  15.7742405  28.7228357  35.0538104   0.6299603  -8.7788983   7.2020088
   [7] -42.2748099   2.0499218   5.1133987  20.2679356  61.5501259  15.1246837
  [13]  -8.2470200 -69.5716489   4.5288825
  
  $y$`Residuals standardized of Gompertz model`
   [1]  0.366474782  0.786494655  0.991855558 -0.124767594 -0.429967313
   [6]  0.088413137 -1.516490473 -0.078707612  0.020663875  0.512238952
  [11]  1.851329419  0.345404791 -0.412714517 -2.401931301  0.001703639

Ajuste con easynls

nlsplot(df, model=10, start=c(723, 12, 0.6))

Ajuste a modelo Logístico

Búsqueda de valores iniciales

mod_Log<-nls(y~SSlogis(x,a,b,c), data=df)
summary(mod_Log)

  
  Formula: y ~ SSlogis(x, a, b, c)
  
  Parameters:
    Estimate Std. Error t value Pr(>|t|)    
  a 702.8716    13.9397   50.42 2.43e-15 ***
  b   6.4519     0.1432   45.05 9.34e-15 ***
  c   1.4523     0.1210   12.00 4.83e-08 ***
  ---
  Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  
  Residual standard error: 27.28 on 12 degrees of freedom
  
  Number of iterations to convergence: 0 
  Achieved convergence tolerance: 5.403e-06

Ajuste a modelo logístico

Ajuste con easynls

regplot(df, model=7, start=c(703, 6.5, 1.5), digits=2,
        position=8, xlab="x", ylab="y")

  $y
  $y$`Logistic model`
                               estimates
  coefficient a                   702.87
  coefficient b                    84.99
  coefficient c                     0.69
  p-value t.test for a              0.00
  p-value t.test for b              0.01
  p-value t.test for c              0.00
  residual mean square            744.16
  r-squared                         0.99
  adjusted r-squared                0.99
  AIC                             146.40
  BIC                             149.24
  p.value Shapiro-Wilk test         0.28
  coefficient of variation (%)      6.45
  first value most discrepant      14.00
  second value most discrepant     11.00
  third value most discrepant       7.00
  
  $y$`Parameters, standard error and confidence intervals`
    parameter estimate standard error IC 2.5% IC 97.5%
  1         a   702.87          13.94  672.50   733.24
  2         b    84.99          29.82   20.03   149.96
  3         c     0.69           0.06    0.56     0.81
  
  $y$`Whole model test`
             df sum of squares mean squares     Fcal p-value
  regression  2    1067531.713  533765.8567 717.2759  <0.001
  residuals  12       8929.883     744.1569        -       -
  total      14    1076461.596   76890.1140        -       -
  
  $y$`Residuals of logistic model`
   [1]  -0.007233735   2.512849877   6.088229917 -12.447737628   2.481935723
   [6]  29.005170461 -30.105035245  -2.285649474  -9.184047308   5.239661794
  [11]  51.452656098  11.766896517  -5.255960711 -61.445214326  16.485829999
  
  $y$`Residuals standardized of logistic model`
   [1] -0.01164402  0.08814577  0.22972298 -0.50426070  0.08692163  1.13718359
   [7] -1.20345098 -0.10186429 -0.37502572  0.19612154  2.02605481  0.45458573
  [13] -0.21948210 -2.44445337  0.64144515

Ajuste con easynls

nlsplot(df, model=10, start=c(723, 12, 0.6))

Ajuste a modelo Degradación Ruminal

Valores iniciales

mod_Deg <-nls(y~SSasymp(x,a,b,c), data=df)
summary(mod_Deg)

  
  Formula: y ~ SSasymp(x, a, b, c)
  
  Parameters:
     Estimate Std. Error t value Pr(>|t|)    
  a 1122.4153   319.5884   3.512 0.004286 ** 
  b -186.4505    81.0809  -2.300 0.040229 *  
  c   -2.4405     0.4631  -5.270 0.000198 ***
  ---
  Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  
  Residual standard error: 74.7 on 12 degrees of freedom
  
  Number of iterations to convergence: 0 
  Achieved convergence tolerance: 3.6e-06

Ajuste con easyreg

regplot(df, model=12, start=c(1122,-186,-2), digits=2,
        position=8, xlab="x", ylab="y")

  $y
  $y$`Ruminal degradation model`
                               estimates
  coefficient a                  -186.45
  coefficient b                  1308.86
  coefficient c                     0.09
  p-value t.test for a              0.04
  p-value t.test for b              0.00
  p-value t.test for c              0.05
  residual mean square           5580.54
  r-squared                         0.94
  adjusted r-squared                0.93
  AIC                             176.63
  BIC                             179.46
  p.value Shapiro-Wilk test         0.46
  coefficient of variation (%)     17.65
  first value most discrepant      11.00
  second value most discrepant      4.00
  third value most discrepant      14.00
  
  $y$`Parameters, standard error and confidence intervals`
    parameter estimate standard error IC 2.5% IC 97.5%
  1         a  -186.45          81.08 -363.11    -9.79
  2         b  1308.86         268.59  723.65  1894.07
  3         c     0.09           0.04    0.00     0.18
  
  $y$`Whole model test`
             df sum of squares mean squares    Fcal p-value
  regression  2      1009495.1   504747.546 90.4478  <0.001
  residuals  12        66966.5     5580.542       -       -
  total      14      1076461.6    76890.114       -       -
  
  $y$`Residuals of ruminal degradation model`
   [1]   93.33134   10.99150  -48.77765 -101.46283  -84.18263  -20.17236
   [7]  -24.24848   50.06729   65.17426   77.20935  104.52392   37.27094
  [13]  -10.71766  -98.30342  -50.70357
  
  $y$`Residuals standardized of ruminal degradation model`
   [1]  1.3494683  0.1589250 -0.7052711 -1.4670408 -1.2171882 -0.2916701
   [7] -0.3506063  0.7239179  0.9423481  1.1163621  1.5113007  0.5388968
  [13] -0.1549655 -1.4213592 -0.7331178

Ajuste con easynls

nlsplot(df, model=12, start=c(1122,-186,-2))

Ajuste a modelo de Lactación

Valores iniciales

mod_Lac <-nls(y~SSlogis(x,a,b,c), data=df)
summary(mod_Lac)

  
  Formula: y ~ SSlogis(x, a, b, c)
  
  Parameters:
    Estimate Std. Error t value Pr(>|t|)    
  a 702.8716    13.9397   50.42 2.43e-15 ***
  b   6.4519     0.1432   45.05 9.34e-15 ***
  c   1.4523     0.1210   12.00 4.83e-08 ***
  ---
  Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  
  Residual standard error: 27.28 on 12 degrees of freedom
  
  Number of iterations to convergence: 0 
  Achieved convergence tolerance: 5.403e-06

Ajuste con easyreg

regplot(df, model=11, start=c(702,6.4,1.5), digits=2,
        position=8, xlab="x", ylab="y")

  $y
  $y$`Lactation model`
                                 estimates
  coefficient a                       1.74
  coefficient b                       3.94
  coefficient c                       0.32
  p-value t.test for a                0.05
  p-value t.test for b                0.00
  p-value t.test for c                0.00
  residual mean square              847.05
  r-squared                           0.99
  adjusted r-squared                  0.99
  AIC                               148.35
  BIC                               151.18
  maximum or minimum value for y    712.11
  critical point in x                12.49
  p.value Shapiro-Wilk test           0.92
  coefficient of variation (%)        6.91
  first value most discrepant        14.00
  second value most discrepant       15.00
  third value most discrepant        11.00
  
  $y$`Parameters, standard error and confidence intervals`
    parameter estimate standard error IC 2.5% IC 97.5%
  1         a     1.74           0.81   -0.01     3.50
  2         b     3.94           0.36    3.17     4.72
  3         c     0.32           0.04    0.24     0.39
  
  $y$`Whole model test`
             df sum of squares mean squares    Fcal p-value
  regression  2     1066296.95   533148.474 629.415  <0.001
  residuals  12       10164.65      847.054       -       -
  total      14     1076461.60    76890.114       -       -
  
  $y$`Residuals of lactation model`
   [1]  14.808987  19.578025  14.390238 -19.361225 -13.359965  19.524737
   [7] -23.864979  13.374251   1.515059   1.633955  34.302751 -10.302449
  [13] -19.900973 -55.883767  53.527246
  
  $y$`Residuals standardized of lactation model`
   [1]  0.47682342  0.65433745  0.46123667 -0.79506654 -0.57168656  0.65235397
   [7] -0.96270611  0.42341941 -0.01800562 -0.01358007  1.20242394 -0.45787910
  [13] -0.81515715 -2.15451542  1.91800169

Ajuste con easynls

nlsplot(df, model=11, start=c(702, 6.4, 1.5))

Ajuste a modelo de Michaelis-Menten

Búsqueda de valores iniciales

mod_micmen <-nls(y~SSmicmen(x,a,b), data=df)
summary(mod_micmen)

  
  Formula: y ~ SSmicmen(x, a, b)
  
  Parameters:
    Estimate Std. Error t value Pr(>|t|)
  a   7382.3    15359.5   0.481    0.639
  b    124.4      282.6   0.440    0.667
  
  Residual standard error: 91 on 13 degrees of freedom
  
  Number of iterations to convergence: 0 
  Achieved convergence tolerance: 2.991e-06

Ajuste con easyreg

MM<-nls(y~(a*x)/(b+x),start=list(a=7382, b=124), data=df)
summary(MM)

  
  Formula: y ~ (a * x)/(b + x)
  
  Parameters:
    Estimate Std. Error t value Pr(>|t|)
  a   7382.6    15360.7   0.481    0.639
  b    124.4      282.6   0.440    0.667
  
  Residual standard error: 91 on 13 degrees of freedom
  
  Number of iterations to convergence: 5 
  Achieved convergence tolerance: 3.855e-06

Se pide el valor del AIC

AIC(MM)

  [1] 181.7487

Resumen Ajuste Modelos

Modelos <-c("Gompertz", "Logístico", "Degradación ruminal",
            "Curva de lactación", "Michaelis-Menten")
AIC<-c("152.72", "146.40", "176.63", "148.35", "181.75")
tabla<-data.frame(Modelos, AIC)
flextable(tabla)

Modelos	AIC
Gompertz	152.72
Logístico	146.40
Degradación ruminal	176.63
Curva de lactación	148.35
Michaelis-Menten	181.75

Comentarios

De los cuatro modelos sigmoideos evaluados, el que presentó menor AIC (criterio de información de Akaike) fue el LOGISTICO. Se ajusta, entonces a ese modelo, quedando: \[y=702,87(1+84,99(e^{0,69x}))^{-1}\]

Limpiar Ambiente de Trabajo

rm(list = ls())

EJERCICIO 2

Se presentan datos de la relación entre dosis de antibióticos y porcentaje de daño de una enfermedad bacteriana.

df<-as.data.frame(read.xlsx("D:/ACTIVIDAD DE TITULACION I (2023)/REG/regnolineal.xlsx",
  sheet=2))

Tabla de datos

flextable(df)

Dosis	Daño
1	98.2
2	91.7
5	81.3
10	64.0
20	36.4
30	32.6
40	17.1
50	11.3

Visualizat tendencia de los datos

plot(df)

Se ajustará a una curva de tipo exponencial

Método de linealización

Linealizar al modelo exponencial

\[y=a*e^{bx} / log\] \[log{(y)}=log(a)+bx\]

Linealización para estimar los parámetros iniciales

Usando un modelo lineal

model <- lm(log(Daño) ~ Dosis, data=df)  
alpha <- exp(coef(model))[1]
beta <- coef(model)[2]

Parámetros iniciales

start <- list(alpha = alpha, beta = beta)
start

  $alpha
  (Intercept) 
     99.94084 
  
  $beta
        Dosis 
  -0.04323612

a = 99,94 y b = -0,04

Ajuste a modelo Exponencial

regplot(df, model=6, start=c(99.94, -0.04), digits=2, 
        position=8, xlab="Dosis", ylab="Daño")

  $Daño
  $Daño$`Exponential model`
                               estimates
  coefficient a                   100.98
  coefficient b                    -0.04
  p-value t.test for a              0.00
  p-value t.test for b              0.00
  residual mean square             10.97
  r-squared                         0.99
  adjusted r-squared                0.99
  AIC                              45.56
  BIC                              45.80
  p.value Shapiro-Wilk test         0.32
  coefficient of variation (%)      6.14
  first value most discrepant       6.00
  second value most discrepant      5.00
  third value most discrepant       1.00
  
  $Daño$`Parameters, standard error and confidence intervals`
    parameter estimate standard error IC 2.5% IC 97.5%
  1         a   100.98           2.35   95.23   106.72
  2         b    -0.04           0.00   -0.05    -0.04
  
  $Daño$`Whole model test`
             df sum of squares mean squares     Fcal p-value
  regression  1      8106.9905    8106.9905 739.1889  <0.001
  residuals   6        65.8045      10.9674        -       -
  total       7      8172.7950    1167.5421        -       -
  
  $Daño$`Residuals of exponential model`
  [1]  1.59466150 -0.72222175  0.37136086 -0.86025119 -5.26116437  5.84011880
  [7] -0.08845963  0.25947474
  
  $Daño$`Residuals standardized of exponential model`
  [1]  0.47447033 -0.28211179  0.07499945 -0.32718554 -1.76431113  1.86083154
  [7] -0.07515571  0.03846285

nlsplot(df, model=6, start=c(99.94, -0.04))

Modelo ajustado:

\[y=100,9778*e^{-0,0443x}\]

Limpiar Ambiente de Trabajo

rm(list = ls())

EJERCICIO 3

Pruebe ajustar los datos de la tabla a: 1.- Función lineal \((y=a+bx)\) 2.- Función parabólica \((y=a+bx+cx^2)\) 3.- Función exponencial \((y=ab^x)\)

Datos del ejercicio

df<-as.data.frame(read.xlsx("D:/ACTIVIDAD DE TITULACION I (2023)/REG/regnolineal.xlsx",
  sheet=3))

Visualizar tendencia de los puntos

plot(df)

Ajuste linear

regplot(df, model=1, digits=2, position=8, 
        xlab="X", ylab="Y")

  $y
  $y$`Linear model`
                               estimates
  coefficient a                    -8.45
  coefficient b                     7.35
  p-value t.test for a              0.05
  p-value t.test for b              0.00
  residual mean square              6.12
  r-squared                         0.97
  adjusted r-squared                0.96
  AIC                              26.69
  BIC                              25.52
  p.value Shapiro-Wilk test         0.20
  coefficient of variation (%)     18.19
  first value most discrepant       3.00
  second value most discrepant      1.00
  third value most discrepant       5.00
  
  $y$`Parameters, standard error and confidence intervals`
    parameter estimate standard error IC 2.5% IC 97.5%
  1         a    -8.45           2.59  -16.70    -0.20
  2         b     7.35           0.78    4.86     9.84
  
  $y$`Whole model test`
             df sum of squares mean squares    Fcal p-value
  regression  1        540.225     540.2250 88.3202  0.0026
  residuals   3         18.350       6.1167       -       -
  total       4        558.575     139.6437       -       -
  
  $y$`Residuals of linear model`
  [1]  2.35 -1.25 -2.35 -0.95  2.20
  
  $y$`Residuals standardized of linear model`
           1          2          3          4          5 
   1.0971849 -0.5836090 -1.0971849 -0.4435428  1.0271518

nlsplot(df, model=1)

Ajuste parabólico

regplot(df, model=2, digits=2, position=8, 
        xlab="X", ylab="Y")

  $y
  $y$`Quadratic model`
                                 estimates
  coefficient a                      -0.45
  coefficient b                       0.49
  coefficient c                       1.14
  p-value t.test for a                0.36
  p-value t.test for b                0.23
  p-value t.test for c                0.00
  residual mean square                0.03
  r-squared                           1.00
  adjusted r-squared                  1.00
  AIC                                 0.42
  BIC                                -1.14
  maximum or minimum value for y     -0.50
  critical point in x                -0.22
  p.value Shapiro-Wilk test           0.24
  coefficient of variation (%)        1.32
  first value most discrepant         4.00
  second value most discrepant        2.00
  third value most discrepant         5.00
  
  $y$`Parameters, standard error and confidence intervals`
    parameter estimate standard error IC 2.5% IC 97.5%
  1         a    -0.45           0.38   -2.10     1.20
  2         b     0.49           0.29   -0.77     1.75
  3         c     1.14           0.05    0.94     1.35
  
  $y$`Whole model test`
             df sum of squares mean squares      Fcal p-value
  regression  2       558.5107     279.2554 8687.9444  <0.001
  residuals   2         0.0643       0.0321         -       -
  total       4       558.5750     139.6437         -       -
  
  $y$`Residuals of quadratic model`
  [1]  0.06428571 -0.10714286 -0.06428571  0.19285714 -0.08571429
  
  $y$`Residuals standardized of quadratic model`
           1          2          3          4          5 
   0.5070926 -0.8451543 -0.5070926  1.5212777 -0.6761234

nlsplot(df, model=2)

Ajuste a modelo Exponencial

Linealización para estimar los parámetros iniciales

model <- lm(log(y) ~ x, data=df)  
alpha <- exp(coef(model))[1]
beta <- coef(model)[2]

Parámetros iniciales

start <- list(alpha = alpha, beta = beta)
start

  $alpha
  (Intercept) 
    0.8192578 
  
  $beta
          x 
  0.7775461

a = 0,82 y b = 0,78

Ajuste a modelo Exponencial

Utilizando los parámetros iniciales determinados

regplot(df, model=6, start=c(0.82, 0.78), digits=2, 
        position=8, xlab="X", ylab="Y")

  $y
  $y$`Exponential model`
                               estimates
  coefficient a                     2.02
  coefficient b                     0.55
  p-value t.test for a              0.04
  p-value t.test for b              0.00
  residual mean square              3.68
  r-squared                         0.98
  adjusted r-squared                0.97
  AIC                              24.14
  BIC                              22.97
  p.value Shapiro-Wilk test         0.77
  coefficient of variation (%)     13.79
  first value most discrepant       1.00
  second value most discrepant      4.00
  third value most discrepant       2.00
  
  $y$`Parameters, standard error and confidence intervals`
    parameter estimate standard error IC 2.5% IC 97.5%
  1         a     2.02           0.58    0.19     3.85
  2         b     0.55           0.06    0.35     0.75
  
  $y$`Whole model test`
             df sum of squares mean squares     Fcal p-value
  regression  1       547.5498     547.5498 148.9902  0.0012
  residuals   3        11.0252       3.6751        -       -
  total       4       558.5750     139.6437        -       -
  
  $y$`Residuals of exponential model`
  [1] -2.2416175 -1.0460222  0.7808165  1.8717500 -0.8905520
  
  $y$`Residuals standardized of exponential model`
  [1] -1.1918451 -0.4559969  0.6683600  1.3397923 -0.3603103

nlsplot(df, model=6, start=c(0.82, 0.78))

Resumen de CME y AIC

resumen = data.frame(
"MODELO" = c("Función lineal", "Función parabólica", "Función exponencial"),
"CME" = c("5.76", "0.06", "3.38"), 
"AIC" = c("26.39", "3.30", "24.34"))
flextable(resumen)

MODELO	CME	AIC
Función lineal	5.76	26.39
Función parabólica	0.06	3.30
Función exponencial	3.38	24.34

¿Cuál de los tres modeles presenta el mejor ajuste?