load("Dataset_AI.RData")
ls()
## [1] "Dataset_AI"

1. Descripció del conjunt de dades i preprocessament:

En aquest apartat descriurem el preprocessament de les dades i seleccionarem les variables utilitzades a l’anàlisi. Introduirem el nostre conjunt de dades i justificarem l’ús de les variables principals i complementàries en funció de la nostra pregunta d’estudi.

A continuació es mostra com hem carregat el fitxer amb el nostre conjunt de dades:

dades <- read_csv("Dataset_AI.csv")
## Rows: 5000 Columns: 26
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr   (5): agent_id, agent_type, model_architecture, deployment_environment,...
## dbl  (17): task_complexity, autonomy_level, success_rate, accuracy_score, ef...
## lgl   (3): human_intervention_required, multimodal_capability, edge_compatib...
## dttm  (1): timestamp
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Dimensions del dataset:

glimpse(dades)
## Rows: 5,000
## Columns: 26
## $ agent_id                    <chr> "AG_01012", "AG_00758", "AG_00966", "AG_00…
## $ agent_type                  <chr> "Project Manager", "Marketing Assistant", …
## $ model_architecture          <chr> "PaLM-2", "Mixtral-8x7B", "Mixtral-8x7B", …
## $ deployment_environment      <chr> "Server", "Hybrid", "Server", "Hybrid", "E…
## $ task_category               <chr> "Text Processing", "Decision Making", "Com…
## $ task_complexity             <dbl> 5, 6, 2, 8, 3, 5, 4, 8, 8, 7, 5, 3, 4, 5, …
## $ autonomy_level              <dbl> 3, 5, 4, 6, 4, 7, 2, 8, 8, 6, 7, 5, 5, 6, …
## $ success_rate                <dbl> 0.4788, 0.4833, 0.8116, 0.3574, 0.5706, 0.…
## $ accuracy_score              <dbl> 0.6455, 0.5660, 0.8395, 0.4888, 0.7137, 0.…
## $ efficiency_score            <dbl> 0.6573, 0.5844, 0.7650, 0.4742, 0.7209, 0.…
## $ execution_time_seconds      <dbl> 22.42, 9.30, 10.37, 43.85, 23.02, 11.04, 1…
## $ response_latency_ms         <dbl> 383.35, 127.38, 2185.27, 1847.43, 254.30, …
## $ memory_usage_mb             <dbl> 308.9, 372.4, 183.3, 488.2, 278.4, 346.3, …
## $ cpu_usage_percent           <dbl> 53.1, 84.9, 45.9, 75.3, 15.1, 66.5, 62.6, …
## $ cost_per_task_cents         <dbl> 0.0106, 0.0068, 0.0053, 0.0195, 0.0105, 0.…
## $ human_intervention_required <lgl> TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, TRUE,…
## $ error_recovery_rate         <dbl> 0.4999, 0.5580, 0.9196, 0.3809, 0.6717, 0.…
## $ multimodal_capability       <lgl> FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, F…
## $ edge_compatibility          <lgl> FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, FA…
## $ privacy_compliance_score    <dbl> 0.9390, 0.8281, 0.7450, 0.9653, 0.9042, 0.…
## $ bias_detection_score        <dbl> 0.8061, 0.7816, 0.8214, 0.8684, 0.8417, 0.…
## $ timestamp                   <dttm> 2024-12-24 04:16:15, 2024-12-24 04:16:15,…
## $ data_quality_score          <dbl> 0.9510, 0.7822, 0.7621, 0.8117, 0.7762, 0.…
## $ performance_index           <dbl> 0.58236, 0.53844, 0.80599, 0.43186, 0.6586…
## $ cost_efficiency_ratio       <dbl> 50.203448, 69.030769, 127.934921, 21.06634…
## $ autonomous_capability_score <dbl> 64.993, 89.060, 124.372, 86.663, 87.019, 1…
dim(dades)
## [1] 5000   26
head(dades)
## # A tibble: 6 × 26
##   agent_id agent_type    model_architecture deployment_environment task_category
##   <chr>    <chr>         <chr>              <chr>                  <chr>        
## 1 AG_01012 Project Mana… PaLM-2             Server                 Text Process…
## 2 AG_00758 Marketing As… Mixtral-8x7B       Hybrid                 Decision Mak…
## 3 AG_00966 QA Tester     Mixtral-8x7B       Server                 Communication
## 4 AG_00480 Code Assista… CodeT5+            Hybrid                 Creative Wri…
## 5 AG_01050 QA Tester     Falcon-180B        Edge                   Planning & S…
## 6 AG_00037 Email Manager Transformer-XL     Edge                   Communication
## # ℹ 21 more variables: task_complexity <dbl>, autonomy_level <dbl>,
## #   success_rate <dbl>, accuracy_score <dbl>, efficiency_score <dbl>,
## #   execution_time_seconds <dbl>, response_latency_ms <dbl>,
## #   memory_usage_mb <dbl>, cpu_usage_percent <dbl>, cost_per_task_cents <dbl>,
## #   human_intervention_required <lgl>, error_recovery_rate <dbl>,
## #   multimodal_capability <lgl>, edge_compatibility <lgl>,
## #   privacy_compliance_score <dbl>, bias_detection_score <dbl>, …

Diccionari de variables:

Variable Tipus Descripció Valors possibles / rang
agent_id categòrica Identificador de l’agent Codi string
agent_type categòrica Rol o tipus de l’agent Textual
model_architecture categòrica Arquitectura/model utilitzat per l’agent Textual
deployment_environment categòrica Entorn de desplegament Server
task_category categòrica Categoria de la tasca realitzada Textual
task_complexity numèrica Nivell de complexitat de la tasca Int 1-10
autonomy_level numèrica Nivell d’autonomia atorgat a l’agent Int 1-10
success_rate numèrica Percentatge d’èxit de la tasca Float 0.0-1.0
accuracy_score numèrica Mesura d’exactitud per a la tasca Float 0.0-1.0
efficiency_score numèrica Mesura d’eficiència Float 0.0-1.0
execution_time_seconds numèrica Temps d’execució en segons Float>0.0
response_latency_ms numèrica Latència de resposta en mil·lisegons Float>0.0
memory_usage_mb numèrica Consum de memòria durant la tasca en MB Float>0.0
cpu_usage_percent numèrica Percentatge d’ús de la CPU durant la tasca Float 0.0-100.0
cost_per_task_cents numèrica Cost de la tasca en cèntims Float>=0.0
human_intervention_required categòrica Indica si ha calgut intervenció humana durant la tasca Bool true / false
error_recovery_rate numèrica Percentatge d’èxit en recuperació Float 0.0-1.0
multimodal_capability categòrica Indica si l’agent disposa de multimodalitat Bool true / false
edge_compatibility categòrica Indica si l’agent pot executar-se en un dispositiu edge Bool true / false
privacy_compliance_score numèrica Compliment de la privacitat Float 0.0-1.0
bias_detection_score numèrica Mesura de detecció de bias Float 0.0-1.0
timestamp numèrica Marca temportal de l’execució Enter amb rang de dates
data_quality_score numèrica Mesura de qualitat de dades per entrada Float 0.0-1.0
performance_index numèrica Índex compost de rendiment Float 0.0-1.0
cost_efficiency_ratio numèrica Relació cost / eficiència Float
autonomous_capability_score numèrica Puntuació composta d’habilitats autònomes Float 0.0-200.0 aproximadament

Les que nosaltres farem servir principalment seran les següents:

Variable Tipus Descripció Valors possibles / rang
success_rate numèrica Percentatge d’èxit de la tasca Float 0.0-1.0
model_architecture categòrica Arquitectura/model utilitzat per l’agent Textual
task_category categòrica Categoria de la tasca realitzada Textual
task_complexity numèrica Nivell de complexitat de la tasca Int 1-10
data_quality_score numèrica Mesura de qualitat de dades per entrada Float 0.0-1.0
autonomy_level numèrica Nivell d’autonomia atorgat a l’agent Int 1-10
agent_type categòrica Rol o tipus de l’agent Textual
  ggplot(dades, aes(x = success_rate)) +
    geom_histogram(bins = 30) +
    labs(title = "Distribució del percentatge d'èxit")

Variable Tipus Descripció Valors possibles / rang
human_intervention_required categòrica Indica si ha calgut intervenció humana durant la tasca Bool true / false
performance_index numèrica Índex compost de rendiment Float 0.0-1.0
accuracy_score numèrica Mesura d’exactitud per a la tasca Float 0.0-1.0
efficiency_score numèrica Mesura d’eficiència Float 0.0-1.0
  dades_log <- dades %>%
    select(
      human_intervention_required,
      task_complexity,
      data_quality_score,
      autonomy_level,
      task_category
    ) %>%
    drop_na()
  
  dades_log$human_intervention_required <- 
    factor(dades_log$human_intervention_required, levels = c(FALSE, TRUE))
 tibble(
    variable = names(dades),
    tipus = sapply(dades, class)
)
## # A tibble: 26 × 2
##    variable               tipus       
##    <chr>                  <named list>
##  1 agent_id               <chr [1]>   
##  2 agent_type             <chr [1]>   
##  3 model_architecture     <chr [1]>   
##  4 deployment_environment <chr [1]>   
##  5 task_category          <chr [1]>   
##  6 task_complexity        <chr [1]>   
##  7 autonomy_level         <chr [1]>   
##  8 success_rate           <chr [1]>   
##  9 accuracy_score         <chr [1]>   
## 10 efficiency_score       <chr [1]>   
## # ℹ 16 more rows
  dades <- dades %>%
  mutate(
  model_architecture = as.factor(model_architecture),
  task_category = as.factor(task_category),
  agent_type = as.factor(agent_type),
  human_intervention_required = as.factor(human_intervention_required)
  )
  summary(dades$model_architecture)
##     Claude-3.5        CodeT5+    Falcon-180B     Gemini-Pro         GPT-4o 
##            512            467            511            481            494 
##    InstructGPT        LLaMA-3   Mixtral-8x7B         PaLM-2 Transformer-XL 
##            540            479            502            484            530
  summary(dades$task_category)
##          Code Generation            Communication         Creative Writing 
##                      476                      563                      475 
##            Data Analysis          Decision Making    Learning & Adaptation 
##                      512                      471                      492 
##    Planning & Scheduling          Problem Solving Research & Summarization 
##                      489                      523                      471 
##          Text Processing 
##                      528
  summary(dades %>%
  select(success_rate, task_complexity, data_quality_score, autonomy_level))
##   success_rate    task_complexity  data_quality_score autonomy_level  
##  Min.   :0.3000   Min.   : 2.000   Min.   :0.7500     Min.   : 1.000  
##  1st Qu.:0.3390   1st Qu.: 4.000   1st Qu.:0.8064     1st Qu.: 4.000  
##  Median :0.4701   Median : 6.000   Median :0.8625     Median : 6.000  
##  Mean   :0.4907   Mean   : 6.083   Mean   :0.8637     Mean   : 6.031  
##  3rd Qu.:0.6133   3rd Qu.: 8.000   3rd Qu.:0.9222     3rd Qu.: 8.000  
##  Max.   :0.9765   Max.   :10.000   Max.   :0.9799     Max.   :10.000
  colSums(is.na(dades %>%
  select(success_rate, task_complexity, data_quality_score,
  autonomy_level, model_architecture, task_category)))
##       success_rate    task_complexity data_quality_score     autonomy_level 
##                  0                  0                  0                  0 
## model_architecture      task_category 
##                  0                  0

2. Metodologia estadística utilitzada:

En aquest bloc formularem les hipòtesis associades a la pregunta d’estudi i descriurem els mètodes utilitzats per analitzar la relació entre el rendiment de les tasques fetes pels agents i diferents variables explicatives.

Hipòtesis: l’objectiu principal és determinar si existeix una relació estadísticament significativa entre el percentatge d’èxit d’una tasca i les variables explicatives seleccionades.

success_ratei​=β0​+β1​⋅task_complexityi​+β2​⋅data_quality_scorei​+β3​⋅autonomy_leveli​+k∑​γk​⋅model_architectureik​+j∑​δj​⋅task_categoryij​+εi

on: 𝜀 𝑖 ε i ​ representa el terme d’error aleatori, 𝛽 β, 𝛾 γ i 𝛿 δ són els paràmetres del model a estimar.

  model_null <- lm(success_rate ~ 1, data = dades)
  model_lm <- lm(
  success_rate ~ task_complexity + data_quality_score + autonomy_level +
  model_architecture + task_category,
  data = dades
  )
  summary(model_lm)
## 
## Call:
## lm(formula = success_rate ~ task_complexity + data_quality_score + 
##     autonomy_level + model_architecture + task_category, data = dades)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.166004 -0.038288 -0.000085  0.037144  0.187791 
## 
## Coefficients:
##                                         Estimate Std. Error t value Pr(>|t|)
## (Intercept)                            0.9208472  0.0108319  85.013  < 2e-16
## task_complexity                       -0.0691868  0.0007167 -96.533  < 2e-16
## data_quality_score                     0.0162957  0.0113475   1.436 0.151047
## autonomy_level                        -0.0013906  0.0005613  -2.477 0.013269
## model_architectureCodeT5+             -0.0123031  0.0034320  -3.585 0.000341
## model_architectureFalcon-180B         -0.0359368  0.0033512 -10.723  < 2e-16
## model_architectureGemini-Pro          -0.0033894  0.0034050  -0.995 0.319574
## model_architectureGPT-4o               0.0084928  0.0033799   2.513 0.012011
## model_architectureInstructGPT         -0.0049410  0.0033075  -1.494 0.135277
## model_architectureLLaMA-3             -0.0167129  0.0034089  -4.903 9.75e-07
## model_architectureMixtral-8x7B        -0.0352220  0.0033671 -10.461  < 2e-16
## model_architecturePaLM-2              -0.0199491  0.0033975  -5.872 4.59e-09
## model_architectureTransformer-XL      -0.0434424  0.0033203 -13.084  < 2e-16
## task_categoryCommunication             0.0083433  0.0034520   2.417 0.015689
## task_categoryCreative Writing         -0.0060577  0.0034802  -1.741 0.081806
## task_categoryData Analysis            -0.0157560  0.0034194  -4.608 4.17e-06
## task_categoryDecision Making           0.0020276  0.0035656   0.569 0.569613
## task_categoryLearning & Adaptation     0.0171235  0.0036854   4.646 3.47e-06
## task_categoryPlanning & Scheduling    -0.0023741  0.0034758  -0.683 0.494628
## task_categoryProblem Solving           0.0133700  0.0035589   3.757 0.000174
## task_categoryResearch & Summarization -0.0049663  0.0034934  -1.422 0.155196
## task_categoryText Processing           0.0019155  0.0034533   0.555 0.579139
##                                          
## (Intercept)                           ***
## task_complexity                       ***
## data_quality_score                       
## autonomy_level                        *  
## model_architectureCodeT5+             ***
## model_architectureFalcon-180B         ***
## model_architectureGemini-Pro             
## model_architectureGPT-4o              *  
## model_architectureInstructGPT            
## model_architectureLLaMA-3             ***
## model_architectureMixtral-8x7B        ***
## model_architecturePaLM-2              ***
## model_architectureTransformer-XL      ***
## task_categoryCommunication            *  
## task_categoryCreative Writing         .  
## task_categoryData Analysis            ***
## task_categoryDecision Making             
## task_categoryLearning & Adaptation    ***
## task_categoryPlanning & Scheduling       
## task_categoryProblem Solving          ***
## task_categoryResearch & Summarization    
## task_categoryText Processing             
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.05357 on 4978 degrees of freedom
## Multiple R-squared:  0.8867, Adjusted R-squared:  0.8863 
## F-statistic:  1856 on 21 and 4978 DF,  p-value: < 2.2e-16
  anova(model_null, model_lm)
## Analysis of Variance Table
## 
## Model 1: success_rate ~ 1
## Model 2: success_rate ~ task_complexity + data_quality_score + autonomy_level + 
##     model_architecture + task_category
##   Res.Df     RSS Df Sum of Sq      F    Pr(>F)    
## 1   4999 126.105                                  
## 2   4978  14.284 21    111.82 1855.7 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  SQT <- sum(residuals(model_null)^2)
  SQE <- sum(residuals(model_lm)^2)
  
  n <- nrow(dades)
  p <- length(coef(model_lm)) - 1
  MSE <- SQE / (n - p - 1)
  
  R2_manual <- 1 - SQE / SQT
  SQT; SQE; MSE; R2_manual
## [1] 126.1052
## [1] 14.28389
## [1] 0.002869403
## [1] 0.8867304

Regressió logística (GLM):

  taula <- table(dades$model_architecture,
                 dades$human_intervention_required)
  chisq.test(taula)
## 
##  Pearson's Chi-squared test
## 
## data:  taula
## X-squared = 56.309, df = 9, p-value = 6.851e-09
  model_logistic <- glm(
    human_intervention_required ~ 
      task_complexity + 
      data_quality_score + 
      autonomy_level + 
      task_category,
    data = dades_log,
    family = binomial(link = "logit")
  )
  
  summary(model_logistic)
## 
## Call:
## glm(formula = human_intervention_required ~ task_complexity + 
##     data_quality_score + autonomy_level + task_category, family = binomial(link = "logit"), 
##     data = dades_log)
## 
## Coefficients:
##                                         Estimate Std. Error z value Pr(>|z|)
## (Intercept)                            -6.975513   1.037510  -6.723 1.78e-11
## task_complexity                         2.543080   0.131155  19.390  < 2e-16
## data_quality_score                     -1.823304   1.085913  -1.679   0.0931
## autonomy_level                         -0.008058   0.051528  -0.156   0.8757
## task_categoryCommunication             -0.088826   0.243698  -0.364   0.7155
## task_categoryCreative Writing           0.416556   0.272019   1.531   0.1257
## task_categoryData Analysis              0.594078   0.375564   1.582   0.1137
## task_categoryDecision Making            0.504060   1.038786   0.485   0.6275
## task_categoryLearning & Adaptation      9.630398 674.886946   0.014   0.9886
## task_categoryPlanning & Scheduling      0.386765   0.256253   1.509   0.1312
## task_categoryProblem Solving           12.033919 620.259191   0.019   0.9845
## task_categoryResearch & Summarization  -0.276712   0.316457  -0.874   0.3819
## task_categoryText Processing            0.464451   0.261025   1.779   0.0752
##                                          
## (Intercept)                           ***
## task_complexity                       ***
## data_quality_score                    .  
## autonomy_level                           
## task_categoryCommunication               
## task_categoryCreative Writing            
## task_categoryData Analysis               
## task_categoryDecision Making             
## task_categoryLearning & Adaptation       
## task_categoryPlanning & Scheduling       
## task_categoryProblem Solving             
## task_categoryResearch & Summarization    
## task_categoryText Processing          .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 3697.1  on 4999  degrees of freedom
## Residual deviance: 1264.5  on 4987  degrees of freedom
## AIC: 1290.5
## 
## Number of Fisher Scoring iterations: 19

3. Comprovació i discussió de les assumpcions:

Ara comprovarem si el model compleix les assumpcions que hem proposat per tal que la inferència sigui vàlida. Farem l’anàlisi mitjançant diversos gràfics i interpretacions d’aquests.

  plot(model_lm$fitted.values, resid(model_lm),
  xlab = "Valors ajustats",
  ylab = "Residus",
  main = "Residus vs valors ajustats")
  abline(h = 0, col = "red")

  plot(model_lm, which = 3)

  library(lmtest)
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
  bptest(model_lm)
## 
##  studentized Breusch-Pagan test
## 
## data:  model_lm
## BP = 68.965, df = 21, p-value = 5.144e-07

Histograma dels residus:

  hist(resid(model_lm),
  breaks = 30,
  main = "Histograma dels residus",
  xlab = "Residus")

Gràfic Q-Q:

  qqnorm(resid(model_lm))
  qqline(resid(model_lm), col = "red")

Shapiro-Wilk:

  shapiro.test(resid(model_lm))
## 
##  Shapiro-Wilk normality test
## 
## data:  resid(model_lm)
## W = 0.99808, p-value = 7.688e-06
  dades_log <- dades_log %>%
    mutate(prob_intervencio = predict(model_logistic, type = "response"))
  ggplot(dades_log, aes(task_complexity, prob_intervencio)) +
    geom_point(alpha = 0.3) +
    geom_smooth(method = "loess") +
    labs(
      title = "Probabilitat d'intervenció humana segons la complexitat",
      x = "Complexitat",
      y = "Probabilitat d'intervenció"
    )
## `geom_smooth()` using formula = 'y ~ x'

4. Resultats (taules, figures, estadístics):

Després de l’estudi, ara analitzarem els resultats obtinguts a partir del model de regressió lineal múltiple i del model logístic. Volem avaluar l’efecte de les variables explicatives sobre el percentatge d’èxit de les tasques dutes a terme pels agents.

  summary(model_lm)
## 
## Call:
## lm(formula = success_rate ~ task_complexity + data_quality_score + 
##     autonomy_level + model_architecture + task_category, data = dades)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.166004 -0.038288 -0.000085  0.037144  0.187791 
## 
## Coefficients:
##                                         Estimate Std. Error t value Pr(>|t|)
## (Intercept)                            0.9208472  0.0108319  85.013  < 2e-16
## task_complexity                       -0.0691868  0.0007167 -96.533  < 2e-16
## data_quality_score                     0.0162957  0.0113475   1.436 0.151047
## autonomy_level                        -0.0013906  0.0005613  -2.477 0.013269
## model_architectureCodeT5+             -0.0123031  0.0034320  -3.585 0.000341
## model_architectureFalcon-180B         -0.0359368  0.0033512 -10.723  < 2e-16
## model_architectureGemini-Pro          -0.0033894  0.0034050  -0.995 0.319574
## model_architectureGPT-4o               0.0084928  0.0033799   2.513 0.012011
## model_architectureInstructGPT         -0.0049410  0.0033075  -1.494 0.135277
## model_architectureLLaMA-3             -0.0167129  0.0034089  -4.903 9.75e-07
## model_architectureMixtral-8x7B        -0.0352220  0.0033671 -10.461  < 2e-16
## model_architecturePaLM-2              -0.0199491  0.0033975  -5.872 4.59e-09
## model_architectureTransformer-XL      -0.0434424  0.0033203 -13.084  < 2e-16
## task_categoryCommunication             0.0083433  0.0034520   2.417 0.015689
## task_categoryCreative Writing         -0.0060577  0.0034802  -1.741 0.081806
## task_categoryData Analysis            -0.0157560  0.0034194  -4.608 4.17e-06
## task_categoryDecision Making           0.0020276  0.0035656   0.569 0.569613
## task_categoryLearning & Adaptation     0.0171235  0.0036854   4.646 3.47e-06
## task_categoryPlanning & Scheduling    -0.0023741  0.0034758  -0.683 0.494628
## task_categoryProblem Solving           0.0133700  0.0035589   3.757 0.000174
## task_categoryResearch & Summarization -0.0049663  0.0034934  -1.422 0.155196
## task_categoryText Processing           0.0019155  0.0034533   0.555 0.579139
##                                          
## (Intercept)                           ***
## task_complexity                       ***
## data_quality_score                       
## autonomy_level                        *  
## model_architectureCodeT5+             ***
## model_architectureFalcon-180B         ***
## model_architectureGemini-Pro             
## model_architectureGPT-4o              *  
## model_architectureInstructGPT            
## model_architectureLLaMA-3             ***
## model_architectureMixtral-8x7B        ***
## model_architecturePaLM-2              ***
## model_architectureTransformer-XL      ***
## task_categoryCommunication            *  
## task_categoryCreative Writing         .  
## task_categoryData Analysis            ***
## task_categoryDecision Making             
## task_categoryLearning & Adaptation    ***
## task_categoryPlanning & Scheduling       
## task_categoryProblem Solving          ***
## task_categoryResearch & Summarization    
## task_categoryText Processing             
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.05357 on 4978 degrees of freedom
## Multiple R-squared:  0.8867, Adjusted R-squared:  0.8863 
## F-statistic:  1856 on 21 and 4978 DF,  p-value: < 2.2e-16
  ggplot(dades, aes(model_architecture, success_rate)) +
    geom_boxplot() +
    labs(
      title = "Percentatge d'èxit segons l'arquitectura del model",
      x = "Arquitectura",
      y = "Percentatge d'èxit"
    ) +
    theme(axis.text.x = element_text(angle = 45, hjust = 1))

  ggplot(dades, aes(task_complexity, success_rate)) +
    geom_point(alpha = 0.3) +
    geom_smooth(method = "lm") +
    labs(
      title = "Relació entre complexitat de la tasca i percentatge d'èxit",
      x = "Complexitat",
      y = "Percentatge d'èxit"
    )
## `geom_smooth()` using formula = 'y ~ x'

  ggplot(dades, aes(data_quality_score, success_rate)) +
    geom_point(alpha = 0.3) +
    geom_smooth(method = "lm") +
    labs(
      title = "Impacte de la qualitat de les dades sobre l'èxit",
      x = "Qualitat de dades",
      y = "Percentatge d'èxit"
    )
## `geom_smooth()` using formula = 'y ~ x'

ggplot(dades, aes(task_category, success_rate)) +
  geom_boxplot() +
  labs(
    title = "Percentatge d'èxit segons la categoria de la tasca",
    x = "Categoria de la tasca",
    y = "Percentatge d'èxit"
  ) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

  library(boot)
  
  boot_fn <- function(data, indices) {
    d <- data[indices, ]
    coef(lm(success_rate ~ task_complexity + data_quality_score +
              autonomy_level, data = d))
  }
  
  set.seed(123)
  boot_res <- boot(dades, boot_fn, R = 1000)
  
  boot.ci(boot_res, type = "perc", index = 2)
## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
## Based on 1000 bootstrap replicates
## 
## CALL : 
## boot.ci(boot.out = boot_res, type = "perc", index = 2)
## 
## Intervals : 
## Level     Percentile     
## 95%   (-0.0691, -0.0665 )  
## Calculations and Intervals on Original Scale
  confint(model_lm)
##                                              2.5 %        97.5 %
## (Intercept)                            0.899611904  0.9420824354
## task_complexity                       -0.070591886 -0.0677817397
## data_quality_score                    -0.005950364  0.0385418332
## autonomy_level                        -0.002491085 -0.0002901559
## model_architectureCodeT5+             -0.019031458 -0.0055748072
## model_architectureFalcon-180B         -0.042506759 -0.0293669322
## model_architectureGemini-Pro          -0.010064668  0.0032858274
## model_architectureGPT-4o               0.001866744  0.0151188028
## model_architectureInstructGPT         -0.011425153  0.0015432500
## model_architectureLLaMA-3             -0.023395889 -0.0100299951
## model_architectureMixtral-8x7B        -0.041822968 -0.0286209643
## model_architecturePaLM-2              -0.026609741 -0.0132885204
## model_architectureTransformer-XL      -0.049951641 -0.0369331682
## task_categoryCommunication             0.001575725  0.0151107929
## task_categoryCreative Writing         -0.012880395  0.0007648994
## task_categoryData Analysis            -0.022459453 -0.0090525155
## task_categoryDecision Making          -0.004962544  0.0090177681
## task_categoryLearning & Adaptation     0.009898510  0.0243485327
## task_categoryPlanning & Scheduling    -0.009188243  0.0044401250
## task_categoryProblem Solving           0.006392915  0.0203470728
## task_categoryResearch & Summarization -0.011814964  0.0018822762
## task_categoryText Processing          -0.004854462  0.0086853631
  library(broom)
  
  resultats_log <- tidy(model_logistic)
  resultats_log
## # A tibble: 13 × 5
##    term                                  estimate std.error statistic  p.value
##    <chr>                                    <dbl>     <dbl>     <dbl>    <dbl>
##  1 (Intercept)                           -6.98       1.04     -6.72   1.78e-11
##  2 task_complexity                        2.54       0.131    19.4    9.39e-84
##  3 data_quality_score                    -1.82       1.09     -1.68   9.31e- 2
##  4 autonomy_level                        -0.00806    0.0515   -0.156  8.76e- 1
##  5 task_categoryCommunication            -0.0888     0.244    -0.364  7.15e- 1
##  6 task_categoryCreative Writing          0.417      0.272     1.53   1.26e- 1
##  7 task_categoryData Analysis             0.594      0.376     1.58   1.14e- 1
##  8 task_categoryDecision Making           0.504      1.04      0.485  6.28e- 1
##  9 task_categoryLearning & Adaptation     9.63     675.        0.0143 9.89e- 1
## 10 task_categoryPlanning & Scheduling     0.387      0.256     1.51   1.31e- 1
## 11 task_categoryProblem Solving          12.0      620.        0.0194 9.85e- 1
## 12 task_categoryResearch & Summarization -0.277      0.316    -0.874  3.82e- 1
## 13 task_categoryText Processing           0.464      0.261     1.78   7.52e- 2
  ggplot(dades_log, aes(prob_intervencio)) +
    geom_histogram(bins = 30) +
    labs(
      title = "Distribució de probabilitats predites",
      x = "Probabilitat d'intervenció humana"
    )

#OR <- exp(coef(model_logistic))
#IC <- exp(confint(model_logistic))
#OR
#IC

5. Interpretació, conclusions, limitacions i possibles línies futures:

Finalment, en aquest últim apartat farem unes breus conclusions envers els resultats dels models de l’estudi, explicarem algunes de les limitacions que ens han sorgit i esmentarem algunes idees per a possibles noves extensions del treball en un futur.