class: center, middle, inverse, title-slide .title[ # AI-Empowered Data Analysis: From Code to System Thinking ] .author[ ### Shanlong Ding ] .institute[ ### World Health Organization, Regional Office for Western Pacific ] .date[ ### 21 Nov 2025 ] --- # What we’ll do in the next 2.5 hours - Why AI for data analysis now - Three levels of prompting: light / middle / strong - Some examples - Hands-on + discussion --- # How are you using AI today? - “Never / rarely” - “Mainly for writing (email, abstracts)” - “Sometimes for code / R / Python” - “Often for analysis and modeling” --- .pull-left[ ### Q1. Which prompt produces more reliable results? A. “Analyze this dataset for me.” B. “Here are the variables and my goals. Please follow these steps…” C. “Tell me a story based on the numbers.” ] -- .pull-right[ ### Answer & explanation Correct answer: B - (A) is too vague – AI doesn’t know your goal or constraints. - (B) provides structure → more reliable output. - (C) encourages storytelling, not analysis. Key idea: Clear structure → better AI performance. ] --- .pull-left[ ### Q2. Which task is AI most likely to get wrong without human supervision? A. Creating a scatter plot B. Fitting a linear model C. Applying the correct WHO methodology for SDG 3.8.1 ] -- .pull-right[ ### Answer & explanation Correct answer: C AI excels at generic tasks (A, B). It often misapplies institution-specific rules (C). Key idea: AI needs explicit governance rules to be correct. ] --- .pull-left[ ### Q3. Why do we need “strong-structure prompting”? A. Because AI lacks creativity B. Because policy tasks need reproducibility C. Because it makes plots prettier ] -- .pull-right[ ### Answer & explanation Correct answer: B Strong structure ensures transparency, reproducibility, traceability. ] --- class: center, middle, inverse # What AI can already do extremely well --- # AI is powerful - Coding - Visualization - Drafting emails and documents ??? - 用AI:写代码,可视化,写邮件 - 最初一股脑给AI,有时好,有时坏 --- <br>
--- .pull-left[ ``` r library(GGally) ggpairs(iris[1:4], title = "Scatterplot Matrix of Iris Dataset") ``` <!-- --> ] .pull-right[ ``` r library(corrplot) corrplot(cor(iris[1:4]), addCoef.col = "white") ``` <!-- --> ] ??? - 大家应该都熟悉 iris,这里选前四列做相关分析 - 如果我们用 R 来画图,需要加载包、选择列、计算相关、再调用 corrplot - 图虽然很好看,但步骤相对繁琐 - 如果需要重复做类似分析,就会花费不少时间 --- .pull-left[ # Visualization with AI `Summarize iris dataset and draw a corrplot and a scatterplot matrix`  ] .pull-right[  **One sentence → reproducible code → clear visualization** ] ??? - 这就是我给 AI 的一句 prompt - AI 自动帮我生成 R 代码,并画出和刚才类似的图 - 基本不需要任何手动步骤 - 这就是 AI 在数据分析中最大的吸引力:又快又方便 - 直接询问即可,更加简单 --- <br>
--- # Imputing missing data .pull-left[ Dataset: airquality - Variables: - Ozone: Ozone concentration in the lower atmosphere (Parts per billion) - Solar.R: Intensity of solar radiation reaching the ground (Langleys) - Wind: Average wind speed (Miles per hour) - Temp: Daily maximum temperature (°F) - Month - Day - Missing data in **Ozone** and **Solar.R** ] .pull-right[ <!-- --> ] ??? - airquality数据集 - Ozone是臭氧浓度,Solar.R是每日太阳辐射量 - Solar.R缺失较少,而且较为稳定,可以使用中位数插补 - Ozone缺失较多,而且受其他因素影响,可以使用kNN插补 --- # Impute with AI `Please tell me the best imputing plan for the dataset airquality, and implement that` A reasonable and defensible imputation strategy is: - Focus on the variables with missing data: Ozone and Solar.R. - Solar.R: use median imputation, because it is a continuous variable with a relatively stable distribution; the median is robust to outliers and simple to explain. - Ozone: use a k-nearest neighbours (kNN) imputation, using Solar.R, Wind, Temp, Month and Day as predictors. This allows the imputed Ozone values to reflect patterns in the other variables and to preserve the variability better than a simple mean or median. - Leave Wind, Temp, Month and Day unchanged, since they have no missing values in this example. --- # Imputing with coding ``` r df.airquality.1 <- airquality %>% mutate(Solar.R = impute_median(Solar.R)) %>% kNN(variable = "Ozone", imp_var = FALSE) ``` ``` r library(mice) md.pattern(df.airquality.1, plot = FALSE) ``` ``` ## /\ /\ ## { `---' } ## { O O } ## ==> V <== No need for mice. This data set is completely observed. ## \ \|/ / ## `-----' ``` ``` ## Ozone Solar.R Wind Temp Month Day ## 153 1 1 1 1 1 1 0 ## 0 0 0 0 0 0 0 ``` --- # Comparison between two methods .pull-left[ **R coding**
Ozone
Solar.R
Wind
Temp
Month
Day
41
190
7.4
67
5
1
36
118
8.0
72
5
2
12
149
12.6
74
5
3
18
313
11.5
62
5
4
19
205
14.3
56
5
5
28
205
14.9
66
5
6
23
299
8.6
65
5
7
19
99
13.8
59
5
8
8
19
20.1
61
5
9
16
194
8.6
69
5
10
] .pull-right[ **AI** | Ozone| Solar.R| Wind| Temp| Month| Day| |-----:|-------:|----:|----:|-----:|---:| | 41| 190| 7.4| 67| 5| 1| | 36| 118| 8.0| 72| 5| 2| | 12| 149| 12.6| 74| 5| 3| | 18| 313| 11.5| 62| 5| 4| | 19| 205| 14.3| 56| 5| 5| | 28| 205| 14.9| 66| 5| 6| | 23| 299| 8.6| 65| 5| 7| | 19| 99| 13.8| 59| 5| 8| | 8| 19| 20.1| 61| 5| 9| | 16| 194| 8.6| 69| 5| 10| ] --- .pull-left[ ### Q4. Is this dataset good for further analysis? |gender | weight_kg| length_cm| |:------|---------:|---------:| |Male | 3.1| 42.1| |female | 4.2| 5.0| |F | 56.0| 48.7| |m | 4.9| 50.3| ] -- .pull-right[ ### Answer & explanation Correct answer: No - Inconsistent gender encoding - Impossible values (56 kg, 5 cm) Key idea: Clean ≠ loadable; clean = semantically correct. ] --- .pull-left[ ### Q5. Which imputation plan is better for the dataset `airquality`? A. Mean for both B. Solar.R → median; Ozone → kNN C. Solar.R → mean; Ozone → random ] -- .pull-right[ ### Answer & explanation Correct answer: B - Solar.R is stable → median works well - Ozone benefits from multivariate kNN ] --- .pull-left[ ### Q6. What is the most important step before imputation? A. Try random imputation B. Inspect missingness patterns C. Run a complex imputation package ] -- .pull-right[ ### Answer & explanation Correct answer: B Understanding missingness mechanism (MCAR, MAR, MNAR, or other missingness patterns) → essential for correct methods. ] --- class: center, middle, inverse # How to ask AI better questions --- # Structured promopting - We are not “instructing” the AI — we are clarifying our own thinking - There is no single universal template for prompting - The structure should be task-specific: - **Inspiration** → free, open-ended prompt - **Model building or diagnostics** → more explicit and rigorous structure - **brainstorming** → use role prompting **Structured prompting is not a fixed format, but a strategy for shaping the direction of the analysis.** `$$\text{运用之妙,存乎一心}$$` ??? There is no single method that works in all situations. True mastery lies in adapting to the context. --- # Light structure **For exploration and inspiration** - Allow AI to think and suggest interesting patterns - Minimal limits, maximal flexibility - Best for: - Early-stage EDA - Identifying patterns - Generating ideas to investigate --- <br>
--- Here we use a dataset **cats**: .pull-left[ - Body weight and heart weight of cats in both sexes: - Bwt: body weight - Hwt: heart weight
Sex
Bwt
Hwt
F
2.0
7.0
F
2.0
7.4
F
2.0
9.5
M
2.0
6.5
M
2.0
6.5
M
2.1
10.1
] .pull-right[ <!-- --><!-- --> ] --- > Query: Here is the dataset cats. Please give me five interesting observations and generate two quick visualizations. Thank you. .pull-left[ **Answers**: - Strong positive relationship between body and heart weight - Male cats are generally larger than female cats - Sex differences remain even after accounting for body weight - Possible outliers among very heavy heart weights ] .pull-right[  <br>  ] --- # Let's try! .pull-left[ Here is a dataset: - `patient_id`: Patient IDs - `age`: Ages - `systolic_bp`: Systolic Blood Pressures - `cholesterol`: Total cholesterol - `bmi`: Body Mass Index ] .pull-right[ Please: > Here is a dataset with patient_id, age, systolic blood pressure, cholesterol and BMI. Please give me 5 interesting observations and propose 2 quick visualizations. ] --- <br>
--- # Results - **Two very clear subgroups** appear in the data - a younger, healthier group and an older, higher-risk group. - **Age strongly aligns** with systolic blood pressure, cholesterol, and BMI - all three increase with age. - **Each subgroup is internally consistent**, with relatively small variation within the young and the older groups. - **Health indicators are highly correlated** - patients with high blood pressure also tend to have high cholesterol and higher BMI. - **Age alone can almost separate the dataset** - a simple plot of age vs. any health indicator reveals two distinct clusters. --- .pull-left[ <!-- --> <!-- --> ] .pull-right[ <img src="data:image/png;base64,#picture/4-cluster.png" width="80%" height="60%" /> <img src="data:image/png;base64,#picture/5-corrplot.png" width="60%" height="60%" /> ] --- # Middle structure **For analysis with clear goals** - More guidance, but still flexible - You already have something in mind, but still open to AI - Best for: - Visualization - Missing data handling - Fitting standard models --- .pull-left[ Here we use a simple dataset `women`: - `weight`: Women's weights in pound - `height`: Women's heights in inch <!-- --> It seems that height and weight follow a roughly linear trend. ] .pull-right[
height
weight
58
115
59
117
60
120
61
123
62
126
63
129
64
132
65
135
66
139
67
142
68
146
69
150
70
154
71
159
72
164
] --- # Middle structure: a focused question .pull-left[ **Prompt to AI**: > Here is the built-in R dataset `women`, > containing heights (inch) and weights (pound) of 15 women. > Please: > 1. Fit a simple linear regression of weight on height > 2. Report the regression equation > 3. Plot the data with the fitted line > 4. Comment briefly on whether a linear model seems appropriate ] -- .pull-right[ <!-- --> ] --- .pull-left[ # Results **A. Fitted linear regression** We fit a simple linear model: `$$weight \approx -87.52 \times + 3.45 \times height \text{ }(R^2 \approx 0.991)$$` Interpretation: - For every **1 inch increase in height**, weight increases by approximately **3.45 pounds** - The line fits the data very closely **B. Plot with fitted line** The plot shows the `women` dataset with the regression line superimposed. ] .pull-right[ <img src="data:image/png;base64,#picture/6-lm.png" width="80%" height="60%" /> **C. Whether a linear model is appropriate** The relationship between **height** and **weight** in the dataset is **strongly linear**: - The points lie very close to a straight upward trend - There is no visible curvature - No major deviations appear in the residuals ] Therefore, a simple linear regression is appropriate to this dataset. --- # Strong structure **For reproducible and policy-relevant analysis** - Highly explicit instructions - almost everything is defined - Used in professional work where transparency matters - Best for: - Output format (script, markdown, reproducible workflow) - Sensitivity analysis - Task definition --- # Why we need strong structure .pull-left[ **When analysis will be used for decisions, policy, or publication:** - Every step must be explicit - Methods must be reproducible - Diagnostics must be documented - Assumptions must be examined - Output must be stable and transparent ] .pull-right[ **Example:** We continue with the `women` dataset and perform a **full diagnostic** of the linear model: `$$weight = \beta_0 + \beta_1 \times height$$` <!-- --> ] --- # Explicit instructions to AI **The strong-structure prompt:** > Please perform a full diagnostic of the linear regression model **weight ~ height** using the `women` dataset. Follow these exact steps and report each session clearly. > - Fit the linear model > - Report coefficients, `\(R^2\)`, residual SD > - Plot fitted line > - Plot residuals vs fitted > - Plot Q-Q plot of residuals > - Test nonlinearity using a quadratic term > - Compare both models > - Give a final recommendation on model suitability This is a **fully defined workflow** - AI cannot guess or improvise. --- .pull-left[ # Results **A. Fit model & report regression** `$$weight = -87.52 + 3.45 \times height$$` - `\(R^2\)` = 0.987 - Residual SD = 1.52 - The fitted line captures the main trend very well. **B. Diagnostic plots** Residuals vs fitted: - Slight curvature - Middle range slightly underestimated - Ends slightly overetimated - Indicates mild nonlinearity ] .pull-right[ <!-- --> <img src="data:image/png;base64,#picture/7-resid.png" width="80%" height="50%" /> ] --- .pull-left[ **B. Diagnostic plots** Q-Q plot: - Light deviation at high end - Mostly normal - Small sample increases uncertainty <img src="data:image/png;base64,#picture/8-qq.png" width="80%" height="40%" /> ] .pull-right[ <!-- --> <!-- --> ] --- .pull-left[ **C. Compare linear vs quadratic** Quatdratic model: `$$weight = \beta_0 + \beta_1h + \beta_2h^2$$` - Quadratic term is statistically significant - `\(R^2\)` increases from 0.987 to 0.999 - Residual curvature disappears - But sample size is only 15 → risk of overfitting **Interpretation:** The quadratic model fits better, but the linear model remains acceptable as a simpler summary. ] .pull-right[ <!-- --> <!-- --> ] --- # Final decision - Either model works - linear model is easier to explain - Acknowledge the slight curvature, and the quadratic model removes the pattern - For real-world prediction, the quadratic model is better if sample size can be expanded **Strong structure ensures transparency and trust.** --- class: center, middle, inverse # Summary: three levels of prompting --- # Light structure - explore **Purpose**: early-stage EDA, idea generation, pattern discovery **Philosophy**: let AI think with you **When to use**: - You don't know the structure yet - You want inspiration or patterns - You want alternative views on the data **Prompt template**: > Here is a dataset with variables. > Please explore it freely and tell me 5 interesting observations. > Suggest two quick plots that help describe the dataset. **Outcome**: broad insight, hypotheses, directions --- # Middle structure - direct **Purpose**: you know the task, but still want flexibility **Philosophy**: guide AI without restricting it too much **When to use**: - Basic statistical tasks - Visualization - Simple models - Missing data handling **Prompt template**: > Please analyze the relationship between x and y. > - Fit a simple model > - Provide the model equation > - Draw the fitted line > - Comment on whether the model seems appropriate **Outcome**: guided analysis + interpretation --- # Strong structure - control .pull-left[ **Purpose**: reproducible, policy-relevant, publication-ready work **Philosophy**: eliminate ambiguity; every step is defined **When to use**: - Reporting to supervisors - Policy analysis - Research outputs - Regulatory or high-stakes work ] .pull-right[ **Prompt template**: > Perform a full diagnostic of the model [formula] using [dataset]. > Follow these exact steps and report each section clearly: > - Fit the model > - Report coefficients, R², residual SD > - Plot fitted line > - Residuals vs fitted > - Q–Q plot > - Add nonlinearity (quadratic) > - Compare models > - Final recommendation ] **Outcome**: structured, transparent, fully reproducible analysis --- .pull-left[ ### Q7. Which test should be used for a 5×6 table with many small cells? - Chi-square - Fisher’s exact test <br> ``` Calm Playful Curious Shy Aggressive Independent Breeder 5 1 0 2 1 0 Rescue 1 7 1 0 0 0 Shelter 3 2 1 3 1 1 Stray 1 1 0 1 3 0 Friend 0 2 1 0 1 0 ``` ] -- .pull-right[ ### Answer & explanation Correct answer: Fisher’s exact test Chi-square unreliable when expected counts < 5. ] --- .pull-left[ ### Q8. Which test for weight vs personality? Small groups, outliers, skewness. - ANOVA - Kruskal–Wallis  ] -- .pull-right[ ### Answer & explanation Correct answer: Kruskal–Wallis Non-parametric, robust to outliers & unbalanced groups. ] --- .pull-left[ ### Q9. How to interpret p = 0.20? A. Strong association B. Weak but significant association C. No statistically significant evidence  ] -- .pull-right[ ### Answer & explanation Correct answer: C We cannot reject independence; results are inconclusive. ] --- class: center, middle, inverse # Human + AI: Division of responsibilities --- # A collaboration, not a replacement .pull-left[ AI is strong at - Speed - Repetition - Pattern detection - Coding - Visualization - Diagnostics ] .pull-right[ Humans are strong at - Judgment - Context - Interpretation - Causality - Ethics - Communication ] <br> .center[**Together → Better analysis, faster, and safer**] --- # AI can code very fast **...but sometimes has *no idea* what problem actually is** Many people think: > If I give AI the data, it will analyze it correctly. But in real work - especially in public health - **AI often produces results that *look correct* but are actually wrong.** --- # A simple example - background .pull-left[ **What is SDG 3.8.1?** A summary score used by WHO to measure how well a country is providing **essential health services** to its population **What goes into it?** 4 sub-indices, and 14 indicators: - Reproductive, Maternal, Newborn, and Child Health (RMNCH) - Infectious diseases - Non communicable diseases (NCD) - Service access and capacity **Why combine them?** No single number can describe a whole system. SDG 3.8.1 gives a standardized 1-100 score. ] .pull-right[ **How to aggregate the index?** - Each indicator is converted to a 1-100 scale - Indicators within each domain are combined into four sub-indices: - Old method: All indicators were treated equally, and combined using a simple geometric mean - New method: Indicators are combined using a weighted geometric mean, where each indicator’s weight reflects the size of the population it represents - Four indices are merged to a single national SCI using a geometric mean ] --- # SDG 3.8.1 - aggregation
domain
tracer
value
weight
RMNCH
Family planning (modern methods)
76.7
34992635
RMNCH
ANC 4+ visits
85.6
4478737
RMNCH
DTP3 immunization coverage
83.0
4430208
RMNCH
Pneumococcal vaccination
75.0
22348675
Old: `\((76.7\times85.6\times83.0\times75.0)^{\frac{1}{4}} = 80.0\)` New: `\((76.7^{34992635}\times85.6^{4478737}\times83.0^{4430208}\times75.0^{22348675})^{\frac{1}{34992635+4478737+4430208+22348675}}=77.1\)` ---
--- # However... Imagine we give AI the 14 indicators and ask: > Please calculate the SDG 3.8.1 score for this country, and tell me the process. AI will do this instantly, but... <img src="data:image/png;base64,#picture/9-uhc.png" width="80%" height="50%" /> --- # This is the results of the old method .pull-left[
domain
old
new
RMNCH
79.96
77.10
Infectious diseases
58.42
86.67
NCD
28.64
41.85
Capacity
70.97
70.97
Overall SCI
55.51
66.74
] .pull-right[ **AI does not know the WHO rules** It does not know that: - Some indicators use population-based weights - Some were revised this year So AI produces a perfectly coded answer - methodologically wrong Unless a human checks it, the mistake stays hidden ] --- **Let's try again!**
--- > Please calculate UHC sub-indices and overall index using the 2025 revised method - calculating the weighted geometric mean for each tracer to get the sub-index, and calculating the geometric mean for each sub-index to get the overall index. <img src="data:image/png;base64,#picture/10-uhc.png" width="1061" /> --- # Summary - AI can calculate very fast - write correct code, reproduce the old method perfectly - But AI does not know the latest rules, until it updates its knowledge - It cannot tell whether it is using the old method - The results look professional, but can be wrong - Humans are needed to guide AI: - To define which method to use - To provide metadata, weights, definitions, etc. - To check whether the outputs are fit for purpose --- class: center, middle, inverse # Data analysis workflow --- # Traditional data analysis workflow .pull-left[  ] .pull-right[ - Import: bring data into R, Python, SPSS, or SAS, etc., from files, databases, or APIs - Tidy: Structure the data into consistent, tidy formats - Transform: Create new variables, filter observations, fix types, handle missingness, address outliers - Visualize/Model: Explore patterns through plots, and fit statistical models to understand relationships - Communicate: Share findings in reports, slides, dashboards, or manuscripts ] This is a **human-dominated workflow**, where each step relies heavily on manual coding, trial-and-error, and iteration. --- class: center, middle, inverse # AI does **not** replace these steps ## It acts as a *high-speed assistant* that accelerates each part of the process --- # An example .pull-left[  ] .pull-right[ - I am a cat person. - I wanted to have my own cat cafeteria. - I want to know the relationship between cats' personalities and other factors. ] --- # Import data **Traditional coding:** ``` r library(tidyverse) library(readxl) df.cat.1 <- read_excel("data/cat-dirty-data.xlsx") df.cat.2 <- read_excel("data/cat-dirty-data2.xlsx") ``` **AI:**  ---
---
--- ``` r partition(skim(df.cat.1)) ``` **Variable type: character** |skim_variable | n_missing| complete_rate| min| max| empty| n_unique| whitespace| |:-------------|---------:|-------------:|---:|---:|-----:|--------:|----------:| |id | 0| 1| 3| 3| 0| 31| 0| |name | 0| 1| 4| 9| 0| 31| 0| |gender | 0| 1| 1| 6| 0| 4| 0| **Variable type: logical** |skim_variable | n_missing| complete_rate| mean|count | |:-------------|---------:|-------------:|----:|:----------------| |neutered | 0| 1| 0.55|TRU: 17, FAL: 14 | **Variable type: numeric** |skim_variable | n_missing| complete_rate| mean| sd| p0| p25| p50| p75| p100|hist | |:-------------|---------:|-------------:|-----:|-----:|----:|----:|-----:|-----:|----:|:-----| |length_cm | 0| 1.0| 48.33| 10.43| 5.0| 46.4| 47.90| 52.50| 72.3|▁▁▂▇▁ | |height_cm | 0| 1.0| 30.55| 2.92| 25.3| 28.6| 30.60| 32.55| 38.5|▃▆▇▂▁ | |weight_kg | 3| 0.9| 6.55| 9.79| 1.8| 4.1| 4.45| 5.28| 56.0|▇▁▁▁▁ | |age_years | 0| 1.0| 7.06| 4.20| 1.0| 3.0| 7.00| 10.50| 14.0|▇▂▅▆▅ | --- ``` r partition(skim(df.cat.2)) ``` **Variable type: character** |skim_variable | n_missing| complete_rate| min| max| empty| n_unique| whitespace| |:-------------|---------:|-------------:|---:|---:|-----:|--------:|----------:| |id | 0| 1| 3| 3| 0| 31| 0| |color | 0| 1| 4| 6| 0| 7| 0| |breed | 0| 1| 6| 18| 0| 6| 0| |personality | 0| 1| 3| 10| 0| 5| 0| |adopted_from | 0| 1| 5| 7| 0| 4| 0| --- .pull-left[ # What issues can we find? - **Inconsistent encoding of sex** - `gender` column has 4 different encodings: Male, Female, M, F - We need to standardize them into two genders: Male and Female ``` r unique(df.cat.1$gender) ``` ``` [1] "Male" "F" "Female" "M" ``` - **Some names are not capitalized** - We should standarize them ``` r str_subset(df.cat.1$name, "^[^A-Z]") ``` ``` [1] "jessica" "grant" "lisa" "david" ``` ] .pull-right[ - **Missing values** - We need to impute the missing values ``` r miss_var_summary(df.cat.1) %>% filter(n_miss > 0) %>% gt() ```
variable
n_miss
pct_miss
weight_kg
3
9.68
- **Outliers** - We need to remove or winsorize the outliers ``` r range(df.cat.1$length_cm, na.rm = TRUE) ``` ``` [1] 5.0 72.3 ``` ``` r range(df.cat.1$weight_kg, na.rm = TRUE) ``` ``` [1] 1.8 56.0 ``` ] --- # Diagnose by AI > Please tell me the datasets are cleaned or not, and tell me why  --- # Clean the data ``` r df.cat.1a <- df.cat.1 %>% mutate( gender = ifelse(str_detect(gender, "[fF]"), "Female", "Male"), name = str_to_title(name), length_cm = ifelse(length_cm == 5, 50, length_cm), weight_kg = ifelse(weight_kg == 56, 5.6, weight_kg), gender = factor(gender) ) df.cat.1b <- kNN(df.cat.1a, variable = "weight_kg", imp_var = FALSE) ``` --- # Clean data with AI > Please clean the dataset: length_cm from 5 to 50, weight_kg from 56 to 5.6, impute weight_kg with kNN method, and transfer the inconsistent encodings of gender and name. Show me the cleaned data.  --- # Merge data The two datasets should be merged together to generate a full dataset. The unique key is `id`. .pull-left[ ] .pull-right[ > Please merge the two datasets, and the unique key is `id`. ]  ---
--- # Comparison between two methods .pull-left[ **R coding**
id
name
gender
weight_kg
a01
Dazhuang
Male
9.2
a02
Maomao
Male
3.1
a03
Mimi
Female
5.5
a04
Shane
Male
7.9
a05
Joshua
Female
4.2
a06
Mark
Male
4.1
a07
Stephanie
Male
4.4
a08
Stanley
Male
5.6
a09
Jeffery
Male
4.5
a10
Jacob
Male
4.0
] .pull-right[ **AI** |id |name |gender | weight_kg| |:---|:---------|:------|---------:| |a01 |Dazhuang |Male | 9.2| |a02 |Maomao |Male | 3.1| |a03 |Mimi |Female | 5.5| |a04 |Shane |Male | 7.9| |a05 |Joshua |Female | 4.2| |a06 |Mark |Male | 4.1| |a07 |Stephanie |Male | 4.4| |a08 |Stanley |Male | 5.6| |a09 |Jeffery |Male | 4.5| |a10 |Jacob |Male | 4.0| ] --- # Light structure > Please explore it freely and tell me some interesting observations. Suggest two quick plots that help describe the dataset. - **Body weight varies widely, with males generally heavier than females.** Weights range from about 1.8 kg to 9.2 kg, with most cats between 4–5 kg. Male cats are on average around 0.8 kg heavier than female cats, and the heaviest individuals are all male. - **Non-neutered cats tend to be heavier than neutered cats.** When comparing by neuter status, non-neutered cats have a higher mean weight and more of them appear in the upper end of the weight range, while neutered cats cluster more in the mid-range. (This is a small sample, so the pattern is illustrative rather than definitive.) - **Adoption source is related to body weight.** Cats adopted from breeders and as strays tend to be heavier on average, whereas rescue cats are noticeably lighter. Shelter cats sit somewhere in the middle. This suggests that environment and history may influence current body condition. --- .pull-left[ # AI provides me the codes  ] .pull-right[ <!-- --> <!-- --> ] --- .pull-left[ <!-- --> <!-- --> ] .pull-right[ - All personality groups have weights mostly in 4-6 kg range - Playful cats look slightly heavier and more variable - Curious cats are light and homogeneous - Shy and aggressive cats don't look very different from calm cats <br> - Breeders: mainly calm - Rescue: almost all playful - Shelter: mixed profile - Stray: more diverse and more aggressive <br> Adoption source is related to the personality mix, even though the sample is small. ] --- # Middle structure .pull-left[ > How to further analyze the relationship between personality and adoption source? `\(\chi^2 test?\)` A chi-square test of independence is the most appropriate first step. - Both variables—personality and adoption source—are categorical. - The chi-square test evaluates whether the distribution of personality types differs across adoption sources. - If some cells have **very small** counts (which is likely in our small toy dataset), a **Fisher’s exact test** can be used as a more robust alternative. ] .pull-right[ > How to analyze the relationship between weight and personality? one-way Anova? A one-way ANOVA is appropriate if you want to compare mean weight across different personality groups. - `weight_kg` is continuous, and `personality` is categorical - ANOVA tests whether the average weight differs among personality types. However, if assumptions of ANOVA (normality, homogeneity of variance) are questionable—which is common with small samples—you may use the **Kruskal–Wallis test**. ] --- # Strong structure > Please check the dataset: should I use the chi-square test or Fisher's exact test? If some cells are very small, please conduct the Fisher's exact test, and tell me: > - The p-value > - What it means > - What's the relationship between cats' personalities and adoption sources --- **Step 1 – Check whether to use chi-square or Fisher’s exact test** I first created a contingency table of adoption source × personality. Because the table is 5 × 6 and many cells have very small counts (including several 0s and 1s), the usual chi-square test would not be reliable: the expected frequencies in a number of cells are well below 5. 👉 Therefore, a Fisher’s exact test for an r × c table is more appropriate than a standard chi-square test. **Step 2 – Fisher’s exact test** Using Fisher’s exact test on the adoption-source × personality table: p-value ≈ 0.20 **Step 3 – Interpretation** A p-value around 0.20 is not statistically significant at conventional levels (e.g. 0.05). This means we do not have strong evidence to reject the null hypothesis that personality and adoption source are independent in this dataset. In other words, given the small sample size and sparse table, we cannot confidently conclude that the distribution of personalities differs across adoption sources. --- **Step 4 – What does this mean in practice?** Even though the stacked bar chart suggests some **interesting patterns** (e.g. breeder cats appearing more often calm, rescue cats more playful, strays more diverse and sometimes aggressive), Fisher’s exact test tells us that, with only about 30 cats: > These patterns could easily arise by chance, and we should treat them as exploratory rather than as strong evidence of a real association. **So:** > There is no statistically significant association between cats’ personalities and adoption sources in this small toy dataset, although descriptive plots hint at patterns that would be worth re-examining in a larger sample. --- # Strong structure > Please check the dataset: should I use the one-way ANOVA, or Kruskal-Wallis, if I want to check the relationship between weight and personality? Please choose the better choice, and tell me: > - The p-value > - What it means > - What's the relationship between cats' personalities and weights -- **Step 1 – Which test should I use?** - The outcome weight_kg is continuous. - The predictor personality is categorical with several groups, some of them quite small (one group has only 1 cat). - The boxplot shows clear outliers (very light and very heavy cats) and some skewed distributions, so the normality and equal-variance assumptions of one-way ANOVA are doubtful. Because of the **small and unbalanced group sizes** and the presence of **outliers**, the **Kruskal–Wallis test** (a non-parametric alternative to ANOVA) is the safer choice. --- **Step 2 – Kruskal–Wallis test for weight by personality** Using the Kruskal–Wallis test to compare weights across personality groups gives: - p-value ≈ 0.28 **Step 3 – What does this mean?** A p-value around 0.28 is **not statistically significant** at conventional levels (e.g. 0.05). - We **do not have enough evidence** to reject the null hypothesis that the distribution of weights is the same across all personality types. - In other words, with this small sample we cannot conclude that cats’ personalities are associated with systematically different body weights. **Step 4 – Relationship between cats’ personalities and weights** Descriptive plots suggest small tendencies: - Playful and calm cats include some of the heavier individuals. - Curious and shy cats tend to cluster more in the middle–lower weight range. However, the Kruskal–Wallis test shows that, given the sample size and variability: > These differences are not statistically significant, so we should treat them as exploratory patterns rather than strong evidence of a real effect. --- # Prompt for communication > You have already helped me completing the data cleaning and statistical analysis for a small dataset of 31 cats from a fictional cat café. The analyses I have already performed are: > - Relationship between personality and adoption source > - Fisher’s exact test instead of chi-square was used > - p-value ≈ 0.20 > - Relationship between weight and personality > - Kruskal–Wallis instead of ANOVA was used > - p-value ≈ 0.28 > Please write: > - A short Results paragraph as if for an academic paper (3–4 sentences), accurately describing these statistical findings. > - A set of 3 bullet points for practical implications (for a non-technical audience), using cautious and non-causal language. --- **Results** In this exploratory dataset of 31 cats, we examined whether personality type varied by adoption source and whether body weight differed across personality groups. A Fisher’s exact test was used for the 5×6 adoption source × personality table due to small cell counts; the p-value (~0.20) indicated no statistically significant association. A Kruskal–Wallis test comparing weight across personality types produced a similarly non-significant p-value (~0.28), suggesting no clear differences in weight distribution. Descriptive plots showed minor patterns, but these trends are likely due to small sample size and should be interpreted cautiously. **Implications for practice** - Apparent differences in personality or weight across adoption sources are not statistically reliable in this small sample. - Descriptive patterns (e.g., playful cats appearing slightly heavier) may be useful for casual observation but should not inform real decisions. - Any meaningful conclusions would require much larger datasets, consistent measurement, and repeated observations; AI can assist with reporting, but human judgment remains essential. --- # Summary - Visuals suggested some patterns - Formal tests (Fisher/Kruskal-Wallis) found no significant associations - In small toy data, AI + stats are for exploration & practice, not for strong claims - In real public health work, we need larger data + domain knowledge --- # Do / Don’t prompt .pull-left[ **DO:** - Tell AI your level (e.g. “assume I know basic regression”) - State the purpose (exploratory vs policy vs publication) - Ask for assumptions and limitations ] .pull-right[ **DON’T:** - Just say “analyze this dataset and give me insights” - Ask for “the best model” with no context - Copy-paste outputs into policy slides without checking ] --- # Take-home messages - AI: fast, helpful, reproducible, scaleable - Prompting: research thinking - Light: explore - Middle: guide - Strong: control - AI needs humans to check rules, metadata, definitions **Human + AI!** --- .pull-left[ ### Q10. Which SCI aggregation is correct? A. Simple geometric mean of all tracers B. Weighted geomean within domains → geomean of domain scores ] -- .pull-right[ ### Answer & explanation Correct answer: B Matches updated WHO (2025) SCI methodology. ] --- .pull-left[ ### Q11. What happens if AI uses the old SCI method? A. Nothing changes B. Systematic bias; AI won't warn you ] -- .pull-right[ ### Answer & explanation Correct answer: B Wrong method → wrong rankings → wrong policy. ] --- .pull-left[ ### Q12. Which step is AI least able to automate? A. Cleaning B. Running models C. Choosing conceptually appropriate methods ] -- .pull-right[ ### Answer & explanation Correct answer: C Method choice is conceptual, contextual, and requires human judgment. ] --- class: center, middle, inverse # Let's try together! --- ## About the dataset - **Sheets**: - `note`: tracer–domain mapping - `value`: UHC SCI tracer values (by country × year) - `weight`: weights for each tracer (by country × year) - **Task**: Compare **old method vs new method** for SCI calculation --- #### note
--- #### value
--- #### weight
--- ## Old vs New method - **Old method** - For each domain: geometric mean of all tracer values - Overall SCI: geometric mean of 4 domains - **New method** - For each domain: **weighted geometric mean** of tracers (weights provided) - Overall SCI: geometric mean of 4 domains (no weights needed here) --- ## Your tasks .pull-left[ 1. Import and reshape the data - Convert wide tracer values → long format - Join with weights and tracer–domain mapping 2. For each **domain × country × year**: - Calculate **old domain score** (geomean) - Calculate **new domain score** (weighted geomean) 3. For each **country × year**: - Calculate **old SCI** (geomean of domain_old) - Calculate **new SCI** (geomean of domain_new) 4. Focus on **2021**: - Compare **SCI_old vs SCI_new** - Compare **domain_old vs domain_new** ] .pull-right[ - Perform a **paired t-test**: - old vs new SCI (2021) - old vs new domains (2021) - Make box plots: - Old vs new SCI values - Old vs new domain values - Answer: - Is the **new SCI significantly different** from the old one? - Which domains show the largest differences? - How might these differences affect interpretation for policy? ] --- class: center, middle, inverse # 🐱 Your turn! --- .pull-left[ ## Results 2021 |domain | year| p.value| conf.low| conf.high| |:-------------------|----:|-------:|--------:|---------:| |SCI | 2021| 0.000| -6.044| -3.479| |RMNCH | 2021| 0.000| 2.263| 5.126| |Infectious Diseases | 2021| 0.000| -18.271| -7.188| |NCD | 2021| 0.000| -9.836| -7.111| |Service Capacity | 2021| 0.281| 0.000| 0.000| ] .pull-right[ <!-- --> ] --- .pull-left[ ## Results 2019 |domain | year| p.value| conf.low| conf.high| |:-------------------|----:|-------:|--------:|---------:| |SCI | 2019| 0.00| -5.180| -2.961| |RMNCH | 2019| 0.00| 2.895| 5.509| |Infectious Diseases | 2019| 0.00| -15.019| -5.716| |NCD | 2019| 0.00| -9.840| -7.071| |Service Capacity | 2019| 0.64| 0.000| 0.000| ] .pull-right[ <!-- --> ] --- .pull-left[ ## Results 2017 |domain | year| p.value| conf.low| conf.high| |:-------------------|----:|-------:|--------:|---------:| |SCI | 2017| 0.000| -5.809| -3.432| |RMNCH | 2017| 0.000| 2.743| 5.396| |Infectious Diseases | 2017| 0.000| -17.145| -7.557| |NCD | 2017| 0.000| -9.893| -7.039| |Service Capacity | 2017| 0.208| 0.000| 0.000| ] .pull-right[ <!-- --> ] --- .pull-left[ ## Results 2015 |domain | year| p.value| conf.low| conf.high| |:-------------------|----:|-------:|--------:|---------:| |SCI | 2015| 0.000| -6.650| -4.109| |RMNCH | 2015| 0.000| 2.665| 5.201| |Infectious Diseases | 2015| 0.000| -20.157| -10.131| |NCD | 2015| 0.000| -9.928| -6.968| |Service Capacity | 2015| 0.773| 0.000| 0.000| ] .pull-right[ <!-- --> ] --- class: center, middle, inverse # Thank you! dings@who.int