AI in Data Science and Statistics

Chuka University
CDAM
Data Science
Machine Learning
AI Tools
Statistics
Author

D K. Muriithi | CDAM-Chuka University

Published

May 10, 2026

Artificial Intelligence

Artificial Intelligence (AI) is a branch of computer science focused on creating machines or software that can perform tasks that normally require human intelligence.

In simple terms, AI allows computers to think, learn, and make decisions—at least in ways that resemble human behavior.

1. AI in Everyday Life

You are probably already using AI every day, often without noticing.

Smartphones

  • Face unlock uses AI to recognize your face.

  • Predictive text suggests the next word when typing.

  • Camera enhancements automatically improve photos.

Social Media

Platforms such as Facebook, Instagram, and TikTok use AI to:

  • Recommend content you may like

  • Detect spam or harmful content

  • Suggest friends or accounts to follow

Streaming Services

Netflix and Spotify use AI recommendation systems to suggest movies, series, or songs based on your preferences.

Healthcare

AI helps doctors:

  • Detect diseases from medical images

  • Predict disease outbreaks

  • Support diagnosis and treatment planning

For example, AI is increasingly used in malaria prediction and healthcare analytics, which aligns closely with research interests in disease modeling.

Banking and Finance

AI helps:

  • Detect fraud in transactions

  • Assess credit risk

  • Power customer support chatbots

Navigation and Maps

Google Maps uses AI to:

  • Predict traffic

  • Recommend faster routes

  • Estimate travel time

2. How AI Learns: A Simple Example

Imagine you want a computer to identify whether a patient has malaria.

The dataset contains information on fever, headache, body temperature, and malaria test results.

Fever Headache Temperature Malaria Result
Yes Yes 39°C Positive
No Yes 36°C Negative
Yes No 38°C Positive

An AI model studies patterns in this data.

It learns things like:

Patients with fever and high temperature are more likely to test positive.”

Later, when new patient data arrives, the AI predicts whether the patient may have malaria.

This is called Machine Learning.

3. Main Branches of AI

3.1 Machine Learning (ML)

A subset of AI where computers learn patterns from data.

Examples:

  • Predicting malaria infection

  • Fraud detection

  • House price prediction

Common algorithms include:

  • Linear Regression

  • Logistic Regression

  • Decision Trees

  • Random Forest

  • XGBoost

  • Neural Network

These are particularly relevant in data science and applied statistics.

3.2 Deep Learning

A more advanced form of machine learning that uses artificial neural networks.

Used for:

  • Image recognition

  • Speech recognition

  • Medical imaging

  • Autonomous vehicles

Example:
An AI model detecting whitefly infestation on cucumber plants from images.

3.3 Natural Language Processing (NLP)

Helps computers understand human language.

Examples:

  • Chatbots

  • Translation systems

  • Sentiment analysis

  • Text summarization

ChatGPT belongs here.

4. How ChatGPT Works (Simplified)

ChatGPT is an AI language model.

It works in roughly four stages:

Step 1: Training on Large Text Data

The system learns patterns from massive amounts of text including:

  • Books

  • Articles

  • Websites

  • Research papers

It learns:

  • Grammar

  • Facts

  • Writing styles

  • Relationships between words

Step 2: Predicting the Next Word

When you ask:

“What is AI?”

The model predicts the most likely next words based on patterns learned during training.

Very simplified example:

Input:
“What is Artificial…”

Possible next words:

  • Intelligence ✅

  • Banana ❌

  • Football ❌

It continuously predicts one word (or token) at a time.

Step 3: Context Understanding

The system uses context from your conversation.

For example, because you work in statistics, data science, and health modeling, explanations can be tailored toward:

  • machine learning,

  • prediction models,

  • healthcare applications,

  • statistical reasoning.

Step 4: Fine-Tuning and Safety

The model is further trained to:

  • Follow instructions

  • Be helpful

  • Reduce harmful or inaccurate outputs

5. Types of AI with Examples

Type Meaning Example
Narrow AI Performs one specific task ChatGPT, Deepseek, Grok…
General AI Human-level intelligence across tasks Not yet achieved
Super AI Beyond human intelligence Theoretical

Today, almost all AI systems are Narrow AI.

6. AI vs Machine Learning vs Deep Learning

Think of them like nested circles:

Artificial Intelligence

Machine Learning (subset of AI)

Deep Learning (subset of ML)

Example:

  • AI = smart healthcare system

  • ML = predicts malaria risk from patient data

  • Deep Learning = detects malaria parasites from microscope images

7. Advantages of AI

✅ Faster decision-making
✅ Handles large datasets
✅ Automates repetitive tasks
✅ Finds hidden patterns in data
✅ Improves prediction accuracy

8. Limitations of AI

❌ Needs quality data
❌ Can be biased
❌ May make incorrect predictions
❌ Often lacks human judgment and context

9. A simple definition to remember

Artificial Intelligence is the ability of machines or computer systems to imitate human intelligence by learning from data, recognizing patterns, solving problems, and making decisions.

For someone in statistics and data science, a useful way to think about AI is:

AI = Statistics + Data + Computing + Learning Algorithms

In simple terms: Statistics explains data, Data Science analyzes data, and AI learns from data to make predictions and support decisions.

10. AI Learning Roadmap

Phase 1: Strengthen Python + R for AI

Focus on:

In R

You already use R, so continue with:

  • tidyverse

  • caret

  • tidymodels

  • randomForest

  • xgboost

  • DALEX

  • iml

  • shiny

You already have experience with:

  • Random Forest,

  • synthetic malaria data,

  • SMOTE,

  • DALEX/SHAP.

That is already intermediate-level work.

In Python

Learn:

Libraries:

  • pandas

  • numpy

  • matplotlib

  • scikit-learn

  • xgboost

  • tensorflow

  • pytorch

Phase 2: Machine Learning Foundations

Master:

  1. Regression models

  2. Classification models

  3. Feature selection

  4. Hyperparameter tuning

  5. Cross-validation

  6. Ensemble learning

Especially:

  • Random Forest

  • XGBoost

  • Gradient Boosting

These are strong for healthcare datasets.

Phase 3: Explainable AI (XAI)

Very important for healthcare research.

Learn:

  • SHAP values

  • LIME

  • Partial Dependence Plots

  • Feature Importance

You are already moving in this direction with DALEX.

Phase 4: Deep Learning

For:

  • image-based diagnosis,

  • whitefly detection,

  • medical imaging.

Learn:

  • Neural Networks

  • CNNs (images)

  • RNN/LSTM (time series)

Frameworks:

  • TensorFlow

  • Keras

  • PyTorch

Phase 5: Research & Deployment

Learn:

  • APIs

  • Shiny apps

  • Dashboards

  • Model deployment

  • MLOps basics

For example:
Deploy a malaria prediction model as a web app.

11. Suggested Learning Path (Practical)

Because of your background, I would go:

Statistics → ML in R → Python → XAI → Deep Learning → Deployment

Practical projects:

  1. Malaria prediction model

  2. Diabetes classification

  3. Mental health risk prediction

  4. Whitefly image detection on cucumber plants

  5. Disease hotspot prediction in Kenya

These fit directly with your research interests.

13. Summary

Machine Learning teaches computers to learn from data, while AI is the broader field of making systems behave intelligently. Your statistics background already gives you much of the mathematical foundation needed for AI.

14. Learning Schedule

  • AI tools like ChatGpt, Deepseek, Qwen, Manus, Claude, Google tools

  • Python + AI for Data Science

  • R + AI for Data Science (Groq, gemini, ChatGpt)

  • Art of AI Prompting (ChatGpt, Deepseek, Qwen, Manus, Claude, Google tools)

  • AI Dashboard (Bricks, Manus, claude etc)

  • Google Colab + Gemini

  • Data Science model deployment (Manus, Mocha , loveable etc)

  • Github copilot