Intro to AI ML!

class: title-slide

.row[
.col-7[
.title[
# Intro to AI ML
]
.subtitle[
## Intro to AI ML
]
.author[
### Laxmikant Soni [blog](https://laxmikants.github.io) [](https://github.com/laxmiaknts) [](https://twitter.com/laxmikantsoni09)
]

.affiliation[
]

]

.col-5[

.logo[

<img src="figures/rmarkdown.png" width="480" />
]

]
]

---

# Artificial Intelligence and Machine Learning (AI/ML)

.pull-top[

## Past, Present, and Future of AI/ML

]

.pull-top[

### **Past**  
- **Definition**: Artificial Intelligence (AI) refers to the simulation of human intelligence processes by machines, while Machine Learning (ML) is a subset of AI that enables machines to learn from data and improve their performance over time without explicit programming.
- **Applications**:
  - **Early AI Systems**: Initially, AI was focused on rule-based systems and expert systems, where predefined rules were hard-coded into computers. Early ML systems mainly dealt with basic tasks like linear regression and decision trees.
  - **Limitations**: These systems struggled with scalability and often lacked generalization across tasks. Data availability and computational power were also limitations.

]

---

# Artificial Intelligence and Machine Learning (AI/ML)

.pull-top[

### **Present**  
- **Definition**: Today, AI/ML encompasses a wide range of algorithms and methods, including deep learning, reinforcement learning, and unsupervised learning, often powered by massive datasets and advanced computing resources.
- **Applications**:
  - **Natural Language Processing (NLP)**: Used in virtual assistants, chatbots, and translation services.
  - **Computer Vision**: Employed in facial recognition, autonomous vehicles, and medical imaging.
  - **Healthcare**: AI/ML models are used for predictive analytics, diagnosing diseases, drug discovery, and personalized treatments.
  - **Finance**: AI/ML algorithms are applied in algorithmic trading, fraud detection, and credit scoring.
  - **Automation**: Robotics and self-learning machines in manufacturing and logistics.

]
---

# Artificial Intelligence and Machine Learning (AI/ML)

.pull-top[

### **Future**  
- **Definition**: The future of AI/ML involves the development of more advanced, interpretable, and ethical AI systems that can exhibit more human-like reasoning and creativity while operating in a transparent and explainable manner.
- **Applications**:
  - **General AI**: Advancements in creating systems capable of performing any intellectual task a human can do, also known as Artificial General Intelligence (AGI).
  - **Explainable AI (XAI)**: Efforts to make AI decisions more interpretable to ensure fairness, trustworthiness, and transparency in critical applications like healthcare and law enforcement.
  - **Edge AI**: Deploying AI models on edge devices for real-time decision-making, reducing latency, and privacy concerns.
  - **AI in Creativity**: AI generating music, art, and literature, augmenting human creativity and pushing the boundaries of what’s possible in creative industries.
  - **Ethical and Responsible AI**: Ensuring AI systems are developed and deployed ethically, addressing issues of bias, accountability, and fairness.

]

---

# Types of Learning Systems in AI/ML

.pull-top[

## Supervised Learning

### **Definition**  
- Supervised learning is a type of machine learning where the model is trained on labeled data. The model learns to map input data to known output labels by minimizing the error between predicted and actual labels.

### **Applications**  
- **Classification**: Identifying categories, such as email spam detection or medical diagnosis (e.g., detecting whether a tumor is malignant or benign).
- **Regression**: Predicting continuous values, such as predicting house prices or stock market trends based on historical data.

]

---

# Types of Learning Systems in AI/ML

.pull-top[

## Unsupervised Learning

### **Definition**  
- Unsupervised learning involves training a model on unlabeled data. The system tries to identify patterns and structures from the data without explicit supervision.

### **Applications**  
- **Clustering**: Grouping similar data points together, such as customer segmentation for marketing or grouping products based on buying patterns.
- **Dimensionality Reduction**: Reducing the number of features in the data while preserving important relationships, such as Principal Component Analysis (PCA) for data compression.

]

---

# Types of Learning Systems in AI/ML

.pull-top[

## Semi-Supervised Learning

### **Definition**

- Semi-supervised learning is a hybrid approach where the model is trained on a small amount of labeled data and a large amount of unlabeled data. This approach is useful when labeling data is expensive or time-consuming.

### **Applications**

- **Medical Imaging**: Using a small number of labeled images and a large number of unlabeled images to train a model for detecting diseases in medical scans.
- **Web Content Classification**: Classifying web pages with only a few labeled examples but abundant unlabeled content.

]

---

# Types of Learning Systems in AI/ML

.pull-top[

## Reinforcement Learning

### **Definition**

- Reinforcement learning is a type of learning where an agent learns to make decisions by interacting with an environment. It receives rewards or penalties for its actions and adjusts its strategy to maximize the cumulative reward over time.

### **Applications**

- **Game Playing**: Algorithms like AlphaGo or chess-playing AI use reinforcement learning to learn optimal strategies.

- **Robotics**: Robots learn to perform tasks such as walking, picking objects, or driving autonomously through trial and error.

- **Recommendation Systems**: Personalizing content based on user interactions to maximize engagement.

]

---

# Types of Learning Systems in AI/ML

.pull-top[

## Self-supervised learning

### **Definition**

- Self-supervised learning is a type of unsupervised learning where the model generates its own labels from the data. The system typically predicts part of the data from other parts, learning useful representations without explicit human-provided labels.

### **Applications**

- **Natural Language Processing**: Language models like GPT-3 or BERT use self-supervised learning to predict the next word in a sentence or fill in missing words, helping improve understanding and generation of human language.

- **Computer Vision**: Learning useful visual representations by predicting parts of an image, such as filling in missing parts or predicting the transformation applied to an image.

]

---

# Concept Learning as Search Through a Hypothesis Space

.pull-top[

## Definition:Concept Learning

### **Concept Learning**

- Concept learning is the process by which a machine or algorithm learns to classify examples based on certain attributes or characteristics. The goal is to discover a general rule or concept that can correctly classify new, unseen instances of data.

- It is often described as searching through a hypothesis space, where the hypothesis represents a possible solution or classification rule.

]

---

# Concept Learning as Search Through a Hypothesis Space

.pull-top[

## Hypothesis Space

### **Definition of Hypothesis**

- A **hypothesis** is a specific assumption, rule, or model used to make predictions or decisions about data. In the context of machine learning, a hypothesis is an approximation of the target concept that the algorithm is trying to learn. It represents a possible explanation or classification rule based on the given features.
  
### **Definition of Hypothesis Space**

- A hypothesis space is the set of all possible hypotheses (or models) that could explain the data. In concept learning, a hypothesis represents a classification rule or decision boundary that differentiates between different categories or concepts.

- The hypothesis space is typically constrained by the features and types of models being used (e.g., decision trees, linear classifiers).

]

---

# Concept Learning as Search Through a Hypothesis Space

.pull-top[

## Search Through Hypothesis Space

- The search involves exploring different hypotheses within this space to find the one that best fits the training data. The goal is to find a hypothesis that generalizes well to unseen examples, not just fits the training set perfectly.

]

---

# Concept Learning as Search Through a Hypothesis Space

.pull-top[

## Search Through Hypothesis Space

- The search process can be guided by various strategies, such as:

- **Incremental Search**: Gradually refining hypotheses by modifying the current hypothesis.
  
  - **Gradient-based Search**: Using optimization techniques to adjust the hypothesis parameters based on the error between predicted and actual labels.

]

---

# Concept Learning as Search Through a Hypothesis Space

.pull-top[

## Example: Concept Learning with a Decision Tree

- **Attributes**: A dataset of animals with attributes like size, color, and habitat.

- **Goal**: Learn a concept that classifies animals into categories such as "Mammal" or "Reptile" based on the given attributes.

]

---

# Concept Learning as Search Through a Hypothesis Space

.pull-top[

## **Hypothesis Space Exploration**

- The hypothesis space consists of all possible decision trees that could separate the animals into these categories. The learning algorithm explores this space, starting with simple hypotheses (e.g., if the animal is small, it might be a mammal) and refining them based on feedback from the training data.`

- **Search Process**: The algorithm starts by hypothesizing simple rules (e.g., "if size = small, then mammal") and iteratively adjusts these rules by exploring more complex hypotheses to improve classification accuracy.

]

---

# Concept Learning as Search Through a Hypothesis Space

.pull-top[

## Applications in AIML

- **Supervised Learning**: Concept learning as hypothesis search is fundamental in many supervised learning tasks, especially in algorithms like decision trees and k-nearest neighbors (k-NN), where the goal is to learn a hypothesis that can generalize well from labeled examples.

- **Inductive Learning**: In inductive learning, concept learning serves as a foundation for algorithms that generalize from specific examples to broader concepts. Algorithms like ID3 (Iterative Dichotomiser 3) or C4.5 build decision trees through the exploration of hypothesis spaces.

- **Rule-Based Learning**: In rule-based learning systems, hypothesis spaces consist of different potential rules that classify examples based on feature values. These rules are refined through feedback from training data.

]

---

# General-to-Specific Ordering of Hypotheses

.pull-top[

## General-to-Specific Ordering of Hypotheses

### **Definition**

- The **general-to-specific ordering** of hypotheses refers to the process of organizing hypotheses from the most general to the most specific. The general hypotheses are more flexible and can classify a broader range of instances, while the specific hypotheses are more constrained and apply to fewer instances.

### **Purpose in Concept Learning**

- In concept learning, the goal is often to find the most specific hypothesis that fits the data without overfitting. This means starting with a general hypothesis and gradually refining it to make it more specific by incorporating more constraints from the training examples.

- A general hypothesis covers a wide range of possibilities, while a specific hypothesis closely fits the training examples, making it a candidate for a model that performs well on unseen data.

]

---

# General-to-Specific Ordering of Hypotheses

.pull-top[

### **Example**

- For instance, when learning a concept to classify animals, the hypothesis "animal is a mammal" is more general than "animal is a mammal with fur and four legs". The latter is more specific and fits a smaller set of animals.

]

---

# Finding Maximally Specific Hypotheses

.pull-top[

### **Definition**

- A **maximally specific hypothesis** is the hypothesis that is as specific as possible while still accurately classifying all the training examples. It contains no unnecessary generalizations or exceptions and is as narrowly defined as the data allows.

### **Finding the Maximally Specific Hypothesis**

- To find a maximally specific hypothesis, the learning algorithm begins with the most general hypothesis and iteratively refines it by adding conditions to eliminate inconsistencies with the training examples.

- This process ensures that the hypothesis is specific enough to correctly classify all the examples but not overly specific to the point where it overfits to the noise or peculiarities in the training data.

]

---

# Finding Maximally Specific Hypotheses

.pull-top[

### **Example**

- If the hypothesis is "animal is a mammal," and the training data indicates that the animal is a mammal with four legs and fur, the maximally specific hypothesis would be "animal is a mammal with four legs and fur."

]

---

# Version Spaces

.pull-top[

### **Definition**

- A **version space** is the set of all hypotheses that are consistent with the given training examples. It includes both the most general and the most specific hypotheses that correctly classify the examples.

### **Purpose in Concept Learning**  
- The version space is important in concept learning because it represents the set of hypotheses that could potentially explain the training data. The goal is to narrow down the version space by eliminating hypotheses that are inconsistent with the data.
- Over time, the version space becomes smaller as the learning algorithm refines its hypothesis.

### **Example**  
- If the task is to classify animals as mammals, and the training examples include animals with fur and four legs, the version space might include all hypotheses that classify mammals with these features. As more data is provided, the version space is refined to narrow down the possible hypotheses.

]

---

# Inductive Bias

.pull-top[

### **Definition**

- **Inductive bias** refers to the set of assumptions or preferences made by the learning algorithm that guide the search for a hypothesis. It dictates how the algorithm generalizes from the training data to unseen examples.

- Inductive bias is necessary because there is often an infinite number of hypotheses that could explain the data, so the algorithm needs some guidance to make reasonable assumptions about which hypotheses are likely to be correct.

]

---

# Inductive Bias

.pull-top[

### **Examples of Inductive Bias**

- **Preference for Simpler Hypotheses**: A learning algorithm may prefer simpler hypotheses (Occam’s Razor), assuming that simpler models are more likely to generalize well to new data.

- **Bias Toward Generalization**: Many algorithms assume that a hypothesis that fits the training data well is more likely to generalize to unseen examples.

- **Data Type Bias**: In decision tree learning, an inductive bias might prefer hypotheses based on certain attributes that can split data more effectively, such as using categorical attributes over continuous ones.

]

---

# Inductive Bias

.pull-top[

### **Importance of Inductive Bias**

- Without inductive bias, a learning algorithm would be unable to make meaningful generalizations from the data, leading to overfitting or underfitting.

- Inductive bias is a key factor in how well the algorithm can learn from data, generalize, and make accurate predictions on new examples.

]

---

# Evaluation of Learning Algorithms

.pull-top[

## Definition

### **Evaluation of Learning Algorithms**  
- Evaluating a learning algorithm involves assessing its performance on a given task. The goal is to determine how well the model can generalize to new, unseen data and make accurate predictions.
- Evaluation metrics depend on the type of problem being solved (e.g., classification, regression) and the specific characteristics of the data.

]

---

# Evaluation of Learning Algorithms

.pull-top[

## Common Evaluation Metrics

### **Accuracy**

- **Definition**: Accuracy is the percentage of correct predictions made by the model.
- **Use**: Suitable for balanced datasets, where the classes are roughly equally distributed.
- **Formula**:  
  $$
  \text{Accuracy} = \frac{\text{Number of Correct Predictions}}{\text{Total Predictions}} \times 100
  $$
- **Limitations**: Accuracy may be misleading for imbalanced datasets, where one class is significantly more frequent than the other.

]

---

# Evaluation of Learning Algorithms

.pull-top[

## Common Evaluation Metrics

### **Precision**

- **Definition**: Precision measures the proportion of positive predictions that are actually correct.
- **Use**: Important in cases where false positives are costly, such as in fraud detection or medical diagnoses.
- **Formula**:  
  $$
  \text{Precision} = \frac{\text{True Positives}}{\text{True Positives + False Positives}}
  $$

]

---

# Evaluation of Learning Algorithms

.pull-top[

## Common Evaluation Metrics

### **Recall (Sensitivity)**  
- **Definition**: Recall measures the proportion of actual positives that are correctly identified by the model.
- **Use**: Critical when false negatives are undesirable, such as in identifying diseases or missed opportunities.
- **Formula**:  
  $$
  \text{Recall} = \frac{\text{True Positives}}{\text{True Positives + False Negatives}}
  $$
  
]

---

# Evaluation of Learning Algorithms

.pull-top[

### **F1-Score**

- **Definition**: The F1-Score is the harmonic mean of precision and recall, offering a balance between the two metrics.
- **Use**: Useful when both false positives and false negatives are important and the dataset is imbalanced.
- **Formula**:  
  $$
  \text{F1-Score} = 2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision + Recall}}
  $$
  ]

---

# Evaluation of Learning Algorithms

.pull-top[

### **Mean Squared Error (MSE)**

- **Definition**: MSE is the average of the squared differences between the predicted values and the actual values. It is used for regression problems.
- **Use**: Measures the quality of predictions in regression tasks by penalizing larger errors more severely.
- **Formula**:  
  $$
  \text{MSE} = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2
  $$
  where `$y_i$` is the actual value, and `$\hat{y}_i$` is the predicted value.

]

---

# Evaluation of Learning Algorithms

.pull-top[

## Cross-Validation

### **Definition**

- **Cross-validation** is a technique used to evaluate the performance of a model by partitioning the data into multiple subsets or folds. The model is trained on some folds and tested on the remaining fold(s), with this process repeated for all subsets.

### **Purpose**  
- Cross-validation helps to ensure that the model's performance is not overly dependent on any particular subset of the data, thus providing a better estimate of its generalization ability.
- **K-Fold Cross-Validation**: The dataset is split into k equal parts (folds). For each fold, the model is trained on the (k-1)remaining folds and tested on the remaining fold.
  
### **Example**  
- If using 5-fold cross-validation, the model will be trained 5 times, each time using 4/5 of the data for training and 1/5 for testing. The average performance across all 5 iterations is then reported.

]

---

# Evaluation of Learning Algorithms

.pull-top[

## Overfitting and Underfitting

### **Overfitting**

- **Definition**: Overfitting occurs when a model learns the noise or random fluctuations in the training data instead of the underlying patterns. As a result, the model performs well on the training set but poorly on new, unseen data.
- **Signs**: High accuracy on the training set but low accuracy on the validation or test set.

### **Underfitting**  
- **Definition**: Underfitting occurs when a model is too simplistic to capture the underlying patterns in the data. This leads to poor performance on both the training and test sets.
- **Signs**: Low accuracy on both the training and test sets.

### **Balance**  
- The key to good model evaluation is to find a balance between overfitting and underfitting, ensuring that the model generalizes well to new data while still capturing the important patterns in the training data.

]

---

# Model Selection and Comparison

.pull-top[

### **Definition**

- **Model selection** refers to the process of choosing the best model for a given task based on its performance metrics. This involves comparing multiple models, considering factors like accuracy, precision, recall, and computational efficiency.

### **Approaches**  
- **Grid Search**: A method of systematically searching through a hyperparameter space to find the optimal combination for a model.
- **Random Search**: An approach where hyperparameters are selected randomly within a specified range.
- **Ensemble Methods**: Combining predictions from multiple models (e.g., Random Forests, XGBoost) to improve performance.

### **Purpose**  
- Evaluating and selecting the right model is crucial for achieving optimal performance and ensuring that the model generalizes well to unseen data.

]

---

# Bias-Variance Trade-off

.pull-top[

### **Definition**

- The **Bias-Variance Trade-off** refers to the balance between two sources of errors in machine learning models:
  - **Bias**: Error introduced by the model's assumptions. High bias indicates that the model is too simplistic, and it fails to capture the underlying patterns in the data, leading to underfitting.
  - **Variance**: Error introduced by the model's sensitivity to small fluctuations or noise in the training data. High variance indicates that the model is too complex, overfitting the training data and failing to generalize to new data.
  
]

---

# Bias-Variance Trade-off

.pull-top[

### **Trade-off**  
- The goal is to find a model that minimizes both bias and variance. A model with high bias and low variance will underfit the data, while a model with low bias and high variance will overfit the data.
- As the complexity of the model increases (e.g., more features, deeper neural networks), variance tends to increase, and bias tends to decrease. Conversely, simpler models have higher bias and lower variance.

]

---

# Bias-Variance Trade-off

.pull-top[

### **Example**  
- **High Bias**: A linear model attempting to classify data that follows a non-linear pattern might have high bias, resulting in poor performance.
- **High Variance**: A complex decision tree that perfectly fits the training data, but performs poorly on new data, might have high variance.

### **Managing the Trade-off**

- **Cross-validation**: Helps evaluate the model’s performance on unseen data and assess if overfitting or underfitting is occurring.

- **Regularization**: Techniques like L1 (Lasso) or L2 (Ridge) regularization can help reduce variance by penalizing the complexity of the model.

- **Ensemble Methods**: Methods like bagging (Random Forests) or boosting (Gradient Boosting) combine multiple models to reduce variance while maintaining low bias.

]

---

# Data Preprocessing

.pull-top[

### **Definition**  
- **Data preprocessing** involves transforming raw data into a clean and usable format before applying machine learning algorithms. It addresses issues such as missing values, noisy data, and irrelevant features that can affect the model's performance.

### **Steps in Data Preprocessing**  
1. **Data Cleaning**: Handling missing values, detecting and removing outliers, and correcting inconsistencies in the data.
2. **Normalization/Standardization**: Scaling features to a similar range so that they are treated equally by the model (e.g., Min-Max scaling, Z-score standardization).
3. **Encoding Categorical Variables**: Converting categorical features (e.g., "Yes"/"No" or "Male"/"Female") into numerical values (e.g., using one-hot encoding).
4. **Handling Imbalanced Data**: Addressing class imbalance through techniques such as resampling (oversampling/undersampling), synthetic data generation (SMOTE), or using class weights.

]

---

# Data Preprocessing

.pull-top[

### **Importance of Preprocessing**  
- Preprocessing ensures that the data is in the right format for the learning algorithm, improves the quality of the model, and can lead to better generalization. For example, a model that uses raw text data would benefit from preprocessing steps like tokenization and stopword removal.

### **Example**  
- In a dataset with numerical features, if one feature ranges from 0 to 1 and another ranges from 1,000 to 10,000, standardization ensures that both features are on the same scale, preventing the model from giving undue importance to the higher-range feature.

]

---

# Feature Selection

.pull-top[

### **Definition**  
- **Feature selection** is the process of identifying and selecting the most relevant features (variables) for use in model construction. It aims to improve model accuracy, reduce overfitting, and reduce computational cost by removing irrelevant or redundant features.

]

---

# Feature Selection

.pull-top[

### **Techniques for Feature Selection**

1. **Filter Methods**: Evaluate each feature independently using statistical tests (e.g., Chi-square test, correlation coefficient) to determine its relevance to the target variable.
   - **Example**: A low correlation between a feature and the target variable indicates that the feature is not useful and should be removed.
2. **Wrapper Methods**: Search for feature subsets that improve model performance by evaluating different feature combinations through cross-validation.
   - **Example**: Recursive Feature Elimination (RFE) recursively removes the least important features to find the optimal subset of features.
3. **Embedded Methods**: Perform feature selection during the model training process (e.g., Lasso regularization automatically selects important features).
   - **Example**: In linear regression, Lasso adds a penalty for including irrelevant features, encouraging the model to select only the most important ones.

]

---

# Feature Selection

.pull-top[

### **Importance of Feature Selection**  
- Reducing the number of features can lead to simpler models that are easier to interpret, faster to train, and less prone to overfitting. It also helps to reduce the computational cost of training the model.
- Feature selection is particularly useful when dealing with high-dimensional datasets where the number of features exceeds the number of observations (e.g., genomics, text data).

### **Example**  
- In a healthcare dataset, features like "age," "blood pressure," and "cholesterol level" may be highly predictive of a patient's risk of heart disease, while "eye color" and "favorite food" would not contribute meaningfully to the prediction and could be discarded.

]

---

class: inverse, center, middle

# Thanks