Artificial Intelligence Study Guide

DSCI 6612 - Introduction to Artificial Intelligence

University of New Haven

Search Algorithms (Informed and Uninformed)
Heuristics (Consistency and Admissibility)
Adversarial Search in Zero-Sum Games (Minimax, Alpha-Beta)
Adversarial Search in Non-Deterministic Games (Expectimax)
Multi-Agent Games
Markov Decision Processes (MDPs)
Utilities and Value Functions
Problem Formulations

1. Search Algorithms

Problem Formulation

A search problem consists of: - Initial state: Starting configuration - Actions: Set of available actions from each state - Transition model: Result of applying an action to a state - Goal states: States we want to reach - Action cost function: Cost of each action

Performance Measures

Completeness: Does the algorithm find a solution if one exists?
Cost optimality: Does it find the lowest-cost solution?
Time complexity: How long does it take?
Space complexity: How much memory is needed?

Uninformed Search Algorithms

Breadth-First Search (BFS)

Strategy: Expand shallowest nodes first
Implementation: FIFO queue
Complete: Yes (if branching factor is finite)
Optimal: Yes (for uniform action costs)
Time complexity: O(b^d)
Space complexity: O(b^d)
Key feature: Finds optimal solution for unit costs

Uniform-Cost Search (UCS)

Strategy: Expand node with lowest path cost g(n)
Implementation: Priority queue ordered by path cost
Complete: Yes (if action costs ≥ ε > 0)
Optimal: Yes
Time/Space complexity: O(b^(C/ε)) where C is optimal cost
Key feature: Dijkstra’s algorithm for graphs

Depth-First Search (DFS)

Strategy: Expand deepest node first
Implementation: LIFO stack
Complete: No (can get stuck in infinite paths)
Optimal: No
Time complexity: O(b^m) where m is maximum depth
Space complexity: O(bm)
Key advantage: Low memory usage

Depth-Limited Search

Strategy: DFS with depth limit ℓ
Complete: Yes if ℓ ≥ depth of solution
Optimal: No
Key issue: Choosing appropriate depth limit

Iterative Deepening Search (IDS)

Strategy: Repeatedly apply depth-limited search with increasing limits
Complete: Yes
Optimal: Yes (for uniform costs)
Time complexity: O(b^d)
Space complexity: O(bd)
Key advantage: Combines benefits of BFS and DFS

Informed Search Algorithms

Greedy Best-First Search

Strategy: Expand node with lowest heuristic value h(n)
Evaluation function: f(n) = h(n)
Complete: No (can get stuck in local minima)
Optimal: No
Key feature: Fast but not guaranteed to find optimal solution

A* Search

Strategy: Expand node with lowest f(n) = g(n) + h(n)
Complete: Yes
Optimal: Yes (if heuristic is admissible)
Time/Space complexity: Exponential in general
Key feature: Optimally efficient among admissible heuristics

Memory-Bounded Search

**IDA* (Iterative Deepening A*)**: A* with iterative deepening on f-cost
RBFS (Recursive Best-First Search): Uses linear space, remembers f-values
**SMA* (Simplified Memory-bounded A*)**: Uses all available memory optimally

2. Heuristics

Definition

A heuristic function h(n) estimates the cost of the cheapest path from node n to a goal state.

Admissibility

Definition: A heuristic h(n) is admissible if it never overestimates the true cost
Mathematical condition: h(n) ≤ h(n) for all n, where h(n) is true cost to goal
Consequence: A* with admissible heuristic is optimal
Example: Straight-line distance in route-finding problems

Consistency (Monotonicity)

Definition: h(n) ≤ c(n,a,n’) + h(n’) for every node n and successor n’
Geometric interpretation: Triangle inequality
Relationship: Every consistent heuristic is admissible (but not vice versa)
Consequence: With consistent heuristic, A* never reopens nodes

Properties and Relationships

Dominance: If h₂(n) ≥ h₁(n) for all n, then h₂ dominates h₁
Better heuristics: Lead to fewer node expansions
Relaxed problems: Often source of admissible heuristics

Creating Heuristics

Relaxed problems: Remove constraints from original problem
Pattern databases: Precompute costs for subproblems
Landmarks: Use distances to intermediate goals
Learning: Train from experience with problem instances

3. Adversarial Search in Zero-Sum Games

Game Formulation

Players: MAX and MIN (two-player games)
Initial state: Starting configuration
Actions: Legal moves available to each player
Transition model: Result of making a move
Terminal test: When the game ends
Utility function: Payoff for each terminal state

Zero-Sum Games

Definition: One player’s gain equals the other’s loss
Property: What’s good for MAX is bad for MIN
Examples: Chess, checkers, tic-tac-toe

Minimax Algorithm

Principle: MAX maximizes utility, MIN minimizes utility

Minimax value:

MINIMAX(s) = UTILITY(s)                    if TERMINAL(s)
            max_{a} MINIMAX(RESULT(s,a))   if TO-MOVE(s) = MAX
            min_{a} MINIMAX(RESULT(s,a))   if TO-MOVE(s) = MIN

Properties:
- Complete: Yes
- Optimal: Yes (against optimal opponent)
- Time complexity: O(b^m)
- Space complexity: O(bm)

Alpha-Beta Pruning

Purpose: Eliminate branches that cannot affect final decision
Alpha (α): Best value MAX can guarantee so far
Beta (β): Best value MIN can guarantee so far
Pruning condition: If α ≥ β, prune remaining branches
Performance:
- Best case: O(b^(m/2)) - can search twice as deep
- Depends heavily on move ordering
- Average case: O(b^(3m/4))

Move Ordering

Killer moves: Moves that caused cutoffs in similar positions
Iterative deepening: Use results from previous depths
Transposition tables: Cache position evaluations
Good ordering heuristics: Captures first, then threats, then forward moves

Evaluation Functions

Purpose: Estimate utility of non-terminal positions
Requirements:
- Fast to compute
- Strongly correlated with winning chances
Linear form: EVAL(s) = w₁f₁(s) + w₂f₂(s) + … + wₙfₙ(s)
Features: Material, position, mobility, king safety, etc.

Cutting Off Search

Depth limit: Fixed depth cutoff
Quiescence search: Continue until “quiet” positions
Horizon effect: Problems pushed beyond search depth
Singular extensions: Extend search for clearly best moves

4. Adversarial Search in Non-Deterministic Games

Expectimax Algorithm

Purpose: Handle games with chance elements (dice, card draws)
Node types:
- MAX nodes: Choose action to maximize expected utility
- MIN nodes: Choose action to minimize expected utility
- CHANCE nodes: Outcome determined by probability distribution

Expectimax Formula

EXPECTIMAX(s) = UTILITY(s)                           if TERMINAL(s)
               max_{a} EXPECTIMAX(RESULT(s,a))      if TO-MOVE(s) = MAX
               min_{a} EXPECTIMAX(RESULT(s,a))      if TO-MOVE(s) = MIN
               Σ P(r) × EXPECTIMAX(RESULT(s,r))     if TO-MOVE(s) = CHANCE

Key Differences from Minimax

Pruning: More difficult due to averaging
Evaluation: Must return values proportional to winning probability
Complexity: O(b^m × n^m) where n is number of chance outcomes

Applications

Backgammon: Dice rolls determine available moves
Poker: Unknown cards create uncertainty
General principle: Expected utility maximization under uncertainty

5. Multi-Agent Games

Multiplayer Games

Utility vectors: Each player has separate utility function
Example: Three-player game with utilities (v_A, v_B, v_C)
Strategy: Each player maximizes their own utility

Alliances and Cooperation

Emergence: Can arise from purely selfish behavior
Example: Weak players team up against strong player
Dynamics: Alliances form and break as game progresses
Non-zero-sum: Cooperation possible when players can both benefit

Mechanism Design

Goal: Design rules to achieve desired outcomes
Auction theory: Bidding mechanisms
Voting systems: Preference aggregation
Incentive compatibility: Truth-telling is optimal strategy

6. Markov Decision Processes (MDPs)

MDP Formulation

An MDP is defined by: - States: S (set of all possible states) - Actions: A (set of all possible actions) - Transition model: P(s’|s,a) = probability of reaching s’ from s via action a - Reward function: R(s,a,s’) = immediate reward for transition - Discount factor: γ ∈ [0,1] for future reward discounting

Key Properties

Markov property: Future depends only on current state, not history
Stationary: Transition probabilities don’t change over time
Observable: Agent knows current state

Value Functions

State Value Function

V^π(s) = E[Σ_{t=0}^∞ γ^t R(s_t, a_t, s_{t+1}) | s_0 = s, π]

Expected total discounted reward starting from state s following policy π

Action Value Function (Q-function)

Q^π(s,a) = E[Σ_{t=0}^∞ γ^t R(s_t, a_t, s_{t+1}) | s_0 = s, a_0 = a, π]

Expected total discounted reward starting from state s, taking action a, then following π

Bellman Equations

Bellman Equation for V^π

V^π(s) = Σ_a π(a|s) Σ_{s'} P(s'|s,a)[R(s,a,s') + γV^π(s')]

Bellman Optimality Equation for V*

V*(s) = max_a Σ_{s'} P(s'|s,a)[R(s,a,s') + γV*(s')]

Bellman Optimality Equation for Q*

Q*(s,a) = Σ_{s'} P(s'|s,a)[R(s,a,s') + γ max_{a'} Q*(s',a')]

Value Iteration

Initialize V₀(s) arbitrarily for all s

For k = 1, 2, … until convergence:

V_{k+1}(s) = max_a Σ_{s'} P(s'|s,a)[R(s,a,s') + γV_k(s')]

Extract optimal policy: π*(s) = argmax_a Σ_{s’} P(s’|s,a)[R(s,a,s’) + γV*(s’)]

Policy Iteration

Initialize policy π₀ arbitrarily
Repeat until convergence:
- Policy evaluation: Solve V^π_i(s) = Σ_{s’} P(s’|s,π_i(s))[R(s,π_i(s),s’) + γV^π_i(s’)]
- Policy improvement: π_{i+1}(s) = argmax_a Σ_{s’} P(s’|s,a)[R(s,a,s’) + γV^π_i(s’)]

Discounting

Purpose: Model preference for immediate rewards
γ = 0: Only immediate rewards matter
γ = 1: All rewards weighted equally (can cause infinite values)
Typical values: γ ∈ [0.9, 0.99]
Effect: Higher γ means more long-term planning

7. Utilities and Value Functions

Utility Theory

Utility: Numerical representation of preferences
Axioms: Completeness, transitivity, continuity, monotonicity
Expected utility: Decision making under uncertainty

Risk Attitudes

Risk-neutral: U(expected value) = expected U(value)
Risk-averse: U(expected value) > expected U(value)
Risk-seeking: U(expected value) < expected U(value)

Utility in Games

Zero-sum: Utilities sum to constant (typically zero)
General-sum: Players can have aligned interests
Von Neumann-Morgenstern: Utility theory for strategic interactions

Value Function Properties

Monotonicity: Better states have higher values
Additivity: Value decomposes across time/features
Convergence: Iterative methods converge to optimal values

8. Problem Formulations

Search Problems

Components: Initial state, actions, transition model, goal test, path cost Examples: - Route finding: States = locations, actions = moves, goal = destination - 8-puzzle: States = board configurations, actions = tile moves, goal = solved state

Game Problems

Components: Players, initial state, actions, transition model, terminal test, utility Examples: - Chess: States = board positions, actions = legal moves, utility = win/loss/draw - Poker: Hidden information, chance elements, betting actions

MDP Problems

Components: States, actions, transition probabilities, rewards, discount factor Examples: - Grid world: States = positions, actions = movements, rewards = goal/penalty - Robot navigation: Uncertain movement, sensor noise, multiple objectives

Key Formulation Decisions

State representation: What information to include/exclude
Action space: Discrete vs continuous, atomic vs composite
Objective function: Single vs multiple objectives, short-term vs long-term
Uncertainty modeling: Deterministic vs stochastic, full vs partial observability

Quick Reference Formulas

Search Complexity

BFS/UCS: Time O(b^d), Space O(b^d)
DFS: Time O(b^m), Space O(bm)
A*: Time/Space exponential, but optimally efficient
IDA*: Time O(b^d), Space O(bd)

Heuristic Properties

Admissible: h(n) ≤ h*(n)
Consistent: h(n) ≤ c(n,a,n’) + h(n’)
Dominance: h₂(n) ≥ h₁(n) ⇒ h₂ dominates h₁

Game Theory

Minimax value: Optimal play assuming rational opponent
Alpha-beta: Prune when α ≥ β
Expectimax: Expected utility under chance

MDPs

Bellman equation: V*(s) = max_a Σ_{s’} P(s’|s,a)[R(s,a,s’) + γV*(s’)]
Q-value: Q*(s,a) = Σ_{s’} P(s’|s,a)[R(s,a,s’) + γV*(s’)]
Policy: π(s) = argmax_a Q(s,a)

Exam Tips

Understand problem formulations - Be able to define states, actions, goals for different problem types
Know algorithm properties - Completeness, optimality, complexity for each algorithm
Practice heuristic evaluation - Determine if heuristics are admissible/consistent
Work through minimax examples - Include alpha-beta pruning calculations
Solve simple MDPs - Value iteration, policy extraction, Bellman equations
Understand utility theory - Risk preferences, expected utility calculations

Common Mistakes to Avoid

Confusing admissible vs consistent heuristics
Incorrect alpha-beta pruning (wrong bounds)
Forgetting discount factor in MDP calculations
Mixing up MAX and MIN nodes in game trees
Incorrect complexity analysis

Study Guide prepared for DSCI 6612 - Introduction to Artificial Intelligence University of New Haven - Fall 2025