Artificial Intelligence Study Guide

DSCI 6612 - Introduction to Artificial Intelligence

University of New Haven


Table of Contents

  1. Search Algorithms (Informed and Uninformed)
  2. Heuristics (Consistency and Admissibility)
  3. Adversarial Search in Zero-Sum Games (Minimax, Alpha-Beta)
  4. Adversarial Search in Non-Deterministic Games (Expectimax)
  5. Multi-Agent Games
  6. Markov Decision Processes (MDPs)
  7. Utilities and Value Functions
  8. Problem Formulations

1. Search Algorithms

Problem Formulation

A search problem consists of: - Initial state: Starting configuration - Actions: Set of available actions from each state - Transition model: Result of applying an action to a state - Goal states: States we want to reach - Action cost function: Cost of each action

Performance Measures

  • Completeness: Does the algorithm find a solution if one exists?
  • Cost optimality: Does it find the lowest-cost solution?
  • Time complexity: How long does it take?
  • Space complexity: How much memory is needed?

Uninformed Search Algorithms

Breadth-First Search (BFS)

  • Strategy: Expand shallowest nodes first
  • Implementation: FIFO queue
  • Complete: Yes (if branching factor is finite)
  • Optimal: Yes (for uniform action costs)
  • Time complexity: O(b^d)
  • Space complexity: O(b^d)
  • Key feature: Finds optimal solution for unit costs

Uniform-Cost Search (UCS)

  • Strategy: Expand node with lowest path cost g(n)
  • Implementation: Priority queue ordered by path cost
  • Complete: Yes (if action costs ≥ ε > 0)
  • Optimal: Yes
  • Time/Space complexity: O(b^(C/ε)) where C is optimal cost
  • Key feature: Dijkstra’s algorithm for graphs

Depth-First Search (DFS)

  • Strategy: Expand deepest node first
  • Implementation: LIFO stack
  • Complete: No (can get stuck in infinite paths)
  • Optimal: No
  • Time complexity: O(b^m) where m is maximum depth
  • Space complexity: O(bm)
  • Key advantage: Low memory usage

Iterative Deepening Search (IDS)

  • Strategy: Repeatedly apply depth-limited search with increasing limits
  • Complete: Yes
  • Optimal: Yes (for uniform costs)
  • Time complexity: O(b^d)
  • Space complexity: O(bd)
  • Key advantage: Combines benefits of BFS and DFS

Informed Search Algorithms

2. Heuristics

Definition

A heuristic function h(n) estimates the cost of the cheapest path from node n to a goal state.

Admissibility

  • Definition: A heuristic h(n) is admissible if it never overestimates the true cost
  • Mathematical condition: h(n) ≤ h(n) for all n, where h(n) is true cost to goal
  • Consequence: A* with admissible heuristic is optimal
  • Example: Straight-line distance in route-finding problems

Consistency (Monotonicity)

  • Definition: h(n) ≤ c(n,a,n’) + h(n’) for every node n and successor n’
  • Geometric interpretation: Triangle inequality
  • Relationship: Every consistent heuristic is admissible (but not vice versa)
  • Consequence: With consistent heuristic, A* never reopens nodes

Properties and Relationships

  • Dominance: If h₂(n) ≥ h₁(n) for all n, then h₂ dominates h₁
  • Better heuristics: Lead to fewer node expansions
  • Relaxed problems: Often source of admissible heuristics

Creating Heuristics

  1. Relaxed problems: Remove constraints from original problem
  2. Pattern databases: Precompute costs for subproblems
  3. Landmarks: Use distances to intermediate goals
  4. Learning: Train from experience with problem instances

3. Adversarial Search in Zero-Sum Games

Game Formulation

  • Players: MAX and MIN (two-player games)
  • Initial state: Starting configuration
  • Actions: Legal moves available to each player
  • Transition model: Result of making a move
  • Terminal test: When the game ends
  • Utility function: Payoff for each terminal state

Zero-Sum Games

  • Definition: One player’s gain equals the other’s loss
  • Property: What’s good for MAX is bad for MIN
  • Examples: Chess, checkers, tic-tac-toe

Minimax Algorithm

  • Principle: MAX maximizes utility, MIN minimizes utility

  • Minimax value:

    MINIMAX(s) = UTILITY(s)                    if TERMINAL(s)
                max_{a} MINIMAX(RESULT(s,a))   if TO-MOVE(s) = MAX
                min_{a} MINIMAX(RESULT(s,a))   if TO-MOVE(s) = MIN
  • Properties:

    • Complete: Yes
    • Optimal: Yes (against optimal opponent)
    • Time complexity: O(b^m)
    • Space complexity: O(bm)

Alpha-Beta Pruning

  • Purpose: Eliminate branches that cannot affect final decision
  • Alpha (α): Best value MAX can guarantee so far
  • Beta (β): Best value MIN can guarantee so far
  • Pruning condition: If α ≥ β, prune remaining branches
  • Performance:
    • Best case: O(b^(m/2)) - can search twice as deep
    • Depends heavily on move ordering
    • Average case: O(b^(3m/4))

Move Ordering

  • Killer moves: Moves that caused cutoffs in similar positions
  • Iterative deepening: Use results from previous depths
  • Transposition tables: Cache position evaluations
  • Good ordering heuristics: Captures first, then threats, then forward moves

Evaluation Functions

  • Purpose: Estimate utility of non-terminal positions
  • Requirements:
    • Fast to compute
    • Strongly correlated with winning chances
  • Linear form: EVAL(s) = w₁f₁(s) + w₂f₂(s) + … + wₙfₙ(s)
  • Features: Material, position, mobility, king safety, etc.

4. Adversarial Search in Non-Deterministic Games

Expectimax Algorithm

  • Purpose: Handle games with chance elements (dice, card draws)
  • Node types:
    • MAX nodes: Choose action to maximize expected utility
    • MIN nodes: Choose action to minimize expected utility
    • CHANCE nodes: Outcome determined by probability distribution

Expectimax Formula

EXPECTIMAX(s) = UTILITY(s)                           if TERMINAL(s)
               max_{a} EXPECTIMAX(RESULT(s,a))      if TO-MOVE(s) = MAX
               min_{a} EXPECTIMAX(RESULT(s,a))      if TO-MOVE(s) = MIN
               Σ P(r) × EXPECTIMAX(RESULT(s,r))     if TO-MOVE(s) = CHANCE

Key Differences from Minimax

  • Pruning: More difficult due to averaging
  • Evaluation: Must return values proportional to winning probability
  • Complexity: O(b^m × n^m) where n is number of chance outcomes

Applications

  • Backgammon: Dice rolls determine available moves
  • Poker: Unknown cards create uncertainty
  • General principle: Expected utility maximization under uncertainty

5. Multi-Agent Games

Multiplayer Games

  • Utility vectors: Each player has separate utility function
  • Example: Three-player game with utilities (v_A, v_B, v_C)
  • Strategy: Each player maximizes their own utility

Alliances and Cooperation

  • Emergence: Can arise from purely selfish behavior
  • Example: Weak players team up against strong player
  • Dynamics: Alliances form and break as game progresses
  • Non-zero-sum: Cooperation possible when players can both benefit

Mechanism Design

  • Goal: Design rules to achieve desired outcomes
  • Auction theory: Bidding mechanisms
  • Voting systems: Preference aggregation
  • Incentive compatibility: Truth-telling is optimal strategy

6. Markov Decision Processes (MDPs)

MDP Formulation

An MDP is defined by: - States: S (set of all possible states) - Actions: A (set of all possible actions) - Transition model: P(s’|s,a) = probability of reaching s’ from s via action a - Reward function: R(s,a,s’) = immediate reward for transition - Discount factor: γ ∈ [0,1] for future reward discounting

Key Properties

  • Markov property: Future depends only on current state, not history
  • Stationary: Transition probabilities don’t change over time
  • Observable: Agent knows current state

Value Functions

State Value Function

V^π(s) = E[Σ_{t=0}^∞ γ^t R(s_t, a_t, s_{t+1}) | s_0 = s, π]

Expected total discounted reward starting from state s following policy π

Action Value Function (Q-function)

Q^π(s,a) = E[Σ_{t=0}^∞ γ^t R(s_t, a_t, s_{t+1}) | s_0 = s, a_0 = a, π]

Expected total discounted reward starting from state s, taking action a, then following π

Bellman Equations

Bellman Equation for V^π

V^π(s) = Σ_a π(a|s) Σ_{s'} P(s'|s,a)[R(s,a,s') + γV^π(s')]

Bellman Optimality Equation for V*

V*(s) = max_a Σ_{s'} P(s'|s,a)[R(s,a,s') + γV*(s')]

Bellman Optimality Equation for Q*

Q*(s,a) = Σ_{s'} P(s'|s,a)[R(s,a,s') + γ max_{a'} Q*(s',a')]

Value Iteration

  1. Initialize V₀(s) arbitrarily for all s

  2. For k = 1, 2, … until convergence:

    V_{k+1}(s) = max_a Σ_{s'} P(s'|s,a)[R(s,a,s') + γV_k(s')]
  3. Extract optimal policy: π*(s) = argmax_a Σ_{s’} P(s’|s,a)[R(s,a,s’) + γV*(s’)]

Policy Iteration

  1. Initialize policy π₀ arbitrarily
  2. Repeat until convergence:
    • Policy evaluation: Solve V^π_i(s) = Σ_{s’} P(s’|s,π_i(s))[R(s,π_i(s),s’) + γV^π_i(s’)]
    • Policy improvement: π_{i+1}(s) = argmax_a Σ_{s’} P(s’|s,a)[R(s,a,s’) + γV^π_i(s’)]

Discounting

  • Purpose: Model preference for immediate rewards
  • γ = 0: Only immediate rewards matter
  • γ = 1: All rewards weighted equally (can cause infinite values)
  • Typical values: γ ∈ [0.9, 0.99]
  • Effect: Higher γ means more long-term planning

7. Utilities and Value Functions

Utility Theory

  • Utility: Numerical representation of preferences
  • Axioms: Completeness, transitivity, continuity, monotonicity
  • Expected utility: Decision making under uncertainty

Risk Attitudes

  • Risk-neutral: U(expected value) = expected U(value)
  • Risk-averse: U(expected value) > expected U(value)
  • Risk-seeking: U(expected value) < expected U(value)

Utility in Games

  • Zero-sum: Utilities sum to constant (typically zero)
  • General-sum: Players can have aligned interests
  • Von Neumann-Morgenstern: Utility theory for strategic interactions

Value Function Properties

  • Monotonicity: Better states have higher values
  • Additivity: Value decomposes across time/features
  • Convergence: Iterative methods converge to optimal values

8. Problem Formulations

Search Problems

Components: Initial state, actions, transition model, goal test, path cost Examples: - Route finding: States = locations, actions = moves, goal = destination - 8-puzzle: States = board configurations, actions = tile moves, goal = solved state

Game Problems

Components: Players, initial state, actions, transition model, terminal test, utility Examples: - Chess: States = board positions, actions = legal moves, utility = win/loss/draw - Poker: Hidden information, chance elements, betting actions

MDP Problems

Components: States, actions, transition probabilities, rewards, discount factor Examples: - Grid world: States = positions, actions = movements, rewards = goal/penalty - Robot navigation: Uncertain movement, sensor noise, multiple objectives

Key Formulation Decisions

  1. State representation: What information to include/exclude
  2. Action space: Discrete vs continuous, atomic vs composite
  3. Objective function: Single vs multiple objectives, short-term vs long-term
  4. Uncertainty modeling: Deterministic vs stochastic, full vs partial observability

Quick Reference Formulas

Search Complexity

  • BFS/UCS: Time O(b^d), Space O(b^d)
  • DFS: Time O(b^m), Space O(bm)
  • A*: Time/Space exponential, but optimally efficient
  • IDA*: Time O(b^d), Space O(bd)

Heuristic Properties

  • Admissible: h(n) ≤ h*(n)
  • Consistent: h(n) ≤ c(n,a,n’) + h(n’)
  • Dominance: h₂(n) ≥ h₁(n) ⇒ h₂ dominates h₁

Game Theory

  • Minimax value: Optimal play assuming rational opponent
  • Alpha-beta: Prune when α ≥ β
  • Expectimax: Expected utility under chance

MDPs

  • Bellman equation: V*(s) = max_a Σ_{s’} P(s’|s,a)[R(s,a,s’) + γV*(s’)]
  • Q-value: Q*(s,a) = Σ_{s’} P(s’|s,a)[R(s,a,s’) + γV*(s’)]
  • Policy: π(s) = argmax_a Q(s,a)

Exam Tips

  1. Understand problem formulations - Be able to define states, actions, goals for different problem types
  2. Know algorithm properties - Completeness, optimality, complexity for each algorithm
  3. Practice heuristic evaluation - Determine if heuristics are admissible/consistent
  4. Work through minimax examples - Include alpha-beta pruning calculations
  5. Solve simple MDPs - Value iteration, policy extraction, Bellman equations
  6. Understand utility theory - Risk preferences, expected utility calculations

Common Mistakes to Avoid

  • Confusing admissible vs consistent heuristics
  • Incorrect alpha-beta pruning (wrong bounds)
  • Forgetting discount factor in MDP calculations
  • Mixing up MAX and MIN nodes in game trees
  • Incorrect complexity analysis

Study Guide prepared for DSCI 6612 - Introduction to Artificial Intelligence University of New Haven - Fall 2025