A search problem consists of: - Initial state: Starting configuration - Actions: Set of available actions from each state - Transition model: Result of applying an action to a state - Goal states: States we want to reach - Action cost function: Cost of each action
A heuristic function h(n) estimates the cost of the cheapest path from node n to a goal state.
Principle: MAX maximizes utility, MIN minimizes utility
Minimax value:
MINIMAX(s) = UTILITY(s) if TERMINAL(s)
max_{a} MINIMAX(RESULT(s,a)) if TO-MOVE(s) = MAX
min_{a} MINIMAX(RESULT(s,a)) if TO-MOVE(s) = MIN
Properties:
EXPECTIMAX(s) = UTILITY(s) if TERMINAL(s)
max_{a} EXPECTIMAX(RESULT(s,a)) if TO-MOVE(s) = MAX
min_{a} EXPECTIMAX(RESULT(s,a)) if TO-MOVE(s) = MIN
Σ P(r) × EXPECTIMAX(RESULT(s,r)) if TO-MOVE(s) = CHANCE
An MDP is defined by: - States: S (set of all possible states) - Actions: A (set of all possible actions) - Transition model: P(s’|s,a) = probability of reaching s’ from s via action a - Reward function: R(s,a,s’) = immediate reward for transition - Discount factor: γ ∈ [0,1] for future reward discounting
V^π(s) = E[Σ_{t=0}^∞ γ^t R(s_t, a_t, s_{t+1}) | s_0 = s, π]
Expected total discounted reward starting from state s following policy π
Q^π(s,a) = E[Σ_{t=0}^∞ γ^t R(s_t, a_t, s_{t+1}) | s_0 = s, a_0 = a, π]
Expected total discounted reward starting from state s, taking action a, then following π
V^π(s) = Σ_a π(a|s) Σ_{s'} P(s'|s,a)[R(s,a,s') + γV^π(s')]
V*(s) = max_a Σ_{s'} P(s'|s,a)[R(s,a,s') + γV*(s')]
Q*(s,a) = Σ_{s'} P(s'|s,a)[R(s,a,s') + γ max_{a'} Q*(s',a')]
Initialize V₀(s) arbitrarily for all s
For k = 1, 2, … until convergence:
V_{k+1}(s) = max_a Σ_{s'} P(s'|s,a)[R(s,a,s') + γV_k(s')]
Extract optimal policy: π*(s) = argmax_a Σ_{s’} P(s’|s,a)[R(s,a,s’) + γV*(s’)]
Components: Initial state, actions, transition model, goal test, path cost Examples: - Route finding: States = locations, actions = moves, goal = destination - 8-puzzle: States = board configurations, actions = tile moves, goal = solved state
Components: Players, initial state, actions, transition model, terminal test, utility Examples: - Chess: States = board positions, actions = legal moves, utility = win/loss/draw - Poker: Hidden information, chance elements, betting actions
Components: States, actions, transition probabilities, rewards, discount factor Examples: - Grid world: States = positions, actions = movements, rewards = goal/penalty - Robot navigation: Uncertain movement, sensor noise, multiple objectives
Study Guide prepared for DSCI 6612 - Introduction to Artificial Intelligence University of New Haven - Fall 2025