Nov 21 2024

(Stanford)

Search 2 - A* | Stanford CS221: Artificial Intelligence (Autumn 2019)

Here is a great video explaining A* Search
Can I use X algo? What’s the cost? Is it cyclical? (could it be broken up into backwards/forward – so linear but add ‘direction’ to state)
Where do you get the heuristic? Well, you could try first by making the problem simpler by removing constraints (knock down walls, allow tiles to overlap, etc.)
The heuristic needs to be tested for consistency [ The first inequality follows because h(s) = FutureCost_rel(s) , and all future costs correspond to the minimum cost paths. So taking action a from state s better be less than taking the best action from state s (this is all in the search problem P_rel ).)
Tradeoffs:
- Efficiency → h(s) = FutureCost_rel(s) must be easy to compute [Closed form, easier search, independent subproblems]
- Tightness → heuristic h(s) should be close to FutureCost(s) (don’t remove too many constraints)
- So the art of designing heuristics is to balance informativeness with computational efficiency.
- You can have more than one heuristic, and you just take the max of each (you don’t have to choose just one)
Now turn to “learning” when you have a search problem
- Going back to transportation example (walk = s to s+1, Tram = s to 2s
- But if we don’t know the cost, but say we do know the optimal path, how can we learn the costs?
- Learning is the inverse of Search
  - Forward problem (search): Cost (s, a) –> (a1, a2, …ak)
  - Inverse problem (learning): (a1, a2, …ak) –> Cost (s, a)
  - Structured Perceptron (simplified) See Slide 108
  - Structured Perceptron (updated by Collins, 2002) - See slide 112

(Fast.ai)

Lesson 7

Road to the top (part 3)
Gradient accumulation - not needing bigger GPUs (more memory) because you do gradient accumulation (for Convnext)
Batch size? Pick the largest one you can (fits in your GPU) and make it a multiple of 8
- Rule of thumb - if you divide batch size by 2 you divide the learning rate by 2