Nov 15 2024

(Stanford)

AI/ML Learning - Backpropagation - computing gradients automatically
- Broke down complex equations into building blocks. Covered back and forward propagation. Demonstrating how neural networks can calculate the weights for each feature for any function.
AI/ML Learning - Differentiable programming’
- Feedforward NN, Convolutional NN, MaxPool, SequenceRNN, SimpleRNN, LSTM, Attention, AddNorm, TransformerBlock, BErT, Collapsing, GenerateToken, LanguageModel, SequenceToSequence
- Once we have the EmbedTokens, we have a choice to use them either on RNNs (SequenceRNN, SimpleRNN, LSTM) or Transformers (which are based on attention - addNorm, TransfomerBlock, BERT)
  - We can Collapse to do categorization
  - We can generate new sequences (GenerateToken, LanguageModel, Seq2Seq)
- Sequencial RNNs vs Transformers, which use attention mechanism - processing y by comparing it to each xi…and self attention, is processing x by comparing to each x~i~. Layer normalization and residual connections => AddNorm(f) – applies f to x safely, so if f() is junk, at least i have x. Basically, normalizing the feature vectors.
- TransformerBlock
  - Attention (x) – allow the vectors to talk to each other
  - AddNorm - to normalize all these results
  - Feedforward - to each individual vector
  - And AddNorm that result as well.
- BERT - Embeds the tokens (words or letters of language), then does a TransformerBlock 24 times! Resulting in getting a series of vectors that are highly contextualized and nuanced, and contains a lot of rich information about the sentence
  - From here you can use collapse to drive the vectors into one value → category
  - Or use it to select an answer out of the questions (predicting the next token)
- Tokens
  - GenerateToken → x[] -> y (e.g., ‘abc’) generate token y based on [ x * EmbedToken(y) ]
  - EmbedToken - Looks up and returns the vector for a token ‘abc’ -> x[]
- Language Model
  - If x is (‘the quick brown’) → what’s the next predicted word? To generate the token in the next sequence…
  - LanguageModel(x=’the quick brown’) = GenerateToken(Collapse(SequenceModel(EmbedToken(x))))
- Sequence-to-Sequence models [ la maison bleue ] => [ the blue house ]
  - Generate a sequence from another sequence (sentence to translation, document summary, semantic parsing → e.g., sentence to code)