Day 14 - Backpropagation (computing gradients), Differentiable Programming, BERT, RNN vs Transformers
Nov 15 2024
(Stanford)
- AI/ML Learning - Backpropagation - computing gradients automatically
- Broke down complex equations into building blocks. Covered back and forward propagation. Demonstrating how neural networks can calculate the weights for each feature for any function.
- AI/ML Learning - Differentiable programming’
- Feedforward NN, Convolutional NN, MaxPool, SequenceRNN, SimpleRNN, LSTM, Attention, AddNorm, TransformerBlock, BErT, Collapsing, GenerateToken, LanguageModel, SequenceToSequence
- Once we have the EmbedTokens, we have a choice to use them either on RNNs (SequenceRNN, SimpleRNN, LSTM) or Transformers (which are based on attention - addNorm, TransfomerBlock, BERT)
- We can Collapse to do categorization
- We can generate new sequences (GenerateToken, LanguageModel, Seq2Seq)
- Sequencial RNNs vs Transformers, which use attention mechanism - processing y by comparing it to each xi…and self attention, is processing x by comparing to each x~i~. Layer normalization and residual connections => AddNorm(f) – applies f to x safely, so if f() is junk, at least i have x. Basically, normalizing the feature vectors.
- TransformerBlock
- Attention (x) – allow the vectors to talk to each other
- AddNorm - to normalize all these results
- Feedforward - to each individual vector
- And AddNorm that result as well.
- BERT - Embeds the tokens (words or letters of language), then does a TransformerBlock 24 times! Resulting in getting a series of vectors that are highly contextualized and nuanced, and contains a lot of rich information about the sentence
- From here you can use collapse to drive the vectors into one value → category
- Or use it to select an answer out of the questions (predicting the next token)
- Tokens
- GenerateToken → x[] -> y (e.g., ‘abc’) generate token y based on [ x * EmbedToken(y) ]
- EmbedToken - Looks up and returns the vector for a token ‘abc’ -> x[]
- Language Model
- If x is (‘the quick brown’) → what’s the next predicted word? To generate the token in the next sequence…
- LanguageModel(x=’the quick brown’) = GenerateToken(Collapse(SequenceModel(EmbedToken(x))))
- Sequence-to-Sequence models [ la maison bleue ] => [ the blue house ]
- Generate a sequence from another sequence (sentence to translation, document summary, semantic parsing → e.g., sentence to code)