Day 16 - Insights on Random Forests, Generalization, and K-Means
Nov 19 2024
(Fast.ai)
- Finished Questions/solutions review for Chapter 9 (From Lesson 5)
- Lesson 6 - How Random Forrest Really Work (video)
- More trees is better, but diminishing returns…Jeremy doesn’t go past 100 (rule of thumb)
- Road to the Top (part 1), how Jeremy got to the top of the leaderboard for a computer vision competition
** Important NOTE: **
* Imaging world-> Columns by Rows * Tensor/array world -> we say Rows by Columns
-
** Important NOTE: **
- We do image augmentation, where we randomly augment the image each time we train, to prevent the model from memorizing the images (to help the model generalize) - learn.tta() will re-use the same augmentation done during training…it’s not the same thing as doing image augmentation during training.
(Stanford)
- AI/ML Lesson 11 - Generalization
- Overfitting vs Generalization
- Test sets vs Real-world – how good is our predictor?
- Controlling the norm - reduce features or reduce weights; early stopping (reduction in epochs)
- The real objective is not minimizing the training loss, it’s accuracy of inference (minimizing the loss on unseen/future examples)…but we can’t really know that so we settle on minimizing the test loss.
- Balancing act of minimizing the training error while not letting your hypothesis class grow too big - KISS
- AI/ML Lesson 12 - Best practices
- Design decisions → hyperparameters - (hypothesis class (algo), Training objectives (loss/score), optimization algo (min loss), etc.
- Validation Set / Test Sets
- Model Development Strategy
- Split data into train, val, test (lock this away)
- Look at the data to get intuition over the features and the problem space
- (loop)
- 3a. Implement model architecture or feature extractors, adjust hyper params
- 3b. Run learning algo
- 3c. Sanity check train and validation error rates - make sure errors going down (if going back up, are you overfitting?)
- 3d. Look at weights (if interpretable) and prediction errors (how’s the model doing? If not well, then how is it screwing up? If well, then does it make intuitive sense?)
- When ready, you unlock that test set and get final error rates.
- Tips/ Best Practices
- Fast iterations - start with simple model, small subsets of data, see if you can overfit (drive training error to zero) with just 5 examples – because if you can’t fit anything on five examples, then your model is wrong (data too noisy, model not expressive enough, learning algo isn’t working, etc.)
- Data hygiene, then separate into training, validation, and test (lock away)
- Keep track - track training loss and validation loss over time, is it going down? What hyperparams did you use, did you change stats on data (how many features and example), did you change your model (how many weights, norm of weights) or predictions (which ones are right and wrong, why)?
- Organize experiments by folder, for example
- Report results - run experiments multiple times, with different random seeds (make sure results are stable and reliable - report mean and std dev over these random seeds)
- Compute multiple metrics (avoid distilling everything down to one number – e.g., test error), in practice we should be interested in multiple metrics. Error rates on subpopulations, any biases we are cognizant of?
- Fast iterations - start with simple model, small subsets of data, see if you can overfit (drive training error to zero) with just 5 examples – because if you can’t fit anything on five examples, then your model is wrong (data too noisy, model not expressive enough, learning algo isn’t working, etc.)
- AI/ML Lesson 12 - K-means
- Clustering / Unsupervised Learning - discover structure in unlabeled data
- K-means objective
- K-means algo - setting the assignments given the centroids, and setting the centroids given the assignments
- K-Means - suffers from local minima (we want global)
- Solutions: run multiple times from different random initializations and take the best one
- Use a smarter heuristic - K-means++ whereby you pick the initializations with centroid points that are as far away from the clusters as possible (spread out)
- Good for…
- Data exploration and discovery
- Providing representations to downstream supervised learning - gives useful features to provide to supervised learning problems.
- Clustering / Unsupervised Learning - discover structure in unlabeled data