Nov 19 2024

(Fast.ai)

(Stanford)

  • AI/ML Lesson 11 - Generalization
  • Overfitting vs Generalization
    • Test sets vs Real-world – how good is our predictor?
  • Controlling the norm - reduce features or reduce weights; early stopping (reduction in epochs)
    • The real objective is not minimizing the training loss, it’s accuracy of inference (minimizing the loss on unseen/future examples)…but we can’t really know that so we settle on minimizing the test loss.
    • Balancing act of minimizing the training error while not letting your hypothesis class grow too big - KISS
  • AI/ML Lesson 12 - Best practices
    • Design decisions → hyperparameters - (hypothesis class (algo), Training objectives (loss/score), optimization algo (min loss), etc.
    • Validation Set / Test Sets
    • Model Development Strategy
      1. Split data into train, val, test (lock this away)
      2. Look at the data to get intuition over the features and the problem space
      3. (loop)
        • 3a. Implement model architecture or feature extractors, adjust hyper params
        • 3b. Run learning algo
        • 3c. Sanity check train and validation error rates - make sure errors going down (if going back up, are you overfitting?)
        • 3d. Look at weights (if interpretable) and prediction errors (how’s the model doing? If not well, then how is it screwing up? If well, then does it make intuitive sense?)
      4. When ready, you unlock that test set and get final error rates.
    • Tips/ Best Practices
      • Fast iterations - start with simple model, small subsets of data, see if you can overfit (drive training error to zero) with just 5 examples – because if you can’t fit anything on five examples, then your model is wrong (data too noisy, model not expressive enough, learning algo isn’t working, etc.)
        • Data hygiene, then separate into training, validation, and test (lock away)
      • Keep track - track training loss and validation loss over time, is it going down? What hyperparams did you use, did you change stats on data (how many features and example), did you change your model (how many weights, norm of weights) or predictions (which ones are right and wrong, why)?
        • Organize experiments by folder, for example
      • Report results - run experiments multiple times, with different random seeds (make sure results are stable and reliable - report mean and std dev over these random seeds)
        • Compute multiple metrics (avoid distilling everything down to one number – e.g., test error), in practice we should be interested in multiple metrics. Error rates on subpopulations, any biases we are cognizant of?
  • AI/ML Lesson 12 - K-means
    • Clustering / Unsupervised Learning - discover structure in unlabeled data
      • K-means objective
      • K-means algo - setting the assignments given the centroids, and setting the centroids given the assignments
    • K-Means - suffers from local minima (we want global)
      • Solutions: run multiple times from different random initializations and take the best one
      • Use a smarter heuristic - K-means++ whereby you pick the initializations with centroid points that are as far away from the clusters as possible (spread out)
    • Good for…
      • Data exploration and discovery
      • Providing representations to downstream supervised learning - gives useful features to provide to supervised learning problems.