Dec 2 2024

Over the Thanksgiving break, I tried out Deeplearning.AI, which was recommended to me by Sang V. early in my journey. Feeling like I’ve established a strong enough baseline now with Fast.ai and Stanford, I tried out one of the videos. I really liked their format…and it was important for me to get a set of foundational knowledge from Fast.ai and Stanford before really being able to understand these courses. These courses feel like they ‘filled in the gaps’ from the foundational courses, that were both made before things like ChatGPT took off.

(Deeplearning.AI)

  • Recap of last 3 videos that I saw over the thanksgiving break
  • Finetuning Large Language Models
    • Great comparison of finetuning vs prompt engineering - Primary Care Practitioner vs Cardiologist
Category Prompting Finetuning
Providing Data (PRO) Not much data needed to get started (CON) Need a lot more data, at least 100 examples is recommended, and it needs to be high-quality data (labeled, ideally)
Providing Data (CON) Much less data fits, can forget data (PRO) Nearly unlimited Data fits
Results (CON) Hallucinations , hard to teach on new data (PRO) Learns new information, More accurate, few hallucinations. Can correct incorrect information
Injecting Data (PRO) Connect data through retrieval (RAG) but a (CON) is that you can have RAG misses, or gets incorrect data (you can also do RAG after Finetuning, but you can get better results with finetuning on your data )
Costs (PRO) Smaller up-front costs (CON) but higher token costs later if you will hit the model often (CON) Higher costs to train models but lower costs when hitting the model later on with high frequency (fewer tokens needed)
Knowledge (PRO) Quick to get started right away (CON) Need more knowledge and experience to train/finetune an existing LLM
Time (PRO) Can get something up in days (CON) Will likely take a few weeks or more
Generally Good for side projects, fast prototyping, but somewhat generic results … Domain-specific, enterprise, production usage (speed/performance/low latency), more consistent responses, keeps data private
  • Finetuning, Data preparation, Training Process, Evaluation and Iteration

  • Practical Approach to finetuning
    1. Figure out the overall objective/task for your model…start with a reduced task complexity
    2. Collect data related to the tasks inputs and outputs
      • Generate data if you don’t have enough
    3. Finetune on a small model to start so that you can iterate faster (and cheaper)
    4. Vary the amount of data you give the model
    5. Evaluate your LLM to see what’s going well and what needs improvement
    6. Show it off to stakeholders, and get buy-in to collect and feed it more data for better results
    7. Add task complexity after getting good results
    8. Try better performing models (typically larger) to see what improvements you get
      • Bigger models are better at harder tasks like ‘writing’ (chatting, writing emails, write code), or the agent needing to do several things at once or in one-step
  • GPU Memory needs when training (# of tokens) vs inference (# of params)
    • PEFT: Parameter-Efficient Finetuning
    • LAMINI seems to prefer LoRA - low rank, where it freezes the model and only applies matrix transforms on pre-trained weights, which need to be applied during inference
    • I would not have been able to understand the above discussion without my previous learnings from Stanford and Fast.ai
  • Building Systems with the ChatGPT API - Example of an end to end customer service assistant:
    • System, user, and Assistant roles and messages in the prompt
    • classify the inquiry - primary (type of query) and secondary (and instructions to use based on that category)
    • Chain of Thought reasoning - asking model to reason about it’s answer before coming to a conclusion; the format to use when answering
    • Chaining prompts - categorize question - product selection, product lookup, response to the user
    • Context Limitations, Max Tokens, and Reducing Costs (fewer tokens)
    • Moderation - Cleanse the input to make sure it is not malicious
    • Evaluation/ Prompt Training - evaluating the responses - tuning prompt with handful of examples, add some tricky examples, develop metrics to measure performance (BLEU, Rubrics, and use other LLM calls to reason about answer), collect data to tune-prompts (not really train) and validate, and a test set.
    • Note: Uses prompt engineering (not fine tuning models) - adding system messages to provide instructions for better performance

    • How to train on just a few prompts; rubrics to train it.
  • Getting Started with Mistral
    • Mistrel has a number of models (open source vs commercial, and varying sizes) tuned for business use cases, and finetuning. Mistral has a is a framework, and this course showed how to us it for activities such as calling calling USER-DEFINED functions (calling out to your database or APIs), and it also covered Embeddings and RAG (nice intro to this topic)
  • Speaking of User-Defined Functions, here’s another video that talks about how you can use these functions to:
    • More reliably connect GPT’s capabilities with external tools and APIs
    • Improve reliability and reduce hallucination
    • Structure outputs (ensure responses are in JSON)
      • And convert unstructured data into structured data
    • Convert Natural Language into API calls or database queries
    • Build complex conversational agents