Dec 9 2024

  • Continuing Kaggle/Google day 1 paper - Foundational LLM and Text Generation
  • Prompt Engineering
    • Try using verbs that describe action in your prompts:
    • Act, Analyze, Categorize, Classify, Contrast, Compare, Create, Describe, Define,
    • Evaluate, Extract, Find, Generate, Identify, List, Measure, Organize, Parse, Pick, Predict, Provide, Rank, Recommend, Return, Retrieve, Rewrite, Select, Show, Sort,
    • Summarize, Translate, Write.
    • Types of Prompts
      • System / Role/ Contextual prompting - who are you, what is your goal, what’s your style
      • Step-back prompting - take a look at this info, and now answer
      • Chain of thought - think through each step, don’t jump to the answer
      • https://github.com/GoogleCloudPlatform/generative-ai/blob/main/language/prompts/examples/chain_of_thought_react.ipynb
      • Self consistency - asking yourself 3x and go with most frequent answer
      • Tree of thoughts - like self-consistency, but at a level above the prompt
      • ReAct - using tools
    • Also try to be clear with your instructions by providing both a “DO” and a “DO NOT” instruction set.
      • Keep track of your prompts and test them each time
      • Name/version
      • Goal
      • Model
      • Temperature/Top-K, Top-P
      • Token Limit
      • Prompt
      • Output

Ways to add new data to your LLM

  • Discussion with ChatGPT on different options to incorporating your own knowledge into a LLM, summarized in this table

    Method Best For Pros Cons
    Soft Prompting Small FAQ sets, low-cost setups - No training required
    - Flexible and easy to update
    - Limited by token size
    - FAQs must be in every prompt
    RAG (Retrieval-Augmented Generation) Large or dynamic FAQ datasets - Scalable
    - FAQs dynamically retrieved
    - No fine-tuning required
    - Complex setup
    - Dependent on retrieval quality
    Fine-Tuning Highly repetitive or static FAQs - Efficient inference
    - Consistent answers
    - Tailored behavior
    - High initial cost
    - Requires re-training for updates
    Middleware System Handling simple FAQ queries - Fast responses
    - Hybrid with GPT-4
    - Dual system complexity
    - Threshold tuning
    Tool-Augmented Models Structured FAQ storage - Accurate
    - Real-time updates
    - Development overhead
    Knowledge Distillation Reducing API costs - Efficient for repetitive FAQs
    - Cost-effective inference
    - Training and maintenance overhead
    Student/Teacher Replicating GPT-4 quality on FAQs at scale - High-quality outputs
    - Cost-effective once trained
    - Reduces dependence on GPT-4
    - Training requires GPT-4 API calls
    - Retraining for FAQ updates
    Adaptive Prompt Chaining Ambiguous or multi-part queries - Improves query understanding
    - Interactive experience
    - Slower interactions
    - May require multiple exchanges
    Knowledge Graph Complex, interrelated FAQs - Rich context
    - Supports reasoning
    - Complex to build and maintain
    Memory-Based Agents Session-specific FAQ refinement - Personalized experience
    - Avoids repetition
    - Requires session infrastructure

Post-Hoc Grounding

  • Post-hoc grounding refers to the process of aligning model outputs with external, factual, or contextually relevant knowledge after the initial response is generated. This ensures the response is consistent, accurate, or relevant to specific requirements.

    • For example:
      1. A model generates a generic response (e.g., “The capital of France is Paris”).
      2. Post-hoc grounding checks this response against a knowledge base or external source for accuracy.
      3. The output is corrected or refined if necessary before delivery to the user.
    • This technique is common in tasks requiring high factual accuracy, like customer support or scientific reasoning.