Dec 9 2024

Continuing Kaggle/Google day 1 paper - Foundational LLM and Text Generation
Prompt Engineering
- Try using verbs that describe action in your prompts:
- Act, Analyze, Categorize, Classify, Contrast, Compare, Create, Describe, Define,
- Evaluate, Extract, Find, Generate, Identify, List, Measure, Organize, Parse, Pick, Predict, Provide, Rank, Recommend, Return, Retrieve, Rewrite, Select, Show, Sort,
- Summarize, Translate, Write.
- Types of Prompts
  - System / Role/ Contextual prompting - who are you, what is your goal, what’s your style
  - Step-back prompting - take a look at this info, and now answer
  - Chain of thought - think through each step, don’t jump to the answer
  - https://github.com/GoogleCloudPlatform/generative-ai/blob/main/language/prompts/examples/chain_of_thought_react.ipynb
  - Self consistency - asking yourself 3x and go with most frequent answer
  - Tree of thoughts - like self-consistency, but at a level above the prompt
  - ReAct - using tools
- Also try to be clear with your instructions by providing both a “DO” and a “DO NOT” instruction set.
  - Keep track of your prompts and test them each time
  - Name/version
  - Goal
  - Model
  - Temperature/Top-K, Top-P
  - Token Limit
  - Prompt
  - Output

Ways to add new data to your LLM

Discussion with ChatGPT on different options to incorporating your own knowledge into a LLM, summarized in this table

Method	Best For	Pros	Cons
Soft Prompting	Small FAQ sets, low-cost setups	- No training required - Flexible and easy to update	- Limited by token size - FAQs must be in every prompt
RAG (Retrieval-Augmented Generation)	Large or dynamic FAQ datasets	- Scalable - FAQs dynamically retrieved - No fine-tuning required	- Complex setup - Dependent on retrieval quality
Fine-Tuning	Highly repetitive or static FAQs	- Efficient inference - Consistent answers - Tailored behavior	- High initial cost - Requires re-training for updates
Middleware System	Handling simple FAQ queries	- Fast responses - Hybrid with GPT-4	- Dual system complexity - Threshold tuning
Tool-Augmented Models	Structured FAQ storage	- Accurate - Real-time updates	- Development overhead
Knowledge Distillation	Reducing API costs	- Efficient for repetitive FAQs - Cost-effective inference	- Training and maintenance overhead
Student/Teacher	Replicating GPT-4 quality on FAQs at scale	- High-quality outputs - Cost-effective once trained - Reduces dependence on GPT-4	- Training requires GPT-4 API calls - Retraining for FAQ updates
Adaptive Prompt Chaining	Ambiguous or multi-part queries	- Improves query understanding - Interactive experience	- Slower interactions - May require multiple exchanges
Knowledge Graph	Complex, interrelated FAQs	- Rich context - Supports reasoning	- Complex to build and maintain
Memory-Based Agents	Session-specific FAQ refinement	- Personalized experience - Avoids repetition	- Requires session infrastructure

Post-Hoc Grounding

Post-hoc grounding refers to the process of aligning model outputs with external, factual, or contextually relevant knowledge after the initial response is generated. This ensures the response is consistent, accurate, or relevant to specific requirements.
- For example:
  1. A model generates a generic response (e.g., “The capital of France is Paris”).
  2. Post-hoc grounding checks this response against a knowledge base or external source for accuracy.
  3. The output is corrected or refined if necessary before delivery to the user.
- This technique is common in tasks requiring high factual accuracy, like customer support or scientific reasoning.