- Patterns for Building LLM-based Systems & Products
- Evals: To measure performance
- Why evals?
- More about evals
- How to apply evals?
- Retrieval-Augmented Generation: To add knowledge
- Why RAG?
- More about RAG
- How to apply RAG
- Fine-tuning: To get better at specific tasks
- Why fine-tuning?
- More about fine-tuning
- How to apply fine-tuning?
- Caching: To reduce latency and cost
- Why caching?
- More about caching
- How to apply caching?
- Guardrails: To ensure output quality
- Why guardrails?
- More about guardrails
- How to apply guardrails?
- Defensive UX: To anticipate & handle errors gracefully
- Why defensive UX?
- More about defensive UX
- How to apply defensive UX?
- Collect user feedback: To build our data flywheel
- Why collect user feedback
- How to collect user feedback
- Other patterns common in machine learning
- Conclusion
- References
Patterns for Building LLM-based Systems & Products
Patterns for Building LLM-based Systems & Products
“There is a large class of problems that are easy to imagine and build demos for, but extremely hard to make products out of. For example, self-driving: It’s easy to demo a car self-driving around a block, but making it into a product takes a decade.” - Karpathy
This write-up is about practical patterns for integrating large language models (LLMs) into systems & products. We’ll build on academic research, industry resources, and practitioner know-how, and distill them into key ideas and practices.
There are seven key patterns. They’re also organized along the spectrum of improving performance vs. reducing cost/risk, and closer to the data vs. closer to the user.
•
Evals : To measure performance
•
RAG : To add recent, external knowledge
•
Fine-tuning : To get better at specific tasks
•
Caching : To reduce latency & cost
•
Guardrails : To ensure output quality
•
Defensive UX : To anticipate & manage errors gracefully
•
Collect user feedback : To build our data flywheel
附件不支持打印