Lessons learned from LLM based Chatbot study and assessment…#4
Continuing on in the series of lessons learned from the chatbot study, let’s look now into the art of prompt engineering.
So we started off this series of posts talking about the single foundational rule of working with LLM-based systems to be always controlling for variance in the output of the systems. To that end, when employing various prompt engineering techniques, you want to try and base your techniques in this understanding. Fundamentally when you are generating text as an output you are doing one or two of three things: converting/translating between languages (including coding languages), extrapolating up to create more text than your total input, or extrapolating down to create less text than your total input.
I like to think of these options directionally, as across, up, and down. Each has their own types of potential errors that need to be controlled for. For translational errors, generally simple human or model based review steps after initial output can catch most errors with sufficient recursive reviews. Extrapolating up is the real danger in terms of variance, as you really have no way for the model to truly explain itself in terms how of why it came to that output, and anything can be extrapolated outward in complexity to infinity. This is most prolifically seen in the temperature setting of the model effecting its output length (the application of which is a whole other topic in and of itself). Then lastly, extrapolating down always has some definable bottom limit in any number of contexts from length, logic, designated prompt constraints, etc.
Using the basic structure, as I described in one of my oldest AI post about the “one prompt to create them all” (https://lnkd.in/g5Ps2n4f), of “example>instruction>review” is still a perfect way to start. The additional thoughts to integrate to that are utilizing a chain-of-thought process utilizing multiple successive prompts to achieve your end state output. Within that chain, any time you extrapolate up, try always extrapolating down at least once in some revisionist or editorial way for the model to at least put some constraints back on the large degree of upward variation you induced. Variation is primarily scene through reproducibility, so always run your prompt sequence at least twice with any important use case to get a feel for you output variation, then adjust your strategy until you find the balance you need.
Ultimately none of this is perfect, and it’s all just individual techniques that you can apply to any situation. Experience is the best indicator of success in prompt engineering, and all the books, guides, and videos are all a distant second to actual and repeated experience on a variety of ever more complex use cases.