Lessons learned from LLM based Chatbot study and assessment…#1
It is time to conclude the AI study focusing on LLM-based Chatbots I have been leading as my current MPA orders conclude with AETC. As I have no expectation that I will get approval for a full public release of the study results in either PowerPoint or Research Paper formats, I wanted to make a series of posts focusing on the major thematic lessons learned instead. All of these are completely unclassified, and won’t include any sensitive use case data or direct descriptions. Ultimately though the real value is in the knowledge gained not the specific assessments conducted.
So with the necessary preamble out of the way, let’s dive into the first and most important foundational reality of utilizing any LLM-based system.
It is all about the VARIANCE.
The stochastic parrot is alive and well, so I shall spare all the overplayed metaphors beyond that one. The key point is that variance in output is a statical fact inherent to the fundamental design and operation of these types of models. There are many ways to control for it, all with varying degrees of complexity and success, but it is impossible to eliminate in the absolute.
The resulting strategy should then be to first understand, what are the absolute limits of result variation tolerance in your desired or required use case outputs. This might be anything from as close to absolute zero as possible in the case of generating test questions, to almost infinite variation with only simple grammatical and logic requirements on any answer in the case of generating simple lists. The point being to know what you need, understand what you can tolerate in production error, and utilize applicable tools and prompt engineering techniques to produce outputs within your required variance thresholds.
It a tool for the production of text, that’s why it’s called “generative”, so treat its utilization like you would other production based systems, and focus on the tolerance levels to maximize production output quality.