Some random notes that didn’t really fit neatly into the study findings…
Unlike my four previous posts on the topic of our AI study on LLM technology, this one is more a collage of items that didn’t really fit well enough to include in the written presentation. They might come up in a verbal presentation, but not likely unless a question drove the interest in a deep dive. But I like having some sort of written record beyond just my endless pages of random notes, so I thought this might make an interesting post to share.
So random LLM thoughts…
1. There’s an interesting question around where the cost/benefit lines are between utilizing a model with a very large context window like Claude 2, utilizing secondary vector databases calls, custom keyword search plugins, or custom training whole models. Obviously it depends on the use case(s) and the associated value and funding, but it’s not necessarily a clear line, and definitely a nuanced answer on which to use for what.
2. Which modality, or combination there of, might lead to the next major performance break through in LLMs? Clearly there’s some sort of S curve in performance that’s evolving currently with the technology as we see both no major capability jumps on the upper end on language only models, but huge strides in the bottom end both catching up in performance and slimming down in overall size (efficiency). All the news that’s official or leaking out indicates multi-modal will be the next big leap, but will be interesting to see if text and vision is enough, or sound or something else is necessary to achieve major breakthroughs in overall performance, or if additional modalities simply see new but similar performance as language does today.
3. The secret to selecting what high value use cases to build custom tools around is identifying the cross section of large unrefined or disorganized textual data sets, hard to find trends/patterns/insights, and high workloads. I think a simple example of this would be something like IRS tax audit cuing, where you have limited personnel, very large textual data sets, and a high value on potential trend/pattern/insight cuing. Being right enough (leaning into the stochastic nature of the models) on highly recurring valuable tasks to drive additional specific actions (to validate and process findings) is a great application for this technology.