With the release yesterday of the new Llama 3 LLMs from Meta, we are right on track with my predictions from last year’s GenAI study for the DAF…

AI

As of today, we now have open-source LLMs that are at least a 10th as small, but approximately as capable as the original release of GPT4 last year. So, in roughly a year we have seen development in GenAI progress to the point where the baseline standard for open-source LLMs (Meta’s Llama models) catches up to the original groundbreaking foundational model standard of previous year. Now the interesting thing is that this achievement isn’t actually new. We saw the exact same thing last year with the release of Llama 2, as it was roughly equivalent to GPT 3.5 but smaller and released roughly a year after as well. So, this is just more validation of the existing developmental trendline within the field. But the interesting thing isn’t where we are today on that trendline, but where we are all going very soon.

The fascinating thing is that LLM performance levels aren’t headed to infinite heights, but rather are climbing a mostly fixed scale of human performance equivalence. So, think of a fixed 100-point scale, where zero is every possible answer wrong, and 100 is every possible question right relative to the highest expert levels of human performance. What we have is a band of LLM performance with the largest and most capable foundational models at the top, and the smaller open-source models holding up the bottom of the performance band. As of last year, only the very top of the LLM performance band (GPT4) reached over that 80% threshold, and even into the low 90% threshold for certain sub-domains. Now we have the meat of the band of the LLM performance in the 80% range, with the largest foundational models all deep into the 90% range. What this foretells is that next year we will have both, models small enough to fit on your personal phone that are also capable enough to answer most questions you would ever ask (think the full Siri dream realized), and the largest models achieving what will seem like or actually baseline of AGI levels of performance for the first time in history.

The fascinating sidenote to all this is the usage cost never drastically increased for the best models over time. The price for usage of the best models remains stable, while the performance continually increases. It’s one of the biggest productivity gifts that all businesses can benefit from, but not all have taken advantage of…..yet.

Previous
Previous

PROTIP for DoD GenAI system approval: Isolate your prompts to buttons instead of using free text fields…

Next
Next

Probably unpopular hot takes on the defense industry overall…