Claude 3.5 Sonnet beats Claude 3 Opus in some interesting ways...
With the release today of Anthropic's new top performing LLM, Claude 3.5 Sonnet, I naturally had to run it through some of my recent prompt engineering tests to see how it performed. As this model by many popular benchmarks is supposed to be the best performing model in the world right now, I was curious in what ways this might manifest.
I ran three different tests on the model, all of which I have posted in previous articles you can see via my website archive if you are curious. The first was the complex test of drawing a turtle in html, to which previously only Claude 3 Opus was the only model that could produce anything that even remotely looked turtle-like. The new result shown in the image below, not only looks like a turtle but you can select buttons to see it swim, dive, and surface. As the coding abilities of this model are supposed to be vastly improved, it definitely seems to be evident here. The second test was the classic count the letters in a word task, and the model failed with the same result most other models still produce. The third was a simple word count test from a sample paragraph, to which it passed. What was really interesting about the third test, is that the model natively figured out the correct solution to solving the word count problem around it's own system limitations that was my afore reported prompting solution to fix this issue when using older models.
All in all, super impressed with this new model to both meet and surpass the much larger and previous record holder of Claude 3 Opus. To be a much smaller model with better quality and speed of output that is also CHEAPER to operate is frankly damn impressive, and maintains the overall trendline of LLM development. One interesting thing I did notice as well was that the individual output size limitation of Claude 3.5 Sonnet does seem to be smaller than Claude 3 Opus. Not sure if that's just an a visual impression or an actual technical reality, but something I'm going to play with more to determine. The future development curve is very exciting, with new and better models from Anthropic and OpenAI due out later this year.
I can't even tell you how excited I am to see what GPT5 and Claude 3.5 Opus can do!