Acquisition

Military Innovation

SBIR

Brian Morrison 6/21/24 Brian Morrison 6/21/24

AGI by October 2025, that’s the trendline…

Of all the the popular benchmarks used today to test models, I personally think the GPQA Diamond set gives us the most insightful look into predicting the achievement of AGI. The test set is described by its creators in their published research paper as:

Brian Morrison 6/20/24 Brian Morrison 6/20/24

Claude 3.5 Sonnet beats Claude 3 Opus in some interesting ways...

With the release today of Anthropic's new top performing LLM, Claude 3.5 Sonnet, I naturally had to run it through some of my recent prompt engineering tests to see how it performed. As this model by many popular benchmarks is supposed to be the best performing model in the world right now, I was curious in what ways this might manifest.

Brian Morrison 6/19/24 Brian Morrison 6/19/24

Who has the training…?

All the big money is in programs, and programs are run on requirements, so per the DoD’s own mandates who has the required training to execute the requirements process? While many people might point to the program offices themselves, the primary source of requirements management is at the headquarters level in the applicable “5” division. The “5” is the requirements management office for the entire organization that the headquarters is commanding. You need to start the conversation at the “5” division when trying to determine “why” a particular current capability exists, or “how” to create or modify a capability for that organization.

Brian Morrison 6/19/24 Brian Morrison 6/19/24

There are two ways to find the highest value integration points for AI in your business…

Not all use cases are high value, so utilizing the right tool for a job is just as much about finding the biggest impact as it is picking the tool for the job. Sometimes using an inverted screwdriver as a hammer to whack a nail into a surface is the right decision. The screwdriver might be the only tool you have on hand with enough mass & volume to attempt the task. Other times you might have plenty of sledgehammers, but it’s just a series of tiny nails going into delicate material that needs the lightest touch possible. The point being that determining the “right” tool for the job requires analyzing the full scope of the task, as it does your situation to determine the best cost/risk/value proposition to choose the best solution available.

Brian Morrison 6/17/24 Brian Morrison 6/17/24

Most people are lucky to know one, but precious few understand all three…

The first section of my book is called “what to know” for a reason. Lots of folks will talk about the Federal Acquisition Regulation (FAR), some may even be knowledgeable on it, but very few folks have ever been actually trained on all three primary facets of the overall DoD acquisition process. I cover all three of these in my book to give companies a foundational understanding of each and how to utilize them as part of your business strategy.

Brian Morrison 6/12/24 Brian Morrison 6/12/24

The key is backward planning your modular approach to DoD GenAI capability development…

Remember the tech constantly improves regardless of your investment, so start your use case testing on current leading edge closed source models, then acquire and build a modular system that will utilize a future open source model. Use the consistent time lag between the release of leading foundation models to their equivalent open-source competitors to your advantage. With a roughly 6-12 month time gap between the best LLM of today, to the release of an equivalent performing open source model of tomorrow, you can capitalize on both to optimize your DoD project plan.

Brian Morrison 6/11/24 Brian Morrison 6/11/24

A fun experiment with GenAI and Mobile devices for the DoD…

Can you use the vision capabilities of multimodal LLMs to detect uniform wear errors? Anyone who’s been through some form of professional military education knows the “joys” of uniform inspection. The only thing worse than having an issue found, is having one found after you, your friend, and someone else already checked your uniform. As there’s a written standard for all uniform wear, the uniforms themselves are all standardized in production, and only personal mobile devices would be required, this all provides an opportunity to test GPT4o and other vision capable LLMs with an interesting challenge.

Brian Morrison 6/10/24 Brian Morrison 6/10/24

The big impact of NIPRGPT on the DoD is in risk tolerance not capability…

The NIPRGPT launch provides the first widespread CUI approved GenAI capability for the average military member, but the big landmark is in accepted risk tolerance by the service. I’ll be the first to say it’ll be a challenge to get any significant military use case accomplished by small parameter sized low performance LLM (which is what powers NIPRGPT). As much as this might seem negative, I would argue it is the biggest positive for the technology overall for the DoD. Not all GenAI systems or applications are high threat, and as such a graduated scale should be applied to their evaluation and approval. Starting small and proving that fact is VERY important, because what establishing a graduated risk scale relative to model/system capability does is ultimately raise the floor of mass capability deployment.

Brian Morrison 6/5/24 Brian Morrison 6/5/24

This is the type of exciting breakthrough that can move a lot of needles…

The Photomolecular effect can unlock major gains in accuracy and efficiency of weather prediction, water purification, and even home appliances. It’s been decades since we’ve had a physics breakthrough that can translate quickly and directly into new consumer products and capabilities. Everything from gains in more energy efficient alternative mechanical designs of existing products, to improved performance of capabilities in existing systems are all on the table. The best part being the diversity of applications benefits small to large businesses just as much as the consumer when it comes to opportunity.

Brian Morrison 6/4/24 Brian Morrison 6/4/24

About damn time…great way for the government to start boiling the frog of nuclear regulation change by starting with a military customer instead of a civilian one.

DIU and U.S. Army To Prototype Advanced Nuclear Power for Military Installations

Brian Morrison 6/3/24 Brian Morrison 6/3/24

Solving the word count problem with LLMs, a continuation of the sub-token solution…

So unlike the previous solution in my other post about “solving for R”, this solution seems to have a minimum model threshold performance with small models. This solution was tested with one shot initial success on all of the following models: Mistral Large, Llama3-8b-8192, GPT3.5, GPT4o, and Claude 3 Sonnet. Interestingly, Claude 3 Haiku and Opus models were the only models to not separate the “small…and” into two separate words, therefore leaving their counts off by one word. Also of note, if your run the sample paragraph through online word counters, they too will count only 54 instead of 55 words.

Brian Morrison 5/31/24 Brian Morrison 5/31/24

Solving for “R” with lower quality models, the challenge and the solution…

A popular observed issue with LLMs is their perceived inability to solve simple counting problems, but their are ways to solve these issue if you understand the inherent system functions that are causing the errors. The current problem/test is counting the number of times a letter occurs within a given word. The core issue is that in the vectorization of the tokens, letters that are the same and next to each other are getting lost in translation of the mathematical conversion. The solution is to make the individual target variables (letters of the target word) long enough to take up full token lengths, so their proper representation is not lost in vectorization and computation by the model.

Brian Morrison 5/31/24 Brian Morrison 5/31/24

In DefTech your best bet for hiring top talent comes from enabling remote work, so learn to manage it effectively…

In today’s modern business environment, talent is more rare and expensive than ever, and the biggest ask of employees is the biggest struggle for many companies. In DefTech, the salaries more often than not, less competitive for the top talent in many high tech fields. There’s many drivers of this, but it’s been a long running constant for the industry. In the post-COVID world the biggest new non-financial demand of labor has been the rise of remote work. Therefore the ability to support remote work and utilize it effectively, has become a critical requirement for many companies in DefTech. Unfortunately, many managers both commercial and government, have failed to evolve their own skills to match the reality of this new paradigm.

Brian Morrison 5/30/24 Brian Morrison 5/30/24

This kind of ignorance just annoys the crap out of me. Here’s the simple prompt engineering fix to these type of problems with GPT4o. It’s literally one extra prompt. 🙄

Prompt Solution for Model counting issues (solutions for small and large models)

Brian Morrison 5/29/24 Brian Morrison 5/29/24

If I was Google and I wanted to make a big dent in the defense space with a ground breaking research project, this is what I would do...

The core thesis would be MuZero applied to "Command: Modern Air/Naval Operations" to develop unique winning game strategies/tactics, with the winning gameplay videos being analyzed by Gemini Pro 1.5 to explain in detail the applicable strategies/tactics for reporting. Google's Deepmind has already shown their MuZero model is capable of mastering basic video games, so upscaling that to one of the most complex military battle simulators would be a interesting challenge. The value being that MuZero and all previous Alpha models have both mastered their target games, and in doing so developed new and valuable strategies and tactics to accomplish those feats. Where humans were utilized previously to analyze the applicable gameplay results to interpret the critical winning moves of the overall gameplay, automating this complex task with Google's new multimodal modal would be another equally interesting challenge.

Brian Morrison 5/20/24 Brian Morrison 5/20/24

GPT4o does a hell of a job with audio narration, and really DESTROYS the whole business of ElvenLabs and other similar companies...

I had a feeling after watching the launch demos of GPT4o that you could use this new model for audio narration purposes, and forgo entire separate paid audio narration services like ElevenLabs. I have been very impressed with what ElevenLabs has provided up to now, with the money you pay being well worth their output, but why keep paying for a separate service with you have GPT4o? The video below is just the first test of my audio narration experiment using the new ChatGPT iOS app with voice features. It's a 5 minute narration of the intro and part of the first chapter story from my upcoming new book. This fictional story plays out in sections at the end of each chapter of the book as a fun hypothetical employment of the chapter content.

Brian Morrison 5/20/24 Brian Morrison 5/20/24

Has anyone thought to try using GPT4o to observe, direct, or hypothesize on lab experiments?…

With the new multimodal and rapid interactions capable via a smartphone, it begs the question of what could you use it for beyond simple conversations. Without jumping straight to building skynet, it would be an interesting experiment to let GPT4o interact with real world lab experiments via observation and communication via a smartphone. Setup a tripod live stream the video of your lab table experiment, interact with the model live during the experiment via chat, and just see how it goes.

Brian Morrison 5/17/24 Brian Morrison 5/17/24

I really look forward to doing more cool experiments with GPT4o…

There’s not a ton of additional overall quality you get from GPT4o vs GPT4, but there is a lot more possible use case applications and speed as well that you do get. GPT4o is very obviously tuned for brevity to better enable conversional interactions, but the multimodality does seem to provide improved results on some older more problematic use cases. Image generation of characters across multiple poses while maintaining the same character look/style has improved. This would be really interesting to play with more for all kinds of story illustrations purposes. I really want to revisit my original handwriting analysis use cases from last year with GPT4V, as there were some really impressive demos of handwriting creation via image generation with GOT4o from OpenAI.

Brian Morrison 5/16/24 Brian Morrison 5/16/24

When you get a new toy, why not go all out and test the limits, and that exactly what I did with GPT4o...

I still need to finish the longer educational presentation of how my complex GenAI workflow functions, and all the dissection of the various components, but I have released the latest and most complex version for problem solving. So what is now available at my website linked below, is an attempt at optimizing generalized workflow concepts with advanced prompt engineering techniques and structures utilizing task customizable analytical framework implementations. That's just a lot of word salad to say this is a complicated way of using a customized and complex approach to problem solving with GenAI. You can throw just about any kind of problem you want through the workflow, but for the purposes of testing I find using various news articles describing complex real world issues a fun use case.

Brian Morrison 5/15/24 Brian Morrison 5/15/24

I have spent way too many hours messing with getting this working, and now it's time for sleep...

Getting LLMs to draw complex objects, in this case a detailed turtle, using nothing but html code without image files is REALLY hard. This is about as good as I could get tonight, and as simple and silly as that looks, it's miles ahead of the zero short results of any model without using the workflow concepts I have been running on about recently.