GPT-4o vs. GPT-4: How chatbot users benefit from LLM competition
The latest GPT release of OpenAI is now several weeks old. We wanted to test it intensively ourselves first to be able to assess how our customers and their AI chatbots can benefit from it. Find out more in this blog entry:
- How GPT-4o differs from its predecessor GPT-4
- How we deal with the rapid technological progress of large-language models
- What the multimodality of large-language models means for the future of chatbots
Price & speed: Why GPT-4o is a milestone
If you want to see the release video of GPT-4o we recommend that you take a look at it. There you can see impressively how the large language model not only reacts to natural language input in text and audio, but can also process visual input and draw conclusions across formats. OpenAI is not only a technological leader, but also operates marketing at Champions League level. The release demos create images of the future that do not feel so far removed from what is practically possible. In fact, the major achievements of the new release are quite pragmatic in nature, but of great importance for the business development of LLM-based solutions: GPT-4o is twice as fast and half as expensive! Increased efficiency in the architecture means that significantly fewer tokens are required, which enables the LLM provider to pursue a new pricing and product policy.
Content comparison of GPT-4o and GPT-4: What stands out
However, a structured comparison of the content of GPT-4o with its predecessor model reveals that GPT-4 still performs better in some cases, depending on the application. We used the same data sources and queries for both LLMs in over a hundred examples and found the following: GPT-4 hallucinates significantly more and regularly refuses a statement altogether where GPT-4 can clearly provide correct answers. Where both models answer correctly, GPT-4o is able to make more concise statements, while GPT-4 tends to be more verbose. The reasoning skills of the new model are likely to have declined in favor of price and speed. However, this is by no means intended to imply that the new model is “inferior” in terms of content. Rather, we believe that other prompting strategies may need to be used to achieve accurate results with GPT-4o. When working with multimedia content, GPT-4o certainly lives up to its release promise: images are clearly identified with a high degree of probability. We were able to upload our company presentation with diagrams and ask for a presentation text that was a good fit in terms of structure and content, even if, as usual, it was still rather general. In combination with other methods such as RAG, the future multimodality of large-language models opens up further exciting possibilities in the area of effective information search.
aiStudio and the possibilities of ChatGPT
What do these and upcoming new LLM versions mean for our customers? In the aiStudio you can choose between different models and test them with your content outside of live operation. GPT-4o, for example, is not yet fully available in Europe, but we assume that we will be using a mixture of both models for our customers. Where image material is used, GPT-4o is clearly the right choice, and where content topics become more complex, GPT-4 will continue to play an important role for quality reasons. As a platform provider and chatbot consultant, the increasing variety of large-language models offers us very effective options for optimally designing chatbots for you, depending on the use case.
Find out more in our AI guide
If you want to learn more about the technical basics and project phases of a chatbot implementation, our free AI guide is a good place to start.