OpenAI has just released its latest AI model, GPT-4, which exhibits human-level performance on various professional and academic benchmarks.
GPT-4 is a large multimodal model that can accept image and text inputs and generate text outputs.
In this article, we’ll look at GPT-4’s capabilities, limitations, and the risks involved in using it.
By the end, you’ll better understand the potential impact of GPT-4 and what it is and isn’t capable of.
GPT-4’s capabilities are an improvement over the previous model, GPT-3.5, in terms of reliability, creativity, and handling of nuanced instructions.
OpenAI tested the model on various benchmarks, including simulated exams designed for humans, and found that GPT-4 outperformed existing large language models.
It also performs well in languages other than English, including low-resource languages such as Latvian, Welsh, and Swahili.
GPT-4 can accept both text and images as input, making it capable of generating text outputs based on inputs consisting of both text and images.
While the model’s visual input capability is still in the research preview stage, it has shown similar capabilities to text-only inputs.
OpenAI has been working on each aspect of the plan outlined in its post about defining the behavior of AIs, including steerability.
Developers can now prescribe their AI’s style and task by describing the directions in the “system” message.
API users can customize their users’ experience within bounds, allowing for significant personalization.
GPT-4 is not perfect and has similar limitations as earlier GPT models.
It can still “hallucinate” facts and make reasoning errors, so caution should be taken when using language model outputs, particularly in high-stakes contexts.
GPT-4 doesn’t know about events after September 2021, which can cause it to make simple reasoning errors and accept false statements as true.
It may also fail at challenging problems like humans, such as introducing security issues in its code.
GPT-4 can make confident but incorrect predictions and doesn’t always check its work carefully.
Interestingly, the base model is good at predicting the accuracy of its answers, but this ability is reduced after post-training.
Risks & Mitigations
While GPT-4’s capabilities are significant, it poses new risks, such as generating harmful advice, buggy code, or inaccurate information.
OpenAI has been working to mitigate these risks, engaging with over 50 experts to adversarially test the model and collecting additional data to improve GPT-4’s ability to refuse dangerous requests.
As a result, OpenAI has made many improvements to GPT-4 to make it safer than GPT-3.5.
GPT-4 is 82% less likely to give inappropriate content than the previous version, and it follows policies better regarding sensitive topics like medical advice and self-harm.
While OpenAI made the model more resistant to bad behavior, generating content that goes against usage rules is still possible.
GPT-4 can be helpful or harmful to society, OpenAI says, so it’s working with other researchers to understand the potential impacts.
Like previous GPT models, the GPT-4 base model was trained to predict the next word in a document using publicly available data and data licensed by OpenAI.
Fine-tuning the model’s behavior using reinforcement learning with human feedback (RLHF) aligns it with the user’s intent within guardrails.
A significant focus of the GPT-4 project has been building a deep learning stack that scales predictably.
OpenAI has developed infrastructure and optimization with predictable behavior across multiple scales and can accurately predict GPT-4’s final loss during training.
Microsoft confirms the new Bing search experience now runs on GPT-4.
Though it has a usage limit, you can also access GPT-4 with a ChatGPT Plus subscription.
OpenAI may adjust the usage cap based on demand and system performance. The company is considering adding another subscription tier to allow for more GPT-4 usage.
To access the GPT-4 API, you must sign up for the waitlist.
The creation of GPT-4 marks a significant milestone in OpenAI’s efforts to scale up deep learning.
While imperfect, it has exhibited human-level performance on various academic and professional benchmarks, making it a powerful tool.
However, caution should be taken when using language model outputs in high-stakes contexts.
OpenAI has been working to mitigate risks and build a deep learning stack that scales predictably, which will be critical for future AI systems.
Featured Image: Muhammad S0hail/Shutterstock