GPT-4 Explained: What We Know So Far
During a recent online meetup, Sam Altman, CEO of OpenAI, confirmed that the rumors of the launch of the GPT-4 model were true. In this article, we will discuss the information revealed by Altman and combine it with current trends to predict the model size, optimal parameter and compute, multimodality, sparsity, and performance.
Model Size: According to Altman, GPT-4 won't be significantly larger than GPT-3, and we can assume that it will have around 175B-280B parameters, similar to Deepmind's language model Gopher. The size of the model is not always indicative of its performance, as larger models require massive computing resources, complex implementation, and a large dataset. OpenAI is focusing on making smaller models perform better.
Optimal Parameterization: Optimizing large models is often an expensive task, and companies need to make a trade-off between accuracy and cost. For example, GPT-3 was trained only once due to unaffordable costs, and researchers could not perform hyperparameter optimization. Microsoft and OpenAI have shown that GPT-3 can be improved if trained on optimal hyperparameters. They have discovered new parameterization (μP), allowing researchers to optimize large models at a fraction of the cost. The best hyperparameters for the smaller models are the same as the larger ones with the same architecture.
Optimal Compute: The number of training tokens influences the model's performance as much as the size. OpenAI will likely increase the training tokens by 5 trillion to create a compute-optimal model, taking 10-20X FLOPs than GPT-3 to train the model and reach minimal loss.
GPT-4 will be a Text-only Model: During the Q&A, Altman confirmed that GPT-4 won't be multimodal like DALL-E, and it will be a text-only model. Multimodal models combining textual and visual information are challenging to build and require better performance than GPT-3 and DALL-E 2.
Sparsity: Sparse models reduce computing costs by using conditional computation, and they can easily scale beyond 1 Trillion parameters. OpenAI won't be using sparse models for GPT-4.
AI Alignment: OpenAI is striving for better AI alignment with human values and intentions. They have trained InstructGPT, a GPT-3 model that follows human feedback to follow instructions. The model is perceived to be better than GPT-3 by human judges.
GPT-4 Release Date: Microsoft confirmed that GPT-4 will be arriving the week of March 13, and the company is currently focusing on other technologies like text-to-image and speech recognition post release.
In conclusion, GPT-4 will be a text-only large language model with better performance, more aligned with human commands and values. It will be used for various language applications and will be more secure, less biased, more accurate, and more cost-efficient than its predecessor. While there may be conflicting rumors surrounding GPT-4's launch, OpenAI has not revealed anything concrete about the model's architecture, size, or dataset. We can be sure that GPT-4 will present better results and solve the problems of the previous version.