Large language models are actually a part of a different class of models called foundation models. The term “foundation models” was first coined by a team from Stanford when they saw that the field of AI was converging to a new paradigm. Where before, AI applications were being built by training maybe a library of different AI models, where each AI model was trained on very task-specific data to perform a very specific task.
They predicted that we were going to start moving to a new paradigm, where we would have a foundational capability, or a foundation model, that would drive all of these same use cases and applications. So the same exact applications that we were envisioning before with conventional AI, and the same model could drive any number of additional applications. The point is that this model could be transferred to any number of tasks. What gives this model the superpower to be able to transfer to multiple different tasks and perform multiple different functions is that it’s been trained on a huge amount, in an unsupervised manner, on unstructured data.
In the language domain, this basically means feeding a bunch of sentences—I’m talking terabytes of data here—to train this model. The start of my sentence might be “no use crying over spilled,” and the end of my sentence might be “milk.” I’m trying to get my model to predict the last word of the sentence based on the words that it saw before.
Generative AI models
It’s this generative capability of the model—predicting and generating the next word based on previous words that it’s seen beforehand—that is why foundation models are actually a part of the field of AI called generative AI because we’re generating something new, in this case, the next word in a sentence.
Even though these models are trained to perform, at their core, a generation task—predicting the next word in the sentence—we can actually take these models, and if you introduce a small amount of labeled data, you can tune them to perform traditional NLP tasks like classification or named-entity recognition—tasks that you don’t normally associate with a generative-based model or capability.
This process is called tuning. You can tune your foundation model by introducing a small amount of data, updating the parameters of your model, and now performing a very specific natural language task. If you don’t have data or have only very few data points, you can still take these foundation models and apply them effectively in low-labeled data domains through a process called prompting or prompt engineering.
For example, prompting a model to perform a classification task might involve giving a model a sentence and then asking it a question: “Does this sentence have a positive or negative sentiment?” The model will try to finish generating words in that sentence, and the next natural word in that sentence would be the answer to your classification problem, responding either “positive” or “negative” depending on where it estimated the sentiment of the sentence would be. These models work surprisingly well when applied to these new settings and domains.
This is where the advantages of foundation models come into play.
Advantages of Generative AI models
The chief advantage is performance. These models have seen so much data—terabytes of data—that by the time they are applied to small tasks, they can drastically outperform a model that was only trained on just a few data points.
productivity gains
Another advantage is the productivity gains. Through prompting or tuning, you need far less labeled data to get to a task-specific model than if you had to start from scratch because your model takes advantage of all the unlabeled data it saw in its pre-training when we created this generative task.
Disadvantages
However, with these advantages come some disadvantages. The first is the compute cost. The penalty for having this model see so much data is that they are very expensive to train, making it difficult for smaller enterprises to train a foundation model on their own.
Cost
They are also expensive to run at inference, especially when they reach a large size with billions of parameters.
Multiple GPUs may be required just to host these models and run inference, making them a more costly method than traditional approaches. Another disadvantage is trustworthiness. Because these models have been trained on so much unstructured data, often scraped from the Internet, there is a risk of them containing biased or toxic information. Additionally, we often don’t know the exact datasets these models have been trained on, leading to trustworthiness issues.
organizations recognize the huge potential of these technologies and they are also working on innovations to improve the efficiency, trustworthiness, and reliability of these models to make them more relevant in a business setting.
Other Domains
While many examples so far have focused on language, foundation models can be applied to other domains as well. For instance, vision models like DALL-E 2 generate custom images from text descriptions, and models for code, like Copilot, assist in code completion.
companies are innovating across these domains, including language models in products like Watson Assistant and Watson Discovery, vision models in products like Maximo Visual Inspection, and code models with Red Hat’s Project Wisdom. We’re also exploring applications in chemistry, such as using models like Molformer for molecule discovery, and in climate research through Earth Science Foundation models.
I hope you found this overview informative and helpful.
You May Also Like This:-
1-What’s SRE and how it is different from traditional IT operations
2-SRE and DevOps, how different and similar?
3-Measurement goals of SRE- details of SLI, SLO and SLA
1 thought on “What are Generative AI models ?”