Perform zero-shot classification with a foundation model¶
Author: Vanna Winland
In this tutorial, we will use an IBM Granite (TM) Model to perform zero-shot classification.
What is zero-shot classification?¶
Zero-shot classification uses zero-shot prompting, a prompt engineering technique that allows a model to perform a task without any specific training or examples. This is an application of zero-shot learning (ZSL) which is a machine learning method that relies on the pre-trained models’ ability to recognize and categorize objects or concepts on post-training. ZSL is similar to few-shot learning (FSL), the ability for a model to make an accurate prediction by training on a small number of labeled examples. Both techniques are used to enable models to perform tasks that they haven’t been explicitly trained on.
Researchers have been experimenting with machine learning models for classification tasks since the 50’s. The Perceptron is an early classification model that uses a decision-boundary to classify data into different groups. Most ML/DL methods rely on supervised learning techniques, and therefore need to be trained using a large amount of task-specific labeled training data. This presents a challenge as the large, annotated datasets required to train these models simply do not exist for every domain. Some researchers motivated by these constraints say that LLMs are the way around these data limitations.
LLMs are designed to perform natural language processing (NLP) and natural language inferencing (NLI) tasks which gives them a natural ability to perform zero-shot text classification. The model can generate data based on semantic descriptions because it’s trained on a large corpus of data. Like LLMs, foundation models use a transformer architecture that enables them to classify labels without any specific training data for a classification task. This is possible because of the models’ ability to perform self-supervised learning and transfer learning to classify data into unseen classes. To the advantage of data science, this approach eliminates the requirement for large datasets with human-annotated labels because it automates the preprocessing portion of the classification pipeline.
How foundation models perform zero-shot classification¶
Foundation models are built on the transformer architecture that that can take raw text at scale through its attention mechanism, and understand how words relate to each other to form a statistical representation of language. The transformer is a type of neural network architecture designed to interpret meaningful representations of sequences or collections of data points. This capability is the reason why these models perform so well on NLP tasks.
The transformer model architecture includes an encoder-decoder structure and self-attention mechanism that allows the model to draw connections between input and output using an auto-regressive prediction. The encoder processes the tokenized input data into embeddings that represent the data in a format the model can read. The decoder interprets the embeddings to generate an output. The self-attention mechanism computes the weights for each word, or token, in sentence based on its relationship to every other word in the sentence. This allows the model to take the semantic and syntactic relationships between words. The self-attention mechanism is integral for entailment, a NLI task that heavily relies on the self-attention mechanism because it helps the model understand the context within text data.
What models are best for zero-shot classification?¶
Choosing the right model for your zero-shot classification depends on your classification task. It’s no surprise that there is an abundance to choose from, let’s consider three types of models:
Zero-shot classification model is a type of model that can classify data into categories without task-specific labeled examples during the prediction phase. The models rely on training data, which is normally a large-scale, general dataset to classify new, unseen classes. One of the most popular zero-shot classification models for zero-shot text classification is HuggingFace’s facebook/bart-large-mnli model based on the BART-large transformer model architecture. Zero-shot classification models perform well on generalized tasks but, because there is no fine-tuning on specific tasks or datasets, accuracy may be limited. Because of this limitation, the model requires well-formulated prompts.
Large Language Models (LLMs) like GPT and BERT are designed to perform a variety of natural language processing tasks. This can limit use cases using multimodal datasets such as images and audio. These models are designed to handle text data, often using deep learning architectures likes transformers. LLMs are trained on a large corpora of text data, giving them extensive knowledge of language, syntax, semantics, and some domain-specific knowledge. These models generally perform well with little to no task-specific fine-tuning, making them suitable for zero-shot and few-shot classification scenarios. Due to their generalized training, LLMs may have limited accuracy for specialized tasks especially ones that require domain specific data. These models are best to work with when the dataset is text-based.
Foundation models are multimodal, meaning they are often trained on a myriad of modalities including text, images, and speech. These models are generally versatile and after pretraining can undergo optimization for many different tasks. IBM’s Granite (TM) Models classify data by using a large language model (LLM) trained on a curated dataset of business-relevant information, including legal, financial, and technical domains, which allows it to analyze text and identify patterns to categorize data into specific classes based on the context and semantic meaning within the text. Because of their multimodal capacity, these types of models can handle image and text classification. These models are ideal when you need a broad range of capabilities or want to handle multiple types of data for instance, image or audio classification.
Use cases¶
Image classification – Computer vision is a type of artificial intelligence (AI) that allows uses machine learning models to analyze images and videos. A crucial task within computer vision is image classification, which involves labeling and categorizing groups of pixels within an image. Image classification is used in many domains such as social media for photo tagging, self-driving cars, and even healthcare.
Text classification – NLP utilizes text classification to enable the models understanding of human language. Text classification is used for many NLP tasks such as sentiment analysis, similarity scoring, key phrase detection and much more. A popular use case, and one we’ll be exploring in this tutorial, is for customer service analysis.
Audio Classification – The goal of audio classification is to use the model to recognize and distinguish between audio recordings so that it can perform sound categorization. This form of classification is used in smart home and security systems and technologies like text to speech applications.
Prerequisites¶
To follow this tutorial you need an IBM Cloud account to create a watsonx.ai project.
Steps¶
Step 1. Set up your environment¶
While you can choose from several tools, this tutorial walks you through how to set up an IBM account to use a Jupyter Notebook. Jupyter Notebooks are widely used within data science to combine code, text, images, and data visualizations to formulate a well-formed analysis.
Log in to watsonx.ai Runtime using your IBM Cloud account.
Create a watsonx.ai project.
Take note of the project ID in project > Manage > General > Project ID. You’ll need this ID for this tutorial.
Create a Jupyter Notebook.
This step will open a Notebook environment where you can copy the code from this tutorial to perform zero-shot classification on your own. Alternatively, you can download this notebook to your local system and upload it to your watsonx.ai project as an asset. This Jupyter Notebook is available on GitHub.
Step 2. Set up watsonx.ai Runtime instance and API key¶
In this step, you associate your project with the watsonx.ai service.
Create a watsonx.ai Runtime instance (choose the Lite plan, which is a free instance).
Generate an API Key in watsonx.ai.
Associate the watsonx.ai Runtime to the project you created in watsonx.ai.
Step 3. Install and import relevant libraries and set up your credentials¶
We'll need some libraries and modules for this tutorial. Make sure to import the ones below, and if they're not installed, you can resolve this with a quick pip install.
%pip install -U langchain_ibm
%pip install ibm_watsonx_ai
import getpass
from langchain_ibm import WatsonxLLM
from ibm_watsonx_ai.metanames import GenTextParamsMetaNames as GenParams
Step 4. Set up your Watsonx credentials¶
Run the following to input and save your watsonx.ai Runtime API key and project id:
credentials = {
"url": "https://us-south.ml.cloud.ibm.com",
"apikey": getpass.getpass("Please enter your watsonx.ai Runtime API key (hit enter): "),
"project_id": getpass.getpass("Please enter your project ID (hit enter): "),
}
Step 5. Set up the model for zero-shot classification¶
Next we'll setup IBM's Granite-3.0-8B-Instruct to perform zero-shot classification.
model = WatsonxLLM(
model_id = "ibm/granite-3-8b-instruct",
url = credentials.get("url"),
apikey = credentials.get("apikey"),
project_id = credentials.get("project_id"),
params={
GenParams.MAX_NEW_TOKENS: 500,
GenParams.MIN_NEW_TOKENS: 1,
GenParams.REPETITION_PENALTY: 1.1,
GenParams.TEMPERATURE: 0.7, # Adjust for variable responses
GenParams.TOP_K: 100,
GenParams.TOP_P: 0,
},
)
Step 6. Define the prompt¶
Now that the model is prepared to perform zero-shot classification, let's define a prompt. Imagine a scenario where it's imperative to triage certain data, perhaps an IT department's flooded inbox full of user-described technical issues. In this example, the model is asked to classify an IT issue as belonging to either the class "High" or "Low," indicating the priority of the issue. The prompt should showcase the model's out-of-the box capability to classify the priority of IT issues.
The code block below sets up and defines the prompt that the model will respond to. The prompt can be any input, but let's try out the example first. Run the code block to define your user prompt along with some example input text.
def generate_text(prompt):
response = None # Ensure the variable is defined before the try block
try:
response = model.generate([prompt])
return str(response)
except Exception as e:
print(f"Error: {e}")
if response:
print(f"Response: {response}")
return None
# Define the prompt here
defined_prompt = "Set the class name for the issue described to either: high or low. Issue: Users are reporting that they are unable to upload files."
Step 7. Perform zero-shot classification¶
Once the prompt is defined, we can run the next block to allow the model to predict and print its output.
# Generate and print the text based on the defined prompt
generated_text = generate_text(defined_prompt)
print("Generated text:", generated_text)
In this example, the model correctly infers the classification label "high" based on its ability to understand the critical impact of the inability to upload files for users.
Classify service reviews based on sentiment classification¶
Let's apply zero-shot classification to a different aspect of a department's everyday workflow. The same IT department used in the above example has a backlog of customer support reviews that need organized and analyzed. The organization feels the best way to accomplish this is to classify them based on sentiment: "Postive," "Negative,", "Neutral".
Run the code block below with the defined prompt and customer review to classify the sentiment of the text.
# Define the prompt here
defined_prompt = "Classify the following customer review as 'Postive', 'Negative', 'Neutral': Customer review: 'My IT issue was not resolved.'"
# Generate and print the text based on the defined prompt
generated_text = generate_text(defined_prompt)
print("Generated text:", generated_text)
The model is able to perform sentiment analysis and classify the review correctly as "Negative". This capability can be useful for a variety of domains, not just IT. Try out your own prompts to explore how you could use zero-shot classification to automate time-consuming tasks.
Summary¶
In this tutorial we set up IBM's 3-8B-Instruct model to perform zero-shot classification. Then we defined a user prompt and scenario to perform zero-shot classification. We tested out two examples including one semantic and one sentiment analysis.