Build an AI stylist with IBM Granite using watsonx.ai¶
Authors: Anna Gutowska, Ash Minhas
In this tutorial, you will be guided through how to build a generative AI-powered personal stylist. This tutorial leverages the IBM Granite™ Vision 3.2 large language model (LLM) for processing image input and Granite 3.2 with the latest enhanced reasoning capabilities for formulating customizable outfit ideas.
Introduction¶
How often do you find yourself thinking, “What should I wear today? I don’t even know where to start with picking items from my closet!” This dilemma is one that many of us share. By using cutting-edge artificial intelligence (AI) models, this no longer needs to be a daunting task.
AI styling: How it works¶
Our AI-driven solution is composed of the following stages:
The user uploads images of their current wardrobe or even items in their wishlist, one item at a time.
The user selects the following criteria:
- Occasion: casual or formal.
- Time of day: morning, afternoon or evening.
- Season of the year: winter, spring, summer or fall.
- Location (for example, a coffee shop).
Upon submission of the input, the multimodal Granite Vision 3.2 model iterates over the list of images and returns the following output:
- Description of the item.
- Category: shirt, pants or shoes.
- Occasion: casual or formal.
The Granite 3.2 model with enhanced reasoning then serves as a fashion stylist. The LLM uses the Vision model’s output to provide an outfit recommendation that is suitable for the user’s event.
The outfit suggestion, a data frame of items that the user uploaded and the images in the described personalized recommendation are all returned to the user.
Prerequisites¶
You need an IBM Cloud® account to create a watsonx.ai™ project.
Steps¶
In order to use the watsonx application programming interface (API), you will need to complete the following steps.
Step 1. Set up your environment¶
Log in to watsonx.ai by using your IBM Cloud account.
Create a watsonx.ai project.
You can get your project ID from within your project. Click the Manage tab. Then, copy the project ID from the Details section of the General page. You need this ID for this tutorial.
Step 2. Set up watsonx.ai Runtime service and API key¶
Create a watsonx.ai Runtime service instance (choose the Lite plan, which is a free instance).
Generate an API Key.
Associate the watsonx.ai Runtime service to the project that you created in watsonx.ai.
Step 3. Clone the repository (optional)¶
For a more interactive experience when using this AI tool, clone the GitHub repository and follow the setup instructions in the README.md file within the AI stylist project to launch the Streamlit application on your local machine. Otherwise, if you prefer to follow along step-by-step, create a Jupyter Notebook and continue with this tutorial.
Step 4. Install and import relevant libraries and set up your credentials¶
We need a few libraries and modules for this tutorial. Make sure to import the following ones; if they're not installed, you can resolve this issue with a quick pip installation.
# Install required packages
!pip install -q image ibm-watsonx-ai
# Required imports
import getpass, os, base64, json
from ibm_watsonx_ai import Credentials
from ibm_watsonx_ai.foundation_models import ModelInference
from PIL import Image
To set our credentials, we need the WATSONX_APIKEY
and WATSONX_PROJECT_ID
you generated in step 1. We will also set the URL serving as the API endpoint.
WATSONX_APIKEY = getpass.getpass("Please enter your watsonx.ai Runtime API key (hit enter): ")
WATSONX_PROJECT_ID = getpass.getpass("Please enter your project ID (hit enter): ")
URL = "https://us-south.ml.cloud.ibm.com"
We can use the Credentials
class to encapsulate our passed credentials.
credentials = Credentials(
url=URL,
api_key=WATSONX_APIKEY
)
Step 5. Set up the API request for the Granite Vision model¶
The augment_api_request_body
function takes the user query and image as parameters and augments the body of the API request. We will use this function in each iteration of inferencing the Vision model.
def augment_api_request_body(user_query, image):
messages = [
{
"role": "user",
"content": [{
"type": "text",
"text": user_query
},
{
"type": "image_url",
"image_url": {
"url": f"data:image/jpeg;base64,{image}"
}
}]
}
]
return messages
We can also instantiate the model interface using the ModelInference
class.
model = ModelInference(
model_id="ibm/granite-vision-3-2-2b",
credentials=credentials,
project_id=WATSONX_PROJECT_ID,
params={
"max_tokens": 400,
"temperature": 0
}
)
Step 6. Encode images¶
To encode our images in a way that is digestible for the LLM, we will encode them to bytes that we then decode to UTF-8 representation. In this case, our images are located in the local images
directory. You can find sample images in the AI stylist directory in our GitHub repository.
directory = "images" #directory name
images = []
filenames = []
for filename in os.listdir(directory):
if filename.endswith(".jpeg") or filename.endswith(".png"):
filepath = directory + '/' +filename
with open(filepath, "rb") as f:
images.append(base64.b64encode(f.read()).decode('utf-8'))
filenames.append(filename)
print(filename)
image1.jpeg image12.jpeg image6.jpeg image7.jpeg image13.jpeg image10.jpeg image8.jpeg image4.jpeg image5.jpeg image9.jpeg image11.jpeg image2.jpeg image3.jpeg
Step 7. Categorize input with the Vision model¶
Now that we have loaded and encoded our images, we can query the Vision model. Our prompt is specific to our desired output to limit the model's creativity as we seek valid JSON output. We will store the description, category and occasion of each image in a list called closet
.
user_query = """Provide a description, category, and occasion for the clothing item or shoes in this image.
Classify the category as shirt, pants, or shoes.
Classify the occasion as casual or formal.
Ensure the output is valid JSON. Do not create new categories or occasions. Only use the allowed classifications.
Your response should be in this schema:
{
"description": "<description>",
"category": "<category>",
"occasion": "<occasion>"
}
"""
image_descriptions = []
for i in range(len(images)):
image = images[i]
message = augment_api_request_body(user_query, image)
response = model.chat(messages=message)
result = response['choices'][0]['message']['content']
print(result)
image_descriptions.append(result)
{ "description": "A pair of polished brown leather dress shoes with a brogue detailing on the toe box and a classic oxford design.", "category": "shoes", "occasion": "formal" } { "description": "A pair of checkered trousers with a houndstooth pattern, featuring a zippered pocket and a button closure at the waist.", "category": "pants", "occasion": "casual" } { "description": "A light blue, button-up shirt with a smooth texture and a classic collar, suitable for casual to semi-formal occasions.", "category": "shirt", "occasion": "casual" } { "description": "A pair of khaki pants with a buttoned waistband and a button closure at the front.", "category": "pants", "occasion": "casual" } { "description": "A blue plaid shirt with a collar and long sleeves, featuring chest pockets and a button-up front.", "category": "shirt", "occasion": "casual" } { "description": "A pair of bright orange, short-sleeved t-shirts with a crew neck and a simple design.", "category": "shirt", "occasion": "casual" } { "description": "A pair of blue suede sneakers with white laces and perforations, suitable for casual wear.", "category": "shoes", "occasion": "casual" } { "description": "A pair of red canvas sneakers with white laces, isolated on a white background.", "category": "shoes", "occasion": "casual" } { "description": "A pair of grey dress pants with a smooth texture and a classic design, suitable for formal occasions.", "category": "pants", "occasion": "formal" } { "description": "A plain white T-shirt with short sleeves and a crew neck, displayed from the front and back.", "category": "shirt", "occasion": "casual" } { "description": "A black short-sleeved t-shirt with a crew neck and a simple design.", "category": "shirt", "occasion": "casual" } { "description": "Black pants with a zippered pocket and a buttoned fly, showing the waistband and pocket details.", "category": "pants", "occasion": "casual" } { "description": "A pair of tan leather boots with a chunky sole and a high-top design, suitable for casual wear.", "category": "shoes", "occasion": "casual" }
Step 8. Generate outfits with the reasoning model¶
Now that we have each clothing and shoe item categorized, it will be much easier for the reasoning model to generate an outfit for the selected occasion. Let's instantiate and query the reasoning model.
reasoning_model = ModelInference(
model_id="ibm/granite-3-2-8b-instruct",
credentials=credentials,
project_id=WATSONX_PROJECT_ID
)
To align the filenames with the image descriptions, we can enumerate the list of image descriptions and create a list of dictionaries in which we store the description, category, occasion and filename of each item in the respective fields.
# Add filenames to the image descriptions
closet = []
for i, desc in enumerate(image_descriptions):
desc_dict = json.loads(desc)
desc_dict['filename'] = filenames[i]
image_descriptions[i] = json.dumps(desc_dict)
closet = [json.loads(js) for js in image_descriptions]
Now, let's query the Granite 3.2 model with reasoning to produce an outfit for our specified criteria using the closet
list.
occasion = input("Enter the occasion") #casual or formal (e.g. "casual")
time_of_day = input("Enter the time of day") #morning, afternoon or evening (e.g. "morning")
location = input("Enter the location") #any location (e.g. "park")
season = input("Enter the season") #spring, summer, fall or winter (e.g. "fall")
prompt = f"""Use the description, category, and occasion of the clothes in my closet to put together an outfit for a {occasion} {time_of_day} at the {location}.
The event takes place in the {season} season. Make sure to return only one shirt, bottoms, and shoes.
Use the description, category, and occasion provided. Do not classify the items yourself.
Include the file name of each image in your output along with the file extension. Here are the items in my closet: {closet}"""
messages = [
{
"role": "control",
"content": "thinking"
},
{
"role": "user",
"content": [
{
"type": "text",
"text": f"{prompt}"
}
]
}
]
outfit = reasoning_model.chat(messages=messages)['choices'][0]['message']['content']
print(outfit)
Here is my thought process: - The outfit needs to be suitable for a casual morning at the park during fall. - I will select one shirt, one pair of pants, and one pair of shoes that fit the 'casual' occasion category. - I will avoid formal or overly dressy items and choose items that are comfortable for park activities. Here is my response: For a casual morning at the park in fall, I suggest the following outfit: 1. **Shirt**: A blue plaid shirt with a collar and long sleeves (file: 'image13.jpeg') - The plaid pattern is classic for fall and goes well with casual park settings. The long sleeves offer some protection against cooler morning temperatures. 2. **Pants**: Khaki pants with a buttoned waistband and a button closure at the front (file: 'image7.jpeg') - Khaki is a versatile choice that can match the casual vibe and also provide a nice balance with the plaid shirt. It's practical and comfortable for walking around. 3. **Shoes**: A pair of tan leather boots with a chunky sole and high-top design (file: 'image3.jpeg') - Tan leather boots offer a stylish yet comfortable option. The chunky sole provides good grip and support, ideal for navigating park trails or uneven ground. This combination provides a relaxed, put-together look suitable for a casual morning outing, while also considering comfort and practicality.
With this generated outfit description, we can also display the clothing items that the model recommends! To do so, we can simply extract the filenames. In case the model mentions the same filename twice, it is important to check whether the image has not already been displayed as we iterate the list of images. We can do so by storing displayed images in the selected_items
list. Finally, we can display the selected items.
selected_items = []
#extract the images of clothing that the model recommends
for item, uploaded_file in zip(closet, images):
if item['filename'].lower() in outfit.lower() and not any(key['filename'] == item['filename'] for key in selected_items):
selected_items.append({
'image': uploaded_file,
'category': item['category'],
'filename': item['filename']
})
#display the selected clothing items
if len(selected_items) > 0:
for item in selected_items:
display(Image.open(directory + '/' + item['filename']))
Conclusion¶
In this tutorial, you built a system that uses AI to provide style advice to a user's specific event. Using photos or screenshots of the user's clothing, outfits are customized to meet the specified criteria. The Granite-Vision-3-2-2b model was critical for labeling and categorizing each item. Additionally, the Granite-3-2-8B-instruct model leveraged its reasoning capabilities to generate personalized outfit ideas. Some next steps for building off this application can include:
- Customizing outfits to a user's personal style, body type, preferred color palette and more.
- Broadening the criteria to include jackets and accessories.
- For example, the system might propose a blazer for a user attending a formal conference in addition to the selected shirt, pants and shoes.
- Serving as a personal shopper by providing e-commerce product recommendations and pricing that align with the user's unique style and budget.
- Adding chatbot functionality to ask the LLM questions about each outfit.
- Providing a virtual try-on experience that uses a user selfie to simulate the final look.