Graphics and MultiModal LLM¶
3.1 Continuing to use the running Open-WebUI with your browser at:¶
3.2 Pick the vision model to use, for example:¶
granite3.2-vision:2b
3.3 This small vision model is multi-modal.¶
Multi-modal means the model has been trained from multiple modalities of data, such as text, images, audio, and video. So while the primary use of the vision model is to understand visual content, it's also trained with language datasets. So you can issue it questions and commands and it will still respond as a small language model.
For example, you can ask a few simple questions:
Why would hydrogen gas not be preferred in balloons?
Create an easy apple pie recipe for the upcoming holiday?
3.4 Clear the context by clicking on "New Chat"¶
3.5 Using another tab on the browser, search for a image. For example:¶
Kangaroo
3.6 In the search results, click on the "Images" tab and right-click on one of the photos you like and select "Copy Image"¶
3.7 Using the Open-WebUI browser, do a CTRL-V
to paste the copied image to the Open-WebUI and press enter. The Granite vision model will summarize what it sees from the image.¶
3.8 Go ahead and ask the AI more questions about the animal in the image. For example:¶
What is the lifespan for the animal in the photo?
3.9 Keep learning!¶
If you have additional time and interest, you can continue to try out what you've learned about using Open Source AI tools to run your own local AI system.
Have fun!