Auto assist questionnaire
Auto-fill Questionnaire using Chain of Thought or Few-Shot Examples¶
This notebook showcases the application of few-shot examples in autofilling questionnaires. It utilizes a json file (cot_examples.json
) to
provide the LLM with example responses for some use-cases.
By leveraging these few-shot examples, we can enable seamless completion of lengthy questionnaires, minimizing manual effort and improving overall efficiency.
from risk_atlas_nexus.blocks.inference import (
RITSInferenceEngine,
WMLInferenceEngine,
OllamaInferenceEngine,
VLLMInferenceEngine,
)
from risk_atlas_nexus.blocks.inference.params import (
InferenceEngineCredentials,
RITSInferenceEngineParams,
WMLInferenceEngineParams,
OllamaInferenceEngineParams,
VLLMInferenceEngineParams,
)
from copy import deepcopy
from risk_atlas_nexus.library import RiskAtlasNexus
/Users/dhaval/Projects/Usage-Governance/risk-atlas-nexus/src/risk_atlas_nexus/toolkit/job_utils.py:2: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html from tqdm.autonotebook import tqdm
Risk Atlas Nexus uses Large Language Models (LLMs) to infer risks dimensions. Therefore requires access to LLMs to inference or call the model.¶
Available Inference Engines: WML, Ollama, vLLM, RITS. Please follow the Inference APIs guide before going ahead.
Note: RITS is intended solely for internal IBM use and requires TUNNELALL VPN for access.
inference_engine = WMLInferenceEngine(
model_name_or_path="ibm/granite-20b-code-instruct",
credentials={
"api_key": "WML_API_KEY",
"api_url": "WML_API_URL",
"project_id": "WML_PROJECT_ID",
},
parameters=WMLInferenceEngineParams(
max_new_tokens=1000, decoding_method="greedy", repetition_penalty=1
),
)
# inference_engine = OllamaInferenceEngine(
# model_name_or_path="hf.co/ibm-granite/granite-20b-code-instruct-8k-GGUF",
# credentials=InferenceEngineCredentials(api_url="OLLAMA_API_URL"),
# parameters=OllamaInferenceEngineParams(
# num_predict=1000, num_ctx=8192, temperature=0.7, repeat_penalty=1
# ),
# )
# inference_engine = VLLMInferenceEngine(
# model_name_or_path="ibm-granite/granite-3.1-8b-instruct",
# credentials=InferenceEngineCredentials(
# api_url="VLLM_API_URL", api_key="VLLM_API_KEY"
# ),
# parameters=VLLMInferenceEngineParams(max_tokens=1000, temperature=0.7),
# )
# inference_engine = RITSInferenceEngine(
# model_name_or_path="ibm-granite/granite-3.1-8b-instruct",
# credentials={
# "api_key": "RITS_API_KEY",
# "api_url": "RITS_API_URL",
# },
# parameters=RITSInferenceEngineParams(max_tokens=1000, temperature=0.7),
# )
[2025-03-20 10:47:11:888] - INFO - RiskAtlasNexus - Created WML inference engine.
Create an instance of RiskAtlasNexus¶
Note: (Optional) You can specify your own directory in RiskAtlasNexus(base_dir=<PATH>)
to utilize custom AI ontologies. If left blank, the system will use the provided AI ontologies.
risk_atlas_nexus = RiskAtlasNexus()
[2025-03-20 10:47:11:960] - INFO - RiskAtlasNexus - Created RiskAtlasNexus instance. Base_dir: None
Defining Few-Shot Examples for Auto-Assist Functionality¶
This cell showcases the template used in risk_atlas_nexus/data/templates/cot_examples.json
to provide few-shot
examples for auto-assist functionality.
Template Structure:
- Each question is associated with a list of example intents and corresponding answers.
- The format is:
Question
- intents:
(list of example intents)- answers:
(list of corresponding answers for the respective intents above)
In this notebook, we're using a simplified template to cover 4 questions from the Airo questionnaire:
- System environment
- Intended domain
- Utilized techniques
- Targeted user group
Customization:
To adapt this auto-assist functionality to custom questionnaires, users
need to provide their own set of questions, example intents, and
corresponding answers in a json file (e.g., cot_examples.json
). This will enable
the LLM to learn from these few-shot examples and generate responses for
unseen queries.
import json
import os
from risk_atlas_nexus.data import get_templates_path
with open(os.path.join(get_templates_path(), "cot_examples.json")) as f:
cot_data = json.load(f)
print(cot_data)
[{'question': 'In which environment is the system used?', 'examples': {'answers': ["Insurance Claims Processing or Risk Management or Data Analytics \nExplanation: 1. Insurance Claims Processing: The system might be used by an insurance company's claims processing department to analyze and identify patterns in healthcare insurance claims. 2. Risk Management: The system could be applied in risk management teams to detect potential risks and opportunities for cost savings. 3. Data Analytics: The system might be used by a data analytics team within the healthcare insurance company to identify patterns in claims data, helping to inform business decisions.", 'Municipal Waste Management or Private Waste Management departments \nExplanation: Waste management companies need to efficiently collect and process waste while minimizing costs and environmental impact. By using generative AI to analyze historical data on waste generation, recycling rates, and resource utilization, they can optimize collection routes, reduce fuel consumption, and lower labor costs.', 'Distribution Centers or Retail Chains\nExplanation: 1. Distribution Centers: Companies that manage large-scale distribution centers could utilize this system to optimize inventory levels and improve supply chain efficiency. 2. Retail Chains: Large retail chains with complex supply chains might benefit from using this application to summarize and analyze data, identify trends, and make informed decisions.', "Treasury Departments or Asset Management Divisions or Private Banking Units \nExplanation: 1. Treasury Departments: The system might be used by investment banks' treasury departments to manage cash flows, optimize inventory levels, and streamline procurement processes. 2. Asset Management Divisions: Investment banks with asset management divisions could utilize this application to optimize supply chain management for their fund operations. 3. Private Banking Units: Private banking units within investment banks might benefit from using this system to manage wealth management services and optimize supply chain operations.", 'Pharmacologists or Toxicologists or Clinical Trial Managers \nExplanation: Experts that specialize in developing new treatments for various diseases might utilize this application to accelerate their research pipeline or to streamline the drug development process.'], 'intents': ['Find patterns in healthcare insurance claims', 'Generative AI can optimize waste management processes. By analyzing historical data on waste generation, recycling rates, and resource utilization, generative models can generate optimized waste collection routes, recommend recycling strategies, and predict waste generation patterns. This helps in reducing waste generation, optimizing resource allocation, and promoting circular economy practices.', 'Summarize and analyze historic data, industry patterns, create multi-layered models that identify changes as occur, increase supply chain efficiency and optimize inventory levels', 'optimize supply chain management in Investment banks', "In the context of drug repurposing, generative AI can be employed to analyze vast databases of existing drugs and their clinical trials data. By identifying patterns and similarities, the AI can suggest potential new therapeutic indications for existing drugs, based on the chemical structure and pharmacological properties of the APIs. This process can help streamline the drug development pipeline, as it would reduce the need for time-consuming and expensive clinical trials for new indications. For instance, a drug like Atorvastatin, which is currently used to lower cholesterol, could be repurposed for the treatment of diabetic nephropathy, a kidney disease, based on the AI's analysis of similar drugs and their clinical data. This would not only save resources but also provide new treatment options for patients suffering from this debilitating condition. "]}}, {'question': 'What domain is the system intended to be used in? Code/software engineering, Communications, Customer service/support, IT/business automation, Writing assistant, Financial, Technical, Product, Information retrieval, Marketing, Security, User Research, Strategy, Sales, Risk and Compliance, Design, Cybersecurity, Customer service/support & Healthcare, Talent and Organization including HR, Other', 'examples': {'answers': ['Other \nExplanation: Since finding patterns from documents does not fall under any of the categories mentioned ', 'Strategy \nExplanation: Since this task is to improve waste management processes using historical data', 'Strategy \nExplanation: Since this task is to improve supply chain efficiency using trends and patterns', 'Customer service/support \nExplanation: Since the task relates to human conversations or generating human converstations or support. Writing assitant \nExplanation: Since this helps in improving the quality of text. It is not customer service since this on on the quality of text rather than helping in human conversations.', 'Strategy \nExplanation: Since the task is involved in improving the processes to ensure better performance. It is not finance since the task is on supply chain optimization and not on financial aspects even though the application domain is banks.', 'Healthcare and strategy \nExplanation: Since the task is related to healthcare and drug repurposing, which involves analyzing data related to drugs and their clinical trials, this falls under the healthcare domain. It also involves Strategy it talks about using patterns to create new treatment options.'], 'intents': ['Find patterns in healthcare insurance claims', 'Generative AI can optimize waste management processes. By analyzing historical data on waste generation, recycling rates, and resource utilization, generative models can generate optimized waste collection routes, recommend recycling strategies, and predict waste generation patterns. This helps in reducing waste generation, optimizing resource allocation, and promoting circular economy practices.', 'Summarize and analyze historic data, industry patterns, create multi-layered models that identify changes as occur, increase supply chain efficiency and optimize inventory levels', 'Ability to create dialog flows and integrations from natural language instructionscheck if a document has grammatical mistakes', 'optimize supply chain management in Investment banks', "In the context of drug repurposing, generative AI can be employed to analyze vast databases of existing drugs and their clinical trials data. By identifying patterns and similarities, the AI can suggest potential new therapeutic indications for existing drugs, based on the chemical structure and pharmacological properties of the APIs. This process can help streamline the drug development pipeline, as it would reduce the need for time-consuming and expensive clinical trials for new indications. For instance, a drug like Atorvastatin, which is currently used to lower cholesterol, could be repurposed for the treatment of diabetic nephropathy, a kidney disease, based on the AI's analysis of similar drugs and their clinical data. This would not only save resources but also provide new treatment options for patients suffering from this debilitating condition. "]}}, {'question': 'What techniques are utilised in the system? Multi-modal: {Document Question/Answering, Image and text to text, Video and text to text, visual question answering}, Natural language processing: {feature extraction, fill mask, question answering, sentence similarity, summarization, table question answering, text classification, text generation, token classification, translation, zero shot classification}, computer vision: {image classification, image segmentation, text to image, object detection}, audio:{audio classification, audio to audio, text to speech}, tabular: {tabular classification, tabular regression}, reinforcement learning', 'examples': {'answers': ['Natural language processing: text classification \nExplanation: Health insurance claims can be find patterns or to classify claims into categories (e.g., diagnosis, procedure, treatment).', "Natural language processing: text generation and reinforcement learning \nExplanation: Text generation techniques are used to generate reports, recommendations, and insights from the analysis of historical data. and Reinforcement learning algorithms can be employed to optimize the generative model's performance in real-time, based on feedback from the environment (e.g., changes in waste generation rates).", "Natural Language Processing: Text Classification and Token Classification \nExplanation: Specifically, Text Classification is used to classify the input text into categories such as 'grammatically correct' or 'contains grammatical errors' Techniques like token classification (also known as part-of-speech tagging) are employed to analyze the document's structure and identify potential grammatical mistakes"], 'intents': ['Find patterns in healthcare insurance claims', 'Generative AI can optimize waste management processes. By analyzing historical data on waste generation, recycling rates, and resource utilization, generative models can generate optimized waste collection routes, recommend recycling strategies, and predict waste generation patterns. This helps in reducing waste generation, optimizing resource allocation, and promoting circular economy practices.', 'Ability to create dialog flows and integrations from natural language instructionscheck if a document has grammatical mistakes']}}, {'question': 'Who is the intended user of the system?', 'examples': {'answers': ['Insurance companies, government agencies, or other organizations responsible for reimbursing healthcare providers \nExplanation: Healthcare payers need to efficiently process and reimburse claims while minimizing errors and disputes. By identifying patterns in claims data, they can automate routine tasks, detect potential errors or anomalies, and improve overall payment accuracy', 'Waste Management Companies \nExplanation: Waste management companies need to efficiently collect and process waste while minimizing costs and environmental impact. By using generative AI to analyze historical data on waste generation, recycling rates, and resource utilization, they can optimize collection routes, reduce fuel consumption, and lower labor costs.', 'Supply Chain companies\nExplanation: Supply chain managers need to efficiently manage inventory levels, shipping routes, and logistics operations while minimizing costs and maximizing customer satisfaction. By using the system to summarize and analyze historic data, industry patterns, and create multi-layered models that identify changes as they occur, supply chain managers can gain insights into optimal inventory levels, efficient shipping routes, and streamlined logistics processes', 'Investment banks or asset management banks \nExplanation: have complex supply chains that involve the movement of goods, services, and financial instruments. By using the system to optimize supply chain management, financial institutions can minimize costs, maximize efficiency, and reduce risks associated with inventory levels, shipping routes, and logistics operations', 'Pharmaceutical Companies or Biotechnology Companies \nExplanation: companies have a vested interest in optimizing their product portfolios and reducing development costs. By using the system to analyze vast databases of existing drugs and clinical trials data, pharmaceutical companies can quickly identify potential new indications for existing medications, thereby streamlining the drug development pipeline'], 'intents': ['Find patterns in healthcare insurance claims', 'Generative AI can optimize waste management processes. By analyzing historical data on waste generation, recycling rates, and resource utilization, generative models can generate optimized waste collection routes, recommend recycling strategies, and predict waste generation patterns. This helps in reducing waste generation, optimizing resource allocation, and promoting circular economy practices.', 'Summarize and analyze historic data, industry patterns, create multi-layered models that identify changes as occur, increase supply chain efficiency and optimize inventory levels', 'optimize supply chain management in Investment banks', "In the context of drug repurposing, generative AI can be employed to analyze vast databases of existing drugs and their clinical trials data. By identifying patterns and similarities, the AI can suggest potential new therapeutic indications for existing drugs, based on the chemical structure and pharmacological properties of the APIs. This process can help streamline the drug development pipeline, as it would reduce the need for time-consuming and expensive clinical trials for new indications. For instance, a drug like Atorvastatin, which is currently used to lower cholesterol, could be repurposed for the treatment of diabetic nephropathy, a kidney disease, based on the AI's analysis of similar drugs and their clinical data. This would not only save resources but also provide new treatment options for patients suffering from this debilitating condition. "]}}, {'question': 'What is the intended purpose of the system?', 'examples': {'answers': ['To create a centralized monitoring system that allows executives to visualize key sales metrics, identify trends, and make data-driven decisions about resource allocation and business strategy.', 'To design an automated HR workflow that streamlines the employee onboarding experience, reducing administrative burden and improving the overall efficiency of the organization.', 'To develop a predictive analytics system that identifies high-value potential customers and provides actionable insights for sales teams, enabling them to prioritize leads and optimize marketing efforts.', 'To create a virtual replica of the physical manufacturing environment, allowing operations teams to analyze and optimize production workflows, reduce waste, and improve overall efficiency and quality'], 'intents': ['Provide a dashboard to track sales performance by region and product line', 'Automate the process of onboarding new employees, including generating necessary documents and assigning benefits', 'Create a recommendation engine to suggest potential clients based on our existing customer base and sales history', 'Develop a digital twin of our manufacturing process to simulate production schedules, equipment usage, and material requirements']}}, {'question': 'What is the application of the system?', 'examples': {'answers': ['Natural Language Processing (NLP): Analyze sales data to identify trends, patterns, and correlations between regions, products, and sales metrics.\n Machine Learning (ML): Develop predictive models to forecast future sales performance based on historical data and seasonal fluctuations.\n Computer Vision: Integrate with CRM systems to visualize sales pipelines, track customer interactions, and monitor pipeline health.', 'Robotic Process Automation (RPA): Automate manual tasks such as generating new hire paperwork, updating HR systems, and scheduling interviews. \n Document Generation: Use AI-powered document generation tools to create customized new hire packets, including contracts, benefits information, and employee handbooks. Predictive Analytics: Analyze new hire data to identify potential risks or challenges and provide personalized recommendations for onboarding.', 'Collaborative Filtering: Analyze customer behavior and preferences to identify patterns and correlations that can inform product recommendations.\n Knowledge Graph: Build a graph-based knowledge model of customers, products, and services to enable more accurate matchings.\n Text Analysis: Use NLP to analyze customer feedback and reviews to identify sentiment trends and improve product development.', 'Simulation Software: Develop AI-powered simulation models that can predict production outcomes under various scenarios, including equipment failures or material shortages.\n Predictive Maintenance: Analyze sensor data from equipment to predict maintenance needs and optimize scheduling. \n Optimization Engine: Use machine learning algorithms to optimize production processes and minimize waste.'], 'intents': ['Provide a dashboard to track sales performance by region and product line', 'Automate the process of onboarding new employees, including generating necessary documents and assigning benefits', 'Create a recommendation engine to suggest potential clients based on our existing customer base and sales history', 'Develop a digital twin of our manufacturing process to simulate production schedules, equipment usage, and material requirements']}}, {'question': 'What is the subject of the system?', 'examples': {'answers': ['Employees.The system would need to track and analyze data on employee productivity, engagement, and performance across various locations.', 'Business Customers. The system would need to manage data on business customer interactions, including: Customer contact information and preferences, Purchase history and payment records, Invoicing and payment tracking', 'Corporate Travelers. The system would need to collect and analyze data on the preferences and behavior of high-value corporate travelers, including: Customer flight information and itineraries, Hotel reservations and preferences, meal choices and dining experiences', 'Equipment Owners. The system would need to manage data on the condition and usage of critical equipment, including: Equipment serial numbers and models, Maintenance history and records, Sensor data and sensor readings'], 'intents': ['Provide real-time visibility into employee productivity and engagement across multiple locations', 'Automate invoicing and payment processing for large B2B customers', 'Create a personalized travel experience for high-value corporate clients', 'Develop a predictive maintenance model for critical equipment across multiple industries']}}]
There are two ways to use the inference engine to get the LLM outputs. generate_zero_shot_output
which gives the zero-shot output for the question and generate_few_shot_output
which gives the output using few-shot examples defined above.
questions = []
for data in deepcopy(cot_data):
questions.append(data["question"])
usecase = "Generate personalized, relevant responses, recommendations, and summaries of claims for customers to support agents to enhance their interactions with customers."
results = risk_atlas_nexus.generate_zero_shot_output(
inference_engine, usecase, questions
)
results
['Environment: The system is used in a call center environment to support customer service agents in their interactions with customers.', 'The system is intended to be used in the domain of customer service/support.', 'Multi-modal: {Document Question/Answering, Image and text to text, Video and text to text, visual question answering}, Natural language processing: {feature extraction, fill mask, question answering, sentence similarity, summarization, table question answering, text classification, text generation, token classification, translation, zero shot classification}, computer vision: {image classification, image segmentation, text to image, object detection}, audio:{audio classification, audio to audio, text to speech}, tabular: {tabular classification, tabular regression}, reinforcement learning', 'The intended user of the system is the customer support agent who needs to generate personalized, relevant responses, recommendations, and summaries of claims for customers to enhance their interactions with customers.', 'The intended purpose of the system is to provide personalized, relevant responses, recommendations, and summaries of claims for customers to support agents to enhance their interactions with customers.', 'The application of the system is to generate personalized, relevant responses, recommendations, and summaries of claims for customers to support agents to enhance their interactions with customers.', 'The subject of the system is customer support.']
cot_data_with_examples = deepcopy(cot_data)
usecase = "Generate personalized, relevant responses, recommendations, and summaries of claims for customers to support agents to enhance their interactions with customers."
results = risk_atlas_nexus.generate_few_shot_output(
inference_engine, usecase, cot_data_with_examples
)
results
Inferring with OLLAMA: 100%|██████████| 7/7 [00:36<00:00, 5.27s/it]
['[Given: Generate personalized, relevant responses, recommendations, and summaries of claims for customers to support agents to enhance their interactions with customers.] [Question: In which environment is the system used?] Output: [Insurance Claims Processing or Customer Service or Call Center Operations] Explanation: 1. Insurance Claims Processing: The system might be used by insurance companies to analyze and process claims, providing personalized responses and recommendations to customers. 2. Customer Service: The system could be applied in customer service teams to', 'Based on the examples provided, I will attempt to categorize the given task into the corresponding domain. Example 1: [Given: Generate personalized, relevant responses, recommendations, and summaries of claims for customers to support agents to enhance their interactions with customers.] [Question: What domain is the system intended to be used in? ...] Output: [Customer service/support] Explanation: Since the task involves generating responses and summaries to support agents in their interactions with customers, it falls under the domain of Customer', 'Here are the answers to the question "What techniques are utilised in the system?" for the given examples: 1. [Given: Find patterns in healthcare insurance claims] Output: [Natural language processing: text classification Explanation: Health insurance claims can be classified into categories (e.g., diagnosis, procedure, treatment).] 2. [Given: Generative AI can optimize waste management processes. By analyzing historical data on waste generation, recycling rates, and resource utilization, generative models can generate optimized', '[Given: Generate personalized, relevant responses, recommendations, and summaries of claims for customers to support agents to enhance their interactions with customers.] [Question: Who is the intended user of the system?] Output: [Customer Service Representatives or Insurance Agents Explanation: Customer service representatives and insurance agents need to provide accurate and relevant information to customers to resolve claims and provide excellent customer service. By using the system to generate personalized responses, recommendations, and summaries of claims, customer service representatives can enhance their interactions with', '[Given: Generate personalized, relevant responses, recommendations, and summaries of claims for customers to support agents to enhance their interactions with customers.] [Question: What is the intended purpose of the system?] Output: [To create an intelligent customer interaction platform that empowers agents to deliver exceptional customer experiences by providing them with actionable insights, personalized content, and automated support, ultimately driving increased customer satisfaction, loyalty, and revenue growth.]', '[Given: Generate personalized, relevant responses, recommendations, and summaries of claims for customers to support agents to enhance their interactions with customers.] [Question: What is the application of the system?] Output: 1. Natural Language Processing (NLP): Analyze customer data and sentiment to generate personalized responses and recommendations for agents. 2. Machine Learning (ML): Develop predictive models to forecast customer needs and preferences, enabling agents to provide more accurate and relevant support. 3. Text Analysis: Use NLP', '[Given: Generate personalized, relevant responses, recommendations, and summaries of claims for customers to support agents to enhance their interactions with customers.] [Question: What is the subject of the system?] Output: [Customers. The system would need to collect and analyze data on customer claims, including: Claim details and history, Customer preferences and communication records, Agent interactions and response data.] Alternatively, a more specific output could be: [Customers. The system would need to collect and analyze data on individual customer']