IBM Research

View the Project on GitHub IBM/customized-voice-text-bot-for-whatsapp-telegram


Learn more about

Deploying to IBM Cloud and others
Telegram
WhatsApp
Running locally
Telegram
WhatsApp

Setup Watson Services
Setup Data Storage
Customizing Text-to-Speech
Customizing Speech-to-Text


Motivation

Small business owners (SBOs) face several challenges when asking for micro-credit loans from financial institutions. Usual difficulties include low credit scores, unbaked situation, outstanding debts, informal employment situations, inability to showcase their payable capacity, and lack of financial guarantor. Moreover, SBOs often find it hard to apply for micro-credit loans due to bureaucracy, documentation proof, and lack of information on proceeding. That is why banks and non-profit organizations have credit agents and advisors to give them directions to help them. It is particularly challenging for these credit advisors in the Global South context to assess creditworthiness and act accordingly. Credit agents are usually SBOs first point of contact. They perform the screening process of credit access before submitting it to the decision-makers, the bank analysts.

Both Credit agents and SBOs could have their lives facilitated by an Assistant whose deployment could be done through a messaging Application. One of the most popular messaging application not only in Brazil, but also around globe is WhatsApp which has more than 2 billion users. WhatsApp allows users to interact through text and voice.

This system proposes a text and voice conversational user interface deployed for WhatsApp using IBM Watson Speech-to-text, IBM Watson Assistant and IBM Watson Text-to-speech which integrates these services using third-party known as Twilio. Data can be stored in IBM Cloud Object Storage and IBM Cloudant.

Introduction

In this website, expect to find information regarding these topics:

Here are some explanations of terms that you will see in this website. Consistency in word choice was the goal, but there may be a few instances in which terms have been mixed around. If there's any confusion, feel free to open an issue for clarification, commenting or sending us a message.

App Architecture

See ./src/images/architecture_homepage.png

Description

As shown in the diagram, a user interacts with the chatbot by sending a message on WhatsApp to a specific Twilio number. Then Twilio forwards the message to the Flask application using a public endpoint provided by the containerized and deployed application running on Code Engine. The Flask application then identifies if the message received was an audio message or text message and processes it to return the response from Watson Assistant in the appropriate manner, whatever it may be (text or audio), according to the user's input.

If the received message is a voice recording

If it is an audio message, the app sends the audio file to Speech-to-Text and receives its transcription in text format. With the transcription, it sends a request to Watson Assistant for its response. Once Watson Assistant's response returns, it sends Watson Assistant's text response to Text-to-Speech and downloads the resulting audio transcription. Then it sends this audio back to the WhatsApp user via the Twilio API, along with other media responses, if any.

If the received message is a text

If it is a text message, the app sends the text message directly to Watson Assistant for its response. Once it receives its response, the app sends the response to Twilio, which forwards it to the user.