# Create a custom IBM Watson™ Speech to Text language and acoustic model
---
## Pre-requisites:
- [Install Watson Libraries](#Install-Watson-libraries)
- [Import Dependencies](#Import-Dependencies)
- [Add Speech to Text credentials](#Add-Speech-to-Text-credentials)
- [Setup and configure speech credentials](#Setup-and-configure-speech-credentials)

## Test the Lite models:
- [List all the available speech models](#(Optional)-List-all-the-available-speech-models)
- [Print out the details of the Narrowband model](#(Optional)-Print-out-the-details-of-the-Narrowband-model)
- [Transcribe using the base speech (Narrowband US-English) model](#Transcribe-using-the-base-speech-(Narrowband-US-English)-model)
- [Print the transcript](#(Optional)-Print-the-transcription)

## Follow these steps to create, add contents to, and train a custom language model for the IBM Watson™ Speech to Text service:
1. [Create a custom language model](1.-Create-a-custom-language-model)
1. [Add text corpus file for language training](#2.-Add-text-corpus-file-for-language-training)
1. [Print the details of the corpora](#3.-Print-the-details-of-the-corpora)
1. [Train the custom language model](#4.-Train-the-custom-language-model)
1. [Check status of the custom language model](#5.-Check-status-of-the-custom-language-model)

## Follow these steps to create a custom acoustic model for the IBM Watson™ Speech to Text service:
1. [Create a custom acoustic model](#1.-Create-a-custom-acoustic-model)
1. [Add audio file to the acoustic model](#2.-Add-audio-file-to-the-acoustic-model)
1. [List all the audio files used for acoustic modeling](#3.-List-all-the-audio-files-used-for-acoustic-modeling)
1. [List the details of the acoustic model](#4.-List-the-details-of-the-acoustic-model)
1. [Train the acoustic model](#5.-Train-the-acoustic-model)
1. [Check status of the custom acoustic model](#6.-Check-status-of-the-custom-acoustic-model)

## Test the trained models:
1. [Run transcription using the custom acoustic & language models](#1.-Run-transcription-using-the-custom-acoustic-&-language-models)
1. [Print the transcript from the custom model](#2.-Print-the-transcript-from-the-custom-model)

---

## Pre-requisites:
---

### Install Watson libraries

In [None]:
!pip install --upgrade "ibm-watson>=5.3.0"

### Import Dependencies

In [1]:
import json
import time

### Add Speech to Text credentials

> Add the credentials created in 2.1. [Create Watson Speech to Text service on IBM Cloud](https://github.com/IBM/video-summarizer-using-watson/blob/main/README.md#21-create-watson-speech-to-text-service-on-ibm-cloud)

In [2]:
credentials = {
    
  }

### Setup and configure speech credentials

In [3]:
from ibm_watson import SpeechToTextV1
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator

authenticator = IAMAuthenticator(credentials['apikey'])
speech_to_text = SpeechToTextV1(
    authenticator=authenticator
)

speech_to_text.set_service_url(credentials['url'])

---

## Test the Lite models:

#### These are the pre trained models provided by IBM Watson™ Speech to Text
---

### (Optional) List all the available speech models

In [None]:
speech_models = speech_to_text.list_models().get_result()
print(json.dumps(speech_models, indent=2))

### (Optional) Print out the details of the Narrowband model
>Note: `Narrowband` Model is good for transcribing Human to Human conversations and `Broadband` Model is good for transcribing Human to Bot conversations

In [None]:
speech_model = speech_to_text.get_model('en-US_NarrowbandModel').get_result()
print(json.dumps(speech_model, indent=2))

### (Optional) Transcribe using the base speech (Narrowband US English) model

In [None]:
filename = 'datasets/ste-training-data/audios/chunk0.wav'

with open(filename, 'rb') as audio_file:
    speech_recognition_results = speech_to_text.recognize(
        audio=audio_file,
        content_type='audio/wav',
        model='en-US_NarrowbandModel',
        timestamps=True,
        speaker_labels=True,
        word_alternatives_threshold=0.9,
        smart_formatting=True
    ).get_result()

In [None]:
#print(json.dumps(speech_recognition_results, indent=2))

### (Optional) Print the transcription

In [None]:
transcript = ''
for chunks in speech_recognition_results['results']:
    if 'alternatives' in chunks.keys():
        alternatives = chunks['alternatives'][0]
        if 'transcript' in alternatives:
            transcript = transcript + alternatives['transcript']
print(transcript)

---

## Follow these steps to create, add contents to, and train a custom language model for the IBM Watson™ Speech to Text service:
---

### 1. Create a custom language model

In [4]:
model_name = 'STE language model'
model_base = 'en-US_NarrowbandModel'
model_description = 'Custom Lang Model for ST Engineering'

In [5]:
language_model = speech_to_text.create_language_model(
    model_name,
    model_base,
    description=model_description).get_result()

### (Optional) Print the customization ID

In [None]:
print(json.dumps(language_model, indent=2))

### (Optional) Print all the language models

In [None]:
language_models = speech_to_text.list_language_models().get_result()
print(json.dumps(language_models, indent=2))

In [8]:
custom_lang_narrowband_model_id = language_model.get('customization_id')

### 2. Add text corpus file for language training

We have used the following datasets:
- `data/earnings-call-corpus-file.wav`

In [9]:
with open('data/earnings-call-corpus-file.txt','rb') as corpus_file:
    speech_to_text.add_corpus(
        custom_lang_narrowband_model_id,
        'corpus-file.txt',
        corpus_file
    )

### 3. Print the details of the corpora

In [None]:
corpora = speech_to_text.list_corpora(custom_lang_narrowband_model_id).get_result()
print(json.dumps(corpora, indent=2))

### (Optional) Print all the custom words

In [None]:
words = speech_to_text.list_words(custom_lang_narrowband_model_id).get_result()
print(json.dumps(words, indent=2))

### (Optional) Add grammar words

In [None]:
# with open(join(dirname('.'), './.', '{localtion-path}/productionWords.abnf'),
#                'rb') as grammar_file:
#     speech_to_text.add_grammar(
#         custom_lang_broadband_model_id,
#         '{Grammar-name}',
#         grammar_file,
#         'application/srgs'
#     )
# Poll for grammar status.

### (Optional) Upgrade the language model

In [None]:
speech_to_text.upgrade_language_model(custom_lang_narrowband_model_id)

### 4. Train the custom language model

In [None]:
speech_to_text.train_language_model(custom_lang_narrowband_model_id)

### 5. Check status of the custom language model
> The status has to become `available`

In [None]:
# Get status of the language model - wait until it is 'available'
language_models = speech_to_text.list_language_models().get_result()
models = language_models["customizations"]

for model in models:
    if model['customization_id'] == custom_lang_narrowband_model_id: 
        print(model['status'])

> Learn more about creating an IBM Watson™ Speech to Text language model here: <https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-languageCreate>

---

## Follow these steps to create a custom acoustic model for the IBM Watson™ Speech to Text service:
---

### 1. Create a custom acoustic model

In [19]:
model_name = 'STE acoustic model'
model_base = 'en-US_NarrowbandModel'
model_description = 'Custom Acoustic Model for ST Engineering'

In [20]:
acoustic_model = speech_to_text.create_acoustic_model(
    model_name,
    model_base,
    description=model_description).get_result()

### (Optional) Print the customization ID

In [None]:
print(json.dumps(acoustic_model, indent=2))

### (Optional) Print all the acoustic models

In [None]:
acoustic_models = speech_to_text.list_acoustic_models().get_result()
print(json.dumps(acoustic_models, indent=2))

In [23]:
custom_acoustic_narrowband_model_id = acoustic_model["customization_id"]

### 2. Add audio file to the acoustic model

We have used the following datasets:
- `data/earnings-call-2019-train1.wav`
- `datasets/earnings-call-2019-train2.wav`

In [29]:
audioFilePath = 'data/earnings-call-2019-train1.wav'
audioFileName = audioFilePath.split('/')[3]

with open(audioFilePath, 'rb') as audio_file:
    speech_to_text.add_audio(
        custom_acoustic_narrowband_model_id,
        audioFileName,
        audio_file,
        content_type='audio/wav'
    )
# Poll for audio status.

### 3. List all the audio files used for acoustic modeling

In [None]:
audio_resources = speech_to_text.list_audio(custom_acoustic_narrowband_model_id).get_result()
print(json.dumps(audio_resources, indent=2))

### 4. List the details of the acoustic model

In [None]:
acoustic_model = speech_to_text.get_acoustic_model(custom_acoustic_narrowband_model_id).get_result()
print(json.dumps(acoustic_model, indent=2))

### (Optional) Upgrade the acoustic model

In [None]:
speech_to_text.upgrade_acoustic_model(custom_acoustic_narrowband_model_id)

### 5. Train the acoustic model

In [None]:
speech_to_text.train_acoustic_model(custom_acoustic_narrowband_model_id) 

###  6. Check status of the custom acoustic model
> The status has to become `available`

In [None]:
# Get status of the acoustic model - wait until it is 'available'
acoustic_models = speech_to_text.list_acoustic_models().get_result()
models = acoustic_models["customizations"]

for model in models:
    if model['customization_id'] == custom_acoustic_narrowband_model_id: 
            print(model['status'])

> Learn more about creating an IBM Watson™ Speech to Text acoustic model here: <https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-acoustic>

---

## Test the trained models:
---

### 1. Run transcription using the custom acoustic & language models

In [None]:
filename = 'datasets/ste-training-data/audios/chunk0.wav'

with open(filename ,'rb') as audio_file:
    speech_recognition_results = speech_to_text.recognize(
        audio=audio_file,
        content_type='audio/wav',
        model='en-US_NarrowbandModel',
        customization_id=custom_lang_narrowband_model_id,
        acoustic_customization_id=custom_acoustic_narrowband_model_id,
        timestamps=True,
        word_alternatives_threshold=0.9,
        keywords=['derailment case', 'start length', 'drive side quarter', 'driver impression', 'severity'],
        keywords_threshold=0.5).get_result()

In [None]:
#print(json.dumps(speech_recognition_results, indent=2))

### 2. Print the transcript from the custom model

In [None]:
transcript2 = ''
for chunks in speech_recognition_results['results']:
    if 'alternatives' in chunks.keys():
        alternatives = chunks['alternatives'][0]
        if 'transcript' in alternatives:
            transcript2 = transcript2 + alternatives['transcript']
print(transcript2)

---

## (Optional) Delete the custom models:
---

### Delete the custom language model

In [None]:
speech_to_text.delete_language_model(custom_lang_narrowband_model_id)

### Delete the custom acoustic model

In [None]:
speech_to_text.delete_acoustic_model(custom_acoustic_narrowband_model_id)

---