Skip to content

Recognising location with Twilio, Python and Speech Recognition

python code example

This guide assumes you know a little bit of Python and Twilio already. Using Python with the Flask framework, I’ll show you how I handle speech recognition specifically for postcodes in Twilio. The goal of this code is to allow a caller to say their postcode and direct the caller based on what they said.

Code Explained

Imports

These are all the modules needed to make the code work. textblob is a simple tool we can use to classify if a piece of text is positive or negative. text2digits does what it says and converts text to digits.

import re
from flask import Flask, request, session
from twilio.twiml.voice_response import VoiceResponse, Gather
from twilio.rest import Client
import os
from textblob.classifiers import NaiveBayesClassifier
from datetime import datetime
from text2digits import text2digits

Setting up

Next we need to set up a few things before we can proceed. The secret key is needed to make your Flask app run. Then we add in some postcode hints and phonetic hints. These will be used by Twilio when doing speech recognition later.

dateTimeObj = datetime.now()
now = dateTimeObj.strftime("%d/%m/%Y, %H:%M:%S")

app = Flask(__name__)
app.secret_key = "xxxxxxxxxxxxxxxxxx"

postcode_hints = "insert postcode hints here like RG12 7HT"

phonetic_hints = "insert phonetic hints here like Romeo Golf one two seven Hotel Tango"

# TWILIO ACCOUNT DETAILS

# Set them as environment variables when you start
account_sid = os.environ['TWILIO_ACCOUNT_SID']
auth_token = os.environ['TWILIO_AUTH_TOKEN']
client = Client(account_sid, auth_token)

# Training data for yes/no sentiment analysis
train = [
    ("Yes", "pos"),
    ("Yeah", "pos"),
    ("Yep", "pos"),
    ("yes", "pos"),
    ("yes that's correct", "pos"),
    ("that's right", "pos"),
    ("no that is incorrect", "neg"),
    ("no that's incorrect", "neg"),
    ("no that's not right", "neg"),
    ("that's not correct", "neg"),
    ("no that's wrong", "neg"),
    ("nope", "neg"),
    ("No.", "neg"),
    ("no", "neg"),
    ("nah", "neg")
]

# Classifier takes the training data for use later
cl = NaiveBayesClassifier(train)

Get Postcode

The first real part of the call is to collect the caller’s postcode. We use Twilio’s gather verb to do this. You can see we’ve set the language to en-GB and have used our postcode_hints variable to help Twilio transcribe what the caller might be saying. After Twilio has finished gathering the speech, it will go to the gather_get_postcode section. That section just checks if the caller has actually said anything or not and splits based on that data. From there we remove any strange symbols that might be in the text and ask the caller if what we recognised was correct.

# Get Postcode
@app.route("/get_postcode_b", methods=["GET", "POST"])
def get_postcode_b():
    resp = VoiceResponse()

    gather = Gather(input="speech",
                    speechTimeout="auto",
                    language="en-GB",
                    hints=postcode_hints,
                    action="/gather_get_postcode_b")

    # Ask for postcode
    gather.say("Please say your full postcode now.")
    resp.append(gather)

    # Didn't hear anything so go and check count and go to no_1 or no_fallback
    resp.redirect("/no_b")

    return str(resp)

Gather Is It Correct

When ever we use a gather verb in Twilio, we then check if the caller said anything by checking the SpeechResult. We’re hoping they said yes so we use your sentiment analyser to see if the speech result was “pos” or “neg”. If the sentiment is positive, happy days! We recognised their postcode and we can now transfer them to a local agent.

If the caller answered anything other than yes, we’re going to go to our no section which would act as fallback logic. You can place anything you like in the fallback.

# Gather Is It Correct
@app.route("/gather_is_it_correct_b", methods=["GET", "POST"])
def gather_is_it_correct_b():
    resp = VoiceResponse()

    # If the caller said something
    if "SpeechResult" in request.values:
        # Set what the caller said
        answer = request.values["SpeechResult"]
        # Do sentiment analysis on the text to return pos or neg
        sentiment = cl.classify(answer)

        # If they said something like "yes"
        if sentiment == "pos":
            resp.redirect("/local_area_transfer_b")
        # If they said something like "no"
        else:
            # Didn't hear anything so go and check count and go to no or no_fallback
            resp.redirect("/no_b")

    else:
        # Didn't hear anything so go and check count and go to no or no_fallback
        resp.redirect("/no_b")

    return str(resp)

And that’s a simple way of gathering a postcode with Twilio. We asked the caller for their postcode, transcribed it into text using hints then asked if what we had transcribed was correct using our sentiment analysis.


Head over to the contact page and get in touch if you have any comments or follow me on social media and say hi.