This guide assumes you know a little bit of Python and Twilio already. Using Python with the Flask framework, I’ll show you how I handle speech recognition in Twilio. I’ll also give a brief overview of how I make my own visual reports for clients.
tl/dr: Download my example code here: https://drive.google.com/file/d/1PYRT_MD9e8NTNv3RZhJKuPq-9P17BOeO/view?usp=sharing
The telephone service I’ve built is for reporting issues such as potholes, blocked drains and broken traffic lights but it can be used for any scenario you wish. It uses speech recognition to navigate through the service. Twilio transcribes the speech then sends the transcription to Dialogflow which returns an intent. Twilio can then make decisions based on this intent.
Services You’ll Need
Twilio is the glue that holds everything together. The first thing you’ll need to do is purchase a number. They only cost $1 a month. When a call comes in to that number, it point it to the Google App Engine url and entry point where the Python code is hosted. More on that later.
Dialogflow is used to get an intent from what the user says on the phone.
User speaks → Twilio transcribes → Transcribed text sent to Dialogflow → Dialogflow returns an intent name to Twilio.
The database is used to log stats and to check opening hours. I’ve listed the fields I use for the stats logging table.
- sid: Automatically created by Twilio
- section: Manually created in the code
- caller_id: The caller’s CLI as recognised by Twilio
- service_number: The number of the service the caller rang as recognised by Twilio
- asr: Split into 2 parts; the transcribed text and the intent returned by Dialogflow
- is_correct: The service asks the question “is this correct” when checking if it understood the intent correctly. It logs either yes or no in here. Everything else is set as “na” for not available
- call_duration: When Twilio ends a call, it produces a call duration value. This is sent to Integromat automatically via a webhook which then adds it to the database
- timestamp: Logging the time each section was triggered
- date: Logging the day the service was used
Integromat is only used for one aspect of the service and that’s to log the call duration to the database. When editing a number on the Twilio dashboard, there is “call status changes” box to add a link to. That’s the webhook URL so that when ever a call ends, the webhook gets triggered and the call duration is added to the database.
Google Data Studio
Data Studio visualises the stats that have been collected in the log table. The first page shows a heat map of the visited modules, average call duration, total calls, total transfers, transfers by section and logs the percent that said ASR was correct or not.
Page 2 shows the total call duration per day as a nice line graph.
Page 3 shows the total calls per day as a bar chart.
Google App Engine
Google App Engine is where the live Python code is hosted and where the Twilio phone number points to.
When you login to App Engine, click the button in the top right to activate Cloud Shell. Then click the button to open it in a new window and finally click the pencil button to open the editor window. This is where you can see the code itself.
There are 3 main files needed to make the app work:
app.yaml This tells App Engine that we’re using Python and allows you to list any environment variables you might want to use in your code but not want the public to see. In our code we list the Twilio account details and the database connection details.
runtime: python env: flex entrypoint: gunicorn -b :$PORT main:app env_variables: TWILIO_ACCOUNT_SID: "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" TWILIO_AUTH_TOKEN: "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" DB_HOST: "database" DB_USER: "user" DB_PASSWD: "password" DB_DATABASE: "table" runtime_config: python_version: 3
requirements.txt This lists all the required Python packages the service needs to work.
cachetools==4.1.0 certifi==2020.4.5.1 chardet==3.0.4 click==7.1.1 dialogflow==0.8.0 Flask==1.1.2 google-api-core==1.17.0 google-auth==1.14.0 google-cloud==0.34.0 google-cloud-language==1.3.0 google-cloud-speech==1.3.2 googleapis-common-protos==1.51.0 grpcio==1.27.2 gunicorn==20.0.4 idna==2.9 itsdangerous==1.1.0 Jinja2==2.11.1 joblib==0.15.1 MarkupSafe==1.1.1 mysql==0.0.2 mysql-connector==2.2.9 mysqlclient==2.0.1 protobuf==3.11.3 pyasn1==0.4.8 pyasn1-modules==0.2.8 PyJWT==1.7.1 pytz==2019.3 regex==2020.6.8 requests==2.23.0 rsa==4.0 six==1.14.0 tqdm==4.46.1 twilio==6.38.0 urllib3==1.25.8 Werkzeug==1.0.1
main.py This is our actual Python code for the service.
There is one more file in there which is a json file that’s referenced in the code. It’s the Dialogflow credentials we need to do the speech recognition.
To deploy your code, you first need to cd into the right directory. Then you need to activate the correct Google App Engine project so you do
gcloud config set project project-id. Finally, we can deploy our app using
gcloud app deploy which will deploy the main.py file.
The code is written in Python using the Flask framework. Download my example code here: https://drive.google.com/file/d/1PYRT_MD9e8NTNv3RZhJKuPq-9P17BOeO/view?usp=sharing
The general setup is that each section of code is a separate function with a route into it like /intro.
The code below shows the intro.
VoiceResponse() is what Twilio needs to work.
play() can play an audio file.
redirect() is where that code block will go to once it’s finished processing.
return str(resp) is needed at the end of every block to make it work. The session variables are what we’ll be logging to the database as the stats.
# Intro @app.route("/intro", methods=["GET", "POST"]) def intro(): resp = VoiceResponse() # Stats session["call_sid"] = request.values.get("CallSid") session["section"] = "0.Welcome" session["caller"] = request.values.get("From") session["asr"] = "na" session["correct"] = "na" stats_section() # Setting the "choice" variable so no key error session["choice"] = "test" resp.play( "https://audiofile1" ) resp.redirect("/welcome") return str(resp)
Import the required Python modules and packages.
from flask import Flask, request, session: Flask is a Python framework for developing web apps.
from twilio.twiml.voice_response import VoiceResponse, Gather, Dial and
from twilio.rest import Client: Both of these make Twilio work.
import dialogflow_v2 as dialogflow: Dialogflow lets us do ASR.
from datetime import datetime, date: Useful for manipulating dates and times in Python.
from pytz import timezone: Lets us easily set the timezone we’re working in.
import os: Using this means we can set environment variables.
import mysql.connector as mysql: Used to connect our Python code to MySQL databases.
from flask import Flask, request, session from twilio.twiml.voice_response import VoiceResponse, Gather, Dial from twilio.rest import Client import dialogflow_v2 as dialogflow from datetime import datetime, date from pytz import timezone import os import mysql.connector as mysql
The Twilio account details and database connection details are set as environment variables which are listed in the app.yaml file. When testing locally, you can use something like
set DB_HOST=database.net. If the code can’t connect to the database then it prints an error and carries on. We don’t want the entire service breaking just because we can’t connect to a database.
# TWILIO ACCOUNT DETAILS # Set them as environment variables when you start account_sid = os.environ['TWILIO_ACCOUNT_SID'] auth_token = os.environ['TWILIO_AUTH_TOKEN'] client = Client(account_sid, auth_token) try: # Connect to the database db_host = os.environ['DB_HOST'] db_user = os.environ['DB_USER'] db_passwd = os.environ['DB_PASSWD'] db_database = os.environ['DB_DATABASE'] db = mysql.connect( host=db_host, user=db_user, passwd=db_passwd, database=db_database) except mysql.Error as err: print("Database connection error")
The stats are all bundled up into a single function that can get called in any other function. The use of session variables means that a variable can be used in any function. So session[“section”] is created in each section to save the name of that module and then passed back to the stats function to be put in the database.
date = str(date.today()) # Defining a stats function def stats_section(): try: cursor = db.cursor() call_sid = session["call_sid"] section = session["section"] caller = session["caller"] service_number = request.values.get("To") asr = session["asr"] correct = session["correct"] london_time = datetime.now(timezone('Europe/London')) timestamp = london_time.strftime('%Y-%m-%dT%H:%M:%S') insert = ( "INSERT INTO `log`(sid,section,caller_id," "service_number,asr,is_correct,call_duration,timestamp,date) " "VALUES ('" + call_sid + "','" + section + "','" + caller + "','" + service_number + "','" + asr + "','" + correct + "',NULL,'" + timestamp + "','" + date + "');" ) cursor.execute(insert) db.commit() cursor.close() except Exception: print("Error logging stats")
This is the code needed do ASR. It uses Dialogflow to return an intent from a piece of text. It’s a contained function again and uses session variables so can be called from any other part of the script. The GOOGLE_APPLICATION_CREDENTIALS are stored in a json file.
texts = session["choice"] is the text that Twilio has transcribed and sends over to Dialogflow to process.
session["entered_text"] gets returned by Dialogflow as the text that was entered.
session["intent"] is the intent that Dialogflow chooses based on the text it received.
session["confidence"] is the confidence score Dialogflow returns. It’s not used in the service but useful to know in case we need to do something with it later.
# Defining the ASR section using Dialogflow def speech(): os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = ( 'creds.json') session_client = dialogflow.SessionsClient() project_id = "project_id" session_id = "123456123456123456123456123456123456" texts = session["choice"] language_code = "en-GB" df_session = session_client.session_path(project_id, session_id) text_input = dialogflow.types.TextInput( text=texts, language_code=language_code) query_input = dialogflow.types.QueryInput(text=text_input) response = session_client.detect_intent( session=df_session, query_input=query_input) session["entered_text"] = response.query_result.query_text session["intent"] = response.query_result.intent.display_name session["confidence"] = response.query_result.intent_detection_confidence
Play and Say
play() lets us play a public audio file.
say() is the alternative and does text-to-speech using Twilio’s own TTS library.
resp.play( "https://audiofile1" ) resp.say("Thank you for calling")
The Gather verb can be used to gather speech or DTMF. You can choose to gather one or both by defining the input. The speech model can be added if you’re expecting short commands or numbers to be spoken. You can define the number of digits you expect to gather using num_digits. Speech timeout being set to auto means Twilio will stop listening for speech as soon as it hears a break. The language is set to English GB. Hints help Twilio understand what the caller is likely to say. And finally, action is where the service should jump to once it’s finished gathering the speech or key presses.
gather.play() is the audio file you’d like to play that will prompt the caller to press keys or speak.
gather = Gather(input="speech dtmf", speech_model="numbers_and_commands", num_digits=1, speechTimeout="auto", language="en-GB", hints="yes, no, yeah, Yes, nope, nah, No", action="/gather_welcome") gather.play( "https://audiofile2" )
Once we’ve gathered our data, we end up in the /gather_welcome route. “SpeechResult” is returned by Twilio if the user said something in the gather section. We save the transcribed text as
session["choice"] and pass that variable over to the speech function we defined earlier. Dialogflow will do it’s magic and return an intent as
if "SpeechResult" in request.values: session["choice"] = request.values["SpeechResult"] speech() if session["intent"] == "yes": # Do something elif session["intent"] == "no": # Do something else else: resp.redirect("/end_section")
If Twilio recognised that DTMF was used instead of speech, “Digits” is returned in request.values.
elif "Digits" in request.values: choice = request.values["Digits"] if choice == "1": # Do something elif choice == "2": # Do something else else: resp.redirect("/end_section")
Twilio stores the caller’s number in
request.values.get("From"). We can easily check if they’re calling on a mobile and haven’t withheld their number by seeing if their number starts with +447.
caller = request.values.get("From") if caller.startswith("+447"): # Do something because we know it's a mobile else: # Handle landline or unknown numbers
SMS messages a fairly easy to send with Twilio. If we recognise they’re on a mobile, we can just use
request.values.get("From") to get their number and automatically send them an SMS. Otherwise we need to use a gather verb to get them to enter their mobile number and store that in a variable. The body is what we want our SMS to say. The from number is our own Twilio number which is stored under
twilio_number = request.values.get("To") client.messages.create( body=( "This is the body of the SMS." ), from_=twilio_number, to=session["caller"] )
We use a single transfer function that all parts of our service can reference. The only thing we need to do before entering the transfer function is set the number we’ll be dialling by using
session["transfer_number"] = "xxxxxxxxxxx". The day of the week and the current time are set in the transfer function and are used to see if we’re going to be dialling in hours or out of hours.
# Transfer @app.route("/transfer", methods=["GET", "POST"]) def transfer(): resp = VoiceResponse() caller = request.values.get("From") dial = Dial(caller_id=caller) # Set the day of the week. Have to + 1 because Monday is 0 in Python day_of_week = datetime.today().weekday() + 1 day_of_week = str(day_of_week) # Look up the local time london_time = datetime.now(timezone('Europe/London')) now = str(london_time.strftime("%H:%M:%S")) transfer_number = session["transfer_number"]
Now we’ve set the transfer number and got the current date, time and day of the week, we need to look up the opening hours in the database for that specific transfer number. If we get a result back from the database query, we know we’re in hours and so play a message to the caller and transfer. Otherwise we know we’re out of hours. The except part comes into play when trying to connect to the database doesn’t work. In that instance, we just try and transfer anyway.
# Look up opening hours try: cursor = db.cursor() opening_query = ( "SELECT * FROM `transfer_times` WHERE `number` = '" + transfer_number + "' AND `day_of_week` LIKE '%" + day_of_week + "%' AND '" + now + "' >= `open` AND '" + now + "' < `close` LIMIT 1;" ) cursor.execute(opening_query) opening_query_result = cursor.fetchall() cursor.close() # In Hours # Replace zero with +44 transfer_number = ( "".join((transfer_number[:0], "+44", transfer_number[1:])) ) if opening_query_result: resp.play( "https://audiofile4" ) dial.number(transfer_number) resp.append(dial) else: # OOH # Stats session["call_sid"] = request.values.get("CallSid") session["section"] = "OOH" session["caller"] = request.values.get("From") session["asr"] = "na" session["correct"] = "na" stats_section() resp.play( "https://audiofile5" ) except Exception: print("Opening hours lookup error") # Transfer anyway if the database is down # Replace zero with +44 transfer_number = ( "".join((transfer_number[:0], "+44", transfer_number[1:])) ) resp.play( "https://audiofile4" ) dial.number(transfer_number) resp.append(dial) resp.redirect("/end_section") return str(resp)
This shows the flow of a generic call and all the services working together to give us a whole service.
And that’s everything we need for a successful Twilio telephone service built with Python. For a full example of the code, download from here: https://drive.google.com/file/d/1PYRT_MD9e8NTNv3RZhJKuPq-9P17BOeO/view?usp=sharing