Designing for Voice User Interfaces

Designing for Voice User Interface

Since the inception of computing, various kind of interfaces emerged. Most important of them was the Graphical User Interface. The most succcessful implementation of GUI was done by Xerox PARC. Since then, computing has evolved a lot. We can now use train systems to predict the outcomes of events (also known as Machine Learning).

The next interface for human-computer interaction is going to be Voice. Why? Simply because algorithms like natural language processing, voice recognition have gone so strong that one can actually have their own assistants which answers their queries as naturally as an actual human assistant would.

From standalone home devices like Amazon Alexa to Microsoft Cortana on Windows and of course, the popular Apple Siri on MacOS and iOS, voice assistants are now mainstream for consumer usage.

A voice interface is capable of using natural language processing to understand the user as they speak. Once a command is accurately registered, the system either obeys the command to perform a certain action (like making a call or sending an email) or answers a question.

A simple voice command might not be sufficient in many cases, and it can involve a series of voice-based commands to get the desired information. Voice UI is making it to security peripherals, smart cars, and even home appliances.

Unlike command line interfaces or graphical user interfaces, users do not need to physically interact with a device to process various functions and hence making voice one of the most convenient types of UI available.

Types of UI

User Interface is built to ease the human-computer interaction. It is built on both hardware and software components responsible for handling interactions between a user and the system. The various types of user interfaces are classified based on how users interact with the system.

Some of the most used interfaces used are below.

Command Line Interface

Command line interface (CLI) is the oldest type of UI. Users need to type in commands that a computer responds to. CLIs are very light in terms of computing power and hence they are best used in server-side systems which need to make sure that enough RAM and CPU is available to cater to the network traffic. However, the biggest drawback of a CLI is that the users need to learn a number of commands to be able to communicate with the computer.

Graphical User Interface

Graphical User Interface or GUI is one of the most commonly used types of UI. As we noted earlier, GUI is the most used form of UI. It involves the use of pointing devices or touchscreens to create a channel of communication between a user and the computer. Although GUIs have been wildly successful, they still face issues with Accessibility. Most importantly, it requires physical interations with the system which may not be possible with some users.

Voice User Interface

A natural language interface or voice UI takes advantage of the human voice for processing commands. It is a spoken interface that is capable of understanding natural language, and in many cases, the computer responds using voice as well. The technology is commonly seen in smart assistants like Alexa, Siri, Cortana, and Google Assistant. Obviously, there are certain limitations of these assistants like language support, always-on connectivity but these are being addressed as we speak.

How to Design a Voice UI

Voice UIs gained popularity with the introduction of Amazon Alexa and Google Assistant. There has been a tremendous development in Alexa skills ranging from home automation to shopping. Google assistant is now available on millions of Android devices and people are using it to make calls, send SMSs, play music and what not.

Both of these platforms have introduced a great opportunity for both developers and businesses to engage their audiences in a unique and creative way.

Since it’s a new paradigm of interaction, one must be careful in building Voice based UIs as they interact with the end user in a non-traditional way.

Below we discuss what steps you should take to design your next Voice User Interface

Defining a VUI’s Capabilities

When it comes to voice technology, Amazon Alexa is the number one in the industry. It is one of the best examples to identify what a good voice UI is capable of. Translation, retrieving news, finding search engine results, changing the music or even ordering pizza – there is a lot you can do with a voice UI.

However, it completely depends on the purpose of the VUI. If you are building a banking app, then it is important to accommodate voice commands for performing transactions (if the bank allows it), checking account balance, getting in touch with customer support and more. Similarly, VUIs for restaurants and online food delivery apps should be capable of placing orders, registering complaints and tracking orders.

Ultimately, you are looking to convert the existing Graphical User Interface to Voice based one. Which means you need to count for the limitations too. Some interactions might not be possible only with the user’s voice.

A good example is Authentication when integrating external services. Both Alexa and Google Assistant offers to authentication using OAuth2 protocol. Hence the user would ultimately need to sign int manually into the application to use it.

Customer Journey Mapping

Once you identify the necessary functions of your voice UI, you then need to understand your customer’s journey i.e., how they’re going to navigate using their voice.

Customers normally engage with a voice assistant to express their requirements and a good voice assistant should be capable of guiding a customer through the process. If there are certain queries posed by customers very often, programmers can implement automated answers that are displayed.

For example, some business users might want to know about their outstanding invoices using an Alexa Skill. To do so, the Skill need access to their ERP/Accounting systems. As explained earlier, OAuth protocol needs to be followed to integrate external systems using REST APIs. Hence, the user would log in using Alexa’s mobile app into their system manually. Once granted access, the Alexa skill will be able to fetch data.

Here’s how the journey may look like-

  1. Login to the Accounting system to get OAuth Token using Alexa’s app.
  2. User asks- “How many invoices are outstanding?”
  3. Alexa- “For which company?”
  4. User- “Northwind Corporation”
  5. Alexa- “There are 10 outstanding invoices” (After calling the Accounting system’s APIs)

This is just an example. There are multiple ways to create a customer’s journey.

Competition Analysis

If your competitors are already using voice UI to help their customers, you should study how the voice interactions are implemented. Identifying the use case of an app, the voice commands that are used and the overall feedback from the customers can be helpful towards building your own voice UI.

Building on our previous example, you may look at what kind of questions your competition is capable of answering. Are they able to answer analytical questions (“Who is my most loyal customer?”). Are they able to ask follow up questions to the user? Do they have better performance (short response time)?

Competition analysis also make you understand which features are trending and which ones can make bring you more users.

Language Nuances and Priority

Users may often pose multiple requests at the same time, and it is important to program prioritization in a voice UI. It is also important for the voice UI to identify which requests need to be addressed.

The same commands can also be spoken in various ways, and things like accents also need to be accounted for. A good voice UI should be capable of understanding what a customer is trying to express with minimal effort.

Once again building on our previous example there might be multiple ways to ask to for outstanding invoices. All of these ways should be incorporated in building a Voice based UI.


VUIs require prototyping using flowcharts to identify various interactions. It helps developers add or remove specific commands from the pool of possible interactions.

Dialog flows can graphically represent the conversations between a user and a voice assistant. Once a prototype VUI is ready, it needs to go through rigorous user testing to identify flaws and address them.

Prototyping can be done easily in respective developer consoles available for both Alexa and Google Assistant skills.

In our example, one may need to build a backend server to test API integration along with an OAuth gateway to test authorization flow. Needless to say, one should always follow both negative and positive test cases when building prototypes.


With the market for voice-enabled apps and devices on the rise, it is important for businesses to implement voice UI that offers value to customers. VUIs require a lot more research and programming than traditional GUIs but are much more to the modern day audience.

However, with the right analysis, journey mapping and prototyping, it’s very easy to build voice-based interfaces. Both Google and Amazon are giving lots of resources to developers to build voice-based apps on their respective ecosystems.

We, at CitrusLeaf, are also involved in building Voice interfaces for business apps. We can integrate your existing ERP systems to your personal voice assistant like Alexa or Google Assistant.

Looking to level up your business with voice-based apps? Look no further. We’re here to help you. Reach out to us at