Branding Bots Part 1: Voice & Tone for Conversational Apps

Lauren Golembiewski

This post is part one of my series on branding bots. Throughout the series, I explore the most important conversational branding elements and how conversation designers and teams can apply those elements to their voice and chat apps.

Conversational Branding Basics

Conversational branding consists of all the elements that define how a brand should be expressed in voice and chat apps. Conversational branding builds from and interconnects with established, traditional brand elements. 

Great brands connect with us emotionally and elicit feelings. They can make us laugh, think, take action, and so on. Brands are the set of qualities, attributes, values, benefits, clientele, and personality we associate with companies (and now individuals with the rise of the “personal brand”). Almost everyone can think of a brand they love or hate and intrinsically understand the emotional connection based on the traits they associate with it. 

With the massive adoption of conversational technology by companies and consumers alike, branding bots must be a priority for teams building voice and chat experiences. Like any other product, a bot's branding should be clear, consistent, and align with any established brand guidelines. Creating a strong conversational brand is even more important if the bot is the company’s primary interface.

Branding happens whether it’s strategically planned or not; branding is reinforced every time and in every way a company communicates with its audience. A careful branding process backed by solid research and thoughtfully crafted brand elements produces a return for companies that’s far greater than the sum of its parts. Designing a bot is a great opportunity for you to establish, evaluate, or extend a brand system.

What are Voice & Tone?

Whether working with a well-established brand or building a brand from the ground up, you have a unique set of brand elements to consider when branding voice and chat apps.

Voice and tone are the cornerstones of a conversational brand. Since bots interact with users through voice and chat interaction, you must carefully consider the language the bot uses, how the bot’s voice evokes the defined brand qualities, and how the voice modulates its tone to adapt to the situation.

Voice

Voice is the quality of the words the bot uses and how those words reflect the brand. Clearly defined company mission, vision, and values provide a strong foundation for the voice.

Perform conversational branding exercises to explore the brand’s voice and document descriptions and examples to define it. It’s important for companies that have multiple bots to consider how voice will be applied in each bot to ensure consistency, appropriateness, and alignment with the brand. If the brand has an established voice, extend and apply it to the bot. 

Tone

Tone is the way the voice changes in certain situations. The bot’s tone is an extension of its voice and shifts depending on the state of the user, conversation, and system. Tone helps conversations between bots and users seem more natural. 

There are several factors that affect how the tone of voice should change during the bot’s conversational interactions. Some of the factors to consider when determining when and how tone modulates throughout a conversation are the user’s emotional state, the conversation state, the system’s state, and delivery method (the conversational channel through which the interaction is being delivered).

Clearleft’s language refresh is a great example of (re)building voice and tone. Mailchimp also has a fantastic content style guide that emphasizes voice and tone. Both brands provide clear explanations and solid examples to document their voice strategy. 

Voice & Tone Considerations

User’s Emotional State

The user’s emotional state is an important cue to determine the appropriate tone for the bot at any given time in the conversation. If the user is confused or upset, the bot should speak differently than if the user is happy or content. Understanding the user’s emotional state requires research and data analysis. Performing user research will help you understand how users’ emotional states shift as they interact with the bot and perform tasks. Once key emotional states are identified, document how the bot’s tone changes to address them.

Advancements in natural language understanding (NLU) allows for the sentiment of users’ language to be programmatically analyzed. Sentiment analysis returns a simple positivity score that provides some insight into the user’s emotional state during a conversational interaction. Improvements to this technology provide more nuanced insight into the emotional quality of users’ language. Tools like IBM Watson’s Tone Analyzer and platforms like Affectiva and Behavioral Signals can be used to understand the tone and language style of both the bot and its users. 

The Sound on Sound Music Festival bot Voxable designed employs the playful renaissance voice of the brand as it provides options to help users get information. 

Conversation State 

For a conversational interaction to seem natural between the bot and the user, there must be a shared understanding throughout a conversation and this understanding should affect the language the bot uses. Confirm a shared understanding between the bot and the user by having the bot: 

  • Reiterate the user’s request in its reply.
  • Signal an understanding of what the user said.
  • Request more information to complete a task.
A user response like “Home” could mean several things in a voice interaction—for instance, the user might want to visit a home menu or take some action related to their home. Because the bot is aware of the conversational context in this example, it knows “Home” refers to a list of locations managed by the user. 

System State

As the user interacts with the bot, the state of the system—the underlying data or the API— changes. Communicating system state changes to the user is important. Modulate the bot's tone to elicit a positive emotional reaction from the user.

  • Error - Be clear about who is responsible for an error and how the user or the bot can resolve it. Make sure the tone is appropriate, especially if it’s the system causing the failure.
  • Success - Note a successful end state to inform the user they’ve reached the end of a task or accomplished a goal. 
  • Processing - Indicate when a process is being carried out by the bot and give the user an estimate for how long it should take. 
Flash messages and typing indicators can signal when the bot has understood and is processing the user’s request in a chat app. 

Delivery Method

Delivery method should also affect the language the bot uses. Consider the conversational channel and whether the message is being delivered to the user through a chat platform like Facebook Messenger or a voice-based device like Amazon Alexa or Google Assistant. 

  • Text-based conversations - Chat platforms provide visual feedback, asynchronous communication, and a record of previous interactions. Bots can deliver UI affordances that increase understanding and usability.
  • Voice-based conversations - Traditional voice platforms provide little to no visual elements to increase user understanding so optimizing language for voice is essential. 
  • Multimodal conversations - Platforms that allow multiple modes of interaction including text and voice. Smart displays and smart phones are an example of multimodal platforms that support voice interaction along with a visual display.

Voice & Tone Elements

There are other elements beyond language and word choice that affect the bot’s voice and tone such as emojis, GIFs, and sound effects. Consider the conversational channel (e.g. Facebook Messenger, Alexa, Google Assistant) where the bot will interact with users and which channel elements are appropriate to express the conversational brand. 

Chat Apps

Chat apps have affordances in addition to plain text to help convey voice and tone throughout a conversational interaction. Each conversational channel supports specific elements, but these are some of the most popular:

  • Emoji - A visual representation of an emotion, object, or symbol built into almost every text interface that can be used to replace or emphasize key terms and concepts
  • Image - A graphic displaying visual information (e.g. png or jpg)
  • Video and animation - Short moving visual images (e.g. video clips and gif)
  • Audio clip - Sound content (e.g. wav and mp3)
Google Assistant provides several affordances for designers crafting bot messages.

Voice-First Apps

Because voice is the primary mode of interaction in voice-first apps, they support many elements to convey voice and tone such as:

  • ‍Voice selection - The type of voice the bot uses (i.e. synthesized or recorded by voice actor)
  • Speech Synthesis Markup Language (SSML) - A standard way to control how a synthesized voice speaks by controlling aspects like speed, pitch, and pauses
  • Sonic branding - Audio assets like sonic logos, sound effects, and music that help ground users in the brand expression

Multimodal Apps

Multimodal apps employ visual, aural, and gestural elements to convey voice and tone. Examples of multimodal apps include: 

  • Companion app - A mobile app that many voice platforms offer to support the voice apps built on their platform (e.g. Amazon Alexa’s mobile app enables the delivery of images and text to users via cards when they interact with an Alexa skill) 
  • Smart display - An extension of a smart speaker that provides a visual interface for user interaction (e.g. Amazon Alexa, Google Home Hub)
  • Native smartphones app - A mobile app that can be completely customized or integrated with smart phone assistants like Siri or Google Assistant
  • Website app - A multimodal interface on a website made possible by the rise of voice on the web 
  • Wearables and IoT devices - Personal devices like watches that contain a multitude of different ways to gather and relay information back to users

Crafting a conversational experience that effectively uses voice and tone elements takes experimentation and iteration. It’s easier to create a framework for how the bot will employ these conversational branding elements once you have a feeling for how they all work together and affect the overall user experience.


Continue on to Branding Bots Part 2: Persona for Conversational Apps to explore another key element of crafting a conversational brand—the brand persona.

Check out the entire Branding Bots series: