Taking care of work and personal related tasks leaves little time for users to get informed. By reading out the latest news headlines, users are updated on the most recent events. They can focus on work or personal responsibilities. Audiosume provides a satisfying feeling of being productive while getting the latest news.

Besides covering the complete design process, my responsibilities included building a design and development process to match the requirements of a fast-paced environment.

Research

We started with the questions that needed answers. Among others we focused on:

1) How, when and where do people consume the news?
2) What demographic and behavioral traits influence the probability of someone listening to the news?
3) Why do people consume news?
4) What are the primary devices for news consumption?
5) Why do users listen to podcasts and radio?
6) Why do people use voice assistants?
7) What kind of content do people consume on smart speakers?
8) How to establish an emotional connection between the product and the user?
9) What type of personality should the product have and how should the personality traits be communicated?
10) Who are the competitors?

Primary and secondary research has been conducted to get the answers to the questions.

While talking with users about why they consume content from podcast apps, productivity has been mentioned frequently. A phrase often used was the ability to work on something else while listening to the content. At that point, we started to think about using productivity as a selling point for Audiosume.

Besides services that compete directly with Audiosume we also focused our attention on products that are used in situations where Audiosume is used. These include commuting and working in the office.

audiosume - research - app store comments@2x.png

App Store comments on podcast and news media apps helped us further to assess the reasons why users use particular apps. The information revealed the environment the users are in when using specific apps, features they find useful and what they dislike.

audiosume - research - secondary research.png

Media consumption is, due to the advertising industry, a popular focal point for research. A high number of research papers provided us with insights into how users consume the news and the reasons for the behavior.

Based on talks with users, analyzing competition and studying research, we used the Kano model to visualize the importance of the planned features from the users' point of view.

Project Development

Due to the funding by Google, we needed to hit milestones along the way. That required us to define the areas we won't compromise and areas where we'll go with the minimum viable option.

One example of selecting the minimum viable option is the integration of voice that reads out the news. We're using text-to-speech (TTS). The voice quality is decent, and at the same time, the implementation is fast. Going with a more complex conversational UI, where users could use voice for primary commands, has been discarded early on. The complexity of the implementation, one such case is detecting when the user has finished talking (endpoint detection), requires more time and resources.

Product Character

Each product has a certain kind of character, a certain kind of way the person perceives it. We needed to define the character and decide on how the character traits are going to be communicated to the person.

The perception of a product that uses voice to convey the information is different compared to software that doesn't. In that case, the question of a product character is more relevant.

audiosume - product character - as an assistant.png

Discussions about the product at the very beginning revealed that it feels like having an assistant. Someone that curates the sources and informs the person about the events. It does so quickly and to the point. It's subtle. There when the user needs it and gone when the job is done. People should have a feeling that the assistant does work for them.

audiosume - product character - not to formal.png

The whole interaction shouldn't be too formal and distant. The assistant must connect with the person on a personal level. One way Audiosume does that is by using the users' first names. Hearing your name out loud provides a satisfying feeling.

audiosume - product character - circumstances.png

One way to establish a connection between the product and the user is to acknowledge the users' circumstances. Audiosume greets the user according to the time of day or the day of the week.

audiosume - product character - visual design.png

The color scheme and typography are also used to define the character of a subtle assistant that does the work and gets out of the way when done. The app isn't colorful and instead uses a limited set of colors. The selection of font styles tends to be thin and subtle instead of bold and loud.

Text to Speech (TTS)

From early on, we needed to address the limitations of TTS. Pronunciation issues and the tone of the voice made us evaluate how to use voice to define the product character. Due to the monotone tone of the voice, subtle jokes are out of the question. That is a limitation. Making people laugh is one of the ways to establish an emotional connection.

In some cases, we needed to make grammatical adjustments. Due to an unnaturally long pause in the readout, certain situations required us to omit commas.

We tested the TTS right at the start of the project to get feedback from users. It proved to be good enough for us to continue. One of the reasons for that is also the short session time. It doesn't take more than a couple of minutes to get through the latest headlines.

Information Architecture

The app layout allows the user to reach any content segment by swiping in the appropriate direction. The screen transitions emulate a spatial layout that makes it easy to visualize where a particular content is placed according to the current position in the app.

Layout Considerations

At first, we wanted to create a layout internally called Nodes UI. A node represents one news headline.

audiosume - layout considerations - nodes UI flexibility.png

Interaction with nodes offers flexibility. You could share a news headline by swiping it up on the screen or tap and hold to bookmark and read the full article in the browser later.

audiosume - layout considerations - nodes visualizing news.png

A node could also be used to visualize the amount of new content available and if breaking news is happening. Unfortunately, we didn't go with the Nodes UI due to the complex implementation. Even though people provided positive feedback when playing with the wireframe prototype.

audiosume - layout considerations - single control.png

One of the other alternatives included a single control on the screen. It could be used to pause, resume, share and visit the news source online. Sound waves come out under the control.

Showing the wireframes to users gave us an early insight that confirmed itself again later on. Even though it's a product relying on voice, users were still expressing the desire to be able to check out the headline and the source during the news readout. For that reason, the UI with the control wasn't an appropriate option.

audiosume - layout considerations - headline and source.png

The current version of Audiosume includes the headline and the source.

audiosume - layout considerations - ipad.png

The iPad version follows the layout considerations of the iPhone. The difference is the text box dimensions of the headline. The width of the textbox is only 2/3 of the screen width. Together with the vertical placement of the icons the vertical flow of the iPad screen, and the device itself, is underlined with the app UI.

Voice Visualization

audiosume - speech visualization - early prototype.png

Early prototypes included a default voice visualization. We ditched the default visualization for three reasons. First, it takes up valuable space and clutters the UI. Second, the visualization is often used for conversational UI design and might mislead users into thinking that the interaction with the product is similar to using Siri. Third, it disturbs the vertical flow of the layout and the vertical smartphone screen.

audiosume - speech visualization - speech bubbles.png

The current voice visualization consists of blurred bubbles that appear and expand in the background. The color of the bubbles matches the current weather around the user. That way the product communicates to the user that it understands the context of where the user is.

audiosume - voice visualization - across devices.png

The speech visualization works well across a variety of the devices.

Interaction Design

One of the common issues with UI and visual design is that a website or an app uses multiple affordances for interactive elements. Various colors, font styles for labels, button sizes, and icons are used to indicate interactive UI elements.

There's a clear distinction between interactive and other UI elements in Audiosume. Interactive labels use a heavier font style as a visual affordance. The not interactive text that is indicated in combination with the voice readout uses a contrasting light typeface.

audiosume - interaction design - button states.png

Interactive elements have three states (default, selected and tap). The tap state incorporates a depth effect. Taping a button (for example choosing a source) elevates the background with the label. The animation is slightly delayed so that the user can follow the transition.

The selected state is in contrast with the default state

Learn UI

audiosume - visual design - color scheme – 2@2x.png

Using icons on large phones requires both hands, and that doesn't go well when you're on the move. For that reason, we integrated appropriate gestures. Pausing the news readout, editing the sources and visiting the full news story in the browser, can all be initiated with a tap, swipe down or tap and hold gestures. So we started to think on how to introduce the gestures to the user in a way that is relevant and learnable.

Apparent considerations, e.g., introducing the gestures during onboarding, were discarded. We decided to introduce the gestures after the user has performed the desired action with the icon. When the user pauses and then resumes the voice readout, the app uses text and the voice to explain how it can be done with a single tap anywhere on the screen. If the next time the user decides to use the gesture, the icon disappears from the menu. If the user continues to use the icon, it will remain, and the voice won't point to the option to use a gesture every time. The UI adapts to the users' behavior.

During testing the feature with users, we prototyped a slightly different solution. When the user selected the pause icon to stop the voice readout, the app would first explain how to use the gesture and then stop. It didn't perform well. First, it seemed to take a long time before the voice stopped and secondly it was unexpected. You expect the app to stop immediately and not to start explaining how to use an alternative gesture.

The icons are only displayed on the main screen. Visiting the source list or settings screens will not show the icons. On first visit, we included visual and audio prompts on how to get back to the main screen. That way the users already use some of the gestures available.

When the user starts using gestures instead of icons, the icons disappear from the UI. What remains is a pure interface with no clutter.

Some of the Findings From Testing

audiosume - usability testing - keyboard.png

At first, Audiosume read out the initial text on the first onboarding screen that asks of the name of the user. After the question was asked, some of the test participants answered right back only to find out they had to enter their name with a keyboard. It was a point of frustration as users wondered why they couldn't just say their names. The keyboard has been perceived as an annoying backup feature. That's why on the first app launch, there's just text and no voice used for asking the user about the name.

audiosume - usability testing - categories.png

In the current version of Audiosume, the step to select sources includes categories at the top. Users were overwhelmed with the selection of sources that are often not familiar.

audiosume - usability testing - icons disappearance.png

The learn UI approach has been successful. Users learned to use the gestures instead of the icons without any significant issues. One area we need to improve is to visualize the disappearance of the icons better once the gestures are used. The current iteration confuses the users, but we were aware of the issue even before we tested it. An update is planned.

Visual Design

Two colors are used throughout the product. It's not colorful or loud. The color scheme reinforces the subtle nature of the product. Dark blue is used to symbolize knowledge.

audiosume - visual design - font styles@2x.png

There's a contrast between the two font styles used in the app. The light and large font size is not interactive and relates only to the voice.

SF Pro Medium is used for interactive elements. The medium style is just enough to separate it from the other font style and at the same time not too heavy or loud to fit the subtle character of the product.

Users have the option to use a dark UI alternative. Using the app in a dark environment blends the app content and the device into a seemingly singular entity.

audiosume - visual design - button highlights.png

In the dark UI, the interactive elements use a light overlay to indicate the touch areas better. That is also important for the main screen when speech bubbles are used to visualize the readout.