Taking care of work and personal related tasks leaves little time for users to get informed. By reading out the latest news headlines, users are updated on the most recent events. They can focus on work or personal responsibilities. Audiosume provides a satisfying feeling of being productive while getting the latest news.
how it all started
Before being a full-time UX designer, I was a part-time freelance writer. That allowed me to get insights into user behavior on news media websites and the realization that people don't read and get informed as much as they should.
Together with an iOS app developer, we started talking about a potential solution and a product using voice seemed viable.
At the same time, there was an opportunity to get funded by the Google News Initiative. We applied and got the funds needed to start working on the product.
We started with the questions that needed answers. Among others we focused on:
1) How, when and where do people consume the news?
2) What demographic traits influence the behavior?
3) Why do people consume news?
4) What are the primary devices for news consumption?
5) Why do users listen to podcasts and radio?
6) Why do people use voice assistants?
7) What kind of content do people consume on smart speakers?
8) How to establish an emotional connection between the product and the user?
9) What type of personality should the product have and how should the personality traits be communicated?
10) Who are the competitors?
Primary and secondary research has been conducted to get the answers to the questions.
While talking with users about why they consume content from podcast apps, productivity has been mentioned frequently. A phrase often used was the ability to work on something else while listening to the content. At that point, we started to think about using productivity as a selling point for Audiosume.
Besides services that compete directly with Audiosume we also focused our attention on products that are used in situations where Audiosume is used. These include commuting and working in the office.
App Store comments on podcast and news media apps helped us further to assess the reasons why users use particular apps. The information revealed the environment the users are in when using specific apps, features they find useful and what they dislike.
Media consumption is, due to the advertising industry, a popular focal point for research. A high number of research papers provided us with insights into how users consume the news and the reasons for the behavior.
Based on talks with users, analyzing competition and studying research, we used the Kano model to visualize the importance of the planned features from the users' point of view.
Due to the funding by Google, we needed to hit milestones along the way. That required us to define the areas we won't compromise and areas where we'll go with the minimum viable option.
One example of selecting the minimum viable option is the integration of voice that reads out the news. We're using text-to-speech (TTS). The voice quality is decent, and at the same time, the implementation is fast. Going with a more complex conversational UI, where users could use voice for primary commands, has been discarded early on. The complexity of the implementation, one such case is detecting when the user has finished talking (endpoint detection), requires more time and resources.
Each product has a certain kind of character, a certain kind of way the user perceives it. We needed to define the character and decide on how the character traits are going to be communicated to the user.
The perception of a product that uses voice to convey the information is different compared to software that doesn't. In that case, the question of a product character is more relevant.
Discussions about the product at the very beginning revealed that it feels like having an assistant. A person that curates the sources, and informs the user about the events. It does so quickly and to the point since the user doesn't have much time but wants to be informed. It's subtle. There when the user needs it and gone when the job is done. The users should have a feeling that the assistant does work for them.
The whole interaction shouldn't be too formal and distant. It's crucial that the assistant connects with the user on a personal level. One way Audiosume does that is by using the users' first name. Hearing your name out loud provides a satisfying feeling.
One way to establish a connection between the product and the user is to acknowledge the users' circumstances. Audiosume greets the user according to the time of day or the day of the week.
The color scheme and typography are also used to define the character of a subtle assistant that does the work and gets out of the way when done. The app isn't colorful and instead uses a limited set of colors. The selection of font styles tends to be thin and subtle instead of bold and loud.
TEXT TO SPEECH (TTS)
From early one we needed to address the limitations of TTS. The tone of the voice, pronunciation issues, and the tone of the voice made us evaluate on how to use voice to define the product character. Due to the monotone tone of the voice, subtle jokes are out of the question. That is a limitation. Making people laugh is one of the ways to establish an emotional connection.
In some cases, we needed to make grammatical adjustments. Due to an unnaturally long pause in the readout, certain situations required us to omit commas.
We tested the TTS right at the start of the project to get feedback from users. It proved to be good enough for us to continue. One of the reasons for that is also the short session time. It doesn't take more than a couple of minutes to get through the latest headlines.
The app layout allows the user to reach any content segment by swiping in the appropriate direction. The screen transitions emulate a spatial layout that makes it easy to visualize where a particular content is placed according to the current position in the app.
At first, we wanted to create a layout internally called Nodes UI. A node represents one news headline.
Interaction with nodes offers flexibility. You could share a news headline by swiping it up on the screen or tap and hold to bookmark and read the full article in the browser later.
A node could also be used to visualize the amount of new content available and if breaking news is happening. In the end, we didn't go with the Nodes UI due to the complex implementation.
One of the other alternatives included a single control on the screen. It could be used to pause, resume, share and visit the news source online. Sound waves come out under the control.
Showing the wireframes to users gave us an early insight that confirmed itself again later on. Even though it's a product relying on voice, users were still expressing the desire to be able to check out the headline and the source during the news readout. For that reason, the UI with the control wasn't an appropriate option.
The current version of Audiosume includes the headline and the source.
The iPad version follows the layout considerations of the iPhone. The difference is the text box dimensions of the headline. The width of the textbox is only 2/3 of the screen width. Together with the vertical placement of the icons the vertical flow of the iPad screen, and the device itself, is underlined with the app UI.
Early prototypes included a default voice visualization. We ditched the default visualization for three reasons. First, it takes up valuable space and clutters the UI. Second, the visualization is often used for conversational UI design and might mislead users into thinking that the interaction with the product is similar to using Siri. Third, it disturbs the vertical flow of the layout and the vertical smartphone screen.
The current voice visualization consists of blurred bubbles that appear and expand in the background. The color of the bubbles matches the current weather around the user. That way the product communicates to the user that it understands the context of where the user is.
The speech visualization works well across a variety of the devices.
One of the common issues with UI and visual design is that a website or an app uses multiple affordances for interactive elements. Various colors, font styles for labels, button sizes, and icons are used to indicate interactive UI elements.
There's a clear distinction between interactive and other UI elements in Audiosume. Interactive labels use a heavier font style as a visual affordance. The not interactive text that is indicated in combination with the voice readout uses a contrasting light typeface.
Interactive elements have three states (default, selected and tap). The tap state incorporates a 3D effect. Taping a button (for example choosing a source) elevates the background with the label. The animation is slightly delayed so that the user can follow the transition.
The selected state is in contrast with the default state
Using icons on large phones requires both hands, and that doesn't go well when you're on the move. For that reason, we integrated appropriate gestures. Pausing the news readout, editing the sources and visiting the full news story in the browser, can all be initiated with a tap, swipe down or tap and hold gestures. So we started to think on how to introduce the gestures to the user in a way that is relevant and learnable.
Apparent considerations, e.g., introducing the gestures during onboarding, were discarded. We decided to introduce the gestures after the user has performed the desired action with the icon. When the user pauses and then resumes the voice readout, the app uses text and the voice to explain how it can be done with a single tap anywhere on the screen. If the next time the user decides to use the gesture, the icon disappears from the menu. If the user continues to use the icon, it will remain, and the voice won't point to the option to use a gesture every time. The UI adapts to the users' behavior.
During testing the feature with users, we prototyped a slightly different solution. When the user selected the pause icon to stop the voice readout, the app would first explain how to use the gesture and then stop. It didn't perform well. First, it seemed to take a long time before the voice stopped and secondly it was unexpected. You expect the app to stop immediately and not to start explaining how to use an alternative gesture.
The icons are only displayed on the main screen. Visiting the source list or settings screens will not show the icons. On first visit, we included visual and audio prompts on how to get back to the main screen. That way the users already use some of the gestures available.
When the user starts using gestures instead of icons, the icons disappear from the UI. What remains is a pure interface with no clutter.
Some of the findings from testing
At first, Audiosume read out the initial text on the first onboarding screen that asks of the name of the user. After the question was asked, some of the test participants answered right back only to find out they had to enter their name with a keyboard. It was a point of frustration as users wondered why they couldn't just say their names. The keyboard has been perceived as an annoying backup feature. That's why on the first app launch, there's just text and no voice used for asking the user about the name.
In the current version of Audiosume, the step to select sources includes categories at the top. Users were overwhelmed with the selection of sources that are often not familiar.
The learn UI approach has been successful. Users learned to use the gestures instead of the icons without any significant issues. One area we need to improve is to visualize the disappearance of the icons better once the gestures are used. The current iteration confuses the users, but we were aware of the issue even before we tested it. An update is planned.
Some of the users that had the iOS 12 beta installed, discovered issues with the text-to-speech feature. Instead of the usual voice used on iOS 11, a generic computer voice appeared instead.
Two colors are used throughout the product. It's not colorful or loud. The color scheme reinforces the subtle nature of the product. Dark blue is used to symbolize knowledge.
There's a contrast between the two font styles used in the app. The light and large font size is not interactive and relates only to the voice.
SF Pro Medium is used for interactive elements. The medium style is just enough to separate it from the other font style and at the same time not too heavy or loud to fit the subtle character of the product.
Users have the option to use a dark UI alternative. It consists of pure black and white colors. Black has been selected to take advantage of the new generation OLED screens on iPhones. Using the app in a dark environment blends the app content and the device into a seemingly singular entity.
In the dark UI, the interactive elements use a light overlay to indicate the touch areas better. That is also important for the main screen when speech bubbles are used to visualize the readout.