🔈Live from the Internet, Here Comes Audio Social Media

Social media has just signed up for audio chat and we'd love for you to walk them in.

“We're in a situation where as a society we're failing because we don't have the right level of social contact with each other. How do we create a good public square?” -Philip Rosedale, founder of Second Life. 

All the world is trying to join Clubhouse, the casual audio chat app that is still in beta and has been downloaded nearly 4 million times in the last month. The app’s allure is the ability to hear, listen in on, and partake in conversations by influential people in real time with a range of emotions, intonation, and delivery. As a cultural phenomenon, Clubhouse functions on social hierarchies of influence and access: it’s an exclusive, members-only opportunity where users have to be invited to join the platform. It operates as a broadcasting tool to highlight curated topics and speakers to as many as 5,000 participants in a room. It is a premise that Twitter and Facebook are also exploring. 

At New_ Public, we look at the potentiality of new tech products to avail human connection and create greater opportunities for humanization on social media, especially in light of fake news, the emergence of Parler, and the Great Deplatforming. Recently, Clubhouse has come under scrutiny for spreading misinformation, for unmoderated harassment, and for underdeveloped privacy protections. 

 Image description: This illustration of six forms of vocal amplification and telephony include a megaphone, a smart phone, a rotary phone, a cellular phone, and a wall cabinet phone against a blue background.
There is a full spectrum of audio apps, from one-on-one conversations, group chats, and conference apps (Clubhouse) to more experiential opportunities to bump into strangers and strike up conversations.

But voice-powered social media feels good, especially when we are hungry for human interaction, in the midst of an ongoing pandemic and as 70% of the US is covered in snow. Opening up a new sensory experience, or simply speaking with new people who don’t live in our house, feels like a natural evolution of our digital social lives. Hearing strangers’ voices in conversation, filling up the home office, reminds us of the publicness of our pre-covid lives in bars, cafes, and parks. The use of the human voice is a powerful tool for communication. In our Signals research in promoting thoughtful conversation, we reviewed the work of researchers who found that platforms that fostered higher quality dialogue favored voice-based software versus text-based software to translate the benefits of face-to-face communication to an online space. 

It is important to understand the cognitive dynamics of reading typed text versus listening to a human voice as they relate to creating more humanizing experiences in social media, with less hatred and more acceptance of differences. In 2017, Juliana Schroeder, a professor at Haas School of Business at the University of California, Berkeley co-authored research titled, “The Humanizing Voice: Speech Reveals, and Text Conceals, a More Thoughtful Mind in the Midst of Disagreement.” 

In the paper, Schroeder and her colleagues explore voice, and how hearing the words that a person expresses compared to reading them, changes the way that we interpret those words. Schroeder says, “You recognize that there's this feeling, thinking person behind the words, because you hear those words imbued with their thoughts and feelings in those moments. It happens at the implicit, visceral level and that allows you to humanize the communicator in a way that just reading those words does not.” 

Schroeder is also the co-director of the Psychology of Technology Institute at UCB where she often asks other psychologists and technologists, “What is technology doing to our daily conversations? And what can we be doing to use technology to actually improve our conversations?”

We know from her research that the voice buffers against potential dehumanization. Schroeder says there is a lot that technologists could be doing to create algorithms of feedback around the emotionality of language. There could be a “nudge” functionality in text-based platforms that asks “Are you sure you want to have that conversation?” The algorithm would recognize when it might be time to switch over to a more synchronous medium. She says, “We think it is possible to leverage different tools to affect the way people make judgments and act on these platforms.”

Today, from a user perspective, there is a full spectrum of audio apps that can be experienced depending on your social needs or personal inclinations, from one-on-one conversations to group chats and conference apps (Clubhouse), to more experiential opportunities to bump into strangers and strike up conversations. Clubhouse only allows for a limited number of experts to talk to each other a la The New Yorker festival forum. There are many apps that are engaged with making human connection more intimate, more fluid, and perhaps more awkward or humanly flawed, by upholding the virtues of human connection. Dial Up, a voice-chat app, connects you to strangers for bursts of peer-to-peer conversations. High Fidelity uses the ‘cocktail party’ effect to separate people’s voices in a space so that it sounds like you’re in a real room, where chit-chat can take place in corners alongside larger group talk.

In High Fidelity, the engineers have considered the comprehension of multiple speakers in the same space through spatial audio. The software’s API allows for others to translate that technology onto other platforms and experiences. Philip Rosedale, the co-founder of High Fidelity and the founder of Second Life, explains that as they continue to develop and evolve spatial audio experiences, High Fidelity is invested in exploring how to create more digital public spaces. He says, “To create a real public space, if there's going to be more than two people in communion in that space, you have to do this spatialization because otherwise, the users retreat to politeness or aggression.”

In other words, to create dynamic digital meeting spaces that can hold real tension and dynamic differences, more attention has to be paid to latency in the technology, or how a voice travels from a speaker’s mouth to the listeners ears. Rosedale reminisces on phone conversations before cell phones. Because the line delay was 16 milliseconds, it was one of the most intimate forms of communication. With the advent of mobile phones, the delay we now experience is multiplied by about a factor of five to seven.

We are only now building on the learnings from research that suggests that listeners glean emotions better from voice-only communication. We are only now yearning for more connectivity in platforms because our physical lives have been orbited onto digital realms out of necessity. What we know now is that the interest and market for real human voices, in all of their complexity, is influencing the potentiality of our personal technology. We hope that moderation, accessibility, and inclusivity get built into those frameworks—Twitter’s Audio Space will include conversation transcripts that are a nod to accessibility and ethical credibility at once. And we hope that Android and iPhone versions are developed equally so that more people have access to such rich conversations. 
-Marina Garcia-Vasquez

Have one on us!

It’s been a week of news, from Australia vs Facebook to the power outages in Texas. During a regular work week, we might have walked to a local bar after work to exchange heated debates on the future of journalism or on the possible effects of climate change around the world. But here we are at home, clicking all the buttons on IMissMyBar.com, an auditory modern digital artifact in homage to the bar atmosphere of Maverick in Monterrey, Mexico, designed by Lagon and Tandom. Hear the rattle of shakers and the clink of ice as bartenders prep rounds of cocktails in a crowded bar. We promise it will unleash a wave of nostalgia immediately.

More on the exploration of audio:

Contemplating the virtues of a podium vs a water cooler,

The New_ Public team

Illustrations by Josh Kramer.

Civic Signals is a partnership between the Center for Media Engagement at the University of Texas, Austin, and the National Conference on Citizenship, and was incubated by New America.