Audio-as-Data Tools: Replicating Computational Data Processing

Josephine Lukito; Jason Greenfield; Yunkang Yang; Ross Dahlke; Megan A. Brown; Rebecca Lewis; Bin Chen

doi:10.17645/mac.7851

Article | Open Access

Audio-as-Data Tools: Replicating Computational Data Processing

Josephine Lukito School of Journalism and Media, University of Texas at Austin, USA
Jason Greenfield Center for Social Media and Politics, New York University, USA
Yunkang Yang Department of Communication & Journalism, Texas A&M University, USA
Ross Dahlke Department of Communication, Stanford University, USA
Megan A. Brown School of Information, University of Michigan, USA
Rebecca Lewis Department of Communication, Stanford University, USA
Bin Chen School of Journalism and Media, University of Texas at Austin, USA / Journalism and Media Studies Centre, University of Hong Kong

Full Text PDF (free download)

Views:

2242

Downloads:

1093

Abstract: The rise of audio-as-data in social science research accentuates a fundamental challenge: establishing reproducible and reliable methodologies to guide this emerging area of study. In this study, we focus on the reproducibility of audio-as-data preparation methods in computational communication research and evaluate the accuracy of popular audio-as-data tools. We analyze automated transcription and computational phonology tools applied to 200 episodes of conservative talk shows hosted by Rush Limbaugh and Alex Jones. Our findings reveal that the tools we tested are highly accurate. However, despite different transcription and audio signal processing tools yield similar results, subtle yet significant variations could impact the findings’ reproducibility. Specifically, we find that discrepancies in automated transcriptions and auditory features such as pitch and intensity underscore the need for meticulous reproduction of data preparation procedures. These insights into the variability introduced by different tools stress the importance of detailed methodological reporting and consistent processing techniques to ensure the replicability of research outcomes. Our study contributes to the broader discourse on replicability and reproducibility by highlighting the nuances of audio data preparation and advocating for more transparent and standardized practices in this area.

Keywords: audio-as-data; computational methods; conservative talk shows; data processing; reproduction; talk radio

Supplementary Files:

Supplementary File

Published: 6 May 2024

DOI: https://doi.org/10.17645/mac.7851

Issue: Vol 12 (2024): Reproducibility and Replicability in Communication Research

© Josephine Lukito, Jason Greenfield, Yunkang Yang, Ross Dahlke, Megan A. Brown, Rebecca Lewis, Bin Chen. This is an open access article distributed under the terms of the Creative Commons Attribution 4.0 license (http://creativecommons.org/licenses/by/4.0), which permits any use, distribution, and reproduction of the work without further permission provided the original author(s) and source are credited.

Media and Communication

Open Access Journal

ISSN: 2183-2439

Audio-as-Data Tools: Replicating Computational Data Processing