Automatic Speech Recognition, Word Error Rate, and Candidate Characteristics

Faculty Sponsors: Erika Franklin Fowler and Markus Neumann

Sam Feuer is standing and smiling in front of a house. He is wearing blue overalls, glasses, and a blue and orange striped shirt.

Samuel Feuer

Sam is a senior at Wesleyan studying Mathematics and Computer Science. His academic interests include network science, text analysis, and applications of computational techniques to social science, and he has done research as a part of the Wesleyan Media Project’s Delta Lab for over three years. He also is an avid singer, pianist, and music director. After graduating in December, he hopes to attend graduate school for computational social science, applied mathematics, or data science.

Abstract: Automatic speech recognition (ASR) models are becoming more popular among political scientists, as they allow researchers to examine large quantities of audio data by converting recordings into text. Models have gained accuracy as methods have matured, but they are still hindered by certain audio features, including background music, poor quality, uncommon words, and certain accents. Previous works have validated these models for usage in political science contexts, but did not consider potential correlation between transcription error and candidate or ad-level data like gender, party, or spend. We find that ad spend, whether a non-candidate person speaks, and party all correlate with transcription error, but these errors did not seem to have noticeable effects on structural topic modeling, a popular downstream text application.

summer23_poster_sfeuer