Open Access Journal

ISSN: 2183-2439

Article | Open Access

Ideology and Policy Preferences in Synthetic Data: The Potential of LLMs for Public Opinion Analysis

Full Text   PDF (free download)
Views: 41 | Downloads: 16


Abstract:  This study investigates whether large language models (LLMs) can meaningfully extend or generate synthetic public opinion survey data on labor policy issues in South Korea. Unlike prior work conducted on people’s general sociocultural values or specific political topics such as voting intentions, our research examines policy preferences on tangible social and economic topics, offering deeper insights for news media and data analysts. In two key applications, we first explore whether LLMs can predict public sentiment on emerging or rapidly evolving issues using existing survey data. We then assess how LLMs generate synthetic datasets resembling real-world survey distributions. Our findings reveal that while LLMs capture demographic and ideological traits with reasonable accuracy, they tend to overemphasize ideological orientation for politically charged topics—a bias that is more pronounced in fully synthetic data, raising concerns about perpetuating societal stereotypes. Despite these challenges, LLMs hold promise for enhancing data-driven journalism and policy research, particularly in polarized societies. We call for further study into how LLM-based predictions align with human responses in diverse sociopolitical settings, alongside improved tools and guidelines to mitigate embedded biases.

Keywords:  AI-generated text; ChatGPT; large language models; news media; policy preferences; public opinions

Published:  


DOI: https://doi.org/10.17645/mac.9677


© Keyeun Lee, Jaehyuk Park, Suh-hee Choi, Changkeun Lee. This is an open access article distributed under the terms of the Creative Commons Attribution 4.0 license (http://creativecommons.org/licenses/by/4.0), which permits any use, distribution, and reproduction of the work without further permission provided the original author(s) and source are credited.