Loud and Negative: Exploring Negativity in Voter Thoughts About Women and Men Politicians

Negative information about political candidates is readily available in contemporary political communication. Moreover, negativity is tightly connected to gendered expectations about what constitutes appropriate behavior for politicians. Yet, existing theoretical models of negativity and candidate evaluation typically do not address the role of gender and the available empirical evidence remains inconclusive regarding the electoral consequences of the interaction of negativity and gender. This article tackles these gaps in two studies to investigate how negativity manifests in voters’ thoughts about women and men politicians in response to negative media cues and how these thoughts affect vote preference. Study 1 uses a mixed methods think‐aloud approach to trace the first impression formation and subsequent decision‐making pro‐ cess (N = 78). Study 2 replicates the design as an online thought listing survey experiment (N = 142). A similar quantitative pattern emerges across both studies: (a) Negative cues elicit similar amounts of negativity in voters’ thoughts for women and men politicians, (b) these negative thoughts strongly lower candidates’ electoral chances, (c) but less so for women candidates. The qualitative analysis suggests that negative cues heuristically affect earlier stages of impression formation while voters are likely to rely on gender cues when they rationalize their vote decision.


Introduction
More women are entering politics than ever before (Hughes & Paxton, 2019) and there are indeed signs that the political tide is turning in women's favor. Recent research suggests that voters display little bias against women candidates at the ballot box (e.g., Bridgewater & Nagel, 2020;Dolan, 2014) or in experimental settings (Schwarz & Coppock, 2022). At the same time, negative political campaigns are commonplace (Nai, 2020) and may be becoming more frequent (Geer, 2012; for a metaanalysis see Lau et al., 2007). An interplay of structural, contextual, and personal factors is driving political candidates to incorporate elements of negativity in their cam-paign strategies (e.g., Valli & Nai, 2020), which drive voters' attention directly (Fridkin & Kenney, 2012) or indirectly by generating more (negative) media coverage (Maier & Nai, 2020;Meffert et al., 2006;Soroka et al., 2019).
These larger phenomena in political communication do not happen in isolation but affect the ways that voters evaluate political candidates in concert. Although existing models of negativity and candidate evaluation do not address the role of candidate gender (Klein & Ahluwalia, 2005;Lodge & Taber, 2013), negativity is tightly connected to gendered expectations about what constitutes appropriate behavior for politicians (Krupnikov & Bauer, 2014). The literature on gender stereotyping and candidate evaluation offers three possible theoretical explanations for the interaction of negativity and candidate gender on voter evaluations.
First and most conventionally, a reinforcing effect of campaign negativity would be predicted by the research on role congruity (Eagly & Karau, 2002) and stereotype activation (e.g., Bauer, 2015). Many forms of negative politics-such as candidate attacks, mudslinging, or scandalization (Craig & Rippere, 2016;Fridkin & Kenney, 2012)-run counter to stereotypical expectations that prescribe women (but not men) to be warm, communal, and nurturing while proscribing any forms of aggressiveness, immorality, or stubbornness (Prentice & Carranza, 2002). Though voters may not directly punish women candidates on the basis of gender in neutral conditions (see Schwarz & Coppock, 2022), "campaign communication activates stereotypes when they otherwise might not be activated, thereby diminishing support for female candidates" (Bauer, 2015, p. 691). By reinforcing the perceived disconnect between leadership and gender stereotypes (Schneider & Bos, 2014), negativity in candidate messages-either in their own communication (see, e.g., Valli & Nai, 2020) or in media coverage (Van Der Pas & Aaldering, 2020)-can thus indirectly affect voter evaluations.
Second, however, an equalizing effect of campaign negativity may arise from voters' dislike of negative campaigning irrespective of the gender of the involved candidate (Fridkin & Kenney, 2012). In this logic, the attentional pull of negative cues in a candidate's message outweighs gender cues (Meffert et al., 2006;Soroka et al., 2019) and "neutralize[s] the disadvantages caused by gender stereotypes" (Gordon et al., 2003, p. 35). Similarly, research has shown that women who focus on masculine traits and issues in their communication can counteract gender stereotypes (Bauer, 2017). The underlying idea is that voters' decision to categorize a female candidate as either a political leader or a woman is not clear-cut but malleable by strategic communication. Messages containing stereotypically masculine forms of negativity (e.g., attacks, corruption, scandals, etc.) may thus shift the most salient category during the evaluation from "woman" to "leader" (Bauer, 2017) and provide voters with a way to ignore or reconcile incongruent role expectations by creating a new subtype (e.g., "female leader"; Schneider & Bos, 2014).
Third, the notion of benevolent sexism (Glick & Fiske, 1996) could also suggest a protective effect of campaign negativity for women. Negative campaign elementsespecially those framing or portraying women as targets of attacks or scandals-violate the "norm of civility towards women" (Cassese & Holman, 2019, p. 57). In turn, this impression of exposed vulnerability for women candidates can compel voters to protect women by not only excusing or overcompensating for any potential transgression but also by punishing their (male) opponents (Barnes et al., 2020).
The mixed empirical evidence on the interaction of negative candidate messages and candidate gender does not clearly favor one theoretical argument over the other. In line with the reinforcement perspective, some studies indeed find that going negative on the campaign trail entails stronger backlash for women candidates than for men (Cassese & Holman, 2018;King & McConnell, 2003). Triangulating three different studies, Nai et al. (2021) have recently shown that voters consistently punish women-but not men-for using negative campaigning elements. In contrast, other research shows that negativity may act as an equalizing force resulting in few and inconsistent gender differences in candidate evaluations (Craig & Rippere, 2016;Krupnikov & Bauer, 2014). Finally, a few studies indicate in line with the protective perspective that the "presence of gender stereotypes appears to soften the blow of negative attacks" (Fridkin et al., 2009, p. 70;Gordon et al., 2003).
One explanation for these inconclusive findings might reside in the fact that reinforcing, equalizing, and protective effects are confounded by varying voter perceptions of negativity and that different forms (and definitions) of negativity may have different gendered consequences within and between voters (Sigelman & Kugler, 2003). Yet in-depth knowledge about the role of gender in voters' appraisal, processing, and application of negative information is still missing. I, therefore, propose to take a step back and approach the intersection of gender and negativity in an exploratory fashion. In two studies, I trace voters' thinking in response to negative and neutral candidate cues to assess differences in voters' thoughts about women and men candidates involved in negativity (RQ1) and to understand how voters' negative thoughts affect their vote decision (RQ2).

Study 1: Think Aloud Exploration
The think aloud (TA) paradigm conceptualizes the thinking process as a sequence of information chunks that enter participants' working memory for processing and verbalization while they perform a given task (Ericsson & Simon, 1993;Van Someren et al., 1994). While common in psychology and educational science (for a systematic review see Fox et al., 2011), the only use of concurrent verbalization techniques in the context of candidate evaluation-though not about gender or negativity-is a study by Lusk and Judd (1988), which traces voters' thoughts in response to candidate vignettes. The authors conclude that the strength of the TA method is to derive bottom-up perceptions of the investigated phenomenon, which addresses some of the definitory concerns of negativity (Sigelman & Kugler, 2003).

Participants
Seventy-two participants (51% women, M age = 36.9, SD age = 14.4) were recruited via snowball sampling by the author and nine research students of a master's seminar at a Swiss university. Participants had to be at least 18 years old and be fluent in German. The language criterion served to ensure effortless verbalization as all materials were in German. In addition, special attention was paid to include participants with heterogeneous sociodemographic and professional backgrounds. The study protocol is part of a larger pre-registered project (available at https://osf.io/wgn9r) approved by the university's institutional review board.

Procedure and Materials
The TA paradigm follows a 2 (candidate gender: woman vs. man) × 3 (cue type: neutral vs. negative vs. unrelated) within-subjects quasi-experimental design consisting of a warm-up, two TA candidate evaluation tasks, and a brief post-test survey. First, participants were familiarized with the TA procedure in three rounds of warm-up tasks adapted from the TA literature (Ericsson & Simon, 1993;Van Someren et al., 1994).
The first evaluation task (T1) was designed to trace participants' initial responses to candidate cues. Each participant serially viewed mock newspaper title pages of fictional candidates, which are individually displayed for one minute each. Participants were instructed to spontaneously respond to the image and infer the candidates' political profile: "Please look at the image and try to guess what this person is like as a politician in real life." For this task, a total of 14 candidate stimuli were grouped into seven different sets. Each set manipulated candidate gender (woman vs. man) and a specific framing by varying the title pages' candidate image and headline. A first set contained a neutral (pre-tested) portrait photo of a man or woman candidate with a headline simply identifying them as candidates for an election (see Panels A and B of Figure 1). Rather than focusing on one specific form of negativity, I manipulated three sets to explore different forms of negativity: (a) an image of a negative campaign ad denouncing a candidate as corrupt, (b) an image of a candidate displaying strong anger at a local debate, and (c) a paparazzi shot of a candidate at a strip club with a moralizing headline (see the Supplementary Materials). The remaining sets contained other framings and were used as filler materials for this study. Every participant viewed a total of six stimuli from three randomized sets (one neutral, one negative, and one filler). The design and content of all stimuli were adapted from real examples of media coverage and pilot tested.
The second task (T2) sought to capture participants' decision-making process. Participants were shown the same sets of title pages again but this time portraying the woman and man candidate simultaneously next to each other in the final stretch of a hypothetical race (see Panel C of Figure 1). Participants were instructed to make a choice: "Please look at the two candidates and think aloud about whom you would rather recommend to a friend." To mask the gender-specific intention of the study, participants viewed two neutral sets presenting same-gender races in addition to the sets from T1.
In the final part of the study, participants completed a short survey containing political, sociodemographic, and attitudinal measures.

Coding
The raw transcripts of the verbal report were first coded by means of qualitative content analysis. The annotated dataset was then used to extract measures for quantitative analysis (see Section 2.3.2). The individual candidate description (from T1) represented the unit of analysis. As each participant saw four candidate images (i.e., the neutral and one of three negative sets), this resulted in a total of 288 candidate descriptions. The verbal report for each candidate description was segmented into single thoughts as the unit of coding. Following recommended practice (Ericsson & Simon, 1993, pp. 172, 205-207, 266-270), a single thought was defined as a full sentence, Figure 1. Neutral set of stimuli used in the first (Panel A and B) and second TA candidate evaluation task (Panel C). Note: Translated from German. which represents the linguistic (and verbalizable) equivalent of a semantically closed unit of meaning.
In line with Lodge and Taber's (2013) dual process model of political evaluation, two dimensions of thinking were coded. First, thought content reflects the semantic core of the activated concept (i.e., what the thought is about). For this, a category grid was derived from the literature on candidate evaluation and inductively completed. The final category grid distinguished between six different thought contents (see Table 1 and the Supplementary Materials for the full category grid along with coding examples). Second, thought affect-that is, the general valence tendency accompanying the thought content-was categorically coded either as negative (−1), neutral or ambiguous (0), or positive (1).
All coding was conducted by the author and a student assistant after extensive training. In case of repeated inductively observed (sub-)categories or disagreements for coded thought contents, harmonizing decision rules were established and the material was revisited. Intercoder reliability for the more standardized thought affect was satisfactory (Krippendorff's = 0.87).

Measures
The independent variables are candidate gender (0 = man, 1 = woman) and cue type (0 = neutral, 1 = negative) which are derived directly from the stimulus material.
As dependent variables, I measure negativity in voter responses as the sum of thoughts with negative affect per candidate description (M = 2.63, SD = 2.73) during T1. I capture participants' vote choice as a dummy variable to reflect whether they recommended the candidate (1) or not (0) during T2. Both measures are derived from the verbal report coding.
Finally, I include several individual characteristics as control variables. To account for differences in participants' verbalization speed, I measure their total thoughts as the sum of all verbalized thoughts per candidate coding in T1 (M = 7.13, SD = 2.65). From the short survey, I derive participants' own gender (0 = man, 1 = woman) and their ideology (two items on a scale from 1 = left/liberal to 10 = right/conservative; M = 3.06, SD = 1.14). Additionally, I assess gender essentialism, as gender essentialist beliefs can moderate the impact of gender stereotypes in candidate evaluations (Swigger & Meyer, 2019). Adapting their measure, I calculate an index of participants' average agreement to eight items (e.g., "Gender is a natural category") on a seven-point scale (M = 3.81, SD = 1.08).

Data Analysis
Data analysis simultaneously integrates quantitative and qualitative approaches where statistical analysis is used for identifying relationships and regularities and the qualitative in-depth analysis serves to explore and contrast underlying explanations (see Fearon & Laitin, 2013).
After a descriptive summary, I run Bayesian multilevel regression models to predict first the extent of negativity in participants' thoughts and then vote choice. I cluster the models around the stimulus set and the individual participant to accommodate the nested structure of the data. I rely on a Bayesian framework for its ability to draw conclusions based on probabilistic inferences about the presence-or absence-of an effect given the observed data (Gelman et al., 2013). Please refer to the Supplementary Materials for a technical discussion of model specification and evaluation.
I will report results as estimated posterior means along with 95% credible intervals (CrI). As a test of the evidence for or against the presence of an effect, I will calculate Bayes factors (BF). BF describe two models' predictive performance in relation to each other-that is, BF 10 is calculated as the ratio of the likelihood of H 1 (evidence in favor of the presence of effect) over the likelihood of H 0 (evidence in favor the absence of effect)given the data (Keysers et al., 2020;Wagenmakers et al., 2018). I follow the conventional classification for interpreting BF 10 where a BF 10 between 1 and 3 indicates anecdotal evidence, between 3 and 10 moderate evidence, between 10 and 30 strong evidence, between 30 and 100 very strong evidence, and a BF 10 greater than 100 means extreme evidence in favor of the alternative hypothesis (see Hoijtink et al., 2019).

Results
Regarding the first research question, the pairwise comparisons show few systematic gender differences in negative thoughts about politicians (see Table 1). Across all thought content categories, participants have more negative thoughts about men (M = 3.30, SD = 3.17) than women candidates (M = 2.49, SD = 2.77). The BF 10 for this comparison indicates that the presence of this small difference (d = 0.27) is 7.22 more likely than its absence. Note that this gender difference shrinks but persists in the multivariate analysis including control variables (BF 10 = 3.96, see Model 1 in Table 2). No striking gendered patterns arise for single thought contents, except for candidates' personality, which is more frequently the object of negative thoughts for men than women politicians (BF 10 = 5.83, d = 0.30).
Model 1 in Table 2 shows that negative candidate cues entail on average 1.31 (CrI = −1.32-3.85) more negative thoughts compared to neutral cues (BF 10 = 6.69). The lack of evidence for an interaction effect indicates that negative cues increase negative thinking irrespective of candidate gender (BF 10 = 2.51). Indeed, the qualitative analysis suggests that negative cues-regardless of the type of negativity-trigger negative thoughts across different thought content categories with no direct relation to the negative cue itself. For example, candidates involved in a scandal are not only criticized in terms of their integrity but also regarding their appearance, political experience, and competence. The following thought passage of a male participant shows how an initial negative thought about the emotional display of the "angry candidate" can cascade into a stream of negativity of seemingly unrelated aspects: Oh wow, this guy looks pissed off, as if he wanted to bite off my head. He looks like the type of person who always shouts and never listens to distract from his incompetence. With this posture, he looks like a mulish bull. He has way too much gel in his hair and the way he holds up his chin makes me think of Mussolini. Very unlikeable. I now see that the image has almost no saturation, which makes it unpleasant to look at.
The second research question relates negativity in participant thoughts to their vote choice (see Model 2 in Table 2). The single most clear result is a negative effect of the number of negative thoughts on vote choice: With every additional negative thought, the chance of getting the participant's vote recommendation decreases by 37% on average (OR = 0.63, CrI = 0.49-0.77). The BF greater than 999 indicates extreme evidence. Crucially, the effect of negativity on voter thoughts varies across candidate gender, with women being less strongly affected by participants' negative thoughts than men candidates (BF 10 = 275.5). Panel A in Figure 2 illustrates this interaction and shows that men's chances of getting the vote drop dramatically when participants have only a few negative thoughts while the preference for women candidates diminishes much more gradually. The qualitative data point to a combination of equalizing and protective effects. For one, participants often struggle to form a decision after negative appraisals, calling their decision a "toss of a coin" (male, 31 years) or a "50-50 decision" (female, 54 years). In these cases, negativity appears to deflect from genderrelated aspects and to pre-empt the potential of backlash against women candidates. Moreover, almost half the participants referred to the social context of structural bias against women when thinking about their vote "If I'm going to have to vote for somebody incompetent, might as well be a woman given there are too few of them."

Study 2: Thought-Listing Replication
A frequent criticism of the TA paradigm is that verbalization affects the thinking process, leading to a distorted assessment of cognitive processes (for a review, see Fox et al., 2011). Because Study 1 involved the presence of an experimenter, another concern is that social desirability might drive which thoughts are verbalized. To address these issues, I replicate the design of the first study as a "silent" thought-listing (TL) survey experiment (Erisen et al., 2014;Lodge & Taber, 2013).

Participants
A total of 142 participants (43% women, M age = 30.7, SD age = 9.6) were recruited via Amazon Mechanical Turk. Participants had to be at least 18 years old and fluent in German. Participation lasted on average 13.3 minutes (SD = 7.2) and was rewarded with 1 USD.

Design and Stimuli
The design and materials were identical to Study 1, except for the following changes. T1 instructed participants to perform the TL task in two steps for each image. First, they viewed images and listed their thoughts as spontaneously as possible in empty text boxes (with a forced list of five thoughts). Second, they then saw their own listed thoughts and classified each thought as either positive, neutral, or negative. For T2, participants moved a slider to either the left or the right to indicate their vote preference for the candidate on the corresponding side (see Panel C in Figure 1).

Measures
I measure negativity as the sum of thoughts that participants classified as negative per candidate image (ranging from 0 to 5, M = 1.8, SD = 1.6). Participants' vote preference is captured on a scale from −50 (preference against candidate) to 50 (preference for candidate), where the scale midpoint of 0 indicates a neutral undecided preference (M = 0.7, SD = 26.7). The same independent and control variables were used as in Study 1.

Data Analysis
The unit of analysis was the individual candidate image (n = 456). I repeat the same Bayesian multilevel regression models from Study 1, again clustered around the stimulus set and the individual participant.

Results
The results from Study 2 largely mirror those of Study 1. Model 1 in Table 3 suggests very strong evidence for the absence of an effect of candidate gender on the number of listed negative thoughts (BF 10 = 0.02). Participants list on average 1.75 more negative thoughts in response to candidate images with negative cues compared to those with neutral cues (CrI = 0.12-2.79, BF 10 = 12.5) irrespective of candidate gender. Again, negativity in voter thoughts has a strong negative effect on vote preference, diminishing the preference by 2.90 per listed negative thought (CrI = −4.56-1.23, BF 10 > 999). Moreover, the interaction effect points to a protective effect where negative  Figure 2 shows that women candidates retain a slightly positive vote preference despite the presence of negative thoughts while only little negativity causes a significant drop in preference for men candidates.

Overall Discussion and Conclusion
The goal of this study was to investigate the relationship between candidate gender and negativity in voters' evaluation process. I examined voter thoughts in response to neutral and negative candidate cues by means of a mixed methods approach, combining a quantitative and qualitative TA (Study 1) and TL (Study 2) design. Across both studies, a similar pattern emerges: (a) Negative cues elicit the same amount of negativity in voters' thoughts for women and men politicians, (b) these negative thoughts strongly lower candidates' electoral chances, (c) but less so for women candidates. First, voters' tendency to think negatively of candidates irrespective of gender can be interpreted as an equalizing effect of negativity. One interpretation is that negative cues have primacy over gender cues in the initial, mostly implicit stages of the candidate evaluation process (see, e.g., Lodge & Taber, 2013). This view is supported by psychophysiological studies which have consistently linked negative cues-but not gender cues-to implicit affective responses to political cues (Bakker et al., 2021;Soroka et al., 2019). Moreover, research on affect contagion has shown that initial affective responses spread and favor the activation of similarly charged mental concepts, which are retrieved from memory and made available for further (explicit) processing (Erisen et al., 2014;Lodge & Taber, 2013), including verbalization. I find evidence for this cascading effect of negative cues on further processing both in the quantitative (effect of negative cues on number of negative thoughts) and the qualitative analysis (see block quote in Section 2.4). As negativity selectively reinforces negative thoughts, the activation of gender-related aspects becomes less likely thus reducing their availability as heuristics. However, even if negativity affects the evaluation of women and men candidates similarly, the fact that content analyses have shown more negative media coverage for women politicians (see Van Der Pas & Aaldering, 2020) remains problematic, as this provides more opportunity for negative affect priming (Meffert et al., 2006).
Second and in line with meta-analytic findings (Lau et al., 2007), I find very strong evidence that inducing negativity in voters' thoughts does not win votes. This has implications for candidates' campaign strategies. Though negativity is a losing game for all candidates in this study, the context of actual campaign negativity may modulate how voters think about specific forms of negativity (for a review see Nai, 2020). For example, studies show that voters are less likely to electorally punish candidates who respond to negativity rather than instigating it (Craig & Rippere, 2016;Krupnikov & Bauer, 2014).
Third, the finding of a protective (or less detrimental) effect of negative thoughts for women candidates shifts the mixed empirical evidence on the relationship of negativity and gender ever so slightly towards a more optimistic narrative for women: While detrimental in absolute terms, women suffer less from negative thoughts relative to men (Fridkin et al., 2009;Gordon et al., 2003). The qualitative insights illustrate that voters frequently invoke women's descriptive underrepresentation when faced with a choice between two candidates and that this perceived power imbalance can tip the scale in women's favor. These explicit gender references at the later stage of the evaluation process (T2) contrast with the earlier impression formation stage (T1) where mentions of gender are scarce. This could mean that negative and gender cues enter the evaluation process at different stages and in different ways. Whereas negativity drives and affectively anchors the initial (implicit) processing of a candidate image, gender marks the context for the (explicit) rationalization of the vote decision. This finding underlines the important role of public perceptions of women in politics for opinion formation (Stauffer, 2021) and adds to recent research suggesting that actively reminding voters of existing biases can be a viable strategy for women candidates (Brooks & Hayes, 2019). This study comes with several limitations. I focus on explicit dimensions of voter thinking and thus of the candidate evaluation process. This choice implies that any assumptions regarding implicit aspects of candidate evaluation-though established in the literatureremain untested. A promising approach for future studies could lie in the combination of the TA paradigm with implicit approaches, namely psychophysiological measures or implicit association tasks. Moreover, it also raises the issue of social desirability, which could encourage participants to exaggerate their gender perceptions despite methodological efforts to mask the gender-specific goal of the study (through gender-neutral cover stories, filler tasks, and same-gender stimulus sets) or enhancing the anonymity of thoughts (Study 2). However, rationalizations cannot (and should not) be isolated from their social context as they are precisely indicative of how voters reconcile social expectationssuch as gender norms-with their own prior attitudes and beliefs (Lodge & Taber, 2013;Yong et al., 2021). Finally, although the design of this study cannot establish (or reject) any underlying mechanism, the protective effect implies that voters are somehow motivated to rationalize away part of the negativity for women but not men candidates. Whether they do so out of benevolent sexism (Barnes et al., 2020;Cassese & Holman, 2019), because they found ways to resolve perceived role incongruence (Bauer, 2017), or following a genuine desire to undo structural inequality remains an open question.