What Remains in Mind? Effectiveness and Efficiency of Explainers at Conveying Information

Whether and to what extent mass media contribute to the acquisition of knowledge depends fundamentally on the senses addressed by a particular medium. However, there is a lack of current research investigating the effectiveness and efficiency of (new) media, like scrollytelling and explainer videos, at conveying information, compared to established formats like text and audio. To fill this research gap, I conducted an experimental online survey (N = 381) with medium as the independent variable (explainer text vs. audio vs. video vs. scrollytelling) and the recall of information as the dependent variable. The subjects were presented with a popular scientific presentation on the environmental consequences of meat consumption in order to examine a socially relevant, controversial topic and to explore the possible consequences of dissonance on recalling information. As the present study demonstrates, the traditionally lower reputation of moving images in regard to the effectiveness of information transfer is not always justified. Rather, the results show that scrollytelling and video lead to a significantly more extensive recall than audio and in part text media. However, when considering exposure time, text turns out to be the most efficient medium. The dissonance perceived by the participants did not have any significant influence on their recall of information.

With the emergence of digital platforms, audio and audio-visual content is gaining popularity in the dissemination and acquisition of knowledge (Schneider, Weinmann, Roth, Knop, & Vorderer, 2016) at the expense of text-based formats. Therefore, in the tradition of media theorist Walter Ong (1991), there already is talk of a "return to orality" (Kaeser, 2016) or a "post-text future" (Manjoo, 2018) with YouTube as digital lecture hall. Accordingly, out of the 86% of 12 to 19-year-olds in Germany who use YouTube, a quarter expect to expand their knowledge. Almost half of the pupils describe YouTube as important or even very important for school matters (Jebe, Konietzko, Lichtschlag, & Liebau, 2019). 13% use so-called explainer videos about school topics at least several times a week (Feierabend, Plankenhorn, & Rathgeb, 2017). Altogether, explainer videos have already been watched by about 70% of the population in Germany, making their use much more widespread than in the United States (Krämer & Böhrs, 2017). Such explainers can be defined as "movies from self-production…which explain how to do something or how something works or in which abstract concepts and contexts are explained" (Wolf, 2015, p. 1). In this context, news explainers are used, for instance, to counteract disinformation (Graves & Cherubini, 2016) or to provide background information about occasionally controversial issues (Spilioti, 2018) in increasingly complex, popularized high-choice news environments (Umbricht & Esser, 2016).
The unmistakable trend towards video is also driven by the economic interests of information intermediaries, as videos are easier to monetize than other formats (Kalogeropoulos, Cherubini, & Newman, 2016). So, publishers are increasingly complementing their text offerings with online videos (Bock, 2016). Newman concludes that the "video-enabled internet is changing the formats and style of digital content, providing competition for, but not replacing text" (2017, p. 20). Altogether, a proliferation of digital knowledge transfer formats can be observed, which is reflected in the heterogeneity of usage patterns (e.g., Costera Meijer & Groot Kormelink, 2015).
The question thus arises as to what consequences these developments will have for informing people. There is no doubt of a positive relation between media use and the acquisition of socially relevant knowledge (Delli Carpini & Keeter, 1996;Eveland & Schmitt, 2015). Yet, clearly, there are "some variations across media channels and types of political learning" (Dimitrova, Shehata, Strömbäck, & Nord, 2014, p. 98). Lang traces these variations primarily back to differences among perceptual channels, temporal constraints, learned signals, and the orientation-eliciting structural features of the various media, which perform an "extremely important role in the automatic allocation of resources" (2006, S63), like attention. In this respect, it is necessary to distinguish between effectiveness and efficiency of information transfer. Effectiveness is understood as learning output, and efficiency as the ratio between input (time spent to consume a specific content) and output (information recall) (Krämer & Böhrs, 2017).
Which medium is the most effective and which the most efficient at conveying (political or scientific) information is of essential importance not only in the pedagogical context, but also for deliberative discourse and decision-making in democracies that depend on wellfounded judgements. This is especially true in times of an erosion of a shared knowledge base and the questioning of epistemic authorities (Neuberger et al., 2019). Consequently, Holbert emphasizes, that "perhaps the central question for the discipline concerns how media aid citizens in becoming informed voters" (2005, p. 511), or, as Baron puts it, "we should be figuring out the right curricular balance of video, audio, and textual materials" (2017, p. 19).
The developments are also relevant because videos are claimed to possess a higher suggestive power than other formats. In turn, the affective reaction to their content, specifically the induced amount of dissonance, may be an important factor when investigating the recall of information provided by explainers. After all, emotions and cognitions interact closely, and emotions help learners to prioritize information as they process it (Brosch, Scherer, Grandjean, & Sander, 2013;Forgas, 1995;Tyng, Amin, Saad, & Malik, 2017). Therefore, this study aims to investigate the effects of the medium itself and of emotions on recalling information provided by explainers. To pursue this goal, I first explain more about why medium and emotions matter for learning processes and then present the results of my experimental survey.

The Power of the Medium
Early research in communication science in this context predominantly concerned the memory of news. It mainly showed that individuals remember stimulus material received through print media more extensively than identical material received through broadcasting media (Facorro & DeFleur, 1993;Wilson, 1974). According to Stauffer, Frost, and Rybolt (1981), memory of news is worst for audio formats. In line with that, Daniel and Woody (2010) examined the retention of a 22-minute podcast in comparison to the corresponding text. Like Green (1981), they found that listeners performed more poorly than readers did in completing a quiz about the article.
The "primacy of print" (Furnham & Gunter, 1989, p. 309), or, in the words of Jacoby, Hoyer, and Zimmer, the "superiority of the medium" (1983, p. 212) has been repeatedly stated in experiments, particularly those of the research group around Furnham and Gunter (Furnham & Gunter, 1985, 1987Gunter & Furnham, 1986). Besides their research on news, they also examined popular scientific contributions, coming to similar conclusions: Print leads to the best recall scores, followed by audio-visual, with audio-only being last (Furnham, Gunter, & Green, 1990). They and other researchers attribute this phenomenon mainly to the fact that text offers its readers greater cognitive control, since processing speed can be freely determined. Videos and audios, on the other hand, are played at a predetermined reception time, which may overload or under-engage recipients (Eveland, Seo, & Marton, 2002;Green, 1981;Lang, 2006). While audios and videos organize "in time" (Noelle-Neumann, 1977, p. 92), text offers orientation in space. Nevertheless, this text feature can stand in the way of an integrated knowledge acquisition process, as recipients can fly over or skip passages (Dalrymple & Scheufele, 2007).
Walma van der Molen and van der Voort (2000) heralded kind of an epistemological turn in the intermedia study of information recall. In their experiment, they found that both adult and child viewers of children's TV news stories recalled more information than readers of the corresponding print news. When, on the other hand, adults received news made for adults (rather than prepared for children), they remembered the content conveyed in the print article better. The researchers argued that the latter TV news stories showed a low degree of redundancy between the image and audio track, i.e., a small amount of semantic overlap between the verbal and visual content. Instead, "standard news pictures" and "talking head-only items" dominated; these have only a limited supplementary information value because they convey "little meaning and are often at best only partially related to the spoken commentary" (p. 134). The authors expected that the images distract the recipients from the spoken text as the carrier of the main information (see also Sundar, 2000). However, as previous research on so-called cue summation demonstrated, image and audio should not be completely redundant either. Otherwise the recipients are not offered any additional learning cues that facilitate information retrieval (Severin, 1967). If at least 40 to 50% of verbal information has a semantic reference to its visualization, TV can exercise a recall advantage over print, according to Walma van der Molen and Klijn (2004). In this respect, the recipients' limited capacity for information processing should be taken into account. If the information density of a contribution is too high, verbal and visual information might compete for the recipient's limited attention (Lang, 2006).

Studying Medium Effects in the Context of Explainers
Research on the effectiveness and efficiency of different media at conveying information has scarcely begun to be transferred to the digital age so far (Powell, Boomgaarden, De Swert, & de Vreese, 2018). Not only do digital videos and audios allow for more cognitive control today than they did during the period of the studies described above, new hybrid formats such as scrollytelling have entered the market. Scrollytelling refers to digital storytelling formats that unfold as you scroll. Thereby multimedia elements like photos, videos, audio, graphics or animations complement text elements (Godulla & Wolf, 2018). Furthermore, by traditionally focusing on news, communication science neglected the emergence as well as the effects of popular (science) formats like the explainer videos mentioned above. Explainer videos are characterized by an informal style of presentation as well as a higher degree of narration and didactics than documentary films. Not least, they feature simple language, as well as a complementarity of spoken word and image (Krämer & Böhrs, 2017). For example, the audio track may be illustrated by the visualization of numbers and quantities, and by graphics and theme pictures (Lauter, 2018). It can thus be assumed that explainer videos usually contain a greater amount of semantically related (i.e. redundant) audio-visual information than news stories. This is why I hypothesized that: H1a: Subjects exposed to an explainer video will recall significantly more facts than those exposed to the corresponding text.
There is strong research evidence that audio leads to the poorest recall performance. In contrast to video, audio lacks additional retrieval cues, in contrast to text, the rate at which audio information is presented is not determined by the recipient (Daniel & Woody, 2010;Furnham et al., 1990). Therefore, I proposed that: H1b: Subjects exposed to an audio contribution will recall significantly fewer facts than those exposed to the corresponding video, text and scrollytelling.
As a hybrid medium, scrollytelling contains textual and audio-visual passages, therefore sharing some of the advantages and disadvantages of both text and video. Consequently, I supposed that: H1c: Subjects exposed to scrollytelling will recall significantly fewer facts than those exposed to the corresponding video, but more facts than those exposed to text and audio.
Many intermedia studies dealing with information acquisition have equated exposure time (e.g., Eveland et al., 2002;Furnham et al., 1990;Furnham, Proctor, & Gunter, 1988). However, it can be presumed that the uptake of information occurs at different rates (Furnham et al., 1990). Allowing subjects to self-regulate their exposure time makes it possible to distinguish between effectiveness and efficiency of different media in transferring information. Efficiency is thus derived by weighting the recalled information (= effectiveness) with the respective exposure time (Krämer & Böhrs, 2017).
In digital environments, reading is characterized by a quick, selective scanning of content (Ackerman & Goldsmith, 2011;Baron, 2017;Mangen, Walgermo, & Brønnick, 2013). Audio-visual formats support this reception mode only to a limited extent. This is why I expected that: H2: Text is a more efficient format than audio, video and scrollytelling for recalling information.
Furthermore, their larger proportion of redundant audiovisual information makes explainer videos particularly suitable for investigating whether the alleged advantage of video over other media really is due to its semantic overlap, which would be a direct effect of the medium on recalling information. There is considerable evidence to propose an alternative explanation: Due to their vividness, pictorial formats capture recipient's attention comparatively longer than non-pictorial formats (Eveland et al., 2002;Goldberg et al., 2019), which in turns leads to greater recall (Chun & Turk-Browne, 2007). This would be an indirect effect of the medium on information recall. Effects of the different formats would be mediated by exposure time (see also Singer Trakhman, Alexander, & Berkowitz, 2019). Congruently, then, I hypothesized that: H3: The assumed higher recall values among the recipients of video are an indirect effect due to higher exposure time.
On the content level, explainers typically tackle socially relevant, complex and at times controversial issues like migration, embryonic stem cell research and global warming. Thus, it is likely that some of their expressed statements will cause cognitive dissonance among some recipients (Hart & Nisbet, 2012). Dissonant evidence is "information that challenges one's ideological worldview or set of cultural values" (Nisbet, Cooper, & Garrett, 2015, p. 37) and that may even lead to questioning one's identity (Kahan, 2013). However, according to the theory of cognitive dissonance (TCD), individuals seek consistency among their cognitions, meaning, among other things, that their attitudes, values and intentions should not contradict each other (Festinger, 2001). Dissonance is perceived as an aversive, unpleasant motivational state and as a result, exposure to dissonant messages may lead to negative metacognitive affective experiences (Harmon-Jones, 2000;Nisbet et al., 2015). Consequently, individuals partly try from the onset to avoid dissonant content, and if this is not possible or expedient, they try to reduce cognitive dissonance, for example by altering one of the inconsistent elements, like their attitude or behavior (Festinger, 2001;Jang, 2014).
The present study focuses on recipients' concrete emotional reactions and its consequences for information recall once they are exposed to potentially dissonant material. Once more because of their vividness, visual stimuli have been hypothesized to be psychologically more activating than pure text, and seem to be processed more emotionally than non-visual stimuli (Geise & Baden, 2015;Powell et al., 2015Powell et al., , 2018Powell et al., , 2019. I therefore concluded that: H4: Video and scrollytelling induce stronger feelings of dissonance than text and audio. Strong emotions at the time of perception are said to promote encoding and recall of semantic information. For example, Doerksen and Shimamura (2001) found evidence that the use of emotional words leads to an increased allocation of attention (see also Kensinger & Corkin, 2003;Lang, 2017). However, it is quite ambiguous to what extent the (positive or negative) valence of emotions promotes or inhibits learning (Heidig, Müller, & Reichelt, 2015;Lang, Sanders-Jackson, Wang, & Rubenking, 2013;Tyng et al., 2017). According to Forgas, negative affect "recruits more careful and substantive processing styles" (1995, p. 50) because it has an alert function. Pekrun, Goetz, Titz, and Perry (2002) differentiated further between negative activating emotions (like anger, anxiety, and shame) and deactivating emotions (such as hopelessness), depending on whether they increase or decrease motivation to process information. Weeks (2015) observed that-rather than a general negative affect-it is anger that facilitates reasoning in the direction of one's own attitudes or beliefs (known as motivated reasoning). Anxious individuals, on the contrary, process the content to which they are ex-posed more elaborately, as anxiety unfolds in reaction to a threatening external stimulus.
Relatively little research has examined the concrete, typically short-lived emotional reactions accompanying thought generation during the reception of dissonant material. So far, psychological studies have found feelings of discomfort and stress (van Veen, Krug, Schooler, & Carter, 2009), tension, anger and irritation (Zuwerink & Devine, 1996) as well as anxiety, hostility and depression (Russell & Jones, 1980) to be associated with dissonance. Taddicken and Wolff (2020, in this thematic issue) showed that individuals exhibit an alarmed state grounded in feelings of insecurity and helplessness when confronted with opinion-challenging disinformation about climate change. Moreover, individuals expressed a state of activation, indicating they were attentive and curious. The dominant emotion, however, was anger. In general, the different emotions evoked by dissonant messages might affect processing of information partly in opposite directions. As anger seems to be a determining element of dissonance, it is conceivable that individuals confronted with dissonant messages may turn away from content during reception, or selectively recall or forget information in order to resolve the uncomfortable state as an expression of motivated reasoning (Lind, Visentini, Mäntylä, & Del Missier, 2017;Russell & Jones, 1980; see also Taber & Lodge, 2006). Therefore, I hypothesized: H5: Feelings of dissonance negatively moderate the relation between the medium and exposure time as well as between the medium and the recall of information.

Participants
The data for this study were collected in an experimental online survey of internet users in Germany from June 20, 2019, to July 3, 2019. Participants were mainly recruited via social network sites, including Facebook groups that deal with the topic of the stimulus material. Randomly assigned to the four experimental groups, 436 participants completed the questionnaire. 55 cases were excluded because of completing the questionnaire too quickly or spending a disproportionately large or short amount of time on the website with the respective stimulus. I cleaned the dataset of cases that violated Leiner's (2019) quality parameter 'relative speed index' (≤ 2.0), indicating that the participants did not take the survey seriously. Furthermore, I excluded extreme outliers that differed by more than three standard deviations from the respective mean exposure time. As a result, the sample for data analyses consisted of 381 participants (75% female), whose age ranged from 16 to 82 years (M = 34; SD = 15). 75% had achieved a high school diploma.

Procedure
First, the participants' topic-specific prior attitude and prior knowledge were measured. Next, participants were randomly assigned to one of four medium conditions (text, audio, video, scrollytelling). An analysis of variance (ANOVA) confirmed that randomization to the experimental conditions was successful. No significant differences were observed between the four experimental groups in terms of age, gender, formal education level and media use (all p > 0.05; see Table 1). Afterward, the participants' feelings of dissonance and factual knowledge were surveyed. In order to capture as natural a usage behavior as possible, the knowledge test was not announced in advance, and no learning instructions were given. Respondents were then asked to provide information on their media use and socio-demographics. The questionnaire was completed by a debriefing, which informed the respondents about the nature and purpose of the experiment.

Materials
Because I decided out of practical considerations to use a pre-existing video, I formulated several requirements that the video had to meet in order to be considered suitable: 1. It must contain neither a brand logo nor familiar testimonials, so (at least the obvious) effects of brand familiarity could be excluded. 2. It had to be long enough to convey a sufficient number of facts, but not too long in order to avoid fatigue. 3. It had to be scalable to text, audio and scrollytelling without sacrificing authenticity, and its audio track had to be comprehensible without the accompanying pictures.
4. It had to convey facts that were unfamiliar to the participants, so that their recall of information could be traced back to the stimulus. 5. Its content had to be topical and enduring. 6. Its content had to be controversial in order to generate sufficient variance in feelings of dissonance. 7. Its audio and image track had to exert a sufficient degree of semantic overlap.
The starting point for the four experimental stimuli was hence a popular scientific video from 2014 (https://edeos.org/projekte/fleisch-und-nachhaltigkeit). It is entertainingly packaged, animated, enriched with graphics and deals with the global ecological impact and sustainability aspects of industrial meat production and consumption. The issue seems appropriate to provoke feelings of dissonance. Non-vegetarians may perceive the message as a potential threat to their lifestyle, which may cause anger (Piazza et al., 2015). In order not to fatigue the participants, I reduced the original length of the video from 7:38 minutes to 5:24 minutes. The equally shortened transcript of the video with a length of 827 words (including instructions) served as the plain text condition. I modified some of the wording that would have seemed untypical for a text contribution. The audio clip consisted of the audio track of the shortened explainer video. The scrollytelling contribution was created using the digital storytelling tool Pageflow. It consisted of a 529-word text interrupted by an information graphic and three video clips of 11, 36 and 19 seconds.
Its first page provided instruction as how to navigate the scrollytelling. The amount of semantic information contained in the four forms of media was nearly identical; the passages with slightly different formulations were not covered in the questionnaire (see Supplementary File for a list of the stimuli).  (Taddicken, Reif, & Hoppe, 2018). The questions did not address information that was conveyed solely verbally in the original video. Correct answers were rated with one point, partly correct answers with half a point (only for cued recall questions), and wrong answers with zero points (for a similar approach see Früh, 1980). Thereupon an index was formed from the arithmetic mean of the evaluated responses, ranging from zero to one (M = .63; SD = .21). The factual questions and the operationalization of the following variables are presented in Table S1 in the Supplementary File.

Mediator and Moderator
The  Table 2). The first factor includes negative feelings of guilt, fear, insecurity and shame, the second com-prises motivation and confirmation. As the factor 'anger' is composed of only one item, it cannot be regarded as an independent factor here. The rotation sums of squared loadings rather indicate a two-factor solution. Moreover, the item 'offended' cross-loads with .34 on factor 1 and with .18 on factor 3. The operationalization of prior attitude (M = 5.69 on a 7-point item scale; SD = 1.19, Cronbach's = .82) encompasses the dimensions of problem awareness and behavioral intention. As a conative component, the latter comprises the willingness to assume responsibility (Taddicken, 2013). Additionally, I controlled for the participants' demographics (gender, age, and formal education level) as well as their media use (consisting of television, radio, newspaper and internet use) (e.g., Greussing & Boomgaarden, 2019). Table S2 in the Supplementary File. To address H1a-H1c, I conducted an ANOVA. It proved that the medium exerted a significant influence on the recall  of information (F(3, 358) = 2.87, p < .05, partial 2 = .02, n = 362). Contrast analyses (see Table 3) demonstrated that video did not lead to significantly higher recall levels than text (p = .13), rejecting H1a. Subjects exposed to the audio contribution recalled significantly fewer facts than those exposed to the corresponding video (p = .007) and scrollytelling (p = .03). Contrary to expectation, there was no significant difference between the effectiveness of audio and text in terms of successfully transferring information (p = .27), thus H1b may only be partly accepted. The reception of scrollytelling resulted in recall levels similar to those of subjects who watched the corresponding video (see Table 3; p = .65). Hence, recipients of the scrollytelling were able to recall significantly more information than those of the audio contribution (p = .03). Recipients of scrollytelling did not recall significantly more facts than the recipients of text (p = .31). H1c is therefore rejected. As presumed in H2, the effectiveness of information transfer should be distinguished from efficiency. As Welch's ANOVA confirms, exposure time differs significantly across the different media forms: Welch's F(3, 195.56) = 5.78, p < .01. Subjects were exposed for a significantly longer time to video and scrollytelling than to text. Not surprisingly, depending on the medium, significant differences can be observed regarding the product of information recall and the indexed exposure time (see Table 4), with Welch's F(3, 206.24) = 4.07, p < .01. Bonferroni post hoc tests reveal that text conveys significantly more information than audio, video and scrollytelling in the same amount of time, confirming H2.

Zero-order correlations between all variables of interest are presented in
H3 posed that the assumed higher recall values among the recipients of video result from an indirect effect due to higher exposure time. Mediation analysis en-ables potential indirect effects through exposure time to be separated from the direct effects of the inherent capacities of the respective media (Singer Trakhman et al., 2019), and therefore allows to distinguish between those two rival explanations. I conducted mediation analysis using model 4 of the SPSS PROCESS (Hayes, 2012) macro version 3.3. Because the predictor (i.e., the medium) is multi-categorical, I coded dummy variables with video as the reference category. Again, socio-demographics, media use, prior attitude and prior knowledge were included as covariates. Confidence intervals that do not include zero indicate significance for statistical inference of mediated effects. Except for audio, neither a total nor a direct effect of the medium on information recall could be observed in relation to video (all p > .05). Yet, according to Hayes (2018), a total effect is not a prerequisite to indirect effects. Besides, less power is required to detect indirect effects compared to comparably sized total effects (Kenny & Judd, 2014). Mediation analysis confirms H3 in the sense that a negative indirect effect of different media via exposure time was observed when comparing video and text (ab text = −.034, 95% CI = [−.069, −.011]). No indirect effects exist when comparing video and audio (ab audio = −.012, 95% CI = [−.029, .001]) or video and scrollytelling (ab scrollytelling = .01, 95% CI = [−.01, .032]). Thus, the relatively higher recall values associated with video compared to text may be explained by longer exposure time. However, this indirect effect seems to be cancelled out by another, unknown variable (MacKinnon, Krull, & Lockwood, 2000), which is already indicated by the absence of a total effect of text on recall in comparison to video.
Moving to the hypothesized interaction between medium and emotions, as can be seen in Table 5, the lev- Notes: Cell a-b entries are means with standard deviations in brackets. The exposure time index is based on the lowest mean exposure time. Information transfer efficiency is the indexed product of effectiveness and exposure time index, with value 100 for most efficient medium. els of feelings of dissonance did not differ significantly between the four media, which is why H4 is rejected. The distribution of the two items 'offended' and 'upset' thereby is skewed to the right, i.e. the majority of respondents felt neither offended nor upset by the reception of the contribution. With regard to H5, which proposed that recalling information is affected by the feeling of dissonance, either by reducing exposure or by selective recall of information, I executed a moderated mediation analysis using model 8 with video as the reference category and the (manually standardized) feeling of dissonance as the moderator (see Figure 1). As we already noticed in Table 5, the means of the feeling of dissonance and its previously identified factors hardly differ between the treatment groups. Therefore, not surprisingly, no significant interaction effects of the different media and the feeling of dissonance on exposure time or information recall could be observed. This means that path a and c do not significantly differ along the different levels of the moderator (Hayes, 2015). Interaction effects were also examined separately for each of the three identified factors of dissonance perception and were not confirmed. H5 consequently is rejected. Participants neither turned away from the medium when experienced as dissonant (as their exposure time was not shorter compared to those who did not express feelings of dissonance), nor did they seem to selectively remember or forget dissonant information.

Discussion
Audio-visual formats have not enjoyed a good reputation in the past, when it comes to recalling information. Bock (2016) argues that behind this criticism lies a historical cultural evaluation of word over picture, both on the part of the audience and the producers. The devaluation of moving images is not always justified, however, as the present article demonstrates. The media formats do not differ substantially with regard to the recipients' level of information recall. Multimodal media like video and scrollytelling, at least when characterized by a certain degree of semantic overlap among the audio and visual tracks, seem to be as effective as text in promoting the transfer at least of certain information. In contrast, audio, as a single-channel medium, leads to the lowest levels of information recall. Despite similar exposure time, audio and video lead to different recall values; this result indicates a direct effect of medium on the recall of information. This is in line with previous research and the theoretical framework of cue summation, arguing for the learning benefits of an increasing number of retrieval cues.
However, in comparison to text, explainer videos and scrollytelling do not lead to equal information recall per se, but rather seem to convey information also through their ability to bind attention for longer durations. So one central question in science communication is how long individuals can be motivated for reception. Apparently, feelings of dissonance do not play a central role here: Neither they nor their factors significantly influenced exposure time. Perhaps individuals perceived a lack of action implication (Harmon-Jones, Harmon-Jones, & Levy, 2015) or incentive to learn (Pekrun et al., 2002). Alternatively, as indicated by the low means of the 'anger'-reflecting item (see Table 5), they did not feel ("sufficiently") threatened or offended by the explainer to follow motivated reasoning. In any case, the absence of (short-term) effects of dissonance on recall is good news for explainers aiming to rationalize the deliberative discourse. The often invoked suggestive power of moving images should thereby not be overemphasized. Videos did not trigger stronger dissonant feelings than the other formats examined. Similarly, Powell et al. showed that "vivid news videos did not evoke a strong emotional response" (2018, p. 591; see also MacKay & Ahmetzanov, 2005).
Video nevertheless may be the more effective medium, while text is the more efficient, which may be traced back to the fact that text still allows individuals for the most differentiated information selection. This is reflected in user preferences for online news: About two-thirds of the adult online users surveyed in the 2019 Reuters Institute Digital News Report prefer news in text form to video form. Affinity for text is justified by the ease and rapidity of reading (Kalogeropoulos, 2019). The relatively low information transfer efficiency of scrollytelling may partly be explained by the rather unconventional click-through process.
From a methodological point of view, this study highlights the importance of considering exposure time as a factor of attention and recalling information. A free al-location of exposure time corresponds to natural usage behavior (Ackerman & Goldsmith, 2011). On the other hand, to move on to the limitations of the study, unlimited exposure time may simultaneously confound the results. As a solution, Jacoby et al. (1983) proposed dividing participants into one group without and one group with an exposure time limit. Accordingly, Ackerman and Goldsmith (2011) conducted two experiments regarding text learning from printed hardcopy versus from computer screen, one with fixed and the other with selfregulated study-time. They demonstrated that no differences in test performance occurred under the fixed study time condition. Under the self-paced study condition, worse performance was observed on screen than on paper. Because it was impossible to manipulate the mediator in this study, I can make only limited assumptions about the causal chain of the indirect effect.
In contrast to many previous studies (for an overview see Brosius, 1995, pp. 36-37), the majority of the subjects answered the quiz largely correctly (see Table S3 in the Supplementary File)-a fact that may not only be traced back to guessing, but also to the difficulty level of the questions. It might not have been sufficiently exhaustive, which could be investigated in future studies applying item response theory (IRT) models. Moreover, exposure to the stimulus was forced in this study. Under natural circumstances, it would be feasible that recipients whose attitudes are opposed to the issue of the contribution would not even pay attention to it (e.g., Dylko et al., 2017). However, contact with dissonant information may happen incidentally due to social interaction, or intentionally to sharpen one's own argument (Festinger, 2001; J. K. Lee & E. Kim, 2017) and therefore is quite likely especially in today's media environment (Taddicken & Wolff, 2020). Actually, the challenging nature of dissonant information may make it all the more conspicuous and thought-provoking to recipients, so that they remember it just as much as or better than consonant information (Wicks, 1995).
Consequently, future research should consider the quality of exposure. As with all online experiments, I had no control over how intensively the participants received the respective contribution. With the formula "television is easy, print is tough", Salomon (1984) proposed that the processing of audio-visual stimuli is automated and therefore unconscious and non-reflective (and more emotional). Text is said to require and to foster higher processing energy than (audio-)visual materials (Eveland et al., 2002;Geise & Baden, 2015;Lang, 2006;Powell et al., 2015Powell et al., , 2018Powell et al., , 2019. However, to quote Grabe, Lang, and Zhao, "Television viewing, although it 'feels' simple, is in fact a complex and difficult cognitive task" (2003, p. 390).
The research evidence regarding scrollytelling is even more ambiguous: On the one side, scrollytelling's multimedia elements may cause sensory overload or cue distraction which can hinder information processing (Sundar, 2000). On the other side, interactivity such as scrolling through the story may enhance elaboration and learning (Xu & Sundar, 2016). Further, the narrative flow, ergo the consecutive presentation of text passages and video sequences in scrollytelling, may impede distractions, interferences or cognitive overload (Pincus, Wojcieszak, & Boomgarden, 2017; see also Lang, 2006). Future research should therefore take into account the degree of elaboration in the reception of (popular) scientific content. It seems plausible that elaboration cancels out the indirect effect of exposure time observed in the mediation analysis to a certain degree.
A further limitation that is typical in the context of online experiments is, besides a lack of sample representativeness and the capacity to address only short-term information recall rather than knowledge, the one stimulus only-procedure. Because effects are topic-related, they are difficult to generalize (Reeves, Yeykelis, & Cummings, 2016). Therefore, similar research dealing with other, potentially dissonance-provoking stimuli is necessary, especially since the levels of dissonant feelings were quite moderate and there may therefore have been a lack of variance in the moderation model. It cannot be ruled out that long-term effects of dissonance on knowledge may occur. Finally, yet importantly, increased attention in the experimental context can be assumed. Individuals are likely to multitask, especially when consuming audiovisual content (Eveland et al., 2002). In digital contexts, this even applies to reading (Baron, 2017).
Despite these constraints, in relation to audio, new formats like explainer videos and scrollytelling are promising media for imparting information. In terms of accommodating people with less developed reading literacy and information processing skills (Grabe, Kamhawi, & Yegiyan, 2009;Kleinnijenhuis, 1991), audio-visual formats can serve as a "knowledge leveler" (Neuman, 1976, p. 122; see also Hollander, 2014). Future experimental research should therefore further address the characteristics of audio-visual and hybrid formats that facilitate recalling information, such as subtitling (especially in the context of videos embedded in social media), and the optimal ratio of video and text passages.