Modality-Specific Effects of Perceptual Load in Multimedia Processing

Digital media are sensory-rich, multimodal, and often highly interactive. An extensive collection of theories and models within the field of media psychology assume the multimodal nature of media stimuli, yet there is current ambiguity as to the independent contributions of visual and auditory content to message complexity and to resource availability in the human processing system. In this article, we argue that explicating the concepts of perceptual and cognitive load can create progress toward a deeper understanding of modality-specific effects in media processing. In addition, we report findings from an experiment showing that perceptual load leads to modality-specific reductions in resource availability, whereas cognitive load leads to a modality-general reduction in resource availability. We conclude with a brief discussion regarding the critical importance of separating modality-specific forms of load in an increasingly multisensory media environment


Introduction
Using media is often a rich, multisensory experience. Video games, movies, and other digital environments contain numerous streams of audiovisual information that must be processed quickly and simultaneously Lang, 2000). As such, interacting with these media often requires that an individual engage both visual and auditory processing systems at the same time. Each of these systems contains spatial, temporal, and physical constraints that circumscribe the quantity and quality of information that an individual can effectively process (Buschman, Siegel, Roy, & Miller, 2011;Franconeri, Alvarez, & Cavanagh, 2013;Kahneman, 1973;Marois & Ivanoff, 2005).
Within the media psychology literature, these limitations are often referred to using the language of resources Lang, 2000Lang, , 2006. Humans dynamically allocate limited processing re-sources to encoding, storing, and retrieving information in an environment (Lang, Bradley, Cuthbert, & Simons, 1997;Lang, Sanders-Jackson, Wang, & Rubenking, 2013) and the resources that are allocated are required (consumed) at a rate commensurate with the complexity of the information. This resource allocation process has been shown to predict message processing outcomes like memory, enjoyment, and learning (for a recent review and meta-analysis, see Huskey, Wilcox, Clayton, & Keene, 2019). Because of this, understanding resource allocation is critical for understanding how individuals process multimedia messages as well as how alterations in resource allocation processes influence outcomes of interest.
An open question in media psychology research concerns the extent to which resource allocation depends on the modality in which information is presented. Theories and models in the literature frequently describe media as multimodal (recruiting both the visual and audi-tory processing systems, see e.g., Basil, 1994a;Geiger & Newhagen, 1993;Lang, 2000;, but there is a current lack of clarity as to the extent to which information presented in different modalities loads the processing system in different ways. Some models assume that processing resources exist in one central modality-independent pool (e.g., Lang, 2000Lang, , 2009. In this case, information presented within one modality should affect resource availability as measured in other modalities. Others, though, propose that at least some forms of media processing may draw from modality-specific resource pools (Basil, 1994a(Basil, , 1994b, meaning that information presented in one modality may not necessarily reduce resource availability as measured in another. Empirical investigations have long reported modality-specific effects associated with structural (e.g., brightness, contrast, information density, etc.) and content features (e.g., violence, morality, etc.), but a sizeable subset of these findings are conflicting or ambiguous between studies . This ambiguity limits our understanding regarding the specific effects of modality in multimedia processing.
In this article, we argue that a clearer explication of the concepts of perceptual and cognitive load can engender substantial progress toward understanding and predicting modality-specific message processes and effects, contributing to further precision in our understanding of the demanding nature of multimedia processing (Bowman, Banks, & Wasserman, 2018). Drawing from the cognitive neuroscience of attention, we provide a discussion of the nature of perceptual load and its relevance to media scholarship. We present a model based on processing hierarchies in the human brain predicting that the effects of perceptual load should be largely modalitydependent whereas the effects of cognitive load should be largely modality-independent. In addition, we report findings from an experiment providing strong support for this hypothesis. In this experiment, we manipulated cognitive load and (visual) perceptual load within a video game and measured how long it took participants to respond to a secondary task that was presented in either the visual or auditory modality. As predicted, perceptual load only influenced reaction times within the modality in which it was introduced whereas cognitive load influenced reaction times across both modalities. We conclude with a brief discussion regarding the critical importance of understanding modality-specific forms of load in an increasingly multisensory media environment.
A wealth of studies report differences between visual processing and auditory processing in a multimedia context (Basil, 1994a;Lang, 1995;Lang, Potter, & Bolls, 1999). In one study, Lang and colleagues (1999) show that message pacing (the number of camera changes within a given time window) and arousal differentially influence cued recall for visual and auditory content. As message pacing increases, memory for visual content (measured using a cued recall task) stays constant or increases for both arousing and calm messages. In contrast, memory for auditory content decreases as pacing increases in arousing messages. In other work, valence is shown to differentially influence visual and auditory processing (Lang, Newhagen, & Reeves, 1996;Newhagen & Reeves, 1992). This work shows that negative visual content tends to be better remembered than positive (Lang et al., 1996;Newhagen & Reeves, 1992). A recent study replicates this finding, but finds that the opposite pattern is true for auditory content-positive auditory content tends to be better remembered than negative (Keene & Lang, 2016). Taken together, these findings suggest that there may be important differences between visual and auditory resource allocation processes that are contingent on the nature of the content that is presented, but it remains unclear exactly what types of content may elicit differential resource availability across modalities and what types of content may influence both modalities in a similar way.
Another area of research concerns the effects of redundancy between auditory and visual channels in messages. Messages that are redundant across modalities (e.g., subtitles that match the words being spoken) seem to be better remembered than those that are non-redundant or conflicting across modalities (Drew & Grimes, 1987;Grimes, 1990;Lang, 1995;Wember, 1983). This effect is especially strong whenever visuals are highly attention-grabbing or emotional (Brosius, 1993). It has been suggested that nonredundancy between modalities (may) lead to cognitive overload, resulting in reduced memory for message content (Grimes, 1991). Whenever information is non-redundant or conflicting across modalities, auditory memory consistently suffers the most (Brosius, 1989;Drew & Grimes, 1987;Grimes, 1990Grimes, , 1991Lang et al., 1999). This effect has been highlighted as especially problematic in news and educational settings, as the presentation of complex or otherwise attention-grabbing visuals may interfere with encoding of the auditory track-typically the location of the bulk of important con-tent in both news and educational messages (Brosius, 1989(Brosius, , 1993Grabe et al., 2003;Thorson & Lang, 1992;Wember, 1983).
One conclusion that has been drawn from this work is that visual processing is more automatic (e.g., less resource-intensive) than auditory processing, making it more robust in the face of increasing message complexity and leaving more resources available in the visual channel than the auditory channel as complexity increases (Lang et al., 1999). Ultimately though, the interpretability of these data regarding resource availability at encoding is limited. This is primarily due to the fact that each of these studies tested memory using cued recall tasks (e.g., multiple-choice questions or factrecognition). These measures are more accurately connected to information storage and retrieval of information (not encoding Lang, 2009). Another important factor may be the generally semantic nature of auditory content in messages, requiring individuals to engage in much more processing for the same memory outcome. As such, it is clear that in order to understand the modality-specific effects of message content on resource availability at encoding, researchers must use more direct measures of resource availability at encoding than cued-recall measures. These include secondary task reaction times (STRTs; Lang & Basil, 1998;Lang, Bradley, Park, Shin, & Chung, 2006), or encoding measures such as forced-choice audiovisual recognition tasks (see e.g., Keene & Lang, 2016;Lang et al., 2015;Yegiyan, 2015).
Only a small number of studies in the communication literature to date meet these criteria. In a pair of early studies, Thorson et al. (1985Thorson et al. ( , 1987) manipulated visual and auditory complexity while participants viewed (and/or listened to the audio tracks from) television messages and responded to a visual or auditory STRT probe. This study reported that visual and auditory STRTs depend on both: a) the modality in which message content is introduced, and b) the modality in which STRTs are measured. If it were the case that visual and auditory resources draw from the same pool, visual or auditory message complexity would be expected to influence STRTs irrespective of the modality in which information is presented or the modality in which the STRT is measured. In finding modality-specific effects, these studies cast doubt on the idea that capacity limitations in a central resource pool solely determine message processing performance across modalities, and provide initial support for the idea that visual and auditory resources may be at least partially separable.
In another study, Basil (1994b) manipulated the modality in which information was presented and the modality in which the STRT probe occurred. This study found that visual probes were responded to more quickly overall than auditory probes and that STRTs were faster in both modalities whenever the bulk of message information was contained in the auditory modality. This study provides unclear support for either a modality-specific or a modality-general resource pool. A final study-although not directly measuring resource availability-investigated visual task performance in the presence of high-and low-imagery audio tracks, finding that listening to an audio track only interferes with visual task performance whenever the audio track is high in imagery (presumably loading visual processing resources; Bolls, 2002). This provides preliminary support for the idea that processing resources may be separable by modality-at least to the extent that processes within one modality do not require processing in the other modality.
Since this spate of early studies, no further evidence within communication research has demonstrated the modality-specific effects of message complexity on resource availability. As such, the picture is still quite unclear as to whether different forms of visual and auditory complexity in media content require modality-specific or modality-independent processing resources. Early theorizing regarding the modality-specific effects of message complexity hints at the idea that perceptual processes may be differentiable from "meaning-level" (cognitive) processes (see e.g., Thorson et al., 1987) and that each may affect processing in different ways. In later work, though, this idea was largely abandonedpossibly due to the aforementioned ambiguity in findings regarding the independent influence of these processes. Although more than 30 years have passed since this initial work, recent developments in communication and cognate fields both allow for and necessitate the re-opening of this question. Most pressingly, mounting evidence from neuroscience research suggests that perceptual load has largely modality-dependent effects whereas the effects of cognitive load are largely modalitygeneral (Hasson, Chen, & Honey, 2015;Murray et al., 2014;Regev et al., 2018;Wahn & König, 2017). Thus, it is likely that currently ambiguous findings regarding modality differences, redundancy, and modality-specific resource availability can be resolved when considering the relative contributions of perceptual and cognitive load to the overall complexity of a message.
In order to understand and make predictions regarding the unique role of perceptual operations in multimedia processing, it is necessary to briefly review the anatomy and physiology of the sensory processing pathways in the brain and body and to discuss the neural mechanisms of perception (space does not permit an indepth discussion of these pathways and processes; interested readers are encouraged to consult Bear, Connors, & Paradiso, 2015, or Woolsey, Hanaway, & Mokhtar, 2017, for detailed treatments of these important topics). Audition and vision involve converting variations in air pressure (sound waves) and light (electromagnetic waves) into neural signals. For audition, this conversion process takes place in a collection of specialized organs in the inner ear that encode the intensity and frequency of sound stimuli into temporal and spatial patterns of neural firing. Different receptors fire in response to different frequencies and intensities. Firing patterns are transmitted along a series of pathways from the inner ear to the auditory cortex within the temporal lobes of the human brain. The auditory cortex-like all cortical regions-is arranged in layers from the interior of the brain to the exterior. A neural signal arrives from the inner ear at the deepest layer, and it is further processed in each successive layer as it moves to the outer layer of the cortex (Nelken & Bar-Yosef, 2008).
For vision, this conversion process takes place in the retina. Specialized receptor cells in the retina detect light, dark, color, and other visual features, such as borders between light and dark areas of the visual field. This information is transmitted through the optic nerves and along a series of pathways to the visual cortex within the occipital lobes of the brain. Different cells in the visual cortex are specialized for detecting different visual features (e.g., orientation, motion, color, shape). The visual cortex, like the auditory cortex, is arranged in hierarchical layers. These layers allow incoming information to be integrated across short timescales (tens to hundreds of milliseconds) in order to form identifiable objects, entities, and events that can be used to guide behavior (Hasson et al., 2015). Within a mediated environment, these perceptual operations could be as varied as identifying a new item that has appeared within the environment, differentiating a target item from a sea of similar stimuli in a visual or auditory stream, or processing the contents of a new scene introduced by a structural feature. The amount of processing resources that these basic operations require can be thought of as perceptual load.
Perceptual load increases whenever more items need to be identified in the sensory field, or when the number (or difficulty) of perceptual operations required to identify target items increases. These perceptual operations are myriad, but the most commonly investigated are filtering, de-distortion, mental rotation, perspective changing, individuation (recognizing unique features of a target object), and integration of disparate features into a coherent whole (Elliott & Giesbrecht, 2010;Fitousi & Wenger, 2011;Murphy et al., 2016). Some of these operations are only meaningful for visual processing (such as rotation, and perspective changing), whereas others can occur in either the visual or auditory processing streams (e.g., individuation, integration). Although there is still active debate, a formidable body of evidence suggests that the brain perceptually processes all items in the sensory field provided there are enough resources to do so (Lavie, 1995;Lavie et al., 2004). As resources begin to be exhausted (e.g., perceptual load increases), fewer and fewer items in the sensory field are able to be processed at any given time, reducing the influence of peripheral stimuli (Murphy et al., 2016). As perceptual load increases in a given channel, orienting responses attenuate to additional stimuli in that channel (Cosman & Vecera, 2009, 2010a, 2010bSantangelo & Spence, 2008) and interference effects of irrelevant stimuli are reduced (Forster & Lavie, 2008;Fu et al., 2009).
If it is the case that "resources" as traditionally discussed are related to temporal, spatial, and physical constraints within brain regions and their connecting pathways Franconeri et al., 2013;Marois & Ivanoff, 2005), the fact that visual and auditory processing take place in largely separate regions and along parallel pathways strongly suggests that each may have their own "pool" of processing resources that is at least in some ways unaffected by activity in the other modality. In this framework, it is assumed that a process shares resources to the extent that it recruits brain regions and pathways that overlap with those recruited by another process (Franconeri et al., 2013). In view of these findings, we predict that an induction of perceptual load in a given modality will reduce resource availability only within the modality in which it is introduced, slowing responses to a secondary task in the same modality, but not responses in another modality.

Perceptual Load versus Cognitive Load
To review, perceptual processing involves the filtering, detection, and integration of object features from the sensory environment. These processes take place over short timescales (i.e., within milliseconds), are predominantly stimulus-driven, and are largely outside of conscious awareness and control. In contrast, cognitive processing involves operations such as goal-directed control of attention, sense-making/learning, and maintenance of relevant items in working memory (Lavie, 2010). Cognitive processes integrate and maintain information over much longer timescales than perceptual processes (Murray et al., 2014). These processes are interrupted whenever meaning-level information is nonsensical or scrambled, such as when scenes in a movie are presented in an incorrect sequence (Aly, Chen, Turk-Browne, & Hasson, 2018;Baldassano et al., 2017). The amount of processing resources that cognitive operations require can be thought of as cognitive load. Cognitive load increases in relation to two primary factors: 1) an increase in the amount of information that must be held in working memory; 2) an increase in the unfamiliarity, ambiguity, uncertainty, or error-proneness of this information. Cognitive load has primarily been manipulated using simple working memory tasks, such as requiring a participant to hold a string of numbers or letters in working memory or remember items previously seen in a task, but it can also be manipulated by increasing the conceptual complexity of a message or a task (e.g., introducing unfamiliar concepts or more ambiguous rules; Lavie, 2010).
Cognitive load is perceived as intrinsically effortful (Westbrook & Braver, 2015) and individuals (usually) seek to minimize it (Inzlicht, Shenhav, & Olivola, 2018;Kool, McGuire, Rosen, & Botvinick, 2010), although increased motivation seems to lead to increased willingness to expend cognitive effort (Botvinick & Braver, 2015;Huskey, Craighead, Miller, & Weber, 2018;Locke & Braver, 2008). Recall that higher perceptual load tends to lead to reduced processing of task-irrelevant stimuli. In contrast, higher cognitive load is often associated with increases in behavioral and neural indicators of task-irrelevant stimulus processing (Fitousi & Wenger, 2011;Kelley & Lavie, 2011;Lavie, 2005). As cognitive load increases, it becomes more likely that a task-irrelevant stimulus will interfere with performance on a primary task and that it will be encoded into memory (Lavie, 2005). This effect is especially pronounced in individuals with cognitive processing difficulties such as ADHD (Forster & Lavie, 2008Forster, Robertson, Jennings, Asherson, & Lavie, 2014). Within a media task, cognitive load could correspond to things like learning the rules of a complex game (Bowman et al., 2018), reconciling conflicting information in a narrative (Yarkoni, Speer, & Zacks, 2008;Zacks & Magliano, 2011), or learning new items that must be remembered (Mayer, 2014;Moreno & Mayer, 1999).
Perceptual processing-related activity within sensory regions and pathways is highly correlated during stimulus processing both within and between subjects, but this activity is largely uncorrelated with activity in other brain regions in the same subjects (such as those used to process sensory information from other modalities; Godwin, Barry, & Marois, 2015). As cognitive processing increases, these modality-specific networks become integrated with one another and with other large-scale neuronal networks in a "global workspace" network distributed across the whole brain (Hearne, Cocchi, Zalesky, & Mattingley, 2017;Kitzbichler, Henson, Smith, Nathan, & Bullmore, 2011;Shine & Poldrack, 2018). The extent to which these networks become integrated during cognitive processing is a predictor of performance (Finc et al., 2017). Thus, increases in cognitive processing requirements (cognitive load) should lead to modality-general effects on resource availability and indicators of processing performance (such as memory and learning). Behavioral and neuroscientific findings provide support for this idea, reporting that (provided cognitive load is kept constant) the effects of perceptual load are largely modality-specific whereas effects of cognitive load seem to not depend on the modality in which complexity is introduced or performance is measured (Duncan et al., 1997;Keitel, Maess, Schröger, & Müller, 2013;Sandhu & Dyson, 2016;Wahn & König, 2017).
To date, these findings are largely constrained to nonnaturalistic working memory tasks and highly controlled stimuli, but emerging evidence suggests that they may be generalizable to a multimedia context (Wang & Duff, 2016). A recent study using inter-subject correlations of brain imaging data reported that perceptual processing of auditory and visual narratives recruited modalityspecific processing networks, but that cognitive processing (conscious attending and sense-making) was associated with activation patterns that spread across modalities and into higher-order processing networks (Regev et al., 2018). Thus, it could be expected that the extent to which complexity in one modality interferes with resource availability in the other is contingent upon the extent to which the complexity is cognitive (as opposed to merely perceptual) in nature. With these things in mind, we predict that perceptual load should reduce resource availability in a modality-specific fashion whereas cognitive load should reduce resource availability in both modalities (these hypotheses, along with an initial experimental design and analysis plan, are pre-registered. Preregistration, as well as all code and data, can be accessed at https://osf.io/as2u5).

General Overview
An experiment was conducted in which participants played 30 minutes of a specially designed experimental video game stimulus (see below). Participants played the game under conditions of cognitive and (visual) perceptual load, and resource availability was measured in both the visual and the auditory modality. All frequentist data analysis was conducted using linear mixed-effects models in R (R Core Team, 2013), and all non-frequentist, Bayes factor analysis was conducted using the BayesFactor package in R (https://CRAN.Rproject.org/package=BayesFactor).

Subjects
101 participants were recruited from the undergraduate research pool at a large western university (N male = 44, N female = 57, M age = 20.06). Before data collection, a power analysis was conducted using the simr package in R (Green & MacLeod, 2016) in order to determine sufficient sample size. This analysis revealed that 60 subjects was sufficient for 80% power given the size of previous effects using a similar manipulation. Thirteen participants were excluded due to equipment failure, or due to their non-compliance with experimental protocol, leaving a final N of 88 for the analyses reported herein.

Stimulus
The stimulus for this experiment was Asteroid Impact (https://github.com/medianeuroscience/asteroid_impact), an open-source video game developed in Python. Asteroid Impact allows for fine-grained experimental control over gameplay variables as well as highresolution data logging. The object of the game is to pilot a spaceship around the screen to collect valuable crystals while avoiding asteroids. Asteroid Impact adapts to the skill level of the subject, gradually increasing in difficulty as the subject successfully collects crystals and decreasing in difficulty as the subject fails to avoid asteroids. The base size, frequency, and speed of all in-game sprites were held constant across all conditions.

Procedure
Participants were invited one at a time into a computer lab containing ten cubicles, each with one Dell computer with a 1600 × 900 monitor (60 Hz refresh rate). A researcher guided the participants to a computer and gave them a consent form containing relevant information regarding the study design. After signing the consent form, participants viewed the instruction screen for the experiment (see Figure 1a, 1b). Roughly half of the participants were assigned to the visual STRT condition and half of the participants were assigned to the auditory STRT condition, but all participants underwent the same cognitive and perceptual load conditions (for a visual depiction of the experimental design see Figure 2). Before beginning gameplay, participants listened to a brief script read by the researcher reminding them that their primary task would be to collect as many crystals as they could while avoiding asteroids and that their secondary task would be to press the space bar when they either saw the star or heard the tone. Participants were then instructed to put on their headphones and begin the experiment. Audio was presented through Bose QuietComfort 15 headphones with computer volume set to 15/50.
Participants played a one-minute practice round of Asteroid Impact followed by six five-minute rounds of gameplay. Two rounds contained no load manipulation, two rounds contained the perceptual load manipulation, and two rounds contained the cognitive load manipulation. Following the practice round, each subsequent round was presented in random order. Instruction screens before each round alerted participants of the different gameplay conditions without revealing core hypotheses of the study (see Figure 1d, 1e). Upon completing gameplay, participants filled out a brief survey to assess individual cognitive differences and media use habits. These data were not analyzed in this study (as a part of this survey, participants were asked to rate their own video game skill on a scale from 1 to 7. In this sample, mean video game skill was 3.87 [SD = 1.76]. This and other participant-level data are available in the OSF repository for this project).

a) b) c)
d) e) f) Figure 1. Depiction of the main screens in the game environment. a) Instruction screen for the visual modality condition; b) instruction screen for the auditory modality condition, c) depiction of the cognitive load manipulation, d) depiction of the instruction screen preceding the cognitive load manipulation, e) depiction of the instruction screen preceding the perceptual load manipulation, f) depiction of the perceptual load manipulation. Under perceptual load, randomly generated Mondrian-type squares were overlaid onto the screen at 80% opacity, rendering game elements much more difficult to see. In the cognitive load condition, collection of two of the same-colored crystal in a row would cause a loss of 1000 points (equivalent to ten crystals).

Manipulating Perceptual Load
Although measures have been proposed for overall message complexity (such as ii; Lang et al., 2006), a specific measure of perceptual load in a multimedia environment does not currently exist. As such, recent work in the field employs direct manipulations of perceptual load. This has been done in several ways: by introducing a sensory-rich stimulus as opposed to a relatively sparse one (Stróżak & Francuz, 2017), by increasing the number of items in the visual field (Wang & Duff, 2016) and by reducing contrast between foreground and background items (Fisher, Hopp, & Weber, 2018. In this experiment, we manipulated perceptual load in the visual channel using a well-validated manipulation from visual perception research involving randomly regenerating shapes and colors (Bahrami, Carmel, Walsh, Rees, & Lavie, 2008;Hesselmann, Hebart, & Malach, 2011;Lavie, Lin, Zokaei, & Thoma, 2009). To induce perceptual load, we added semi-transparent visual overlay consisting of Mondrian-like rectangles of varying colors and sizes that changed locations and colors at a random time point within each ten-second period of gameplay (see Figure 1). This manipulation was chosen in that it was: a) as tightly controlled as possible, not introducing potential confounds with cognitive load, and b) easily integrated into the narrative of the game. An instruction screen presented before perceptual load levels alerted participants that their "spaceship display is damaged" and that the following level may be more difficult to see.

Manipulating Cognitive Load
As with perceptual load, a direct measure of cognitive load does not currently exist for selecting multimedia stimuli, so it must be directly introduced through "modding" the video game or message stimulus (Elson & Quandt, 2016). Previous work has manipulated cognitive load in video games by increasing the number of items in a matching task (Wang & Duff, 2016) and by introducing a 1-back memory maintenance component into the game (Fisher, Hopp, & Weber, 2018;Fisher et al., 2019). In this experiment, we manipulated cognitive load using the 1-back maintenance task outlined in Fisher, Hopp, and Weber (2018). This manipulation has been shown to be perceived as cognitively difficult, and to elicit activation in working memory-related brain regions (Eriksson, Vogel, Lansner, Bergström, & Nyberg, 2015;Veltman, Rombouts, & Dolan, 2003). This manipulation is similar in many ways to the "n-back" (Owen, McMillan, Laird, & Bullmore, 2005), a very widely used working memory manipulation in cognitive neuroscience research. An instruction screen presented before the cognitive load levels alerted participants that "in this level, some of the crystals are dangerous" and that they are no longer allowed to collect two subsequent crystals of the same color. If two crystals of the same color were collected in sequence, a short, negative "buzzer" sound played, and the participant's in-game score dropped by 1000 points. This task required participants to maintain the identity of the most recently collected crystal in their working memory, and to continually update this information as new crystals were collected.

Measuring Resource Availability
The primary dependent variable of interest in this experiment is resource availability. Previous work has demonstrated that the STRT is a reliable indicator of resources available at encoding provided that the participant does not enter cognitive overload (see e.g., Lang, 2006;Lang & Basil, 1998). In a typical STRT paradigm, participants are told that they will be responding to a secondary task while completing the primary task (in this case, video game play). Most commonly, participants are asked to press a button upon seeing a flash or hearing a tone.
Participants are instructed to concentrate on the primary task, but to respond to the secondary task probe as quickly as they can. Previous work demonstrates that reward can modulate STRTs (Fisher et al., 2019). As such, responding to the secondary task was worth the same amount of points across all conditions in the game. The modality of the STRT probe was manipulated between participants. In the visual STRT condition, the secondary task prompt was a white star that appeared in a random location on the screen. In the auditory STRT condition, the secondary task prompt was a 400Hz tone. Following the good-practice recommendations in Whelan (2008), we conducted three preprocessing steps on the reaction time data. First, any reaction times less than 100 msec were discarded, along with reaction prompts that were missed entirely. After this, reaction times were filtered to remove any values that were more than three standard deviations away from the mean within participants and conditions. These filtering steps removed an average of 13.1 reaction times per participant (out of a total of 180). Finally, the remaining reaction times were log transformed.

Results
Based on our theoretical model regarding the modalityspecific effects of perceptual load, we expected that the visual perceptual load induction would influence STRTs in the visual modality but not in the auditory modality. Under high perceptual load, visual STRTs should be slower but auditory STRTs should be similar under both high and low load. In contrast, we expected that cognitive load would lead to slower STRTs regardless of the modality in which the STRT was measured.

Main Effects of Load and Modality
Previous work has shown that cognitive load robustly influences resource availability such that resource availability drops (STRTs lengthen) as cognitive load increases (pending that participants remain focused on the primary task, see e.g., Fisher et al., 2019;Fox, Park, & Lang, 2007). All predictions were tested using linear mixedeffects model fit using the lme4 package in R (Bates, Mächler, Bolker, & Walker, 2015). Cognitive load, perceptual load, and modality were treated as fixed effects, and were coded using effects coding. The dependent variable (log STRTs) was z-transformed before data analysis. Random intercepts and slopes were included for participants and for cognitive load and perceptual load nested within modality condition. All reported betas are standardized. For a visual depiction of the contrasts employed in this experiment please see

Interactions Between Modality and Load
Based on our model, it was expected that perceptual load and modality would interact such that under high (visual) perceptual load, STRTs in response to the visual probe would become slower whereas STRTs in response to the auditory probe would remain about the same. As cognitive load is predicted to influence resource availability in a modality-independent fashion, no interaction effect was predicted for cognitive load and modality. Cognitive load, perceptual load, modality, and the interaction between cognitive/perceptual load and modality were treated as fixed effects and were coded using effects coding. Random intercepts and slopes were included for participants and for cognitive/perceptual load nested within modality condition. This analysis revealed an interaction between perceptual load and modality-(87) = −.08, 95% CI [−.099, −.056], 2 = 0.028, p < .001-such that the difference in STRT times between high and low perceptual load was greater in the visual STRT condition (ΔM = 281.77) than in the auditory STRT condition (ΔM = 9.26, see Figure 4). There was a significant difference in STRTs between high and  Figure 4. Interaction effects between perceptual load and modality (top) and between cognitive load and modality (bottom). There was an interaction between perceptual load and modality. The difference in STRTs between high and low perceptual load was 281.77 msec in the visual STRT condition and was 9.26 msec in the auditory STRT condition. There was not a significant interaction between cognitive load and modality. Error bars represent 95% confidence intervals.
low perceptual load in the visual condition-t(51) = 9.95, p < .001-but not in the auditory condition (p = .401). As predicted, there was no interaction effect between cognitive load and modality (p = .316).

Non-Frequentist Analysis
A follow-up Bayes factor analysis was conducted in order to further ascertain the evidence in favor of the interaction we observed between perceptual load and modality in this experiment (Rouder, Morey, Verhagen, Swagman, & Wagenmakers, 2017) and against the presence of an interaction between cognitive load and modality. Using the model described above, we obtain a BF 10 of 2.06 × 10 4 (± 5.30%) compared to a model containing the main effects and random effects only. Regarding an interaction between cognitive load and modality, an observation of no significant difference between experimental conditions is not necessarily evidence that there is no difference between the two conditions (Weber & Popova, 2012). As such, we subjected the model containing an interaction between cognitive load and modality to a Bayesian analysis as well. When comparing the model containing the cognitive load × modality interaction to a model containing only main effects and random effects, we obtain a BF 10 of = .12 to 1 (± 5.44%), indicating that the interaction model is about eight times less likely given the data than is the main effects model alone.

Discussion
In this experiment, we manipulated visual perceptual load while measuring STRTs (an indicator of resource availability) in both the visual and auditory channel. Our data suggest that perceptual load influences resource availability only within the modality in which it was introduced whereas cognitive load influences resource availability irrespective of modality. Framing these findings in the language of Holbert and Park (2019), perceptual load's influence on resource availability is contingent upon modality whereas cognitive load's influence is not. When resource availability was measured in the visual channel, increased visual perceptual load lengthened reaction times, but when resource availability was measured in the auditory channel, increased perceptual load did not lead to a significant increase in reaction times. Bayesian analyses revealed that the relative likelihood of the perceptual load × modality model compared to a main effects model is about 20,000 to one (given the data observed herein). This provides strong evidence for the modality-specificity of perceptual resources during multimedia processing. Our data also reveal substantial main effects of both cognitive load and modality on resource availability, as well as a smaller main effect of perceptual load. These results replicate a spate of findings in cognitive and media psychology showing that cognitive load robustly influences resource availability during media processing (Fisher et al., 2019;Huskey et al., 2018;Lang et al., 2006). It is worth noting that the main effect of modality observed in this study is in contrast with the findings of Basil (1994a), who showed that reaction times to auditory probes were slower overall than reactions to visual probes (although see Thorson et al., 1985). We hesitate to interpret this finding as supporting the idea that auditory responses are faster in general than are visual responses, as it is possible that the main effect of load could simply be reflective of the baseline visual perceptual load in the task or due to the active nature of our task (playing a video game versus passively viewing a message). Future work should investigate how the main effect of modality varies as a function of baseline perceptual load or the interactivity of the media stimulus.
These findings support the predictions of the revised version of the LC4MP put forth by  and extend beyond the predictions of the original LC4MP. In the original LC4MP (Lang, 2000(Lang, , 2006, resource capacity limitations were treated as purely conceptual, having no particular basis in the structure or function of the brain. In this model, it was simply proposed that increased complexity within a media environment should be associated with reduced resource availability (pending that resource allocation is held constant). This conceptualization, although allowing for remarkable advancements in our understanding of message processing, did not provide a framework for more specific predictions regarding how resource availability may vary based on modality or process type. In the updated model , capacity limitations on information processing are given a biological basis. It is proposed that these capacity limitations are related to spatial, chemical, or temporal constraints on neural firing present within large-scale semi-specialized brain networks. As such, the architecture of these networks places bounds on the sorts of processes that can happen concurrently. This updated assumption leads to the prediction that increased complexity within a message will influence resource availability in different ways depending on whether the complexity is perceptual or cognitive, and upon the modality in which the information is introduced. Increased perceptual load in a media environment should reduce resource availability primarily within the modality in which the load was introduced (visual or auditory), having minimal influence on resource availability in another modality. In contrast, cognitive load should influence resource availability in visual and auditory channels in a roughly equivalent fashion.
In addition, these results suggest a general note of caution for researchers who use STRTs as an indicator of resource availability during media processingespecially when a message is high in perceptual load. If perceptual load is high in a given modality, but STRTs are measured in another modality, it is likely that the effects of load on resource availability will not be captured by the STRT task. Likewise, these data suggest that slow STRTs observed in one modality alone are not necessar-ily an indicator that cognitive load in the media stimulus is high, given that perceptual load can reduce resource availability within one modality even in the absence of increased cognitive load. With this in mind, future research using rich audiovisual stimuli should ensure that the STRT probe is either: a) within the primary modality in which load is introduced; b) alternatingly presented in each modality and treated as a combined index across modalities; or c) if modality-specific questions are not of interest, present in both modalities (e.g., a flash accompanied by a tone).
In directly manipulating perceptual and cognitive load in line with validated procedures developed in neuroscience research, this work pushes beyond what has previously been tested within this context, allowing for more precise investigation of the roles of modality, cognitive load, and perceptual load in media processing. This does not mean that this approach is not without limitations that may circumscribe the generalizability of the observed findings. One primary limitation of the approach outlined herein is that Asteroid Impact is predominantly visual in nature (i.e., auditory cues are mostly irrelevant for successful gameplay). As such, effective manipulation of auditory perceptual load in the game environment is not currently feasible. It is possible that an induction of auditory perceptual load may have different modalityspecific effects than did the visual perceptual load manipulation employed here. Future work should investigate the modality-specific effects of load for tasks that are primarily auditory, such as listening to narratives or radio shows, to ascertain how an induction of auditory perceptual load influences resource availability and processing performance within and between modalities.
Second, this experiment only contained two levels of cognitive and perceptual load (low versus high), limiting the utility of this work for understanding the parametric relationship between cognitive/perceptual load and resource availability. Extant research suggests that the parametric relationship between cognitive load and attentional resource allocation is curvilinear rather than linear (Weber, Alicea, Huskey, & Mathiak, 2018). Future work should manipulate cognitive/perceptual load along a continuum to paint a fuller picture of how load influences processing in naturalistic tasks.
Finally, although a "modding" approach (Elson & Quandt, 2016) allows researchers to circumvent many of the shortcomings inherent in single message designs, it is still the case that Asteroid Impact is only one video game, and that it is rather rudimentary compared to "state of the art" video games widely available today. As such, Asteroid Impact, just like any other media stimulus, contains myriad uncontrollable idiosyncrasies that may limit the generalizability of the effects observed here. Future work should seek to manipulate cognitive and perceptual load in novel ways and in novel contexts (such as a different genre of game or a non-interactive form of media) in order to ascertain the robustness of the model proposed herein.

Modality and Interactive Media
The role of modality in multimedia processing and other naturalistic tasks, although a long-neglected question, is perhaps of greater importance now than it has ever been. Three recent developments in the multimedia landscape have highlighted the importance of understanding when and why information loads modality-independent cognitive resources and when it loads modality-specific ones.
First, video games and other interactive media are becoming increasingly multisensory, employing rich crossand inter-modal stimuli in order to build more engaging and immersive worlds. This is especially true in emerging virtual reality and augmented reality systems. We know that enjoyment and performance in video games is contingent on the demands that the game places on the human processing system (Sherry, 2004). These demands can be cognitive, emotional, physical, or social (Bowman et al., 2018). Despite the clear diversity in the sorts of demands that video games and other interactive media can place on individuals, the bulk of research in this domain has considered these demands in a non-specific sense, using terms such as "cognitive load" as a catch-all for the myriad ways in which these media may generate demand (Bowman et al., 2018). The data presented in this article suggest that perceptual demands may have their own role to play in influencing experiences with interactive media. Indeed, perceptual persuasiveness-the extent to which the sensory experience of a video game is rich and immersive-has been shown to be a robust predictor of game enjoyment (Weber, Behr, & DeMartino, 2014). Video games that more effectively manage modalityspecific and modality-general resources are likely to be more immersive, and therefore enjoyable. Further research in this area is critical for the development of more effective games aimed at cognitive rehabilitation and optimization, as it has been shown that perceptually immersive and enjoyable games more effectively elicit neuroplastic changes in key neural substrates (Bavelier, Levi, Li, Dan, & Hensch, 2010;Kamke et al., 2012) and are more likely to lead to treatment compliance (Kofler et al., 2018).
Second, driven by technological advancements and changes in media use habits, individuals increasingly multitask within and between perceptual modalities using digital technology (Rideout, Foehr, & Roberts, 2010). Individuals multitask upwards of 92% of the time when using certain forms of media (Deloitte, 2015), and switch between streams up to 2.5 times per minute (Brasel & Gips, 2017). If it is the case that cognitive load and perceptual load differentially influence the salience of stimulus cues that are external to the task at hand, then it could be expected that individuals' within-and between-device multitasking behaviors would likely be predicted by the relative cognitive and perceptual load within each concurrently attended information stream. In fact, a large body of literature from the cognitive neuroscience of perception shows that the salience of pe-ripheral sensory cues is contingent upon the perceptual load present in the primary task (Lavie, 2010;Lavie et al., 2004). With this in mind, it could be suggested that an understanding of the modality-dependence of perceptual load may lead to increased predictive accuracy regarding media multitasking behavior, and a better understanding of when and why individuals choose to switch within and between mediated tasks.
Finally, multisensory digital interfaces are increasingly being incorporated into complex activities like driving a car (Spence & Ho, 2008), controlling robots (Martinez-Hernandez, Boorman, & Prescott, 2017), conducting surgery (Chen et al., 2015), and many other domains. These tasks are often time-critical, requiring quick and accurate reactions to multiple stimuli in quick succession. Overload or inappropriate attentional patterns in one or multiple modalities during these tasks is likely to increase risk of injury or fatality. As such, optimal presentation of multisensory cues in these tasks in view of modality-specific resource limitations is necessary for ensuring safety and efficiency of these interfaces.

Closing Remarks
In this article, we have provided a re-introduction and refinement of the concept of perceptual load into message processes and effects research. Drawing on extant work in the neuroscience of sensation and perception, we further explicated and showed support for the clarifications and extensions to the model outlined in . Results from the experiment reported herein provide clear support for these predictions and suggest that progress toward resolving current ambiguities and inconsistencies regarding message complexity and modality can be found when considering the independent contributions of perceptual and cognitive load.
Results from this experiment also suggest that recently-developed frameworks conceptualizing the demand landscape within video games (see e.g., Bowman et al., 2018) would perhaps benefit from a specific consideration of perceptual load as separable from cognitive load-at least whenever the modality in which information is provided is an area of interest. It is likely that a more granular consideration of which components of a game are merely perceptual (orienting, filtering, dedistortion, etc.) and which components involve higherorder cognitive processes like working memory and cognitive control will increase the utility of these frameworks for understanding the various demands that video game play places on the human processing system.
In summary, this study highlights the importance of perceptual load for understanding how information presented in the visual or auditory modality may lead to modality-specific or modality-general effects on resource availability. In a rapidly changing multimedia environment, characterized by increasingly multimodal stimuli, it has become even more critical that media psychology researchers develop an understanding of how these processes work in order to contribute to the design of digital messages and tools that are immersive and engaging, but also that reduce unnecessary load on the cognitive and perceptual systems, facilitating safer and more effective media and media use behaviors.
Frederic René Hopp is a MA/PhD Student in the Department of Communication at UCSB. Broadly speaking, his research explores media processes and effects from cognitive and neuroscientific perspectives. Specifically, his current research focuses on neural responses to morally-laden media content to predict real-world outcomes, such as media preferences or political judgment and decision making. Before attending UCSB, Frederic held several research assistant positions at the University of Mannheim, where he investigated the effects of cyberostracism in social media environments and the role of entertainment experiences for the processing of political talk shows. Frederic holds a BA in Media and Communication Studies with a minor in Political Science from the University of Mannheim.
René Weber received his PhD (Dr.rer.nat.) in Psychology from the University of Technology in Berlin, Germany, and his M.D. (Dr.rer.medic.) in Psychiatry and Cognitive Neuroscience from the RWTH University in Aachen, Germany. He is a Professor in the Department of Communication at the University of California in Santa Barbara and director of UCSB's Media Neuroscience Lab (https://medianeuroscience.org). He was the first media psychology scholar to regularly use neuroimaging technology to investigate various media effects, from the impact of violence in video games to flow experiences, attention disorders, and the effectiveness of anti-drug PSAs. He has published four books and more than 120 journal articles and book chapters. His research has been supported by grants from national scientific foundations in the United States and Germany, as well as through private philanthropies and industry contracts. He is a Fellow of the International Communication Association.