What is important when we evaluate movies? : insights from computational analysis of online reviews

The question of what is important when we evaluate movies is crucial for understanding how lay audiences experience and evaluate entertainment products such as films. In line with this, subjective movie evaluation criteria (SMEC) have been conceptualized as mental representations of important attitudes toward specific film features. Based on exploratory and confirmatory factor analyses of self-report data from online surveys, previous research has found and validated eight dimensions. Given the large-scale evaluative information that is available in online users’ comments in movie databases, it seems likely that what online users write about movies may enrich our knowledge about SMEC. As a first fully exploratory attempt, drawing on an open-source dataset including movie reviews from IMDb, we estimated a correlated topic model to explore the underlying topics of those reviews. In 35,136 online movie reviews, the most prevalent topics tapped into three major categories—Hedonism, Actors’ Performance, and Narrative—and indicated what reviewers mostly wrote about. Although a qualitative analysis of the reviews revealed that users mention certain SMEC, results of the topic model covered only two SMEC: Story Innovation and Light-heartedness. Implications for SMEC and entertainment research are discussed.


Introduction
When Louis Leon Thurstone (1930) developed A Scale for Measuring Attitude Toward the Movies in the context of the Payne Fund Studies (1929)(1930)(1931)(1932), it was one of the first attempts to measure interindividually different attitudes in movie effects research. Back then, social scientists saw movies as a social problem, in particular, for child and youth development (cf. Wartella & Reeves, 1985). Nowadays, the entertainment and film industry are booming (Hennig-Thurau & Houston, 2019). More than ever before, communication scholars dedi-cate themselves to learning about how watching movies influences entertainment experiences, what (positive) consequences follow, or how predispositions towards entertainment media shape movie selection and use (Raney & Bryant, 2020). Here, subjective movie evaluation criteria (SMEC) play a crucial role in evaluating movies before, during, and after exposure (Schneider, Welzenbach-Vogel, Gleich, & Bartsch, in press) and can help to predict specific individual movie evaluations (Schneider, 2012a). SMEC are conceptualized as "mental representations of important attitudes towards specific film features" (Schneider, 2017, p. 71). To mea-sure SMEC and address the question of what is important when viewers evaluate a movie, the SMEC scales have been developed and validated (Schneider, 2012a(Schneider, , 2017. This, however, has been largely based on factoranalytical examination of self-report data. As subjective criteria may best predict subjective choices, processing, and effects, such a methodological approach makes good sense. Nevertheless, support for the construct validity of SMEC could be strengthened if distinct methodological approaches arrive at similar conclusions from different angles. Moreover, it might be interesting to learn from viewers' written evaluative responses to movies in a more natural environment, aside from a scientific setting (e.g., an online survey). Such an approach may be more unobtrusive and less prone to issues with mental accessibility or social desirability. The tremendous opportunities that movie users have to express themselves online coupled with today's computing power provide opportunities for the computational analyses of online users' movie reviews which could help further examine the construct validity of SMEC and explore the movie write-ups of lay audiences. Thus, in the present article, we follow an exploratory approach as we are interested in what online movie reviewers write about, and if such online reviews provide insight into underlying SMEC, which they might have applied to evaluate a movie.

A Brief Overview of Subjective Movie Evaluation Criteria
Whereas Thurstone (1930, see beginning of Section 1) was interested in attitudes towards movies in general, SMEC aim at examining the standards lay audiences use to assess movie features. Although several criteria had been suggested, most of them were not validated or applied to TV shows or specific target groups only (Schneider, 2017). To address these shortcomings, previous research developed and validated scales for measuring and examining the structure of SMEC (for details see Schneider, 2012aSchneider, , 2017. The procedure comprised open-ended data from an online-survey, studies including a modified structure formation technique, focus groups, and quantitative content analysis of criteria categorization, pretesting and revising of the item pool, exploratory and confirmatory factor analyses, and latent state-trait analyses. Results provided evidence for the content, structural, and substantive validity as well as for the reliability of the SMEC scales. Moreover, the nomological network of SMEC was explored (external validity by examining correlations with related constructs like film genre preferences and personality traits). The eight dimensions that emerged during this process and have been validated are as follows: Story Verisimilitude (SV), which reflects correspondence to (contemporary) reality (e.g., Gunter, 1997;Valkenburg & Janssen, 1999); Story Innovation (SI), which reflects the originality of the story (e.g., Greenberg & Busselle, 1996); Cinematography (CI), which reflects cinematic techniques (e.g., Gunter, 1997); Special Effects (FX), which also reflects cinematic techniques but focuses more on the technical aspects (e.g., Neelamegham & Jain, 1999;Rössler, 1997); Recommendation (RE), which reflects external resources for film evaluation (e.g., Neelamegham & Jain, 1999); Innocuousness (IN), which reflects a lack of potentially unpleasant characteristics (e.g., Nikken & van der Voort, 1997;Valkenburg & Janssen, 1999); Lightheartedness (LH), which reflects amusement and escapism (e.g., Greenberg & Busselle, 1996;Valkenburg & Janssen, 1999); Cognitive Stimulation (CS), which reflects the viewer's cognitive processes such as cogitation or learning (e.g., Himmelweit, Swift, & Jaeger, 1980;Nikken & van der Voort, 1997). Whereas the first four dimensions (SV, SI, CI, FX) summarize film-inherent elements, RE refers to film-external features, and the final three dimensions concern anticipated effects of use (IN, LH, CS).
In addition, some studies investigated the predictive power of LH for the evaluation of a comedy show (My Name is Earl; Burtzlaff, Schneider, & Bacherle, in press, Study 1), as well as the predictive power of IN, LH, and CS for the appreciation and enjoyment of movies in general as well as for specific genres (Schneider, 2012b). This makes the notion of SMEC particularly interesting for broader entertainment research. For instance, traditionally, film-specific evaluations have been mainly examined in light of entertainment experiences like enjoyment and pleasure (e.g., Vorderer, Klimmt, & Ritterfeld, 2004). They correspond to an evaluation criterion such as LH: Whereas when movie viewers enjoy a movie, their overall judgment about a movie can be more positive when they rate LH as highly important. However, recent advances go beyond hedonic consumption and advocate a more nuanced view of audience responses, reflecting a sense of meaning and growth, self-transcendence, or aesthetic and artistic quality (e.g., Oliver & Bartsch, 2010;Oliver & Raney, 2011;Oliver et al., 2018;Vorderer & Reinecke, 2015;Wirth, Hofer, & Schramm, 2012). Similarly, this gain in complexity is also reflected in various evaluation criteria that supplement criteria that refer to effects of use (e.g., CS) with criteria that focus more on the features of the movie (e.g., CI). SMEC might be better and more fine-grained predictors of film-specific evaluations than genre preferences and also emphasize content-related aspects (compared, e.g., to a usercentred approach; e.g., Swanson, 1987;Wolling, 2009). Thus, in sum, preliminary findings underscore the usefulness of SMEC for current entertainment research on movies and may help to understand the role of stable criteria in explaining audience responses before, during, and after movie exposure.

Previous Research on Online Movie Reviews
Although online movie reviews have been extensively researched in the last decades, this has been done al-most exclusively in the domain of marketing studies (e.g., when investigating effects of online word of mouth on box-office success; e.g., Chintagunta, Gopinath, & Venkataraman, 2010;Eliashberg, Jonker, Sawhney, & Wierenga, 2000). Besides, more recently, online movie databases such as the International Movie Database (IMDb), Yahoo! Movies, or Douban have been among platforms subjected to computational analyses (e.g., Bader, Mokryn, & Lanir, 2017;Simmons, Mukhopadhyay, Conlon, & Yang, 2011;Yang, Yecies, & Zhong, 2020;Zhuang, Jing, & Zhu, 2006). In the following paragraphs, we give a brief overview of large-scale studies dealing with online movie reviews. For a more comprehensive summary, see Table S1 in the Supplementary Material. Please note that Table S1 and some parts of the remainder of this article include technical language of computational science. We refer the interested reader to comprehensible and introductory texts for communication scientists such as Günther and Quandt (2016) or Maier et al. (2018).

Predicting Box-Office Success
A wide range of management/economics studies have attempted to predict a movie's box-office success through statistical models, often including samples of online movie reviews (e.g., Hu, Shiau, Shih, & Chen, 2018;Hur, Kang, & Cho, 2016;Lee, Jung, & Park, 2017). Most of these models incorporated the online movie review's sentiment as well as other factors, some of which are also specific to reviews, such as the writing style, use of certain words (bag of words approach), or the length of review (Yu, Liu, Huang, & An, 2012). Other characteristics of online movie reviews such as its rated helpfulness (Lee & Choeh, 2018), the movie's numeric 'star'-rating (Hur et al., 2016), or the genre of movie (Lee & Choeh, 2018) were also often used but referred not directly to the online movie review's content.

Predicting Sentiments
The second line of research, focusing on methodological aspects of computational methods, attempted to establish, complement, or modify algorithms for the mining of online movie reviews, especially for sentiment analysis (e.g., Liu, Yu, An, & Huang, 2013;Parkhe & Biswas, 2016;Yang et al., 2020). As online movie reviews are relatively easy to scrape (e.g., from IMDb) and by nature mostly positive or negative (and rarely neutral), they provide good examples to develop and test classification algorithms. Most of these studies are situated within the fields of computational linguistics and computer science.

Other Computational Approaches
A few studies do not fit either of the previous categories (e.g., Bader et al., 2017;Otterbacher, 2013;Simmons et al., 2011;Yang et al., 2020). See Table S1 in the Supplementary Material for more details.
Three studies are particularly interesting concerning the aim of the present article. Drawing on emotion theory, Bader et al. (2017) created emotional signatures of movies and their genres based on emotions toward or elicited by a film that were extracted from its online reviews on IMDb. Their results imply that emotional evaluations also manifested themselves in online reviews and can help to cluster entertainment-related concepts such as movie genres. Moreover, as emotional and affective states are also related to SMEC (e.g., LH and IN), the appearance of words representing emotions in online movie reviews may help detect those criteria. In other words, it seems likely that SMEC that rely on affective evaluations are reflected in movie reviews. An 'emotional' approach to movies is also common in entertainment research (e.g., Bartsch, 2012;Soto-Sanfiel & Vorderer, 2011). Whereas these findings concern criteria as anticipated effects of viewing, two other studies focused more on film-inherent features as criteria. For instance, in an often-cited article, Zhuang et al. (2006) mined feature-opinion pairs within online movie reviews, based on a movie feature-opinion list. The feature part of this list contained names of movie-related individuals such as directors or leading actors and feature words of six movie elements, which were not derived from theory but were somewhat related to evaluation criteria-for instance, visual effects (partially refers to CI), FX, and screenplay (refers to SI). Thus, these findings support the idea that it is not only users' personal experiences (e.g., emotions) which play a role in movie reviews, but also movie-related features that are deemed necessary to achieve artistic and aesthetic quality. Lastly, using computer-aided content analysis, Simmons et al. (2011) found that storyline-among four other movie elements-was strongly related to the overall film grade. However, a deeper analysis revealed that what they called 'storyline' included statements about CI, action, humour, and entertainment, and thus represented a rather fuzzy concept. Disentangling what lies behind the storyline may hint at effects of use and film-inherent features as they are reflected by SMEC.
In sum, regarding entertainment theory and SMEC, all previous perspectives on online movie reviews show several weaknesses: First, although some studies refer to features or criteria, those criteria are not based on theoretical assumptions. Moreover, research has so far heavily relied on lexical databases or dictionaries, which implies that criteria are directly observable within the online reviews. Given the theoretical assumption that SMEC are latent constructs (Schneider, 2012a(Schneider, , 2017, methodological approaches that take this assumption into account could be more appropriate (e.g., Amplayo & Song, 2017, combined a multi-level sentiment classification with bi-term topic modeling).
To our best knowledge, on the one hand, no studies that computationally analyzed online movie reviews have yet done so against the background of concepts related to entertainment theory. On the other hand, particularly within the field of entertainment theory, communication researchers have rarely used online review platforms to address research problems, even though the opportunities to assess digital traces of audience reactions seem easily available on a large scale and allow conclusions to be drawn about personal characteristics from online behaviour such as liking (Kosinski, Stillwell, & Graepel, 2013). For instance, if entertainment experiences are conceptualized as media effects (for recent overviews, see Raney & Bryant, 2020;Raney, Oliver, & Bartsch, 2020), responses to movie exposure and evaluative judgments of movies-both can be expressed as written online reviews-may indicate underlying evaluative factors (Schneider et al., in press).

The Present Research
Unlike previous studies that focused on predicting boxoffice success or sentiments from online movie reviews, we are interested in what online movie reviewers write about and if those online reviews provide insight into underlying SMEC, which the writers might have applied to evaluate a movie. In doing so, we try to figure out whether text mining of online movie reviews' content can support findings from self-report data and how analyses from such methodologically different approaches can contribute to construct validity. Until now, we are not aware of any similar attempt.
As the SMEC development has been a data-driven, inductive process, we decided against a confirmatory approach in favour of an exploratory, inductive, and unsupervised approach (i.e., topic modeling).

Method
Our sample is based on an open-source dataset including movie reviews from IMDb and their positive or negative sentiment classification (Maas et al., 2011). The dataset consists of 25,000 positive and 25,000 negative movie reviews. Additionally, 50,000 unlabeled reviews are provided. Only up to 30 reviews per movie are included in order to avoid a high number of correlated reviews. After downloading the data, we decided to focus on the positive and negative reviews only because these sentiments might be a sign that users expressed their SMEC. All data management, cleaning, and analysis was performed using R 3.5.3 (R Core Team, 2020) and RStudio 1.2.5033 (RStudio Team, 2019).
Before analysis, we opted for an extensive data preprocessing as recommended in the literature (Maier et al., 2018;Manning, Raghavan, & Schütze, 2008). First, we excluded all duplicate reviews from the dataset. Afterwards, we implemented common data preprocessing steps to delete text that provided no relevant information for automatic text analysis, such as cleaning of HTML tags and links and deleting numbers and whites-pace via the textclean package in R (Rinker, 2018). To improve the quality of our dataset and to reduce the number of possible features, we deleted common stopwords via a combination of different stopword-lists and implemented lemmatization (Manning et al., 2008) via the spacyr (library ldaR wrapper; Benoit & Matsuo, 2020). Online movie reviews are of varying quality as users employ, for instance, internet slang as opposed to formal writing. To enhance the quality of the data and reduce internet slang, we automatically removed internet slang via textclean package in R (Rinker, 2018) and, based on part-of-speech tagging via spacyr, we selected only verbs, nouns, adjectives, and adverbs for further analysis. We deleted the most common words-'movie' and 'film'because they are very general in our context but occurred more than 60,000 times in the corpus and thus three times more often than any other word. We implemented term frequency-inverse document frequency (tf-idf) weighting in order to determine how relevant a word in a given document is-that is, how often a word occurs in a document in relation to how often the word occurs in other documents of the corpus (Manning et al., 2008). In the following, we removed words that had a low tf-idf score (tf-idf < 0.050) and, thus, were not important for our analysis.
For data analysis, we employed topic modeling, an unsupervised machine learning approach to infer latent topics from a large sample size (Maier et al., 2018). Given the characteristics of our sample and theoretical assumptions (i.e., topics are likely to be correlated), we estimated a Correlated Topic Model (CTM) based on the movie reviews (Blei & Lafferty, 2009) using the topicmodels package in R (Grün & Hornik, 2011). To select the number of topics, we estimated 21 topic models from k = 10 to k = 70 via ldatuning 1.0.0 package (Murzintcev & Chaney, 2020) and selected the 38-topic model as the best fitting model to our data. Then, we estimated a set of ten separate 38-topic CTMs with different initial parameters and selected from this set the best model regarding log-likelihood (Grün & Hornik, 2011) as our final model. We selected the topic with the highest probability per online movie review and with a minimum probability (gamma) of 0.02. The best fitting CTM included 38 topics for 41,434 online movie reviews. All scripts for data cleaning and analysis can be accessed via OSF (https://osf.io/pqnk6).
To allow for succinct presentation whilst ensuring coverage of the most important topics in the dataset, we focus on the most frequent topics in our sample with at least 600 reviews per topic. For all topics discovered in the dataset, please see the topic distribution and the top words for all topics in OSF. Based on a qualitative assessment of the top words of each topic, we organized the remaining 14 topics (N = 35,136) in three broad categories (see Table 1). Furthermore, drawing on the material (i.e., evaluation terms and criteria) used during the development of the SMEC scales (Schneider, 2012a, see Appendices A and B;Schneider, 2017, see item wording), we closely inspected those randomly selected reviews that had the highest gammas ( min = 0.02) and marked to which SMEC they referred (see Table 2).

Correlated Topic Model
To answer the question of what online movie reviewers write about, we grouped the 14 topics into three categories for better interpretation (see Table 1). First, it is striking that most of the discovered topics concern funniness and comedy (labeled as 'Hedonism' [HE] category). Although the topics in these categories have nuanced meanings, on a general level, all of them relate to the presence or absence of hedonic and pleasurable kinds of media consumption. This fits into traditional lines of research that assumed enjoyment to be at the heart of entertainment (for a recent overview, see Raney & Bryant, 2020). Moreover, the HE category also reflects audience reactions. Broadly speaking, this fits the subjective movie evaluative criterion LH well. A second set of topics is broadly related to the acting of the cast and summarized in the category 'Actors' Performance' (AP). Although aspects of how well actors play their characters is not included in the final version of the SMEC scales, items that tapped into this category were part of the construction process (see Table B1, Items 47-51, in Schneider, 2012a, e.g., Item 47 reflects the general performance of actors). The third category, 'Narrative' (NA), comprises topics concerning story and plot. It relates to the subjec-tive movie evaluation criterion SI. Both AP and NA refer to what has often been argued to be the most important elements for movie choice or evaluation (e.g., Linton & Petrovich, 1988;Neelamegham & Jain, 1999). Taken together, online movie reviewers mostly write about whether or not they enjoyed a movie, about the APs, and about the quality of the movie's NA.

Additional Qualitative Exploratory Results
Our initial focus lay on the topic model. During interpreting, labeling, and summarizing, it became clear that some SMEC may not have emerged as topics because they were not prevalent. Nonetheless, descriptions related to these SMEC were not totally absent from the data. Based on material from previous research (e.g., criteria that participants named in open-question tasks, content of items in the initial item pool and in the final SMEC scales, and content of cards during modified structure formation technique; Schneider, 2012aSchneider, , 2017, meaningful words and phrases were qualitatively checked, interpreted, and marked using superscripts. To illustrate this, we describe two examples in Table 2. They provide deeper insight into how SMEC are applied when writing online movie reviews. For instance, the second example refers to SMEC such as SV, RE, or CI as well. These examples are particularly interesting with regard to SMEC because many of the criteria that have been previously described by Schneider (2012aSchneider ( , 2017 can be discovered in these reviews. Table 2. Two examples of randomly selected reviews with ≥ 0.02 for each topic (k).

Review
Yesterday I finally satisfied my curiosity and saw this movie. My knowledge of the plot was limited to about 60 seconds of the trailer, but I had heard some good critics 5 which caused my expectations to increase. As I saw the movie, those untied pieces had been combined in a story that was becoming quite intriguing, with some apparently inexplicable details 2 . But in the end, everything is disclosed as a simple succession of events of bad luck, "sorte nula" in Portuguese. Above everything, I felt that the story made sense, and everything fits in its place, properties of a good script 2 . I must also mention the soundtrack, which helps the creation of an amazing environment 9 . And if you think of the resources Fernando Fragata used to make this film, I believe it will make many Hollywood producers envious… 10 Movie Title: Sorte Nula (2004) Path in IMDb dataset: aclImdb/test/pos/11479_8.txt Topic k = 1; = 0.028

On October of 1945, the American German descendant Leopold Kessler (Jean-Marc Barr) arrives in a post-war Frankfurt and his bitter Uncle Kessler (Ernst-Hugo Järegård) gets a job for him in the Zentropa train line as a sleeping car conductor. While travelling in the train learning his profession, he sees the destructed occupied Germany and meets Katharina Hartmann (Barbara Sukowa), the daughter of the former powerful entrepreneur of transport business and owner of Zentropa, Max Hartmann (Jørgen Reenberg). Leopold stays neutral between the allied forces and the Germans and becomes aware that
there is a terrorist group called "Werewolves" killing the sympathizers of the allied and conducting subversive actions against the allied forces. He falls in love for Katharina, and sooner she discloses that she was a "Werewolf." When Max commits suicide, Leopold is also pressed by the "Werewolves" and need to take a position and a decision.
"Europa" is an impressive and anguishing Kafkanian story 2 of the great Danish director Lars von Trier. Using an expressionist style that recalls Fritz Lang and alternating a magnificent black & white cinematography with some coloured details 3 , this movie discloses a difficult period of Germany and some of the problems this great nation had to face after being defeated in the war. Very impressive the action of the occupation forces destroying resources that could permit a faster reconstruction of a destroyed country 1 , and the corruption with the Jew that should identify Max. Jean-Marc Barr has a stunning performance 11 in the role of a man that wants to stay neutral but is manipulated everywhere by everybody. The hypnotic narration of Max Von Sydow is another touch of class 11 in this awarded film 5 . My vote is nine. Movie Title: Europa (1991) File path: aclImdb/train/pos/130_9.txt Topic k = 1; = 0.028 Notes: k = index of topic; = the probability of a given review to be associated with the topic k (please note that we report here only the topic with the highest probability for the respective review); file path = path to the respective file in the IMDb dataset (Maas et al., 2011); bold with superscript indicates relation to SMEC, see interpretation below; italics indicate that text summarizes only content. Interpretation of superscripts (Schneider, 2017, unless  refers to film-inherent features: 'soundtrack' was mentioned as a criterion during the SMEC development and part of the initial item pool (see Schneider, 2012a, Appendices A and B) 10 refers to film-peripheral features: 'production' was mentioned as a criterion during the SMEC development (see Schneider, 2012a, Appendix A) 11 refers to film-inherent features: 'performance of actor' was mentioned as a criterion during the SMEC development and part of the initial item pool (see Schneider, 2012a, Appendices A and B)

Discussion and Conclusion
We started this exploratory journey by asking what online movie reviewers write about and whether those online reviews provide insights into underlying SMEC. To ad-dress these questions, we applied correlated topic modeling to a large IMDb dataset. We found 14 most prevalent topics in 35,136 online movie reviews that tapped into three major categories-HE, AP, and NA-and indicated what reviewers mostly wrote about. A more detailed qualitative analysis of the reviews revealed that users do indeed mention certain SMEC, for example, SV, SI, CI, or RE. However, the focus of the online movie reviews as revealed by the topic model remains on the three overarching topic categories that only cover two SMEC: SI and LH.
Another finding is that top words in almost every topic represent affective reactions. This comes as no surprise because affective responses often represent the valence of a judgment and play an important role in movie evaluation (Schneider et al., in press). However, affective words in a written online movie review reflect not only evaluative judgments but also motivations of the writers. For instance, writing online reviews also fulfills an approval utility for the reviewers, enabling them to enhance themselves by signaling "a kind of connoisseurship or a level of social status that can become important to one's selfconcept" (Hennig-Thurau, Gwinner, Walsh, & Gremler, 2004, p. 43). IMDb quantifies this approval, for example, through ranking reviews by their rated helpfulness or the prolificacy of the reviewer. In general, if reviews contained positive emotional content, readers considered them as more helpful (Ullah, Zeb, & Kim, 2015). Further motivations that can lead to affective elements in reviews are concern for other consumers (e.g., intending to warn them) or the venting of negative feelings (Hennig-Thurau et al., 2004).
Besides these contributions of the present research, there are some limitations. Most of them concern the IMDb reviews and the specific dataset we used (Maas et al., 2011). First-and perhaps most problematic for automatic text mining-online movie reviews on IMDb vary in many aspects that may have introduced noise to our approach. Most crucial is the fact that critiques of a movie and summaries of its content are inextricably interwoven (for a review that contains a large part of content summary, see e.g., the second movie review in Table 2). Second, the IMDb dataset that we used comprises movies with a wide range of quality. Whereas most participants in the SMEC studies had specific and typical movies in mind when answering the items, the database we drew on also largely included mediocre and rare exemplars. Reviewers may have applied different criteria to qualitatively diverse movies. Some preliminary evidence supports this possibility. For instance, individuals named different criteria depending on whether they had to think about good, bad, or typical exemplars of a dramatic movie (Vogel & Gleich, 2012, Study 2). Second, some of the reviews dealt with TV shows or documentaries (e.g., The 74th Annual Academy Awards or Wrestling matches). These media types are not covered by SMEC. As this information was not available in the original dataset, it was not possible to exclude non-movie media types. To deepen our knowledge about this issue and get more details, we gathered meta-data of the respective items via OMDb API (this newly created dataset may also be helpful for future research and is available via OSF: https://doi.org/10.17605/OSF.IO/KA5D8).
We found that 92% of the reviews in Maas et al.'s (2011) dataset actually referred to movies, rendering this limitation marginal. Third, the dataset included up to 30 reviews per movie. Thus, some plots and their descriptions could be overrepresented in the sample. However, given this very large dataset including 50,000 reviews and over 13,000 movies, this should not lead to an imbalance.
Movie evaluation criteria frequently appeared in online movies reviews. The number of criteria mentioned easily exceeded the eight SMEC dimensions as can be seen in the two examples in Table 2. However, they provide some support for content validity. Thus, another way to start developing items to measure SMEC could have been based on online movie reviews. The latent semantic variables, or topics, comprehensively summarized the content of the reviews and, using three broad categories, can be described as HE, AP, and NA. These categories resemble some of the SMEC (i.e., SI and LH), showing partial support for their construct validity but not for others (e.g., SV or CS).
Based on the conceptual framework of SMEC, we were interested in what users write about in online movie reviews and whether this could provide some insights into movie evaluation criteria from a different perspective than traditional self-report. However, after inspecting and interpreting the results of the topic models, we found that some criteria were more prevalent than others. This is perhaps also due to some slightly different goals of the research projects: Whereas the construction of the SMEC scales aimed to identify interindividual differences in what criteria viewers use when they evaluate movies, the present article examined what users write about in online movie reviews and what the most important topics are. Thus, reporting SMEC and applying them while writing about movies have a great deal of common ground but can, nevertheless, also lead to deviations. In short, we did not start with the idea that an unsupervised machine learning approach to movie reviews would result in exactly the same eight criteria that had previously been found in SMEC research based on self-reports. Nevertheless, we were hoping for some unsupportive or supportive insights into movie evaluation criteria.
Although it is hardly possible to explicitly state a priori hypotheses or expectations and test those against the results of a topic model, we think that our findings may spark interest in further assessing the usefulness of computational approaches to additionally explore previous research findings from a different angle or, if possible, to incorporate such procedures during scale development.
Future research could test several alternative computational methods to shed light on the specific SMEC that we could not find on the level of topics and broader categories and to further explore online movie reviews from different angles (for a concise overview, see Günther & Quandt, 2016). For instance, rule-based text extraction can help to refine an initial dataset by eliminating non-evaluative parts such as content summaries (e.g., Simmons et al., 2011). Building and validating a reliable movie criteria dictionary or using supervised machine learning to classify movie criteria based on manually labeled text could be another tool for computational SMEC research. The results of our study might be useful to plan such future analysis. However, this needs considerable effort and is probably not yet advisable because the SMEC construct itself is, as outlined in the introduction of this article, in need of further validation beyond the field of self-reports. To resolve this dilemma, future research endeavors that could be more deductive or supervised may draw on specific wordings of the SMEC scale items or on the preliminary coding scheme that has been developed during the qualitative phases of the SMEC construction (Schneider, 2012a). This information may then help to provide a gold standard for coders.
Besides choosing between unsupervised or supervised approaches, the predictive value of applied models could gain more attention in future. Although often examined outcome variables such as box-office success are often the focus of media economists but not of communication processes or effects research, a question such as how well can detected topics predict the evaluation of a movie on quantitative measures (e.g., star rating), follow-up communication (e.g., sharing or recommending a movie), or consumer choice (e.g., selecting the next movie) should matter to entertainment scholars. Moreover, the predictive validity can be used to compare different models and approaches and improve them (e.g., Amplayo & Song, 2017). Our newly created dataset provides the opportunity to engage in some of these analyses (e.g., using topics to predict box-office success, different types of ratings, or genre classification) that were beyond the scope of this article.
And what about entertainment research in general? Movies as entertainment fare have a long research tradition (e.g., Günther & Domahidi, 2017). Nowadays, it seems that economists, film studios, and online streaming providers-behind closed doors-have done much more applied work about movies than entertainment scholars have. This also becomes obvious when we take a look at the relevant marketing literature. For instance, Hennig-Thurau and Houston (2019) recently published an approximately 900-page book called Entertainment Science and summarize the field from an economist's perspective, while only marginally touching on recent advances in entertainment theory made by communication scholars and media psychologists (as summarized, e.g., in Vorderer & Klimmt, in press). On a macro level, a data-scientific and computational approach may bring these different disciplines closer together and recognize each other's achievements more thoroughly. It may not only be scholarly work (e.g., Taneja, 2016) that benefits but also entertainment industries that could learn from media and communication studies. If they interface with each other better, analyzing Big Data against a social-scientific background may help to improve recommender systems and user experiences within online re-view platforms, video streaming portals, or mixed-media channels. Although there are some notable but rare exceptions (e.g., Oliver, Ash, Woolley, Shade, & Kim, 2014), most entertainment researchers have not taken full advantage of the digital traces or responses that are publicly available online. Utilizing these data and applying computational methods to address open questions or supplement previous research could be a crucial factor for advancing both movie evaluation research and entertainment theory.