London2012: Towards Citizen-Contributed Urban Planning Through Sentiment Analysis of Twitter Data

The dynamic nature of cities, understood as complex systems with a variety of concurring factors, poses significant challenges to urban analysis for supporting planning processes. This particularly applies to large urban events because their characteristics often contradict daily planning routines. Due to the availability of large amounts of data, social media offer the possibility for fine-scale spatial and temporal analysis in this context, especially regarding public emotions related to varied topics. Thus, this article proposes a combined approach for analyzing large sports events considering event days vs comparison days (before or after the event) and different user groups (residents vs visitors), as well as integrating sentiment analysis and topic extraction. Our results based on various analyses of tweets demonstrate that different spatial and temporal patterns can be identified, clearly distinguishing both residents and visitors, along with positive or negative sentiment. Furthermore, we could assign tweets to specific urban events or extract topics related to the transportation infrastructure. Although the results are potentially able to support urban planning processes of large events, the approach still shows some limitations including well-known biases in social media or shortcomings in identifying the user groups and in the topic modeling approach.


Introduction
Cities are complex systems (Castells, 1996;Hall, 1966;Theodore, 2006), consisting of two main elements: the people as residents or visitors, and the infrastructure to fulfill their needs ranging from housing to recreation or even self-realization (Costanza et al., 2007;Maslow, 1943).Some of the infrastructure or related networks are static and mostly physical, such as buildings or the road and electricity networks, whereas others are more dynamic, like social, transportation, or financial networks.From an urban analysis viewpoint, the dynamic nature of these systems is challenging, especially in the case of large cities with millions of people constantly on the move and having different needs and preferences.These challenges do not only result from the sheer amount of www.flacsoandes.edu.ecpeople, but also from the intense spatiotemporal variability originating from urban dynamism and from the constantly changing subjective needs of each person.Therefore, effective planning practice requires analysis at high spatial and temporal scales to understand this dynamism of urban life and processes.
Traditional methods, such as questionnaires or counting, are not capable of handling such fine temporal and spatial scales at all, or they are highly resourceconsuming and, therefore, slow and costly, and thus not up-to-date.This is where the advantages of the datadriven era become relevant, most concretely with respect to the real-time availability of social media data.This data provides unseen contextual insights into spatiotemporal phenomena on a finer scale in cities through users' digital traces on different online platforms such as Twitter, Foursquare/Swarm and Flickr (Abbasi, Rashidi, Maghrebi, & Waller, 2015;Aubrecht, Ungar, & Freire, 2011;Crooks et al., 2015;Girardin, Vaccari, Gerber, Biderman, & Ratti, 2009).This is of central importance to urban planning dealing with the optimization of the abovementioned networks.
Planners are responsible for land use strategies, the design of public places, or transportation planning, which constitute essential factors of urban life (McGill, 2017).An aspect of particular importance for urban planning is the investigation of the effects of a planned large event, considering residents and visitors.These events have a special role in planning because they are usually temporary and require completely different circumstances and conditions compared to the average daily routines of urban life.In contrast with other unplanned events such as emergencies (e.g., natural, industrial and manmade disasters), direct preparations can be made for planned events, not just precautionary measures (Getz & Page, 2016).As a consequence, they are preceded by extensive planning and preparation efforts; but such events frequently still face severe inconveniences or even disruptions, most strikingly with respect to the transportation of people, presumably in different ways for residents and visitors.Therefore, the distinction between these two groups is crucial in most of the analyses due to their different needs, behavioral patterns, and exposure to the effect of a large planned event.
Thus, analyses at fine temporal scales are inevitable for examining citizens' mobility, which can help in detecting patterns and anomalies through the understanding of underlying problems or phenomena in the context of urban transportation.For instance, citizens' trajectories and the number of people moving through the city vary over time during the day, but also between days depending on the weather, weekday, planned and sudden events, traffic density, and many other factors (Sagl, Resch, Hawelka, & Beinat, 2012).Therefore, traditional annual commuting statistics are not informative on such fine spatial and temporal scales because they are mostly produced only once or twice per year and aggregated to spatial planning units.First, this results in commuting data that do not actually reflect real travel directions due to their aggregated nature, and, second, everyday individual trajectory details are lost through the aggregation.
As a consequence, social media, providing digital spatiotemporal traces of individuals, grant valuable mobility information, particularly through their nature of a large and continuous source of data and their fine scale in space and time.Another advantage of using these sources is the potential for extracting direct feedback about city life-related topics, places or phenomena by revealing subjective aspects as well, such as public mood or emotions (Frank, Mitchell, Dodds, & Danforth, 2013;Quercia, Ellis, Capra, & Crowcroft, 2012;Resch, Summa, Zeile, & Strube, 2016).This provides an opportunity to investigate what people actually think about parts of the city and the direct or indirect effects of a large event.Thus, several methods and analyses have been developed in the area of opinion mining (e.g., Pak & Paroubek, 2010) and semantic topic extraction (e.g., Steiger, Westerholt, Resch, & Zipf, 2015) for use in urban planning.
However, to the best of our knowledge, limited research has been conducted to analyze social media data regarding planned large events, considering comparison days (before or after the event), different user groups (residents vs visitors) along with the linkage between sentiment analysis and topic extraction in one study.In our work, we intend to integrate all of these aspects to provide valuable knowledge about urban events, whereby our case study focuses on the 2012 Olympic Games in London.By exploring emotions and events in a city through social media analysis, we aspire to a better understanding of citizens' behaviors and needs in cities. Thereby, we can provide a basis to aid planners in identifying more specific urban planning issues for further indepth analysis.In line with these goals, we intend to answer the following research questions in this article: RQ1 → How can we identify distinctive characteristics of tweeting behavior in terms of spatiotemporal patterns and sentiments between "residents" and "visitors"?RQ2 → Are there detectable changes in the spatial and temporal patterns, and sentiment of the tweets during the London Olympic Games compared to the days before and after it?RQ3 → Which topics that are related to urban planning in the context of a large sports event can be identified through semantic analysis of social media posts?

Citizen-Contributed Geographic Information to Describe Urban (and Spatial) Practices
Among the practical applications of geographic data extracted from social media, we can distinguish two main categories: quantitative and qualitative aspects.
Quantitative approaches describe spatiotemporal phenomena by leveraging the advantage of fine spatial and temporal scales of the data, such as crowdsourcing urban form and function (Crooks et al., 2015), or characterizing and classifying urban areas and location types (Noulas, Scellato, Mascolo, & Pontil, 2011).
There are also a few applications where researchers assess urban life from a qualitative point of view using different social media sources.Girardin et al. (2009) evaluated urban attractiveness by analyzing images from Flickr and mobile phone usage data, while Sun, Fan, Bakillah and Zipf (2015) used geo-tagged images for road-based travel recommendations.As another approach, several researchers improved and refined various methodologies for extracting emotions (Resch et al., 2016), transit rider satisfaction (Collins, Hasan, & Ukkusuri, 2013), and community happiness (Quercia et al., 2012) from Twitter data, also combined with demographics and other objective characteristics of a place such as education or obesity (Mitchell, Frank, Harris, Dodds, & Danforth, 2013), or even defined sentiment as a function of movement (Frank et al., 2013).The advantages of utilizing available additional datasets such as demographics, mobile phone data or mobility trajectories are twofold; they can help the interpretation of the primary results extracted from social media, and, on the other hand, they are also appropriate for validation purposes.

Urban Planning, Social Media and Planned Large Events
Previous studies have shown that social media usage is generally more intensive during large events, and a concentration around the venue and impact on transportation is also identifiable (Gupta & Kumaraguru, 2012;Zhang, Ni, He, & Gao, 2016).The Olympic Games are considered one of the world's largest events, involving a lot of organizational tasks from social, technical, environmental, economic, demographic and transportationrelated perspectives (Chen, 2012;Cook & Ward, 2011;Malfas, Houlihan, & Theodoraki, 2004).
Large sports events like the FIFA World Cup can also be identified using the content of the tweets, hashtags and distribution of retweets.Kim et al. (2015) applied topic modeling before and during the event, while Corney, Martin and Göker (2014) identified phrases (word n-grams) that showed a sudden increase in frequency in the dataset and then selected co-occurring n-grams to identify topics.By using sentiment analysis, researchers identified relationships between the public mood and large socioeconomic events in the media (Bollen, Mao, & Pepe, 2011), together with trends and possible predictions of the disposition theory (Yu & Wang, 2015), such as fanship for sports.Clearly, these analyses identified changes in activity patterns (e.g., supporters induce a general increase in number of tweets), in topic diversity, and in the spatial distribution of topics related to the event.

Data
The study area for the present analysis is Greater London, which has an expansion of 3,458 km 2 .The Twitter data was obtained using the Twitter Streaming Application Programming Interface (Twitter INC, 2017) for the year 2012, and consists of tweet content and attributes such as user name, user location, and message time.We only harvested geolocated tweets, as our study requires geospatial and temporal analysis.To the best of our knowledge, this database does not contain retweets.It shall be noted that due to user practice and the policy of Twitter, in general, the tweets containing coordinates represent only a smaller subset of all tweets posted in a given period, about 1-10% according to previous studies (Morstatter, Pfeffer, Liu, & Carley, 2013;Zhang et al., 2016).Moreover, they are not evenly spread in space and among user groups, as youngsters tend to use social media more actively (Li, Goodchild, & Xu, 2013;Resch et al., 2017).These issues have been thoroughly discussed in existing literature (Steiger et al., 2015;Sui & Goodchild, 2011) and will be further detailed in the Discussion section.In addition, we want to point out that distinguishing personal and non-personal Twitter accounts was beyond the scope of this study.Although we are aware of the possible bias originating from it, we considered its effect on the final results marginal due to the large amount of data.

Methodology
As shown in Figure 1, our methodology comprises a sequential number of steps for pre-processing (defining temporal bins for before/during/after the Olympics and identifying residents vs visitors), textual analysis (sentiment analysis and automated semantic topic modeling), spatial hot spot detection, and finally evaluation and validation through a point pattern test of our results.The single steps are described in the following sub-sections.

Pre-Processing
We developed a two-step filtering procedure to prepare the raw data for the subsequent analysis: Temporal binning: First, we created temporal bins from the raw data representing time periods before, during and after the Olympic Games (OG).This allows us to distinguish between "event days" and "comparison days".The reason for following this approach has been described in previous literature, as large-scale events such as the OG change the dynamics of a city for the time of the event.The temporal bins have been defined as follows: before: June 27-July 13, 2012; during: July 27- August 12, 2012;after: August 27-September 12, 2012.Spatiotemporal subsetting (hypothesizing residents and visitors): The self-reported geolocation data from tweets and the frequency of their presence in the temporal subsets were used to identify presumable "residents" and "visitors" in London.Our approach is based on the work of Abbasi et al. (2015), who identified these user types in Sydney for city trip analysis.The rationale for identifying the two groups was the following: A person who tweeted at least once in each of the temporal subsets was considered a "resident", whereas a person who tweeted in just one of them was considered a "visitor" (non-resident).The remaining users of the dataset were not considered in the present study, as we could not differentiate between less actively tweeting residents or those visitors who stayed longer than a month, without performing further extensive analysis.Although, it is possible to identify them based on their tweets' content, but that is a complex methodology on its own and, therefore, was beyond the scope of this study.We are aware of the limitations of our method and discuss them, along with the advantages in the Discussion section.Yet, our results underpin that the method is effective when there is no additional data available for classifying user types, and it sufficiently reflects the necessary differences between the two groups for the desired purposes in our case.

Semantic Analysis
The semantic text analysis was performed in two consecutive steps: sentiment analysis, followed by automated topic extraction using the unsupervised machinelearning method Latent Dirichlet Allocation (LDA).

Sentiment Analysis
The sentiment analysis algorithm used in our approach is based on the work of Breen (2012).Sentiment scores were calculated for each tweet to automatically define to what degree it contains positive or negative sentiments, by calculating the difference between the number of positive words and the number of negative words.
This approach requires a dictionary with positive and negative words, for which we selected the Hu Liu lexicon (Hu & Liu, 2004), which is the most acknowledged dictionary in recent literature.Generally, if the score value is higher than zero, the sentence is assumed to contain an overall "positive sentiment", whereas it is considered containing a "negative sentiment" if the value is below zero.If the score equals zero, then the sentence is considered "neutral".
The main disadvantage of this approach is that the algorithm has limitations in defining unambiguous negative or positive scores for sentiment values around zero because they are indeed either neutral or they are misclassified with a comparatively high probability.Thus, we categorize positive tweets with the score equal or higher than 2 and negative tweets with the score equal or lower than −2.This does not mean that all the tweets with the sentiment value of 1 and −1 are neutral; rather they have a lower accuracy of being identified as positive or negative and, therefore, we do not consider them in our analysis.

Raw
The terms "positive" and "negative" will be used throughout the article as defined above.

Machine-Learning Topic Modeling
As keyword-based approaches have limitation for social media data (Eisenstein, 2013), we used a machinelearning algorithm that extracts the latent structure of a dataset.This topic modeling approach clusters the data stream and filters the relevant tweets for further subsequent spatial analysis.Concretely, we used LDA, which is a probabilistic topic modeling algorithm that clusters semantic topics in a dataset.LDA is an unsupervised generative model that produces a document-topic distribution and a topic-word distribution.More information about the model and the hyperparameters of LDA can be found in (Blei, Ng, & Jordan, 2003).
Before the actual topic modeling procedure, social media posts need to be pre-processed, thus significantly improving the performance of LDA.We followed the steps defined by Resch et al. (2017), where every preprocessing step is explained in more detail.In the first step, every tweet is split at blank spaces so that every single character or sequence of characters can be treated individually (tokenization).Then all the words are set to lowercase to account for spelling mistakes and differences.In our experiment, URLs, special characters [e.g., ":" or ")"], short words (less than three characters), stop words (identified by a manual list and the list from Natural Language Toolkit (Manning et al., 2014, for English), and unique words that appear only once in the corpus, as well as numbers, are considered noise and are deleted.The remaining words are then reduced to their word stem using the Porter Stemmer (Porter, 1980).
In the next step, we applied LDA on the preprocessed data.We used the implementation of the Gensim library (Gensim, 2017) in Python and processed all the experiments with the following parameter values, which have been empirically derived, as no generically proven formal a-priori parameter estimation method exists so far: α = 0.0001, β = 1/number_of_topics and num-ber_of_topics = 30.We set α to a value that is close to zero because short documents such as tweets usually only contain a single topic (Zhao et al., 2011).The other two variables were chosen according to experimental evidence.In the final step, we classified the tweets in accordance with the topic with the highest probability.The extracted topics were then manually interpreted, focusing on Olympics-related and transportation-related topics.From our perspective, a topic is related to transportation when words like London, station, railway, underground, etc. have a high probability in a topic, whereas a topic is considered Olympics-related if the stem "olymp" has the highest probability, and other words like stadium, ticket, wembley, athlete, etc. also have a high probability.
Examples of Olympics-and transportation-related topics can be found in the Results section.

Spatiotemporal Data Processing
In order to study the spatiotemporal behavior of residents and visitors in the three temporal bins (before, during and after the OG), we analyzed daily and hourly tweet intensities for the subsets of positive and negative tweets and the main semantic topics (LDA output), as well as the similarity patterns for spatial point distribution (Figure 2).In the last step, we investigated spatial hot spots using Kernel Density Estimation (KDE).The maps illustrating the results of the KDE can be found in the supplementary file.
To quantify spatial similarity between tweets before, during and after the OG, a nonparametric and areabased spatial point pattern test was used (Andresen, 2009;Andresen & Malleson, 2013).The test requires the following datasets: base points and test points for comparing spatial patterns and base polygons representing the areal units.We had 5,888 polygons as areal units using the administrative dataset of the Greater London Lower Super Output Area (LSOA) from 2011 (Greater London Authority's DataStore, 2017).The LSOA areas are only used for the similarity test in our study, to define a general pattern in tweeting behavior.The base points are the tweets from the "during OG" bin, both for residents and for visitors in two consecutive analyses.Whereas the test points are the tweets posted before and after the OG, first for residents then for visitors.The entire analy-

Tweets from the 3 temporal bins
Before, during and aŌer OG sis was performed on hourly subsets.Regarding the base point dataset, the next step is to assign the points to the areal units (the LSOA) and then to calculate the percentage of points within each LSOA.For the dataset containing the test points, after assigning them to LSOA polygons as well, they should be randomly sampled, selecting 85% of the points and then calculate percentages (use Monte Carlo simulation to repeat this action 200 times).After that, we can create confidence intervals for each areal unit.Following these separate steps, the base percentage and test confidence for test interval are compared, and the result is the global index of similarity (for all the data) and the local one (for each areal unit).For this case study, the outcome of the test is a global index of similarity, where values range from 0 (no similarity) to 1 (identical).If the index is higher than 0.80, the two datasets are considered to be highly similar (Andresen, 2016).The index shows the level of similarity in the respective LSOA areas between the two analysis periods (Equation 1):

Residents & visitors
where s i is equal to one if two tweet datasets (in our case, similarity between the three temporal subsets, considered two by two) are similar in spatial unit i, and zero if the two are not similar at all.Further, n is the total number of spatial units (the LSOA polygons).
To visually analyze the spatiotemporal characteristics of our findings, we generated hourly and daily density maps for positive and negative tweets during the three time bins according to the user groups (residents vs visitors).There are many spatial tools used to understand changes in geographical patterns (Chainey & Ratcliffe, 2005).For this case study, we chose the KDE method, which involves placing a kernel over each observation (tweet), and, by summing these kernels, showing a density estimation of the observations' distribution (Fotheringham, Brunsdon, & Charlton, 2000).We chose KDE be-cause it belongs to a non-parametric class of density estimators, which has no fixed structure and depends on the point data to define an estimate; practically the form of the density is determined only from the data without any model.The parametric methods, such as Maximum Likelihood Estimation or Bayesian Estimation, assume to know the shape of the distribution.In addition, KDE is highly used for frequency distributions allowing a quick exploration of the dataset distribution.In the article the bandwidth selection was performed automatically by the software used, ArcGIS 10.4,where the kernel function is based on the quadratic kernel function.One of the main advantages of KDE is that it determines the spread of positivism and negativism in this case study, namely the area around a cluster where the likelihood for a positive or negative polarity is present based on spatial dependency.First, we split the data into hourly and daily segments and then ran the nonparametric KDE tool for each layer and temporal bin, which helped to illustrate spatial changes in residents' and visitors' tweeting behavior.

Results
Table 1 shows the summary of our main results that were generated through the methodology described above to provide an overview of the content and structure in this section.

RQ1 & RQ2: Geolocated Tweet Density and Sentiment Intensity for Temporal Subsets
We defined three temporal subsets for our analysis: the time period of the OG and the same number of comparison days before and after the Olympics to test the effect of the OG on spatiotemporal tweeting behavior and on the tweets' semantic content relating to RQ1 and RQ2.
One essential step for this study was to identify presumable residents and visitors.By applying the criteria Table 1.Results summary.

Positive
• August 4: high positive peak for residents (gold • August 4: positive sentiment peak in the • medals for Great Britain), hot spots in the city • daily temporal frame and slight increase in • center and at the Olympic Park; however, no • raw tweets intensity.• increase in residents' raw tweets intensity.
• Well-defined spatial hot spot at the • Opening Ceremony and Closing Ceremony clearly • Olympic Park.• stand out in the number of positive tweets.

Negative
• Mostly flat distribution on daily • Low oscillations for all tweets and higher • temporal patterns.
• for the topics, e.g., the transportation topic.• More negative hot spots outside the city center • Before the OG, negative sentiment exceeds • during and after the OG.
• the positive for a few days.

All tweets
• Residents and visitors show different temporal and spatial patterns.
• Higher number of unique visitors tweeting during the OG.
• Tweets' spatial distribution per hour shows the highest similarity during the night (low number of • tweets) and low similarity during the morning and evening (high number of tweets).
mentioned before (see Methodology section), we had to remove approximately 25% of the before OG tweets, 29% of the OG tweets and 24% of the after OG tweets.Practically speaking, we removed those users who tweeted exclusively in two temporal bins because it would have been difficult to distinguish whether they are just less actively tweeting residents or visitors who stayed a longer period than one month.The 11,571 London residents have the highest tweeting intensity during the OG compared to the visitors, who are texting more in the after OG period (Table 2).Regarding RQ1, we were able not only to distinguish residents and visitors in the dataset based on their temporal profile but also to identify clear and fundamental differences in the two groups' spatiotemporal behavior.Considering the different effect of planned events on residents and visitors, this finding has a key role in various planning-related social media analyses.
As for RQ2, large events tend to increase the social media participatory behavior (Wang, Can, Kazemzadeh, Bar, & Narayanan, 2012), which was also confirmed in this case by the highest density of tweets occurring during the OG (594,891 tweets).Another peak in tweeting intensity (545,693 tweets) was identified after the OG period (especially among visitors), which might be explained by the Paralympic games period and the London 2012 Festival as an accompanying event of the OG to organize "the most culturally engaging" OG in history (Brown, 2012).
Further, one of our hypotheses was that positive sentiments in the text will occur more often during a large event compared to other usual days for the same locations.This assumption was confirmed by the obtained sentiment scores for the six datasets used in this study: 7.65% of the resident tweets and 6.02% of the visitor's tweets during OG are positive, while just 3.04% respectively 2.24% are negative (Figure 3).There was a noticeable decrease in negativity, while the positivity increased.

RQ3: Semantic Topic Extraction
In every sub-dataset (spatiotemporally divided, see Methodology section), we can identify one or more related topics for our target topics "Olympics" and "transportation".Table 3 shows the ten words with the highest probability in the topic.Due to the limited space, we visualize only some of the topics.The reason for the missing syllables of the words is the pre-processing step, stemming, which cuts the word to its root.
Table 3 shows that we can clearly identify topics related to "Olympic" and "transportation", distinguishing the periods before, during and after the OG, as well as between residents and visitors.In all of the "Olympic"-  related topics, the word "olymp" has a high probability and a significantly high probability during OG compared to the other words in the topic.In the case of the "transportation"-related topics, multiple words show high probabilities, such as "station", "railway" or "underground".It is notable that the same words in the "transportation"-related topic show similar probabilities in the datasets for residents and visitors in different time periods.In the dataset after OG, when the Paralympics took place, "paralymp" is also the most probable word in the "Olympic"-related topic.

RQ2: Similarity Index
Figure 4 shows the similarity values in hourly bins as defined above (see Methodology section).The highest similarity values occur during the night when tweets are posted from the same LSOA areas, but they don't have a high density, according to the hourly intensity results.
Starting at 5:00 a.m. the similarity curve decreases until around 9:00 a.m.This shows that during the OG the spatiotemporal behavior of the users is different compared to before and after the OG.The more noticeable differences are at the end of the day, after 6:00 p.m., between residents and visitors after OG (ranging from 0.5814 to 0.5019), and between both visitor datasets (ranging from 0.5635 to 0.5019).

RQ1 & RQ2: Temporal Analysis
After extracting the topics and defining the sentiments of each tweet, we analyzed the temporal distribution of the negative and positive (see Methodology) tweets of residents and visitors, both on hourly and daily levels.

Daily Patterns
The daily tweet intensity using the raw Twitter data for residents showed two temporal peaks during the OG, at the Opening Ceremony and at the Closing Ceremony.The visitors' time series (unlike the residents) showed a peak in the OG period around August 4, when Great Britain won three gold medals in athletics.What is surprising is the higher volume of tweets after the OG for the visitors compared with the other time bins, including a peak during the Paralympics Closing Ceremony (Figure 5).Next, we compared the daily patterns for sentiments in the data subsets.Figures 6-8 illustrate daily intensities in sentiment distribution for residents and visitors during the three temporal frames.It shall be mentioned that in a small number of days the tweets volume for the specific topics is low, especially for the "transportation" topic.

Positive vs Negative Tweet Trends
While analyzing the tweets for residents and visitors, the daily distribution of negative tweets was fairly equal and smooth for "all tweets" (all six data frames, all topics).The negative tweets for residents included in the "olympic" topic have an almost flat trajectory, similar to the ones including "all tweets" (Figure 7 vs Figure 6), except July 11, when they showed an increase and the hot spot map showed higher intensity in the London Center areas, Lewisham and Morden, close to Wimbledon.The intensity of positive tweets was more predominant than the negative ones at any time (Figure 6 and Figure 8), with higher values during the OG and spatial concentration around the Olympic Park (Figure 9).For the "olympic" topic, the positivity curve reaches its maximum for residents on August 4 (from 0.15% negative tweets to 0.86% positive tweets), while for "all tweets" the peak is higher for visitors (Figure 6).In the newspapers, this day is referred to as "Saturday night fever", when Jessica Ennis, Greg Rutherford and Mo Farah all   won gold medals for the host nation.This shows higher public engagement when an action such as winning a prize by co-nationals takes place.
Regarding the "after OG" period, on September 2, when the USA team won the first medal in the trunk and arms mixed at the Paralympics, a decrease of negativity happened for residents, while during the same day an increase occurred for the visitors, together with an in-crease in the positive tweets (Figure 7).A common positive peak can be observed for "all tweets" and "olympic" on September 9 (Figure 6 and Figure 7), during the Paralympics Closing Ceremony.In comparison, the sentiments distribution of the "transportation" topic contains a smoothed zig-zag line, and an increase followed by a decrease in the positive tweets after the OG.Interestingly, on August 4 there was no peak in the results for ei-ther residents or visitors, compared to the other subsets.The tweeting behavior after the OG for "transportation" shows a higher difference between positive and negative tweets for residents, mostly from September 1 to September 9 (Figure 8).However, the maximum tweet volume is 30 per day.

Daily Trends of Residents vs Visitors
Before the OG, residents and visitors for "all tweets" showed slight changes in the trend line (Figure 6), while for the topic "olympic" the visitors' tweets tended to form a zig-zag-like time series (Figure 7), similar to the "transportation" topic (Figure 8).No daily spatial hot spots were found in the Queen Elizabeth Olympic Park for this period.
During the OG, residents and visitors for "all tweets" showed a higher volume of positive tweets, on August 1 and August 4 (Figure 6).On August 1, the spatial hot spots are distributed between London's central area and the Olympic Park area, and on August 4 a high density is located specifically around the Olympic venues: the Olympic Park zone, the River zone (including Greenwich park), and the Central zone (including Hyde Park and Re-gent's Park).For the residents, we also notice smaller hot spots in many parts of the city, which suggest an increased interest in people's tweeting behavior for a special occasion (Figure 9).August 4 is also a common peak for "olympic" tweets, mostly for residents, with a hot spot location around the Olympic Park, the city center and another one between these two as well, almost continuously.In the same time, the visitors show the hot spot only around the Olympic Park and with much lower intensity in the city center (Figure 7, 9).Interestingly, August 4 showed a positive peak that is not connected with the increase in the intensity of the raw tweets for residents.Another dissimilarity arose for the "transportation" topic, where the graphic of sentiments distribution showed a different pattern (Figure 8).For example, the highest positive peaks for the OG period occur on August 1 for the visitors and August 2 for the residents.The visitors' tweeting hot spots are in the city center and at the Olympic Park, while the residents' tweets are clustered in an elongated hot spot with median values around the Olympic Park.
For the After OG period, September 2 was a peak of positive emotions for residents and visitors for the "olympic" topic, showing an intense hot spot for the visitors at the Olympic Park, while the residents' hot spot included the park, but the center was shifted towards the western part of the park.On September 9 residents from the "olympic" topic (Figure 7) and visitors for this topic and "all tweets" (Figure 6) showed an increased positive feeling, possibly caused by the Paralympics Closing Ceremony.Also, on the spatial density for this day the hot spots are located in the approximate city center and in the Olympic Park zone.A different temporal pattern occurred for the "transportation" topic of the visitors' tweets (Figure 8).Two predominant days showed positive peaks, including September 1, when the majority of tweets are from the Olympic Park, and September 8 when all the active tweeting happened in the London center.

Hourly Patterns
After identifying peaks and patterns in the data on a daily level, we also analyzed the tweets' hourly spatial and temporal distribution.Figure 10 shows a general overview of the hourly distribution for the raw num-ber of tweets per user groups in the two weeks temporal frames (a total of six weeks).Residents before OG showed a cyclical daily circular pattern, with low intensity overnight then a rapid increase in the morning and a steep decrease a few hours after midnight.Interestingly, we can define peaks for residents and visitors during the OG for the Opening Ceremony and also for the Closing Ceremony, together with the "Saturday night fever" mentioned in the daily patterns on August 4 for residents.Another peak in tweeting intensity occurs on September 9 for the residents, possibly due to the Paralympics Closing Ceremony.

Paralympics
The following videos1 show the spatiotemporal pattern of positive and negative tweets (aggregated to 24h hours such as for Figure 12, in 10-minutes timeframes) in both user groups for all the three analysis periods.(A static version containing four different hours during the day can be found in the supplementary file.)Blue points represent negative tweets, whereas the positive ones are visualized in red.The semi-transparent points representing each tweet stay there for two hours to illustrate the density of tweets.Each video shows all the three temporal bins after each other (3 × 24 hours) for our two groups of users (residents and visitors), and a clock shows the current time in the lower right corner.

Changes in the Pattern Comparing the OG Period to Before and After
This section reflects to RQ2, as we compared the before and after OG periods to the patterns during the OG.For the residents, we can see that the core of the main hot spot is constant for each hour throughout all the analysis periods, and it is located mainly in the city center.For the negative tweets, the before and after periods are quite similar during the day, but for the positive tweets, there is still a hot spot around the Olympic Park.The reason for this is that our analysis period after the OG includes the days of the Paralympic Games as well.At the same 0 2 4 6 8 10 12 14 16 18 20 22 0,0% 0,2% 0,4% 0,6% 0,8% time, for the visitors, the pattern during the OG and in the other two periods is not that different.Except during the morning hours, but the lower number of tweets can explain this because in this case even 2-3 point can result in relatively strong hot spots.For residents during the OG, there are a few extra hot spots reflecting the venues of the OG.In the periods before and after the OG, the smaller hot spots in the outer parts of the city occur mostly in the morning and the evening, probably due to commuting, but only for the negative tweets.Interestingly, for positive tweets from visitors, the morning hot spot is more concentrated (they do not commute) but only before and during the Olympic Games.During the rest of the day, the patterns do not change significantly in any of the analysis periods, both for positive and negative tweets.

Positive vs Negative Tweet Trends
In general, the positive tweets of the residents tend to be more concentrated with one main hot spot, except the morning after the OG where the negative tweets are conglomerated.However, during the OG in the evening, the size of the positive tweet hot spot is much larger.Probably that was the time when most of the residents were tweeting about the Olympics.For visitors, this concentration of one large hot spot for the positive tweets is not significant; there are smaller hot spots around the city in both cases, and, in general, the hot spot for the negative tweets is larger in extent.

Residents vs Visitors
The most significant difference in patterns between residents and visitors is the distribution of negative tweets in the morning, but again this could be a result of the low number of tweets.Also, the smaller hot spots of the visitors' tweets tend to be more on the Eastern side of the study area, especially in the period after the OG.In general, it can be significant for planners to further investigate the trends and possible causes for negative tweets, as partially it might be connected to planning-related issues such as low satisfaction with infrastructure or poor quality of services.

Integration of the Results into Planning Processes
The major objective of our case study was to illustrate the general potential of Twitter data analysis for urban planning purposes in the case of large planned events.Thus, we identified and addressed research gaps, such as the distinction between residents and visitors regarding the Olympics and comparing event days and non-event days along with both spatiotemporal and content analysis in one study.Consequently, our results serve as a basis for further, more in-depth analyses.
In general, both previous research and the work presented in this article have shown that results from social media analysis are directly usable in urban planning processes, including the general ability to detect sentiments that are associated with places (Resch et al., 2016).In this regard, social media provide people (local citizens and visitors) with a simple and powerful instrument to share their opinions and subjective impressions.This is particularly relevant with respect to connecting social media posts to specific urban events such as Olympic Games or other large sports events, for gaining insight into the perceptions of the urban population regarding these events.In fact, social media are a valuable, open source of information for urban planning.
This openness is of particular importance because urban planning processes are oftentimes still characterized by closed communication between local and official actors, lacking open discussion and transparent procedures (Resch et al., 2016).Moreover, openness and transparency are increasingly a key factor for successful urban planning, allowing for an efficient weighing process that considers the opinions and sentiments of different stakeholders.Current planning processes, however, are mostly shaped by deductive processes, which are typically introduced and controlled by urban governments, oftentimes neglecting or not sufficiently integrating the needs of the citizens.In this context, social media play a key role because they provide an instrument for organizing public participation activities and citizen initiatives.On the positive side, the integration of public discussions on social media and other digital platforms also increase the validity and acceptance of governmental decision-making because traditional planning methods are complemented by new "human sensor" data that reflect the citizens' wishes and needs (Zeile, Resch, Exner, & Sagl, 2015).This is in clear contrast to top-down approaches that follow different decision-making principles.Integrating social media into urban planning may be able to provide unseen insights into citizens' thoughts, perceptions and expectations concerning urban events in an induc-tive bottom-up approach.In this sense, urban planningrelated discussions are, to some degree, self-organizing, giving citizens the chance to discuss planning issues in a peer-to-peer process, rather than in a governmentdriven one.However, the issues of the digital divide, that mostly younger, better educated, and more technologically savvy people participate in social media networks, should be addressed in social media analysis (Czepkiewicz, Jankowski, & Młodkowski, 2017).Due to this digital divide social media platforms are currently not representing the entire society or population appropriately (Diaz, Gamon, Hofman, Kiciman, & Rothschild, 2016;Mellon & Prosser, 2017), therefore conclusions drawn from the analysis depending on the phenomenon should be handled accordingly.The extremes are especially underrepresented in terms of age (very young, and older generations), economic situation (those who cannot afford access through internet or gadget), etc.Consequently, we are aware that the social media-based approach shows a number of limitations; still it may complement current urban planning procedures through an improved understanding of the city as a living organism through proactively engaging citizens into urban planning (Resch, 2013).
Based on the methods we used and their outcomes, we can identify two main types of further planning-related investigations (the list of the examples is not complete): a) Macro-scale: • On a city level, it is possible to point out the differences in the general mobility patterns compared to non-event days (also at different times during the day) and use it for further transportation modeling.The different needs of residents and visitors should be considered; • Planners can also further investigate the hot spots for negative tweets in both groups (residents and visitors); as they show different trends, there might be different reasons behind them.These hot spots can be compared with the extracted topics in these areas, whether they are related to transportation or other planning-related topics, the event itself, or something else entirely; • Regarding the extracted topics, it is possible to search for more specific terms, if the planners provide expert knowledge.Furthermore, other terms can be identified that are connected to the planning-related topics and have not been considered by urban planners yet.The sentiment and spatial distribution of these topics all over the city can then be explored, both for residents and visitors, as these might also differ in this case.
b) Micro-scale: • We could clearly identify activity patterns related to individual venues of the Olympic Games.An interesting example could be to focus on a venue and explore the behavioral patterns of residents and visitors and the effect of a given event at that venue.
(Specifically, right before or after and during the analysis.)Do the residents tend to be more negative?Or maybe less active during that time?• Additional datasets are definitely advantageous for the micro-scale analysis.Planners can explore deeper connections between the event and other urban processes.For example, the effect on the local economy of using bank cards can also be analyzed.(Habidatum, 2017); • Extracting information on a user level is also an option.For visitors, planners can trace the intra-urban mobility patterns, if they tweet regularly during the day (between their accommodation and the venues).This analysis is even more accurate with additional mobile phone data analysis.

Psychological Biases in Human Language and Social Media and Their Relevance for Urban Planning
Most generally, there is a universal positive bias in human language (Dodds et al., 2015).The findings of the present study are congruent with this kind of bias: A higher percentage of positive tweets than negative tweets were identified.Moreover, the residents show a clear peak of positive sentiments during the OG, even though there have been several examples observed recently where local people opposed the organization of the Olympics in their cities (e.g., Kaufmann, 2015;Moore, 2015;Sims, 2017).It might suggest that, once underway, world-class events in a city boost self-respect and pride of the city residents, and the perceptions of their benefits are typically optimistic (Whitson & Macintosh, 1993), which might not be true in the planning phase (e.g., Dempsey, & Zimbalist, 2017).However, it is important to consider that the positive cognitive bias and the homeostatic happiness maintain satisfaction in life, and self-beliefs can act as reality buffers (Cummins & Nistico, 2002).This, again, raises the question of whether the positive cognitive bias can be a buffer that masks some inconveniences in the city that large events such as the OG may cause.For example, Ritchie, Shipway and Cleeve (2009) identified that, in general, urban residents supported important events in their area, but were concerned with some issues such as traffic congestion and an increasing cost of services.Additionally, the benefits of large events in a city can differ between social groups.
For example, younger residents, residents that have a higher socioeconomic status, and residents that live farther away from the event's location, are more likely to perceive additional benefits from the event (Ritchie et al., 2009;Whitson & Macintosh, 1993).The results of our study suggest positivism related to the OG, but beyond this social media positivism, there are several considerations explained above that urban governments and urban planners need to study.In other words, the obtained results are good indicators of the importance of large sports events for residents' life satisfaction, but, in urban planning, these results cannot be isolated from the rest of the city dynamics.
These issues become critical if we consider that urban governments are usually open to investing in consumer and entertainment-oriented developments, such as sports events (Harvey, 1987).However, citizens in a city are more than consumers.Additionally, a large sports event causes changes in different dimensions in a city such as image, knowledge, and emotions, where the long-term effects of these changes are complex to understand (Preuss, 2007).We believe that long-term social media analysis can be considered a necessary instrument to monitor these effects in a city and to offer more pluralistic information to urban planners.Urban planners can use this information to evaluate different qualitative and quantitative costs and benefits of a large sports event.Further research needs to develop new approaches to study large urban events' legacies using social media.At the same time, these approaches need to be enriched with robust epistemologies to understand the complex and dynamic human behavior in the virtual world (social media), without disconnection of the human behavior in the real world.

The Effect of the Paralympic Games on the Selection of the Temporal Bins
We were aware of the fact that there were days related to the Paralympics in the third temporal bin and, thereby, our comparison might show less significant differences.However, we also tested a fourth temporal bin (September 27-October 13, 2012), and the patterns in the original temporal bin (August 27-September 12, 2012) were not biased, except the day of the closing event.Therefore, we decided to keep the original after OG temporal bin because selecting days so much later can also have an effect on the final results, and those would be more difficult to interpret, such as different seasonal effects, or extraordinary events.

Identifying Residents vs Visitors
The process of categorizing Twitter users into "residents" and "visitors" is challenging, and, to our best knowledge, there is no "ground truth" methodology, providing unquestionable results, in the related literature.Also, Abbasi et al. (2015) stated that dividing social media users into residents and visitors is not an easy task.They categorized these types of people in Sydney for city trips supporting urban planning.Our study follows an adapted method from the original when the residents are defined as users tweeting at least ten times in at least n−1 phases of the temporal data analysis.One reason for using this adapted approach is that our datasets had specific time frames related to the OG event, and we hypothesized that for finding so-called active residents, they would need to tweet in all n temporal bins, while the visitors had to tweet just in one of the temporal bins.
The spatiotemporal patterns identified and described in this study show a relative verification of this approach.For example, August 4 was an important day for Great Britain and it was undoubtedly reflected in the spatial and temporal patterns for both presumable residents and visitors: as a monocentric, well-defined daily hot spot around Olympic Park for the visitors, and as polycentric hot spots for residents in many parts of the city, with particularly high density around the city center and also around the park.The daily temporal graphic for positive tweets supports the hot spot map for residents by highlighting a larger increase in positive sentiments compared to visitors.This may be because visitors are generally more excited and tweeting positively for all the OG results, while residents are more interested in Great Britain's performance.August 4 is important in the analysis because it emphasizes people's behavior and how the positive event of winning three gold medals changes the spatial distribution of tweets.Also, for the hourly hot spot detection, we notice more intense hot spots around the Olympic Park for visitors than for residents, i.e. at 8:00 a.m. or 12:00 noon there are only mild or nonexistent hot spots for residents at the park.
However, this particular approach has limitations: We did not consider the declared language of the users (e.g., maybe non-English speakers are more likely to be visitors tweeting before and during OG); the tweets' "user location" field was not used considering the biased information introduced subjectively by the user; data availability-we only had access to 2012 London data, whereas having worldwide data would have been helpful for exploring user's activity status.An interesting future approach may be to adopt all location-related features and create an index of defining residents and visitors.

Topic Modeling
In our analysis we used the basic LDA model for topic modeling that follows a "bag of words" approach, meaning that it uses solely the frequency of terms in a document and does not take grammar or word order into account.A significant problem is names, which consist of two words like "Greater London", where each word is treated independently.As "Greater" and "London" are words that are commonly used in combination, analyzing biterms may increase the quality of our results.However, there are other names like "Olympic Games" where "Games" is a common word in an English conversation.For our particular case, relevant biterms include "Greater London", "Olympic Games" and "Victoria Station", which ware included as single words in Table 3.

Influence of Topic Modeling on the Sentiment Score
The distribution of positive tweets during the OG for the "olympic" topic is different for both residents and visitors.While depicting the possible reasons, we noticed the different word probabilities resulting from the LDA algorithm, such as the word "olymp", with a probability of 0.2676 during the OG period for residents and 0.0426 after the OG for visitors.The inclusions of words not relevant to the topic (which is subjectively named after checking the highest words probabilities), such as "iphon" in the Olympic topic might lead to an unexpected temporal distribution.Also, the sentiment score function is limited in defining a high volume of positive and negative tweets, their majority being labeled as neutral, which may result in increased fuzziness in interpreting the results.

Conclusion
In conclusion, our findings validly answer our research questions: Through spatiotemporal and sentiment analysis of the tweets, we could identify significant patterns in terms of our two defined user groups as well as for the days before, during and after the event.Additionally, the uncertainty originating from the identification of the members of each group due to the lack of additional details can be reduced by integrating further datasets (e.g., cab rides, bicycle network and public transport usage, mobile phone data).
Regarding the utilization for planning purposes, we can state that despite the limitations described above, by applying our workflow to the sample dataset we can provide valuable information about the spatiotemporal behavior and sentiment of residents or visitors concerning large planned events.By comparing our results to important dates of the event (e.g., the Closing Ceremony, "Saturday night fever") or location of the venues, we could validate our results both content-wise and for spatiotemporal patterns, even on finer spatial and temporal scales.Last but not least, topics that are directly related to planning and transportation could be extracted and can be further analyzed for specific urban planning purposes in the future.
Concluding, the case study was also appropriate to illustrate the potential of utilizing social media data for sentiment analysis and topic modeling in order to provide general feedback regarding large planned events.Nevertheless, there are possible ways for improvement beyond the scope of the current study that can also aid to overcome some of the already mentioned limitations.One such option is to design a geovisual analytical tool to interpret the large amounts of data (e.g., maps, graphs, tweets, time periods), also supporting users who are less familiar with GIS concepts and methods.Furthermore, as an outlook to participatory planning, the acquired knowledge could be presented in a Volunteered Geographic Information platform, which is directly connected to the event and where people can provide feedback with location data.

Figure 4 .
Figure 4. Similarity Index distribution for OG tweets as base points (range 0 to 1 where 1 means identical pattern for both analysis periods and 0 represents no similarity at all).

Figure 5 .
Daily tweets density for residents and visitors for the three temporal bins (R = residents, V = visitors; B = before OG, O = during OG, A = after OG).

Figure 8 .
Figure8.Daily sentiment analysis distribution for residents and visitors for the "transportation" topic (percentage of total number of tweets in the given temporal bin for the respective categories).

Figure 10 .Figure 11 .
Figure 10.Residents and visitors tweeting behavior per hour for the three temporal bins.

Figure 12 .
Figure 12.Positive and negative tweets per hour (%): a) as absolute values-every hour compared to the number of all tweets during the OG for residents and for visitors; b) as relative values-every hour compared to the number of all tweets in that hour during the OG for residents and for visitors).

Figure 13 .
Figure 13.Positive and negative tweets per hour (%) for residents and visitors for the "olympic" topic.

Table 2 .
Residents and visitors for the three temporal subsets.

Table 3 .
Examples of words and their probabilities for the identified topics "Olympic" and "transportation".
Daily sentiment analysis distribution for residents and visitors for the "olympic" topic (percentage of total number of tweets in the given temporal bin for the respective categories).
Figure6.Daily sentiment analysis distribution for residents and visitors for "all tweets" (percentage of total number of tweets in the given temporal bin for the respective categories).