The Changing Face of Accountability in Humanitarianism: Using Artificial Intelligence for Anticipatory Action

Over the past two decades, humanitarian conduct has been drifting away from the classical paradigm. This drift is caused by the blurring of boundaries between development aid and humanitarianism and the increasing reliance on digital technologies and data. New humanitarianism, especially in the form of disaster risk reduction, involved government authorities in plans to strengthen their capacity to deal with disasters. Digital humanitarianism now enrolls remote data analytics: GIS capacity, local data and information management experts, and digital volunteers. It harnesses the power of artificial intelligence to strengthen humanitarian agencies and governments’ capacity to anticipate and cope better with crises. In this article, we first trace how the meaning of accountability changed from classical to new and finally to digital humanitarianism. We then describe a recent empirical case of anticipatory humanitarian action in the Philippines. The Red Cross Red Crescent movement designed an artificial intelligence algorithm to trigger the release of funds typically used for humanitarian response in advance of an impending typhoon to start up early actions to mitigate its potential impact. We highlight emerging actors and fora in the accountability relationship of anticipatory humanitarian action aswell as the consequences arising from actors’ (mis)conduct. Finally, we reflect on the implications of this new form of algorithmic accountability for classical humanitarianism.


Introduction
Humanitarian funding requirements have tripled since 2008 (ALNAP, 2018) due to the increasing occurrence of disasters caused by natural hazards and conflict. More than 50% of the people affected by disasters live in fragile and conflict-affected states as Kellett and Sparks (2012) showed for [2005][2006][2007][2008][2009]. While humanitarian expenditure increased from US $2,1 billion in 1990, the end of the Cold War, to US $30 billion in 2017 (Donini, 2017), significant gaps remain between resources and needs. Climate change threatens to push an additional 100 million people into extreme poverty by 2030 (Hallegatte et al., 2015), increasing the need for funding climate change adaptation and disaster risk reduction (DRR).
Significantly, the later humanitarianisms are the consequence of a digital turn partly in response to the resources-needs gap. The digital turn stresses the importance of connectivity, the potential of big data, and innovative financing to improve the speed, quality, and costeffectiveness of humanitarian response and to help anticipate and respond to crises (World Economic Forum, 2017). Data is now becoming the new currency for humanitarian response leading to "new ways of strengthening communities and giving them back the power to help themselves" (World Economic Forum, 2017, p. 14). Cyber-humanitarianism or humanitarianism 2.0 are broad terms used to describe the increasing reliance of humanitarian action on these new digital technologies and data sources (Duffield, 2013). Digital humanitarianism is "the enacting of social and institutional networks, technologies, and practices that enable large, unrestricted numbers of remote and on-the-ground individuals to collaborate on humanitarian management through digital technologies" (Burns, 2014, p. 52). Whereas digital humanitarianism usually refers specifically to the involvement of remote and digital volunteers, we will henceforth refer to network, digital, cyber and digital, 2.0 collectively as 'digital humanitarianism. ' New and digital humanitarianisms emerged in parallel to debates around the changing meaning of accountability, especially after the setting up of the Humanitarian Accountability Project in 2001. With new humanitarianism ending the distinction between development aid and humanitarian action in the early 2000s, humanitarians imported the concept of accountability from development aid and raised it to a "tenet of humanitarian action" (Klein-Kelly, 2018, p. 292). This is especially true in DRR, which is "weaving together humanitarian aid and development like never before" (Hilhorst, 2015, p. 105). At that time the development sector was in thrall of the famous 'accountability triangle' (World Bank, 2004), which linked citizens to policymakers and service providers via the indirect 'long-route' of accountability and citizens/consumers directly to service providers via the direct 'short-route' of accountability. At the same time, information technologists were arguing that 'accountability technologies' such as ICT platforms based on mobile phones could strengthen the 'short-route' of accountability and enhance citizen/consumer power. However, such digital 'accountability technologies' fail when they reduce citizens to mere humanitarian aid consumers and flourish only when they also construct the citizen as a citoyen-a human agent engaging in judgment about public issues in relation to and with othersand as a member of a political, tribal or religious community (e.g., Katomero & Georgiadou, 2018). Similarly, UNHCR's techno-bureaucratic 'accountability technologies' for refugee protection give rise to accountability gaps instead of enhancing accountability (Jacobsen & Sandvik, 2018).
Digital humanitarianism engenders accountability challenges, in particular when using artificial intelligence. Artificial intelligence-whether in the form of expert systems replicating human decision rules or in the form of machine learning generating predictive models with probabilistic reasoning-constitutes a new form of humanitarian experimentation (Duffield, 2019). The difference to previous experimentations such as when vaccines are deployed "in foreign territories and on foreign bodies to test new technologies and to make them safe for use by more valued citizens often located in metropolitan states" (Jacobsen, 2010, p. 89) is that artificial intelligence, especially in its machine learning form, is already widely used and contested in non-emergency contexts in metropolitan states, e.g., to predict the likelihood of welfare recipients to commit fraud and of former prisoners to recidivate and to drive the allocation of public housing and food stamps . Only a bare minimum of relevant accountability standards are currently in place (e.g., FAT/ML, 2018;Korff, Wagner, Powles, Avila, & Buermeyer, 2017). Clearly, when accountable artificial intelligence is lacking even in non-emergency contexts in the global North, the likelihood of artificial intelligence in emergency contexts in the Global South harming vulnerable populations is dramatically increased (Sandvik, Jacobsen, & McDonald, 2017).
It is against this backdrop that this article traces how accountability changes its meaning as the scope of humanitarian conduct and the type of involved actors shifts from classical, to new and to digital humanitarianism. We focus on forecast-based financing, a nascent form of anticipatory humanitarian action (Pichon, 2019), and explore an empirical case in the Philippines where artificial intelligence is used to create triggers for early action before a typhoon makes landfall. Though the Philippines is becoming more developed, it is extremely prone to natural hazards and regularly experiences humanitarian disasters, necessitating a permanent presence of the United Nations Office for the Coordination of Humanitarian Affairs since 2007. The novelty of the case allows a first reflection on which form of accountability artificial intelligence requires in anticipatory humanitarianism.

Classical Humanitarianism and Thick Accountability
The ethical ground of classical (or Dunantist) humanitarianism is a profound feeling of compassion and responsibility to those suffering in extremis. The principles of humanity and impartiality are the universal goals of humanitarian ethics, while neutrality and independence are instrumental measures to achieve these goals in the actual political conditions of armed conflict and disaster (Slim, 2015). Humanity ("address human suffering everywhere, especially for the most vulnerable, with regard to human dignity") demands that humanitarian action takes account of the human person, "all of her or him" (Slim, 2015, p. 49). Impartiality ("provide aid based solely on need, without any discrimination") applies rational objectivity on compassion. Independence ("ensure autonomy of humanitarians from political, corporate and other interests") and neutrality ("avoid taking sides in hostilities or engaging at any time in controversies of a political, racial, religious or ideological nature") secure access in highly politicized environments (Gordon & Donini, 2015). In sum, classical humanitarianism treats the symptoms, not causes of suffering, and stands clear of politics (Barnett, 2013).
The meaning of accountability in classical humanitarianism can be best elucidated by referring to the International Committee of the Red Cross's Accountability to Affected People Framework: Proximity is essential to understanding the situation and assessing people's material and protection needs based on their specific vulnerabilities (age, gender, disability, etc.). Staff members' physical presence enables them to develop a dialogue with communities, listen carefully to people's fears and aspirations, give them a voice and establish the human relationships necessary to "ensure respect for the human being," which is a crucial aspect of the Fundamental Principle of humanity….In this sense, proximity is a driver of accountability and a prerequisite of effectiveness and relevance. (International Committee of the Red Cross, 2019) Although the International Committee of the Red Cross takes responsibility for transparent accounting to communities and donors, the accountability of its staff members seems to rely as much-if not more-on internalized humanitarian principles and moral commitments, following a deontological, obligation-bound ethos to alleviating suffering. This approach echoes 'thick accountability,' a concept defined by political scientist Mel Dubnick (2003) as "a substantive set of expectations reflecting one's standing within [the] moral community" (p. 6) of fellow humanitarians. It is a justificatory account to oneself (Pfeffer & Georgiadou, 2019) that goes beyond simple answerability to donors and program participants in the form of, for example, reporting on outputs of a project. Thick accountability is also reflected in the moral obligation of the international community vis-à-vis sovereign states that fail deliberately or because of a lack of means to protect their population. During the 2005 World Summit, the international community accepted a 'responsibility to protect' and declared their preparedness to take timely and decisive action, when national authorities manifestly fail to protect their populations from genocide, war crimes, ethnic cleansing and crimes against humanity (United Nations, 2020). Similarly, the Inter-Agency Standing Committee can decide to initiate a humanitarian system-wide response (Inter-Agency Standing Committee, 2020) in case a disaster caused by a natural hazard surpasses the capacity of a state to respond. In this case, the sovereign state has to ask for and agree to this international support.

New Humanitarianism and Public Accountability
The Agenda for Humanity defines 'working differently' as a core responsibility to end need. This requires the reinforcement of local systems, the anticipation of and not waiting for crises to happen (hereafter, anticipatory action), and the transcendence of the humanitariandevelopment divide (Agenda for Humanity, 2020). Also, the Sendai Framework for DRR (United Nations Office for Disaster Risk Reduction, 2020): Transcends traditional dichotomies between development and humanitarian relief or developed and developing countries or conflict/fragile and peace situations. Indeed, every single investment and measure, whether for development or relief, can reduce disaster risk or increase it depending on whether it is riskinformed. (pp. 6-7) New humanitarianism rejects the principle of neutrality and includes more politicized activities beyond relief assistance such as improving the welfare of vulnerable populations and strengthening state institutions, integrating human rights and peacebuilding into the humanitarian orbit (Fox, 2001).
Thus, new humanitarianism "changes the focus on the humanitarian act-characterized as the charitable impulses of the giver or their compliance with humanitarian principles-to the rights of an empowered beneficiary seeking to realize rights to which s/he was entitled" (Gordon & Donini, 2015, p. 87). DRR is new humanitarianism at its most politically expressive. It requires proactively 'inducing political will' with unprecedented levels of 'public accountability' (Olson, Sarmiento, & Hoberman, 2011). This paradigm forces "DRR onto political and policy agendas at all relevant levels and across all relevant sectors and provides a combination of spotlight and microscope on development/redevelopment proposals or actions that have hazard-and therefore risk-implications" (p. 60). Olson et al. (2011, pp. 60-61), drawing from Ackerman's (2005) and Bovens' (2007) accountability theory, define public accountability in the context of disaster risk management and a (politicized) new humanitarianism as: A relationship between an actor and a forum, in which (a) the actor has an obligation to explain and justify his or her plans of action and/or conduct, (b) the forum may pose questions, require more information, solicit other views, and pass judgement, and (c) the actor may see positive or negative formal and/or informal consequences as a result.
The key concepts-actor, forum and consequences-in the accountability relationship are imbued with new meanings in digital humanitarianism.

Digital Humanitarianism and Algorithmic Accountability
Digital humanitarianism goes beyond the evolutionary use of ICT for new humanitarianism in a number of ways. First, individuals contribute remotely to humanitarian workers in the field via the OpenStreetMap ecosystem to support vulnerable people and their livelihoods, while global experts leverage satellite remote sensing, Unmanned Aerial Vehicles and geo-intelligence algorithms to identify complex geospatial patterns on the ground. Second, digital humanitarianism evolved into humanitarian activism in 2014 with the Missing Maps project, which mobilizes both remote digital and local volunteers to trace satellite images of disaster-prone areas (Givoni, 2016) during and between disasters and put vulnerable communities on the map. Third, humanitarian organizations and governments are now building digital capacity to deal with satellite and drone imagery, mobile services, social media, and online communities and social networks (van den Homberg & Neef, 2015). For example, 510, an initiative of the Netherlands Red Cross, has been supporting the creation of local data capacity and provision of remote data services to over 30 Red Cross National Societies in the global South since 2016. Similarly, the United Nations Office for the Coordination of Humanitarian Affairs' Centre for Humanitarian Data assists humanitarian partners and the Office's staff in the field. Fourth, the digital turn signaled the dynamic entry of private entrepreneurs and corporate philanthropists in the humanitarian space, an excellent branding and public relations opportunity with further potential benefits, such as increased visibility, access to new markets, access to data, and opportunities to pilot new technologies (Madianou, 2019).
While digital humanitarian actors often present their initiatives as 'neutral,' as a means to an end that will make humanitarian aid faster and more cost-effective, digital humanitarianism has constitutive effects and an agentic capacity to change the social order (Jacobsen & Fast, 2019). It may marginalize the contextual expertise of national and local staff (because they lack the capacity to datafy their expertise) and privilege the technical expertise of outsiders (Jacobsen & Fast, 2019). Mulder, Ferguson, Groenewegen, Boersma, and Wolbers (2016) showed that during the Nepal earthquake, the crowdsourced crisis data replicated existing inequalities (e.g., due to lack of digital literacy and access), creating maps that reflect the density of people able to participate online, rather than the severity of needs. Digital humanitarianism might also blur care and control. Think of cash transfers, resulting in faster, more secure, and more dignified aid (care) but also giving access to vast amounts of data to actors with non-humanitarian intentions (control; Jacobsen & Fast, 2019). The entry of new digital actors and fora to hold them accountable for the consequences of deploying algorithmic socio-technical systems reframe accountability as 'algorithmic,' a relationship where: Multiple actors (e.g., decision makers, developers, users) have the obligation to explain and justify their use, design, and/or decisions of/concerning the system and the subsequent effects of that conduct. As different kinds of actors are in play during the life of the system, they may be held to account by various types of fora (e.g., internal/external to the organization, formal/informal), either for particular aspects of the system (i.e., a modular account) or for the entirety of the system (i.e., an integral account). (Wieringa, 2020, p. 10) While Wieringa firmly embeds 'algorithmic accountability' within accountability theory (Bovens, 2007), she draws from non-emergency contexts in the global North to ground it empirically. An example is the Dutch risk profiling system (SysteemRisicoIndicatiem, or SyRI) used by Dutch municipalities to assess which welfare beneficiaries are more likely to commit fraud in social security and income-dependent schemes. In 2019, a coalition of civil society organizations-including the Dutch Platform for the Protection of Civil Rights, the Netherlands Committee of Jurists for Human Rights, Privacy First-united under the name Suspect by Default and sued the Dutch government for violating the human rights and data protection of the vulnerable people SyRI mostly targeted. According to the coalition: The application of SyRI constitutes a dragnet, untargeted approach in which personal data are collected for investigation purposes….SyRI is a digital tracking system with which citizens are categorized in risk profiles and in the context of which the State uses 'deep learning' and data mining. (Dutch Trade Federation v. The State of The Netherlands, 2020) The Court banned SyRI in February 2020 for breaching the European Convention on Human Rights. The Court drew attention to the actual risk of discrimina-tion and stigmatization resulting from the socioeconomic status and possibly migration background of citizens in disadvantaged urban areas where SyRI was deployed. The SyRI case illustrates the workings of legal accountability, the most unambiguous type of public accountability: A legal forum, the Hague District Court, scrutinizes the conduct-the compliance of SyRI legislation with Article 8 paragraph 2 of the European Convention on Human Rights (Council of Europe, 2020)-of the accountable actor, i.e., the Dutch government.
Emergency contexts complexify algorithmic accountability, especially when human rights or data protection legislation is absent or weakly enforced. As Sandvik et al. (2017) argue, largely untested and non-consented humanitarian interventions are deployed "because something has to be done" (p. 328), lesser standards are employed in analyzing the need and evaluating the effectiveness of an intervention, while the power asymmetry between humanitarian actors and subjects is radically increased. With humanitarian organizations now experimenting with novel artificial geo-intelligence-machine learning algorithms automatically creating maps of, e.g., buildings and their construction materials, or identifying intricate patterns across physical, environmental, and socioeconomic geospatial data-speed and scalability, but also complexity and abstraction of the scrutinized community and its territory can increase dramatically. In humanitarian contexts in the global South, the accountable 'actor' is more complicated than in the Dutch example; in addition to the humanitarian organization, the 'actor' comprises commercial geospatial and mobile phone companies, self-organizing voluntary networks of digital humanitarians, universities and international space agencies, while the 'forum' may lack the muscle of a coalition of civil society organizations to hold the 'actor' to account. The case in the next section illuminates the new dynamic of artificial geo-intelligence in humanitarian action in the Philippines.

Forecast-Based Financing and Trigger Development
Traditionally, disaster governance has focused on emergency response, reconstruction, and rehabilitation for large-scale disaster events (Kellett & Caravani, 2013). However, studies have shown that it is more costeffective to invest in early or anticipatory action (Pichon, 2019) to reduce disaster risk (Mechler, 2005;Rai, van den Homberg, Ghimire, & McQuistan, 2020).
In 2008, the Red Cross Red Crescent movement introduced Forecast-based Financing (FbF) for early action and preparedness for response. FbF enables access to the so-called Disaster Response Emergency Fund, a funding source habitually only available for humanitarian response, via an Early Action Protocol (EAP). The EAP is triggered (Red Cross, 2018) when an impact-based forecast-i.e., the expected (humanitarian) impact as a result of the expected weather-reaches a predefined danger level. An EAP outlines the potential high riskprone areas where the FbF mechanism could be activated, the prioritized risks to be tackled by early actions, the number of households to be reached against an expected activation budget, the forecast sources of information, the expected lead time for activation, and the agencies responsible for implementation and coordination. The first FbF pilots were deployed in 2013 in Togo using a self-learning algorithm for flood forecasting and Uganda (Coughlan de Perez et al., 2015) including text mining of online newspapers to obtain the impact data required for calibrating triggers. Eight EAPs for sudden-onset disasters have been established and approved to date since the first one in 2018.
FbF is an instructive case for exploring the relation between digital humanitarianism and accountability, since big data and artificial intelligence are instrumental for trigger development. The first step of trigger development (Red Cross, 2018) is the creation of a risk and impact database with a high spatial and temporal resolution. This is done using techniques such as the acquisition of remotely volunteered geographic information for vulnerability data, object detection on remote sensing imagery for exposure data, automated damage assessments, and text mining on newspapers for impact data. The second step is a weather forecast skill analysis for different hazard forecasting models followed by the actual impact-based modeling. This can be as simple as overlaying the best weather forecast with the risk data. In its most advanced form, statistical modeling (with machine learning) is applied to historical hazard events and their impacts. The triggers based on an artificial intelligence algorithm must, however, not only allow for the timely and well-targeted implementation of actions but also guarantee accountability. We examine this tradeoff for FbF in the Philippines, where the EAP for typhoons was approved in November 2019 and triggered during typhoon Kammuri in December 2019 (Red Cross, 2019). In the following sections, we use the accountability concepts of actor, forum, and consequences to explore the machine learning trigger of FbF in the Philippines.

Identifying the Actors
Machine learning algorithms are not solely technical objects but part of socio-technical systems and must be scrutinized from legal, technological, cultural, political, and social perspectives. It is precisely this "rich set of algorithmic 'multiples' that can enhance accountability rather than limit it" (Wieringa, 2020, p. 2). The machine learning algorithm is part of a more extensive sociotechnical system, typical of DRR, and requires multiple stakeholders to realize substantive achievements (Olson et al., 2011). FbF traverses different phases comparable to the software development cycle of planning, analysis, design, implementation, testing/integration, and maintenance (Wieringa, 2020). In the Philippines, FbF is in the implementation phase; it is neither fully integrated yet into the Philippine Red Cross Operations Center nor adopted by the government. The constellation of actors will, however, evolve as the FbF phases into testing/integration and maintenance. FbF in the Philippines started with an extensive stakeholder mapping exercise and the establishment of three working groups: trigger, early actions, and financing. The trigger or Technical Working Group brings together members of national government agencies responsible for hazard forecasting, emergency preparedness, and response, as well as the United Nations and INGOs: Office of Civil Defense, Department of Interior and Local Government, Philippine Atmospheric, Geophysical and Astronomical Services Administration, Department of Social Welfare and Development, Department of Agriculture, Commission on Audit, Food and Agriculture Organization, Care International, Oxfam, WFP, START Network, Philippine and German Red Cross. Some of these organizations are also working on anticipatory action, for example the Food and Agriculture Organization for droughts and Oxfam for typhoons. 510 was not part of the Technical Working Group but contributed via the German Red Cross, their contractor, to the development of the algorithm.
The algorithm classifies municipalities into two groups: Those having more than or less than 10% of the houses completely destroyed (Wagenaar et al., 2020). The algorithm is trained on 27 historical typhoons in the Philippines. For each typhoon, the predictand consists of the number of completely damaged houses. The approximately 40 predictors include hazard (typhoon wind speed, track, and rainfall), exposure (population density, number of households), topography and geomorphology (slope, ruggedness, elevation) and vulnerability features (roof material, wall material, percentage of population below 5 and above 60 years old, poverty index). The vulnerability and exposure features are considered to be the same for all typhoons, while the hazard features are specific to each event. Data sources are mostly national organizations such as the Philippines National Census, National DRR Management Council, and Nationwide Operational Assessment of Hazards. For a few features, data from international sources, such as NASA or the Japan Meteorological Agency, are used. It is essential to have data on the predictor and predictands with national spatial coverage and at the same administrative levels. The municipality level is selected as the smallest geographic level because all data is available at this lowest resolution. The subsequent selection of program participants within a municipality is done via a prior and within lead time process with local stakeholders (a barangay validation committee). This means that FbF in The Philippines is partly a human-out-of-the-loop (selection of municipalities) and partly a human-in-the-loop process (selection of program participants).
The actor primarily accountable for the design decisions embedded in the algorithms are the developers of 510. However, critical decisions were taken together with the German and Phillipine Red Cross and-but to a lesser extent-also the Technical Working Group. Such design decisions may affect the outcome of the FbF mechanism. In terms of predictors, specific vulnerability indicators could not be included due to a lack of data at the municipality level. In some cases, proxies were included, e.g., data on households occupying a rent-free plot as a proxy of informal settlements. In other instances, choices were data-driven, for example, by analyzing which weather forecasting models have the best forecast skill. Several performance metrics were used to select the best machine learning model, whereby choices were made. For instance, the model that predicts more cases of damage when there is no damage (false positives) was preferred over a model that has problems identifying cases with damage (false negatives). The German and Philippine Red Cross practitioners also did a reality check on the predictions of the machine learning models based on their field experience and historical knowledge, which led in many cases to further refinements of the machine learning model. Bierens, Boersma, and van den Homberg (2020) elaborate on how legitimacy, accountability, and ownership influenced the implementation of the model using focus group discussions in the Philippines. Although 510 organized missions to assess the requirements for the machine learning model and held co-design sessions, the Philippine Red Cross has not yet fully adopted the machine learning model because of limited digital data and capacity within their organization and the sporadic involvement of local actors in model development (Bierens et al., 2020).

The Forum and Accountability Consequences
The forum-or rather multiple fora-pertain to the audience to which the actors are accountable, either upward, horizontally or downward, while accountability can also be ex ante, in media res or ex post the disaster event (Wieringa, 2020).
The algorithm developers of 510 are horizontally accountable. The 510 team extensively and iteratively reviewed the machine learning model regarding technical soundness and responsible data usage (510, 2020) and disclosed it openly on GitHub. 510 voluntarily submitted the model for peer review to the United Nations Office for the Coordination of Humanitarian Affairs' Centre for Humanitarian Data (United Nations Office for the Coordination of Humanitarian Affairs, 2020) and to academic peer reviewers through journal submissions. More importantly, the algorithm was submitted as part of the EAP to the Validation Committee with members of the International Federation of the Red Cross and Red Crescent Societies, the Climate Centre, and National Societies active in FbF. This committee has authoritative power as they can approve or reject the EAP. Only if an EAP is approved can the trigger model be used to get access to the Disaster Response Emergency Fund. They are well aware of the context in which the machine learning model is applied, and they always critically assess whether less complex models, for example, expertbased rules could be used instead. The Philippines EAP (Philippine Red Cross, 2019) had a few minor change requests before final approval (that is for two years after which the EAP has to be updated and resubmitted for approval). The government of the Philippines is not using the algorithm, and in that sense, they currently have no authoritative power. The Philippine Red Cross, as legally stipulated in a Republic Act (Official Gazette, 2009), is an auxiliary to the Philippines government in the humanitarian domain. It can disseminate information to communities that will be affected and support them in taking early actions to protect themselves.
The users of the algorithm are horizontally, upward, and downward accountable for their 'conduct' in media res and ex post. The German and Philippine Red Cross are horizontally accountable to the Validation Committee as they request the submission of a revised EAP that integrates all the lessons learned throughout the activation. This revision includes an evaluation of how well the trigger functioned. In terms of the early actions, if an EAP is activated and the disaster event does not materialize, the National Society will not have to return the funds to International Federation of the Red Cross and Red Crescent Societies. Within the FbF system, it is recognized that there may be times when the trigger is reached and early actions implemented, but the disaster does not occur. FbF acts under a 'no regret' principle. Moreover, EAPs with more than three days lead time should include a stop mechanism to avoid taking additional actions if the forecast changes and no further actions are required. Downward accountability towards affected populations is notoriously difficult for anticipatory systems (Sufri, Dwirahmadi, Phung, & Rutherford, 2020). During the EAP creation (ex ante), there was no explicit downward accountability but rather humancentered design. The identification and prioritization of the early actions are done via an intensive process of leveling workshops, focus group discussions, key informant interviews, and simulations. An EAP contains an analysis of the consequences for the affected population of acting in vain, whereby early actions which are still beneficial for the population in case of false alarm are prioritized. In addition to the co-creation of the early actions, 510 organized human-centered design sessions with the potential algorithm users.
The donors of FbF, such as the German Federal Foreign Office in the Philippines, request monitoring and evaluation (Gros et al., 2019) of FbF pilots and EAP activations. This is usually done by monitoring and evaluation officers of the implementing organization as well by external consultants for the final evaluation. Monitoring and evaluation consists of participatory methods to obtain feedback from communities and local organizations on the project. Monitoring and evaluation therefore represents not only horizontal (within the organization by the monitoring and evaluation officer) and upward (towards the donor) but also downward accountability. Overall, existing evidence indicates that the effects of anticipatory action at the household level are mainly positive. Prospective affected people, for instance, experience less psychosocial stress when the hazard hits and less loss of livelihood means. However, a recent WFP study on the evidence base of anticipatory action (Weingärtner, Pforr, & Wilkinson, 2020) points out that not all expected benefits are observed in all cases, and findings should be considered in relation to context and the kind of action that was taken. Given that anticipatory action is still mainly in its piloting phase and not yet scaled up, the range of counterfactuals and direct feedback from affected populations is limited. Although acting early can be better than doing nothing, it is less clear whether it is also better than doing other things at different points in time.
In some cases, the affected population raises its voice. The only concrete example known to the authors is the post-typhoon Haiyan evaluations, which found that the Philippine Atmospheric, Geophysical and Astronomical Services Administration and the National DRR Management Council did not explain clearly enough what the impact of the storm surge would mean for the people in Tacloban (WMO, 2014). In addition to ensuring the affected populations understand the warnings, assessing how triggers are understood and acted upon by decision-makers is crucial. In the Philippines, impactbased forecast maps sent 72 hours before the typhoon made landfall were interpreted as exact forecasts even though the corresponding uncertainty of the typhoon forecast data going into the machine learning model and the performance metric of the artificial intelligence model were explained in an accompanying text.

Discussion
The face of accountability has changed in humanitarianism. Classical humanitarianism relies largely on humanitarians' obligation-bound ethos, with little account giving to a forum beyond the suffering human person, "all of her or him" (Slim, 2015, p. 49). New humanitarianism privileges both upward and downward accountability coupled with a demand for more power symmetry between affected and responding communities. Digital humanitarianism, a phenomenon driven by technological solutionism-the belief that digital technologies may solve societal problems-is fraught with risks (Morozov, 2013). For example, Madianou, Ong, Longboan, and Cornelio (2016) showed that digitized feedback mechanisms sustained humanitarianism's power asymmetries rather than improving accountability to affected people.
Our case illustrates that artificial intelligence for anticipatory action is part of a wider socio-technical system with multiple actors, fora, and consequences.
In addition to traditional actors, highly-specialized global data experts are moving into the humanitarian space. As our case treats an artificial intelligence innovation that is still in a phase of scaling up from testing to full adoption first of all within the Red Cross and possibly at a later stage within the government, accountability mechanisms need still further development. Our first exploration suggests that it is a many hands problem (Thompson, 1980), necessitating more precise distinctions between forum and actor. Algorithm developers may be individually accountable if they are not shielded from an audit by their organizations, though developer team leaders are hierarchically accountable within their organization (Bovens, 2007). Organizations involved in the machine learning model development may be corporately accountable due to their influence on the design specifications. Kemper and Kolkman (2019) argue that it is imperative that the various fora critically understand the subject matter to effectively demand account from the actors. The field of explainable artificial intelligence attempts to develop transparent algorithms which shed light on the inner workings of algorithmic models and/or explain model outcomes (Adadi & Berrada, 2018). Unfortunately, there is a mismatch between the methods chosen by developers to explain algorithmic outputs and research from the social sciences, which shows how humans generally offer and understand explanations (Miller, 2019). This emphasizes that in the case of artificial intelligence and anticipatory humanitarianism, individuals and Technical Working Groups involved in the development of these systems must take a proactive role in discussing design decisions and results with users.
Accountability consequences directly relate to what can go wrong if a machine learning algorithm is used. For example, if the machine learning algorithm is biased, the early actions implemented based on the trigger will not reach the right program participants (risk of what can go wrong) and the forum (the donor, the program participants) might decide to withdraw financial support and trust respectively from the actor (consequence). False triggers could significantly reduce the trust of communities in the Red Cross and generate reluctance to act upon an early warning. The Red Cross Red Crescent Movement is building an overview of what can go wrong in the fictitious setting of Madeupsville, as a starting point for discussions, while avoiding finger-pointing (IFRC, 2020). We note that the early actions are tested in real-life simulation exercises, and these exercises do not rely on the use of any kind of modeling or artificial intelligence. For example, in the Philippines case, shelter strengthening, cash for work (for early harvesting of abaca trees), and livestock evacuation were all tested before activation. Government agencies are reluctant to move towards FbF as the risks of what can go wrong will trigger public accountability. In the case of the Philippines, local government units can use their Quick Response Funds for disaster response only once a disaster has already happened, instead of based on a forecast. However, the policy document, Memorandum 60: Revised Guidelines for the Declaration of a State Calamity (NDRRMC, 2019) states that local government units can use their Quick Response Funds in response to a forecast if they can predict that at least 15% of their population will be affected (Bierens et al., 2020). This policy is not yet operationalized, but once its implementing rules are clarified, the Quick Response Funds can be used for forecastbased responses. How the forecast has to be done or by whom has not yet been explained, but an ad hoc governmental committee has been formed to develop guidelines. Government agencies such as the National Meteorological and Hydrological Services face significant barriers before they can transition from weather forecasting to impact-based forecasting as this requires an extended mandate with corresponding funding, considerable organizational transformation to enable collaboration with other governmental agencies, and expertise beyond atmospheric sciences (WMO, 2015).
The socio-technical system evolves over time as the anticipatory approach of FbF is scaled up. Outside actors might initially catalyze the use of anticipatory action before national actors start to adopt the approach. Accountability mechanisms must evolve accordingly. Apart from scaling in terms of actors, algorithms will also become increasingly granular once more detailed data becomes available. Currently, the machine learning algorithm for the Philippines works only at the municipality level, but it may work at the barangay level in the near future and eventually even at the household level. Early actions in the form of cash transfers via mobile phones already require privacy-sensitive data. Scholars (Taylor, Floridi, & van der Sloot, 2017) focusing on violations of individual and group privacy have already signaled how challenging it can be to uphold the humanitarian principles when human and artificial geo-intelligence is used at this granular level for humanitarian action. Digital humanitarianism runs the risk of excluding vulnerable groups from algorithms as they do not have a digital footprint, and hence no data on them is available. These digitally illiterate groups will not be aware of being excluded, and are, therefore, unable to act as a forum holding artificial intelligence developers to account.

Future Research and Recommendations
Our article attempts to ground the concept of accountability in humanitarianism within accountability theory, first developed by political scientists, and later refined for a community of computer scientists in non-emergency contexts in the global North.
As algorithmic accountability is still largely uncharted territory in emergency contexts, several challenging tasks for future research remain. A plethora of global guidelines are emerging regarding fair, accountable, and transparent artificial intelligence (Fjeld, Achten, Hilligoss, Nagy, & Srikumar, 2020), but ensuring the principles of humanity, impartiality, and independence remains elusive. The remoteness of digital humanitarians strips them from a contextual, empathetic understanding of affected individuals and groups and may violate the principle of humanity. Amalgamating disparate data sets into new data products may be weaponized to target religious, ethnic or mobile groups and endanger impartiality, while the lack of a free press, data protection legislation, vibrant civil society organizations, and enforceable human rights charters weakens the local capacity to audit global humanitarians' geospatial data, tools, and artificial algorithms.
Contextualizing algorithms is essential. First, an expert-based approach might be a better fit for a datapoor context than an artificial intelligence approach, and these two approaches should always be benchmarked against one another. Second, continuously retraining the artificial intelligence model with emerging impact and vulnerability data better reflects the dynamics of this risk dimension, but requires new data governance approaches to ensure data sharing is facilitated between actors with different mandates and incentives (van den Homberg & Susha, 2018).
Although well-intentioned, digital humanitarianism may exacerbate North-South power relations and exclude vulnerable populations lacking a digital footprint from artificial intelligence analyses in the South. Symmetric North-South collaborations, local ownership, and effective communication of algorithm uncertainty to designers and users of trigger mechanisms need to be developed. Last but not least, problematizing and possibly expanding Wieringa's (2020) framing of 'algorithmic accountability' for emergency contexts in the global South will require systematic, empirically and theoretically grounded research, especially in anticipatory humanitarian action.