4S 2019 – A Sociotechnical Ethics Analysis of Overwatch’s Endorsement and LFG Systems
Below I’ve copied my presentation notes and embedded images of my slides for a presentation delivered at 4S earlier this evening (Sept. 5, 2019). These thoughts represent the better part of an argument I am developing for a dissertation chapter. They are very rough and I have cited and evidenced VERY, VERY LITTLE. That is because this is a conference presentation, where I don’t feel as obligated to rehearse an exhaustive literature review. That said, I have co-published a more comprehensive review of existing scholarship on digital labor with Ilana Gershon; it is in a forthcoming volume titled Digital Anthropology: Second Edition. That essay definitely informs my thinking here.
For starters, the title of my presentation is embarrassingly long: “The Value of Recognizing Certain Gameplay Behaviors: Efforts to Shape Sociotechnical Ethics with Systematized Qualitative Evaluation Metrics.” Sometimes it is the case where you draft a title that you hope captures the thesis of your argument. I think this was the case here, before I knew exactly what I wanted to argue. C’est la vie.
Game developers have a problem. That problem is commonly framed as “toxicity,” an ill-defined concept that often operates as a synonym for cyber-bullying or incivility. From the top, toxicity is perceived as a problem because it is often used as justification for low retention and adoption rates with players, especially white women and people of color who publicly talk about their experiences of multiplayer, competitive video gaming environments. ‘Toxicity’ generally affects everyone, but it is often discussed in relation to high-profile or viral accounts of gender-based or race-based harassment. Among the people I play with, the everyday identification of toxicity suggests that the concept is widely appropriated to discuss a wider variety of behaviors and social circumstances that on their own require a particular context to become ‘toxic.’ The identification of toxicity, as with all things, is obviously a reflexive and discursive social practice that happens as part of particular boundary making practices between individuals or within a community. But for that reason, designing with toxicity in mind is a minefield because inclusivity is often defined by maximizing the number of people who are part of your consumer base, regardless of the boundaries individuals or groups might draw between themselves.
Game development companies are not the only ones dealing with a ‘toxicity’ problem, but game development companies are uniquely constrained in the systematic implementation of detoxification or content moderation strategies. Today I want to talk about how one game development company, Blizzard Entertainment, is working to ‘solve’ this problem through system design and why I think their commitment to maintaining a specific kind of consumer relationship with players distorts their perception of the problem space they are working in. I also want to try to apply a sociotechnical ethics framework to think practically about how to developers could address some of the more complex, underlying reasons why players act out or act in a ‘toxic’ manner towards others. This is because I think of toxicity as a structural issue for all digital platforms that operate on heteromated labor (Ekbia and Nardi, 2014), as it seems to appear regardless of the goal of human computer interaction. My goal here is to identify similarities between people who play competitive games online and people who participate as workers in the gig economy, and to develop a grammar for talking about gamer labor that more readily qualifies the value of each person’s participation in a game, regardless of the competitive stakes that might seem to dictate that value. Making these connections is important, I reason, because all of the platforms we could be talking about when we discuss heteromated labor are really toxic. I know you know what I mean by this.
I believe that they are toxic because the normal operation of such platforms relies on end users to perform uncompensated emotional labor for other end users, and that toxicity is often a symptom of platforms failing to (1) mediate a feeling of equitable exchange between end users, and (2) facilitate conflict resolution between end users when they disagree about (a) how a goal or task should be completed, and/or (b) who is best suited to complete a specific task. Users I’ve talked to in my field research generally do not think of these platforms as toxic when these meta-structural goals are met by the platform, although they may still have problems with how a platform manages their time and activity. So I want to be clear that ‘managing’ or addressing toxicity is not the same thing as responding to someone’s general discomfort or criticism of platform design. There’s usually a little something extra in that experience; my hunch is that someone’s toxicity (when someone is toxic) usually has to do with someone else’s fundamentally held values and beliefs about what normative behavior should be through a specific channel of communication or platform for social activity.
Having said that, I realize that there are important differences between gamers and gig economy workers, and that there are meaningful limitations to my study of toxicity by focusing only on platform users because it means I’m not developing a grammar that necessarily anticipates the harm to local communities that materially support platform capitalism.
But this is a conference presentation! So I’m going to try and focus on articulating my argument, and then maybe you can help me explore those limitations. I’m presenting work that reflects over 2 years of ethnographic fieldwork in the Overwatch competitive community. That has mostly involved playing the game on multiple accounts and at different tiered levels of competitive play, talking with people involved in competitive play, participating in organized communities outside of the platform itself on Discord, and trying to maintain a pulse of dominant discursive practices on various Overwatch-related subReddits and the official game forum managed by Blizzard employees. I’ve also tried reaching out to Tier 1 esports teams playing in the international competitive league, and establishing a rapport with some of the employees at Blizzard – mostly unsuccessful in this regard for reasons we can get into later. I am still in the process of putting together a dissertation proposal, so really I’m hoping that my presentation here inspires someone to reach out with practical advice which is why you’ll find my Twitter handle on every slide here.
I’m going to assume that most of you don’t follow games or play games, especially not competitive multiplayer games. So if I touch on anything that doesn’t quite make sense, please let me know. But also, I hope that my slides will illustrate how systems present themselves and that my analysis will explain how the operate in general.
In the interests of time, let’s jump right in to thinking about the work gamers do for companies like Blizzard. Then we’ll talk about how Blizzard is trying to improve working conditions for players. In my conclusion, I’ll offer an analysis of where I think technologists should focus their efforts in this ongoing, iterative development process.
I have actually been thinking about gamers as workers for a long time, but I have never found existing discussion of ‘playbour’ sufficiently illuminating for my fieldwork. That concept more often relies on the production of something akin to intellectual property to distinguish between hobbyists and consumers. In that intellectual tradition, there’s really no challenge or pushback against the assumption that players are fundamentally not workers. That the category or identity of ‘gamer’ and ‘player’ are legitimately distinct in the right context, you just need to find some way of justifying someone’s activity or participation as leisurely, voluntary, or possibly selfish. I was very inspired when I picked up Hamid Ekbia and Bonni Nardi’s volume on the concept of heteromation published in 2014.
They coined this term to describe videogames, social media, and crowdsource applications that paradigmatically rely on end users to complete critical tasks for a technological platform to work and serve its intended purpose (2014). It’s a very useful concept for discussing gamers who play competitive multiplayer games because these platforms are designed and regulated around the boundless availability of people who want to play with others. Unlike a first person shooter like Quake, both Blizzard and their end users rely on a large population of people playing or participating on the platform concurrently, 24 hours a day, 7 days a week. Without this population, the game effectively death spirals and people will abandon it for alternatives that promise a very similar experience: something like Fortnight or Apex Legands are two such competing alternatives, with players (often assumed to be on the younger side by other people in the community) who often publicly wonder if Overwatch is ‘dead’ or ‘dying’ as a popular game.
Players perform technical and emotion labor for their fellow players. There’s what game scholars like to qualify as ‘mechanical skill,’ which has to do with tasks that involve key strokes and mouse-clicks. Mechanical skills always leave a digital trace. These traces are often the basis for statistical analysis and quantitative comparisons between players in terms of performance. To oversimplify this a bit, we can think of these tasks as: How fast can you click heads, how much damage you do, or how much healing you distribute across your team. There’s more to say here about the problems with statistical analysis done and the types of behavior game developers do and do not track. For another day, can’t focus on that too much now.
Instead let’s focus on what’s immediately not obvious, to both players and end users, and sometimes the developers. Emotional labor. For me this concept borrows from Arlie Hochschild’s work on flight attendants is The Managed Heart (originally published 1983) and the expectations women are expected to adhere to in order to do ‘good work’ or receive acknowledgement for work well done. In this book she talks a lot about ‘feeling management’ to describe the form emotional labor takes in this particular service industry, with something like a well-timed smile serving as a practical example of making sure someone else feels welcome, heard, appreciated, or something similar.
In a competitive gaming environment, players are sometimes under intense pressure to manage other players emotions, to meet other people’s expectations for strategic action, and to conform to rapidly changing hegemonic norms that change with the creation and completion of every simulated game. Overwatch is a first-person shooter, but unlike many other such titles this game requires teams of 6 to cooperatively collaborate in winning each match. Winning typically involves executing complex, open-ended tasks. There are different heroes with different abilities that functionally operate as tools in your kit for executing strategic maneuvers and collaborating on eliminating opponents or breaking up enemy formations. No one person can win the game for a team; there power to win matches is distributed across multiple players, the degree to which is mostly irrelevant. What is relevant is that this is the sort of game that typically requires everyone to conform to some sort of normative behavior. That occurs through explicit deliberation among players on a team, or it happens as a matter of happenstance; there are some games that players have where no one really communicates at all, and they are still able to collaborate because everyone happens to share expectations for how to strategically accomplish standardized tasks: to create space, to isolate individuals from the enemy group, to focus a high-priority target, or to distribute team resources efficiently to maximize the value of particular hero abilities. There may be some hybrid interaction where players only organize strategy at the beginning of a match, and in other games players will constantly organize and plan throughout each team fight. In those games, players may shout an enemy character’s name over and over and over again to motivate and direct players on the team to a specific task.
Game developers at Blizzard recognize the inherent complexity and difficulty of these logistical problem: how do players efficiently and consistently communicate gameplay strategy, and how do players know whose opinions or beliefs about gameplay strategy should be prioritized over others? Addressing this question is paramount. When communication breaks down, the consequences for many are often more fraught than having lost a match. The psychological trauma of how the game was lost can trigger unpredictable reactions from people who normally perceive themselves as easy-going, chill people. I can speak to personal experience of this, but I have had many conversations with players about this phenomenon. In popular discussions of it, players refer to this psychological state as ‘tilt’. ‘Tilting’ is often discussed as a downward spiral of emotions and the feeling of losing control or having no control in a game. This feeling may prompt players to grief, troll, or otherwise ‘misbehave’ in a social environment, but it is distinct from other experiences because people tend to think it can be controlled and managed. While a toxic player is someone who cannot be controlled, a tilted player is someone who can be rehabilitated. Within the discursive competitive community outside the game, players frequently discuss effective strategies for managing tilt and communicating with particular stereotypes. These discussions reflect a common belief that behavioral psychology and cognitive behavioral therapy techniques (among others), can be effectively applied to manipulate others into cooperating, focusing, communicating, and bonding with other players in the group. Failing to manage tilted players or appropriately respond to their needs is often perceived as the reason a team loses the game, and it can be itself seen as a ‘toxic,’ inappropriate, and reportable act to other teammates. This inter-player dynamic has particular consequences that I don’t have a lot of time to talk about for white women and people of color specifically, at least on servers or in competitive communities in the United States. Delving specifically into these consequences is what I hope my larger dissertation project can meaningfully explain and comment on. But today, I want to talk a little bit about one of the ways Blizzard is trying to address this dynamic between players. I think the implementation of qualitative evaluation metrics at least superficially recognizes the demand and need for this emotional labor on the part of our corporate overlords.
Blizzard implemented the Endorsement System in the summer of 2018, about 2 years after the game formally launched on Windows and Console platforms. They launched this feature concurrently with a formal, in-game Looking-For-a-Game (LFG) system, which was intended to put the Endorsement System to practical use. In summary, the Endorsement System was designed to gamify and metrify classes of behavior that developers and players have a hard time accounting for. Whereas existing systems are well tuned to identify, monitor, and compare ‘mechanical skills,’ ‘soft skills’ like communication, leadership, emotional support, logistical planning, and care work do not leave obvious digital traces that algorithms can automagically track and surveil. The endorsement system was designed to ‘empower’ users to track that sort of information about players on their team. That data about a player is anonymized, aggregated, simplified, and filtered to produce a single number: your Endorsement Level.
Your endorsement level doesn’t directly affect your experience in the game, necessarily. Participating in the system is compulsory, in that everyone starts out at level 1 by default, but it is not designed to be directly punitive if people choose not to endorse you. Instead, the system rewards both direct participation, high-level achievement, and high-level maintenance. In combination with the LFG system, players can sort themselves into selective groups on the basis of endorsement level. In theory, this affords LFG participants with the ability to filter out all sorts of deviant people: people who don’t earn endorsements because either they don’t have adequate soft skills or they don’t communicate, or new accounts often operated by people who aren’t experienced in meta-level analysis and strategy in the game.
This system was widely heralded as a success in the immediate months that followed from its implementation. At a game developer conference held the following March in 2019 (approximately 9 months later), a researcher at Blizzard employees attributed a “40 percent decrease in matches that resulted in disruptive behavior”—which several outlets reported as a “40 percent decrease in toxicity”— with the implementation of the Endorsement and LFG system. The argument here being that these systems contribute to the development, identification, and maintenance of consistent social norms around how to play the game. A few things we could say here about what this number represents or means, but let’s take it at face value that the only forms of toxicity Blizzard needs to worry about are the ones people thing to report. I don’t think that Blizzard is wrong here; but, in practice, the system does not always work as intended, and it has not necessarily evolved to keep pace with changing norms and expectations held by more senior players.
The game is constantly changing as new maps, heroes, and modes are routinely introduced. The development team tweaks hero abilities about once a month. This creates instability and plurality in what people perceive as appropriate meta strategy in the game (which heroes are viable, how to play around obstacles on a map, when to use abilities, etc). The Endorsement System obscures the routine, necessary discursive practices that flex a person’s communication skills and analytical thinking. I’m not just saying that the number you’re given is inaccurate, I’m saying that the number is not adequately representative of a person’s in-game education or soft-skill ability.
Although, the number is not ‘accurate’ in an empirical sense, either. It fluctuates over time according to a black-boxed algorithmic process that calculates the value according to some equation that accounts for the number of games you’ve played, the time you’ve played, they types of endorsements you’ve received, and the number of endorsements you’ve received. Eventually, if you play long enough your number will go up by virtue of playing a lot. It’s a statistical inevitability. In part because the number is cumulative and I think you’re only punished when you leave matches early (so folklore about the system goes), and in part because other players are differently motivated to participate in the system. Lower-level accounts are implicitly motivated to pass out endorsements carelessly or without real consideration for the cheap experience points. I have several stories in my back pocket that I don’t have time to share that entail receiving completely unjustified endorsements from other players. And many more of people using the endorsement system to throw shade or low-key troll opponents.
And in practice, experiences of toxicity are often completely outside the scope of what Endorsement and LFG system can reasonably mediate. This is in part because toxicity is not fundamentally a problem of ‘bad apples’ or even intentional maliciousness. Toxicity is not a consequence of inadequate rewards. Although the perceived success of the Endorsement/LFG systems suggests that people’s behaviors do change when incentivized in particular way, I argue that the apparent failures of this system also suggest that there are better ways to conceptualize and theorize what’s going on when people tilt.
My proposal here is two-fold:
First, I think it is imperative to think about toxicity as epiphenomenal of information systems. As part of a collaboration with my colleagues at Indiana University — Tristan Gohring, Javon Goard, and Lucas Kempe-Cook — we argue that toxic behavior is a consequence of boundary-making practices more generally—those practices that co-constitute who wins and who loses, who deserves this and who deserves that. Toxicity is not something that can be ‘fought,’ as it is an inevitable consequence of diverse people whose purpose in coming together is to resolve conflict of some kind, and obviously in the case of platform workers, resolving some conflict through and with a specific information communication technology.
By couching toxicity in an understanding of sociotechnical ethics, we argue that we can understand a person’s perception of toxicity or another toxic person in terms of their experience and familiarity with context-specific norms and ethics related to using or interacting in a complex system. The range of behaviors we might refer to as ‘toxic’ often other and dehumanize those that deviate from some perceived norm, and that othering process is part of establishing and negotiating normative behavior. It is an essential function in making communities that matter. Players are not ‘toxic’ before they start to communicate and express their beliefs and expectations about proper conduct; rather, players become toxic when people do not share an ethical disposition about what should happen, when some action should occur, or how someone should participate.
My second contribution is to think specifically in terms of toxicity as a consequence of players facing an ethical dilemma and completely disagreeing about (a) how a goal or task should be completed, and/or (b) who is best suited to complete a specific task. This model bears fruit even when thinking about forms of toxicity that are simply instances of identity-based harassment. In the case of gender-based harassment, it is sometimes the case that information about a player’s perceived gender enters into the discursive calculus of how a team can win the game. This sexist ideology or prejudice that women, specifically, are inexperienced or incapable of efficiently playing a specific role reflects a sexist’s assumption about who is best suited to complete damage-related tasks in the execution of a assault-based strategy.
If I’m right, developers at Blizzard need to think more critically about the infrastructure that platforms the resolution of routine ethical dilemmas in a match. I’m personally excited by their recent implementation of a Role Queue system, which I believe works towards this goal explicitly. With this system, it is no longer a debate about who should play what role because the game has taken away the ability for players to make a choice about which hero to play after they’ve learned something about the people on their team. Players don’t waste time on debating the merits of who plays what role in the 30 seconds they have to prepare for their upcoming match. But with that said, I think there’s room for improvement in structuring how teams plan and communicate about strategy and task completion.