Saturday, March 09, 2024

Broadening the notion of affordances

In the design of physical objects and user interfaces, an object’s “affordances” are how the object’s appearances suggest to the user how the object should be interacted with.

Different door handles have different affordances.


In this post, I want to broaden the meaning of “affordances”.

Multiple tools may be all used for the same kind of task. Eg for recording textual information, there’s pen and paper, a word processor, a whiteboard, and voice notes. 

While the standard notion of affordances in design is “how the object suggests it may be used, before it is used”, the notion we’re describing here concerns how the object, while it is being used, shapes how the user does the task.

A whiteboard suits getting info down fairly quickly, in bullet points, and for drawing arrows between items to show their relationships.

A word processor encourages writing in full sentences. Because we can easily see what we’ve already noted down, they also encourage us to write in sequences of sentences and paragraphs.

Pencil and paper seems to be part way between the free-form nature of the whiteboard and the more regimented form of the word processor.

Voice notes are more focused on the present moment. You can’t see what you’ve already said, and it’s more effort to go back to hear the early part of the note. They’re good for brainstorming.

The traditional notion of affordances covers how the design of an object affects a user’s expectations about how to use that object, before they actually use it. We’re expanding this notion to also include how an object’s design shapes the way it is used by a user.

Friday, January 26, 2024

Interactive storytelling: Fictional realism

This post is about the notion of "fictional realism", which I am using to mean a fictional account that is nonetheless meant to 1) accurately portray the time and place where it is set, and, optionally, to 2) focus more on this portrayal, than on presenting a story to the reader/viewer.

Examples of fictional realism include the TV shows "The Wire" and "The Sopranos", the movie "Casino", the novel "One Day in the Life of Ivan Denisovich", and the game "Attentat 1942" (Steam page). Most of these examples have a fairly strong storytelling focus, so don't fit the optional second criteria.

A work of fictional realism may be intended to convey the same kinds of details that a non-fiction work may convey, but to do so using fictional characters (or fictionalised versions of real people), in fictional situations (or fictionalised versions of real situations).

 

"Interactive storytelling" can be used for fictional realism. One of my interests is in using interactive storytelling for exploring a strong form of fictional realism, meeting the second criteria described above (focus more on the portrayal of what life was like, than on presenting a story to the reader/viewer), that presents what it'd be like to be a specific character in a specific situation. So that the player can learn about the player character, about what they do, how they do it, and how they react to situations. Like worker on a sailing ship in the 17th century spice trade. What was their work like? How did they perceive their job (exciting? a journey of exploration?). What were their relationships with the various other sorts of people on the ship?

Interaction could help place the player in the character's shoes, to help immerse them in the character's world. I'd like to use interaction to let the player experience what it's like to be that character. There is the dictum 'show, don't tell'. I want 'experience, don't show or tell'.

I have some ideas about how the interactive storytelling could work, such as to achieve this, though I won't get into such details in this post.

Fictional realism can be used in an educational context. Or be an enriching kind of entertainment. By giving the player interactivity, and letting them experience what it's like to be that character, we hope we can make a compelling way to experience fictional realism.

 

I think that behavior-psychology congruence is a core requirement for fictional realism, and I'll explore this in a future post. In brief, I want the player to control the character such that the character acts in a realistic way.

 

I've also written about the notion of 'strong storytelling', which we can think of as effective or good storytelling. Strong storytelling has a strong focus on plot, and moving the plot forwards. Thus it will tend to cut-out details that aren't relevant to the plot. Thus it would cut out the sorts of details I'm interested in, in fictional realism. The sort of 'day in the life' details.

Compared to strong storytelling, fictional realism is more like real-life. Real-life tends not to be like a story. In stories, all the details are there to serve the overall goals of the story, like its climax, conclusion, and themes. In real life, things happen, but it's just one thing after the other, and they aren't there to result in some climax and conclusion.  

There is, however, no reason why a work of fictional realism couldn't have a plot. It could. It's just that the fictional realism details will dilute the story details, thus making it a weaker form of storytelling.

Strong storytelling and fictional realism are just different forms, each with their own pros and cons.

Interactive storytelling: Behavior-psychology congruence

We wish to introduce the notion of a fictional-character's behavior being congruent, or not, with their psychology. This will help us to, in subsequent posts, look at how, in interactive storytelling, the player having control affects the storytelling.

We use 'psychology' to mean two things: the character's makeup and circumstances.

A character's makeup, is their nature, their personality, their character, and how they think about situations. Such details are a result of their nature, and nurture. How their character is shaped by their life experiences. It includes how their personality might be changed by brain damage or a brain tumor. Or how medications they are taking affect their personality.

By a character's circumstances, we mean what has been going on in their life. Perhaps they have had a stressful few weeks at work. Maybe a loved one died a few months ago, and they are going through grief. Or maybe they started a new relationship and they are happy as a result.

I don't think there's a hard-and-fast distinction between a character's makeup and their circumstances. These are just rough categories.   

In real life, a person's behavior is always congruent with their psychology (their makeup and circumstances). In fictional works, we almost always strive to make a character's behavior congruent with their makeup and circumstances, though we may fail to achieve this. So there can be a lack of congruence.

Behavior-psychology congruence doesn't just apply to "realistic" characters. It applies to all characters, even wacky and "out there" cartoon characters. Wile E. Coyote from the Warner Brothers cartoons wants to catch the Roadrunner, and sets up traps for this purpose. Despite many failures, he's never one to give up trying. The Roadrunner, in turn, likes running fast along roads, and seems to take joy in making the Wile E.'s traps backfire on him. These characters are not realistic, they're not at all like real coyotes and roadrunners. But Wile E.'s psychology is to want to capture the Roadrunner, and to setup traps to do so, and so Wylie's behavior is congruent with that.

If a character has a cartoony makeup then their behavior should be cartoony as well, and good writers will make sure their behavior is congruent with their makeup and circumstances.

If there were scenes where Wile E. was sincerely explaining to other characters that he has been vegan since he became an adult, because he believes no animals should be harmed, then this would not be congruent with his established psychology.

 

In fiction, a character's behavior may be incongruent with their psychology, because of poor writing, poor acting, or poor directing. We can imagine that a very inexperienced writer setting out to write a novel. Earlier in their draft they gave the main character a gentle personality, whereas later on in the draft they gave them an aggressive personality, where the author didn't realise this change had happened. Which leads to inconsistencies in how that character is portrayed, with no explanation in the novel of why the character is different.

We may have a philosophical objection to this talk of incongruence between a character's psychology and their behavior. If all we as viewers or readers see is the character's behavior, and we infer their personality from that, then it would seem to be impossible for there to be such incongruence. Any apparent incongruence would just seem to be incongruence because we didn't yet know enough about the character's psychology. The philosophical objection is that we can only know behavior, so behavior is what defines our picture of the character's psychology, thus /by definition/ there can never be incongruence between them. Any /apparent/ incongruence is simply because we have formed an incorrect picture of their pscyhology, by jumping to incorrect conclusions about it based on the prior behavior of theirs that we've observed.

If a real person's behavior seems incongruous with their psychology, then it is our understanding of their psychology that is wrong (or incomplete). But here we are not talking about real people, but characters in fiction, fiction that may be written by a beginner, or an untalented, author.

In fiction (novels especially), the author may explicitly describe aspects of a character's personality. This way, that character's behavior can be incongruous with the stated aspects of the character's personality, if the writer is inexperienced or otherwise not very good.

But even when the character's personality is only inferrable from behavior, it is still possible for the two to be incongruous. It can be possible to infer psychological traits from behavior, and so we may have two sets of behavior B1 and B1, which reflect psychological traits P1 and P2 -- and P1 and P2 may conflict. A character may be terrified by speaking in front of their class, one point, and yet later inexplicably be supremely confident speaking in front of a large group. We're not saying it's impossible for there to be such a transformation; we're talking about the case of a story that has not included any details explaining such a transformation. At least one of those two behaviors is therefore incongruous with part of the person's psychology.

 

We usually expect that if the character seems to be acting incongruently with their established psychology, that the story will provide an explanation of why. But it may not, and the incongruence may be a result of poor writing (or acting or directing).

 

In subsequent posts, we'll look at how interactivity, in interactive storytelling, can lead to incongruence between a character's behavior and their psychology. The basic idea is that if the player can control a character's actions, then those actions will tend to reflect the player's psychology, not the character's. Using the terminology of those future posts, we'll explain why behavior-psychology congruence is necessary for strong storytelling and fictional realism.

Thursday, January 25, 2024

Interactive storytelling: Environment and storytelling

In a video game, the environment includes the locations the player sees as they traverse the game world. The environment includes the objects they can interact with, such as objects they can pick up and look at, doors they can open, lights they can switch on, etc.

We'll look at the ways a game's environment can contribute to the storytelling in interactive storytelling like video games. Then we'll look at how effectively each of these ways can contribute to storytelling.

 

The setting

The game environment provides the setting for the story. A story about an ad executive living in New York City obviously is set in New York City. A story may lean heavily into its setting, and concern the nature of that setting -- like one that concerns the culture in New York City. Or the story's setting can be more of just a backdrop, where the same story could potentially be set in a number of different places, without changing much about it.

 

The stage necessary for story events

The game's environment may contain details necessary for enabling certain story events. A dense town or city, with suitably-small gaps between the rooftops enables a story event in which the protagonist is chased by several bad guys over rooftops.

 

Atmosphere and world building

An environment of moss and plant-covered ruins could contribute to a post-apocalyptic atmosphere. Posters plastered around a city, instructing the populace on how they should behave, could contribute to the world-building and atmosphere of story set in a fascist country. Dim lighting, with a yellowish hue, along with strange sounds, could give an alleyway an eerie atmosphere.


Characterisation

The environment can contribute to characterisation. If a character's house is neat and tidy (or very dirty, and messy) that will convey something about their personality. As could the paintings we find in their house, or the entries in the diary hidden under their pillow.


The main plot

The environment can contribute to the main plot. The space under a bed may contain a piece of evidence that conclusively shows who's guilty of a murder. A character finding it will be a major plot point.

 

Backstory

The environment can contribute to backstory. Backstory concerns past events, prior to the events the player is currently experiencing in the main story-thread. Those events could have occurred before the start of the main story thread. They could have happened a long time ago. Such distant backstory includes 'lore', that concerns historical details of the setting. Lore may be found in tomes that the player finds, or some runes written on some ruins.

We'll also take 'backstory' to include details that may have only recently happened. For example, halfway through the story the player might receive details about something that had happened a hour prior -- that is, an event that occurred well after the point in time where the story began. We'll call this backstory, too.

Environmental details that contribute to backstory include artefacts like diary entries, letters, memos, and books.

Audio-logs are a means to fill the player in on backstory that's used in video games. Audio-logs are sound recordings that the player can listen to. They may be physical objects, like a tape recorder, that the player can find and play. Such might be a sound recording of a diary entry or voice note. Or an audio-log recording might automatically start playing when it is 'triggered' by the player's actions -- like if the player enters a particular room, or perhaps opens a particular person's locker.

Audio-logs were popularised in the first-person shooter BioShock (2007), which used them as the main means to tell its story. Dear Esther (2012) and Gone Home (2013) are games entirely focused on exploring an environment and finding audio-logs, in order to piece together their stories. Those latter two games are key examples of the genre that came to be known as "walking simulators".

There are also what I've termed audiovisual-logs, used in games like The Vanishing of Ethan Carter (2014), where they only play a small role, Everybody's Gone to the Rapture (2015), and Tacoma (2017). Audiovisual-logs allow the player to see an audiovisual recreation of some past event, in the location where that event occurred. The player can walk around and through this recreation while it is playing.

 

(an audiovisual-log from Everybody's Gone to the Rapture)

(an audiovisual log from Tacoma)


The following links are to the original videos those gifs were created from. Everybody's Gone to the Rapture, and Tacoma.

Audiovisual-logs are the core means of storytelling in Everybody's Gone to the Rapture and Tacoma. In The Vanishing of Ethan Carter, they only play a relatively-small role[1].

[1] In that game, there are certain puzzles, each of which concerns finding out how a particular character died. At the completion of each one of these puzzles, the player is shown a cut-scene showing the full details of how the character died, and then after that the player can follow a floating light. When they get to the light's resting place, they can see a short audiovisual-log of what came next, after the overall happening they've just seen the cutscene of.

For more details on audiovisual-logs, see the audio-logs link, above.

There's what I've termed "frozen-moment-logs". The only game I know of that uses these is The Vanishing of Ethan Carter (2014). The player will come across places where they can unlock a frozen-moment-log, consisting of a static 3D image of some past occurrence (backstory) there, that the player can walk around and view from different angles. In that game, these logs appear as part of larger puzzles.
 

(a frozen-moment-log from The Vanishing of Ethan Carter)

 

Another technique, where the environment contributes to backstory, is what I'm calling "Inferred Backstory". This is where environmental details enable the player to infer some past events.

In a post-apocalyptic game, the player may come across a dilapidated house, and in one of its bedrooms find two mummified corpses in the bed, frozen in an embrace. On the bedside table may sit a framed photo of a happy couple, along with an empty bottle of sleeping-pills.

From these details, the player may infer that the couple was once happy, and loved each other, but ended up finding their circumstances untenable. They may imagine the couple coming to this realisation, taking the sleeping pills, and tearfully embracing each other in the bed as they awaited their fate.

Inferred backstory is often like a little puzzle, where there are clues and the player infers the backstory from them. Usually it's a very simple puzzle.

'Inferred backstory', as we are using the term, concerns any cases of where the player infers prior details from present-moment details. That can include when a player immediately and effortlessly infers some prior details, including details that recently occurred. E.g. they're in a forest and come across some fresh large-animal droppings. They'll immediately infer there was recently some kind of large animal in this place. Some other examples where the player will make immediate and effortless inference: if the player came across the charred remains of a fire, or doors that been broken open.

And of course, by inferring details that have recently happened (e.g. a large animal being here) we may also infer present-moment details -- e.g. that the large animal may be nearby us right now. 

Carson[2] calls these cases of inferred backstory "cause and effect" vignettes, and the above description of them is based on his article.

[2] "Environmental Storytelling: Creating Immersive 3D Worlds Using Lessons Learned from the Theme Park Industry", by Don Carson, March 2000


To effectively communicate story-relevant details

This section draws a lot from Carson.
 
The environment should be designed to effectively communicate important details to the player. Details like where the player is, what sort of place it is, and where they should go next. Usually, we want the player to be able immediately determine such details.

These kinds of concerns don't, of course, only apply to the environments in games. They also apply in movies, TV shows, theatre, and theme park rides. Think set design.

Here are some details that we want the environment to communicate to the player. Which details in a location are the most important story-relevant ones. The atmosphere of a location, and the kind of place it is. What objects and features are important for the player to be aware of. And where the player should go next.

The following can aid in that communication. We can draw attention to important details by how we arrange the objects and features in that location, and by how those objects and features are lit. And by not including too much detail in the location, especially detail that's of little relevance to the story. We don't want to confuse or overwhelm the player.

Contrast can be used to heighten qualities. For example, if we make the player crawl through a narrow passageway to reach a cave chamber, that can heighten the feeling of how large that chamber is. Or, if we want the player to feel that the temple in the forest is a pristine place, we can make them experience a disordered space (like thick jungle) before they find it.

The environment can be designed to guide players as to where they need to go next. For example, in a dark area, having a well-lit large object in one corner, that the player will want to investigate, where this will take them to the exit from this location, to the next place they need to go to.

Many of the ways that the environment can contribute to storytelling are showing rather than telling. Rather than explicitly describing the atmosphere and world building of an environment (perhaps by a character commenting on them), environmental details can show them. With inferred backstory, the player draws their own conclusion about what happened.

And, as Carson points out, the player being able to discover details like artefacts (letters, memos, audio-logs), and inferred backstory, themselves, can be a more enjoyable experience for them than them simply being told those details. (That discovery is part of the gameplay, so the enjoyment comes from the gameplay. It doesn't come from the storytelling).


Summary of how the environment may contribute to storytelling

To summarise, the environment may contribute to the following elements of a story:

  • The setting (e.g. NYC)
  • The stage enabling certain events (e.g. rooftops for a chase sequence)
  • Atmosphere and world building (e.g. moss-covered ruins, strange sounds)
  • Characterisation (e.g. a character's messy room)
  • The main plot (e.g. evidence of who the murderer is).
  • Backstory (e.g. diary entries, audio-logs, audiovisual-logs, frozen-moment-logs, and inferred backstory like the corpses in the bed).

And there are techniques that can be used to effectively communicate such elements of the story. For example, to highlight the size of large cave chamber, make the player crawl though a small space to get to it.

 

Environmental storytelling

The reader may have noticed that we started this post with the heading "Environment and Storytelling" not "Environmental Storytelling". For these are two different things. "Environment and storytelling" refers to all of the ways that the environment contributes to storytelling. Whereas "Environmental storytelling" refers only to a subset of them.

"Environmental storytelling" is a commonly used term to describe storytelling in games. It is used to refer to only cases where the environment contributes to backstory.

For some people, "environmental storytelling" refers to all of the ways that environmental details can contribute to backstory: diary entries, letters, memos, audio-logs, audiovisual-logs, frozen-moment-logs, and inferred backstory (like the corpses in the bed).

For other people, "environmental storytelling" has an even narrower meaning. For them, it only refers to cases of inferred backstory, and does not refer to cases like diary entries, letters, memos, audio-logs, audiovisual-logs, and frozen-moment-logs.

Both definitions of "environmental storytelling" involve environmental details that are leftover from, or reflections of, the past (in the broad sense of "prior to the current moment", not just things in the more distant past). Obviously that's the case for Inferred Backstory like the corpses in the bed. It's also the case for things like diary entries, letters, memos, audio-logs, audiovisual-logs, and frozen-moment-logs.

In this post, we'll use "environmental storytelling" in its broader sense, which includes all the ways that environmental details can contribute to backstory.


Environment and storytelling in movies and TV

Movies and TV use the story's environment (sets, and on-location shots) for storytelling purposes, and of course have done so since well before video games came onto the scene. Movies/TV and games mostly use their environments in similar ways for storytelling, except for some of the ways backstory is used.

(As well as movies/TV, theatre, theme parks, and theme park rides, all use the environment to contribute to storytelling. But this post will focus only on video games, movies, and TV).

Movies/TV can use environmental details to contribute to:

  • The setting (e.g. NYC)
  • The stage enabling certain events (e.g. rooftops for a chase sequence)
  • Atmosphere and world building (e.g. moss-covered ruins, strange sounds)
  • Characterisation (e.g. a character's messy room)
  • The main plot (e.g. evidence of who the murderer is).
  • Backstory ('environmental storytelling', e.g. diary entries, audio-logs, audiovisual-logs, frozen-moment-logs, and inferred backstory like the corpses in the bed).

One difference, regarding the techniques that can be used to effectively communicate such elements of the story, is the following. In games, the environment can be designed to guide players as to where they need to go next, which obviously doesn't apply to movies/TV/cutscenes. Whereas in movies/TV/cutscenes, the cinematography, lighting and set-design can guide where the viewer looks during scenes.

In a moment we'll get into some differences in how backstory (environmental storytelling) is used in movies/TV compared to games.


Why is environmental storytelling more common in games?

The environmental storytelling techniques (contributing to backstory) are either less common or not found at all in movies and TV. This includes things like diary entries, audio-logs, audiovisual-logs, frozen-moment-logs, and inferred backstory like the corpses in the bed.

There's a clear difference between games and movies/TV in this respect, as games often heavily rely on such environmental storytelling. In games, it's often the main form of storytelling that's used. In movies and TV, it tends to be a supplemental form of storytelling.

What is the reason for this difference?

 

Cost and ease

Movies and TV are focused on "cinematics" -- visual portrayals of actors and environments. Whereas game development companies have a primary focus on gameplay. In most cases, it requires additional resources for the game developer to be able to include "cinematics" in their games. And smaller game developers may not have the skills and/or budget for this.

Compared to cinematics, it's cheaper and quicker to add artefacts like diary entries and letters to a game. Audio-logs require hiring voice actor(s), but a game might have a total of less than 1 hour of audio-log audio, which can be recorded quickly and thus doesn't require the expense of hiring a voice-actor for a long period of time.

On the other hand, high-quality animated cutscenes take more time to develop, and require animator(s) and voice actor(s). FMV (full motion video) or motion-capture for cutscenes requires (real or virtual) sets, and hiring actors. Motion capture requires specialised equipment (either purchased or hired) and the skills to turn it's output into animation. I imagine that the time and money required for cutscenes is similar to the time and money required for audiovisual-logs.

 

Visual mediums excel at visual-action storytelling

Here's a reason that environmental storytelling is used less in movies/TV. Movies and TV are primarily visual, and excel at visual-action storytelling. This is the visual depiction of action (what I referred to as 'cinematics', above). By 'action', I don't mean just things like fights and shootouts, as you'd find in action movies. I mean 'action' in a general sense, of the visual details that can be captured by a video camera. This could include characters simply talking to each other, or a tense scene where two characters are sitting in the same room, each silently trying to ignore the other.

In constructing visual action, all the tools of acting (performance), cinematography, editing, and so on, are brought to bear. Visual-action storytelling is a strong form of storytelling.

If the movie or TV show contains backstory, it's usually presented through visual action -- that is, through a flashback. Environmental storytelling also conveys backstory, but it mostly does /not/ do so through visual action. In a moment we'll examine this, and see why environmental storytelling tends to thus be a weaker form of storytelling[3].

[3] Before leaving this topic, we can note that one kind of use of inferred backstory in movies/TV is where the 'clues' are amongst the background details of scene(s), that might only briefly be in shot. Most people watching the movie/show wouldn't notice them, and they're there as interesting details or "easter eggs" for repeat or careful viewers. And/or as details designed for other viewers to subconsciously take in.

 

Games excel at visual-action gameplay, but not visual-action storytelling

Like movies and TV, graphical games are also a visual medium -- with the addition of gameplay. They excel at incorporating gameplay into visual-action. Consider first-person shooters, platform games, racing games, etc. However, games do not excel at incorporating gameplay into visual-action storytelling.

That's why, if a game is to include visual-action storytelling, that's done in a cutscene (effectively a little movie) that is separate to the main gameplay. Cutscenes are usually non-interactive, though they can include simple forms of interactivity like Quick-time Events (QTEs) and "Choose Your Own Adventure"-style choices. We don't have a way of integrating the player having control over a character and movie-like visual action.

In visual-action storytelling, all the tools of acting (performance), cinematography, editing, and so on, are brought to bear. But, during gameplay, when the player has moment-to-moment control over a character (like the character's movements), it's not possible to strongly exploit those tools of visual-action storytelling.

Here we'll turn our attention to cutscenes. These aren't a form of environmental storytelling, but looking at them will help convey the point that games are poor at integrating interaction/gameplay with strong visual-action storytelling.

In this earlier post, I looked at the types of cutscenes in games, and the ways interactivity and cutscenes can be mixed together.

The standard non-interactive cutscenes are strong visual-action storytelling, but the involve no interactions at all.

In QTE-and-Choice cutscenes, simple player inputs are incorporated into the strong visual-action storytelling. These are interactions like QTEs (Quick-time Events), and choices where the player can choose from a small menu of options, such as dialogue choices or choices about which course of action to take (save Billy or save Jenny, from the oncoming horde of zombies).

QTE-and-Choice cutscenes are strong forms of visual-action storytelling, however as far as interactivity goes, they contain weak forms of interactions/gameplay. The player has very limited control over a character.

During-gameplay cutscenes are how most of the cutscenes are handled in games like Half-Life 2 and the Dishonored games. The player still has some degree of control over their character, while some scripted events occur around them. The player may be able to freely turn their head to look around, or that plus the freedom to move around (within the constraints of their environment, like brick walls etc).

As that earlier post argued, During-Gameplay cutscenes have stronger forms of gameplay but weaker forms of storytelling than non-interactive cutscenes.

So none of the kinds of cutscenes involve strong visual-action storytelling along with strong interaction/gameplay.

Returning to environmental storytelling, audiovisual-logs, like in Everybody's Gone to the Rapture, present visual action, but since it is visual action that the player can walk around and through, the tools of cinematography and editing can't be brought to bear on it. Like with during-gameplay cutscenes, the player is reduced to a spectator of the events.

In summary, even though visual-action storytelling is the strongest form of storytelling in a visual medium, games are not very suited to it. To use visual-action storytelling, games have to either include non-interactive or Q&C cutscenes, which clash somewhat with the interactive nature of games, or D-G cutscenes which have more interaction but weaker storytelling.

 

Environmental storytelling is more compatible with gameplay

That it's difficult to insert gameplay into visual-action storytelling is a reason why environmental storytelling is so often used in games.

Environmental storytelling can form part of the gameplay.

In it, the player explores the environment, and finds artefacts (like diary entries, letters, memos, audio-logs, and audiovisual-logs). A player who is not looking carefully might miss some of the artefacts. That exploration and finding is part of the gameplay.

Audio-logs can be listened to while the player is still engaging in gameplay, where they're moving around and continuing to explore.

When an audiovisual-log is playing, the player can move around, and within, the recreated visuals. The player may move around to find a good view of all the action, to follow a particular character around, or to be closer to one particular conversation (if there are multiple happening at the same time). The audiovisual-logs in Tacoma allow the player to scrub back and forth in the log, to find pertinent details.

Environmental storytelling (e.g. diary entries, audio-logs, audiovisual-logs, frozen-moment-logs, and inferred backstory like the corpses in the bed) can present a kind of puzzle to the player. Something unknown, in the past, has happened, and environmental storytelling provides some clues to its nature. It's like each bit of environmental storytelling is a puzzle piece and the player needs to figure out how they fit together, to see the overall picture of what happened. This is a very common pattern in games that heavily use environmental storytelling. For example, in BioShock, Gone Home, Everybody's Gone to the Rapture, and Tacoma. (Note that this can be done in other mediums, like movies/TV, and novels. But games more often make use of such).

So all these forms of environmental storytelling are more compatible with gameplay. Environmental storytelling brings storytelling into the main gameplay parts of the game, rather than having it be fairly distinct from gameplay, as with cutscenes.

 

Environmental storytelling is generally a weaker form of storytelling

While environmental storytelling allows storytelling to be better integrated with the gameplay, it unfortunately involves a weaker form of storytelling.

We've stated that visual-action storytelling is the strongest form of storytelling in a visual medium. This, and some subsequent sections, look at why environmental storytelling is weaker than visual-action storytelling.

To clarify, I'm saying they're generally weaker forms of storytelling, not that they're bad. They can be quite effective. However, when they are heavily relied upon, like they often are in games, that will tend to weaken the overall storytelling.

Environmental storytelling conveys backstory. I suggest that storytelling that focuses on the main story thread is, generally speaking, stronger storytelling than that focusing on backstory. This is debatable, but I think it's the reason that backstory tends to be used sparingly in most movies and TV shows.

And, if backstory is used, it's more strongly presented with flashbacks, which present the details through visual action. Environmental storytelling mostly does not convey details through visual-action. So these forms of environmental storytelling are generally weaker forms of storytelling.

Audiovisual-logs, such as used in Everybody's Gone to the Rapture, and Tacoma, are the only form of environmental storytelling that's via visual action (during-gameplay cutscenes also use visual action, but they are, like normal cutscenes, not a form of environmental storytelling). Though, as mentioned before, audiovisual-logs can't make use of cinematography or editing, as the player is still in control of a character. The camera needs to be suited to moving a character around. Editing involves cutting out parts of the action, showing only the details before and after it, and that doesn't mesh well with gameplay. Still, audiovisual-logs have the potential to be fairly strong forms of storytelling -- similar to flashbacks.

All the other forms of environmental storytelling are not presented through visual action. They may be textual artefacts like diary entries and letters. And auditory ones like audio-logs.

What about inferred backstory, like the corpses in the bed? Here the backstory is told using visual details (from which the player infers past events). But those are present-moment visual details; the backstory being conveyed is not shown visually. It doesn't, for example, show the visual action of the couple taking the sleeping pills and getting into the bed.

The player coming to their own realisation of what happened, through the environmental storytelling, is something that a number of players enjoy. This is part of the appeal of environmental storytelling. However, I don't consider this to add a lot to the strength of the storytelling.

 

We can draw a distinction between artefacts that provide narrative details, and those that don't, but which convey narrative-relevant information. We can call these narrative and non-narrative artefacts.

Artefacts like diary entries and letters can be either narrative or non-narrative. They can convey narrative backstory, like a diary entry recounting an event that occurred. A non-narrative example is a diary entry that said "Bought new clothes at shops" and then went on to list the items of clothing. This is just some information, though from it the player may infer some narrative-relevant details (that the person who wrote it cared a lot about their neighbour, who they bought a number of items for).

(To turn to textual mediums for a moment, epistolary novels tell a story through letters sent between characters. "Epistolary novels" is also often used in a broader sense, covering stories told through any kinds of artefacts, such as diary entries, newspaper clippings, or other kinds of documents. Some well known examples are "Carrie", by Stephen King (1974), "Posession", by A. S. Byatt (1990), and the Adrian Mole series, by Sue Townsend (1982-2009). Such novels show how something akin to environmental storytelling can be used quite successfully for storytelling. I believe that using artefacts in this way is much more suited to textual mediums, and much less so in visual mediums like visual video games. This is because they are textual artifacts, which means they fit in with the textual nature of novels. Whereas, showing textual artefacts on screen for the viewers to read, or having them read out by a character, is not as suited to the visual-action character of visual mediums.)

I contend that such non-narrative information is, generally speaking, a weaker way of conveying details that are part of the narrative.

Audio-logs may simply be recorded versions of artefacts like diary entries, which may or may not be focused on narrative details. Narrative audio-logs might, for example, contain a recording of when some bad guys stormed a character's office, and took them hostage. That is, an audio-log may contain a recording of an event that happened. Audiovisual-logs will usually convey narrative details.

Inferred backstory (like the corpses in the bed) also conveys narrative details.

Here's a summary of the ways the environment may directly contribute to the narrative and those that do so indirectly.

These environmental storytelling techniques directly contribute to the narrative. Those that contribute to:

  • The main plot (e.g. evidence of who the murderer is).
  • backstory (with narrative artefacts)
    (e.g. certain of: diary entries, audio-logs, audiovisual-logs, frozen-moment-logs, and inferred backstory like the corpses in the bed).

Environmental storytelling techniques that only indirectly contribute to the narrative/plot:

  • The setting (e.g. NYC)
  • The stage enabling certain events (e.g. rooftops for a chase sequence)
  • Atmosphere and world building (e.g. moss-covered ruins, strange sounds)
  • Characterisation (e.g. a character's messy room)
  • non-narrative artefacts conveying backstory (e.g. diary entries, audio-logs).

And where environmental details are used to effectively communicate story-relevant details (e.g. to highlight size of large cave chamber, make player crawl though a small space to get to it).

Audio-logs and inferred backstory that convey narrative details, don't do so through visual action, so they are generally weaker forms of (back)storytelling than flashbacks, which do.

To help reinforce these points, we can note that movies and TV could employ an equivalent of audio-logs. There could be scenes where a character is listening to a sound recording. Or where the audience hears a character reading out a diary entry or letter. The visuals might show the character driving in their car as they listen to the audio recording, or walking around their house while reading the diary entry or letter.

Because the focus of such scenes would be on the audio, the visuals would essentially serve as a background to the audio. The visuals couldn't convey any substantial narrative details, as that would distract the viewer from the audio. So thus it'd be weaker storytelling, because it's not focused on visual action.

That such scenes are, as far as I'm aware, rare in movies and TV is, I suggest, because they're a weaker form of storytelling.

 

Pacing

Another reason flashbacks are a stronger means of conveying backstory is that their visual-storytelling benefits pacing. Pacing is an important part of storytelling. A story may concern some events that take place over a week, but they -- as represented in a movie or TV show -- may do so through only a couple of hours of visual action. They condense the details. They filter out the narratively-irrelevant details, to leave a more narratively-concentrated end-product. If the narratively-interesting details are padded out with a lot of irrelevant details, it will slow down the pacing, and dilute the narrative.

A heavy focus on environmental storytelling, as is often found in games, negatively affects the pacing. It slows the pacing.

Imagine if, during a movie or TV show, there were several occasions where the main character read a full-page diary entry, letter, or memo. Where, each time, the shot of them reading it lasted long enough for the character to read the full text. (Their reading of it might be conveyed with a voice-over representing the character's inner voice, as they read the page). That would, I think, make the pacing feel quite strange. It'd be jarring, to go from the normal speed of the pacing in a movie or TV show, to these really slowed-down segments.

(And this is on top of the fact that gameplay itself, also slows the pacing of the storytelling. Because there'll be long segments of gameplay-focused action in between each narrative-focused segment, and the gameplay is usually not conveying much vis a vis the game's narrative).

 

Limitations to narrative complexity of audio-logs and inferred backstory

I mentioned earlier that audio-logs and inferred backstory can convey narrative details. They are, however, quite limited in their ability to do so.

An individual audio-log or inferred backstory instance can only present fairly short and simple narrative details. And there are fairly strong limits on how substantial/complex the overall narrative details from the totality of the audio-logs/inferred-backstory.

With inferred backstory (like the corpses in the bed) there's visual details from which the player can infer past events. But it's difficult to convey a lot of detail in this fashion. The player has to infer -- figure out -- the events from the clues. It would be too complex for the player to infer more than a simple set of details. For one thing, it would be very challenging to indicate the sequence of the events.

Audio-logs that convey narrative events (like a recording of a kidnapping) need to be relatively brief. Audio-logs are designed such as to not get in the way of gameplay, while the player is listening to them. While listening, the player can still continue to explore around, and possibly even take on some enemies. While listening, they'll be watching visuals (of their environment) that are likely pretty unrelated to the content of the audio-log. Which means they face distractions while listening to the audio-log. They won't usually be paying 100% attention to it. So the player would have trouble being able to properly take in longer audio-logs. Also, if audio-logs were lengthy, then while the player is still listening to one, they might get into a situation (like a major fight with multiple enemies) where 1) they can't focus at all on the audio-log and 2) the audio-log might distract them from the gameplay. (Though this latter point might be addressable by a means to pause audio-logs).

All of the narrative audio/audiovisual -logs, and all of the inferred backstory, in a game, could together contribute to the overall narrative. However, there might be an average of, say, 10-30 minutes of playtime between each audio-log or inferred backstory that the player comes across. During which time the player is undertaking gameplay. This places fairly heavy demands on the player's memory, and the game designer can't expect that the player will be able to recall a lot of the specifics presented in earlier audio-logs and inferred backstory instances. Therefore, individual audio/audiovisual -logs and inferred backstory have to be designed to be somewhat stand-alone. They can't be narratively connected together in intricate ways.

So there are limits on how substantial/complex the overall narrative, conveyed through multiple audio/audiovisual -logs and/or inferred backstory instances, can be. One other reason for this, that applies to a number of games, is that different players may come across the audio/audiovisual -logs and/or inferred-backstories in somewhat different orders.

In contrast, visual action can convey a lengthy sequence of events (like, a whole movie's worth).

 

Contrived nature of audio-logs

Audio-logs are also somewhat contrived. Why were these recordings made in the first place? It might make sense if they're like diary entries or voice notes that a character made. But it doesn't if they're a recording of a narratively-significant event that happened. Who thinks to switch on a recorder just before a significant event? And why are the audio-logs found in various different places in the environment? Often it doesn't make sense. So this contrivance is another reason why you wouldn't have characters in movies/TV finding such audio-logs.

 

Audio/audiovisual -logs are only compatible with certain story settings

And both audio-logs and audiovisual-logs are also only compatible with certain kinds of stories. Neither of them could appear in a realistic story set in the 1700s. Audiovisual-logs, further, need to be in a story that contains elements that are magical, supernatural, alien, or high-tech.

 

In summary, environmental storytelling consists of generally-weaker forms of storytelling, and the reason they're often used in games is that they fit better with gameplay than do other means of storytelling.

 

Where next?

Given the storytelling limitations of the environmental storytelling we've looked at, are there other ways environmental storytelling could be used, that might be better for storytelling?


Further exploring the use of audiovisual-logs

Audiovisual-logs have potential because they can present narrative details through visual-action. The player sees the characters and events being portrayed. Yet, I'm only aware of three games that use them. One (The Vanishing of Ethan Carter -- see below) barely uses them, and the other two (Everybody's Gone to the Rapture, and Tacoma) don't lean much into the visual action.

I expect further exploration of the use of audiovisual-logs in future works. Especially since it should become cheaper and easier for game developers to create them. Motion capture and animating the captured data, is only going to get cheaper and easier over time. AI will likely play a role in that. AI will likely make it quicker and easier to record and process the motion capture data, and to generate animated models from it.

 

Audiovisual-logs with full character-detail

In The Vanishing of Ethan Carter (2014), the audiovisual-logs have a quite minor role. Each one is quite short -- probably 5-10 seconds long -- and there's only a few of them in the game (probably 5-10). In these you see the representations of the characters like you would in a cutscene, except you are there in the scene and can move around and look around while you're watching it play out.

In Everybody's Gone to the Rapture (2015) and Tacoma (2017), the audiovisual-logs are the main source of the storytelling, but at the same time, those games don't lean much into the visual action in the audiovisual-logs, because they show highly abstracted representations of the characters.

In Everybody's Gone to the Rapture (2015), the characters are represented by glowing, dancing points of light. It makes it difficult to even get a clear view of the characters and their movements.

(A still from an audiovisual-log in Everybody's Gone to the Rapture. From this one-minute video. It's from early in the game.)

In Tacoma (2017), you can see the shapes of each of the characters, but those shapes are just filled in with a single colour, where each character has their own colour. These character representations are more 'readable' than the ones in Everybody's Gone to the Rapture. However, in both of these games, you can't see any details of the character's faces. Without facial details, the audiovisual-logs are missing an important part of visual action.

(A still from an audiovisual-log in Tacoma. From this trailer for the game)


Those two games may have used highly abstracted representations of the characters for technical reasons (e.g. for performance). But whatever the reasons were, it seems clear to me that audiovisual-logs showing full details of the characters are far superior. The characters might be shown as semi-transparent, to indicate that you're seeing the details of a past event. Current gaming hardware should be able to handle showing such details, without any problem.

We can call these 'audiovisual-logs with full character-detail'. It's something I expect to see explored more in the future.

 

Audiovisual-logs about ongoing or future situations

Existing audiovisual-logs convey backstory. In The Vanishing of Ethan Carter, Everybody's Gone to the Rapture, and Tacoma, the player enters a situation where a series of events has taken place in the past, before they arrived[4]. The player's goal is to try to understand what happened. 

[4] or not quite so, for one of these games. But to explain would be a spoiler.

Rather than conveying details that happened a while prior to the current moment, audiovisual-logs could be used to convey events that happened only a short while ago, events that are happening now but in a different location, or even future events that haven't happened yet.

So despite what I've said elsewhere in this post, environmental storytelling is not inherently restricted to conveying backstory. What it can't convey are the generally stronger, from a storytelling perspective, details that are happening here and now where the player is.

 

NPC-perspective audiovisual-logs

Audiovisual-logs present some past situation involving some NPCs. Instead of presenting them from the player's point of view (POV), as is normally done, they could be presented through the POV of one or more of those NPCs.

The player could freely switch between the POVs of the different NPCs in the situation. Or perhaps there could be a puzzle element to it, where the player has to do something to unlock each of the different NPCs' POVs.

 

Combining inferred backstory with audiovisual-logs

The Vanishing of Ethan Carter contains puzzles where the player has to discover clues about some past events, and then put those clues in the correct temporal order. This is the player inferring some backstory from the clues. At the end of this process the player is shown a brief audiovisual-log.

There are other possible ways of combining inferred backstory and audiovisual-logs, that are yet to be explored. For example, the following.

The player comes across some clues to some inferred backstory, and once they've seen them all, the game could play an audiovisual-log of the backstory details.

For this to work, the game has to know that the player has noticed those clues. The game could have a 'look' verb, that the player could use on objects. And if the player has 'looked' at each of the clues, the game could play the audiovisual-log.

Or, to make it more challenging for the player, them noticing the clues might require them to apply a separate 'is clue' verb to each of the items they think are clues. That way, a player who just looks at every available objects (as players tend to do in games) would not thereby automatically find all the clues.

In The Vanishing of Ethan Carter, the player is required to put the clues in the correct temporal order. But there could be alternatives to this. Like correctly linking each clue to the person that left it behind.


Interaction within visual action, to enhance the storytelling

My primary interest is in storytelling, and how gameplay, or interaction, could be used to enhance the storytelling.

In a visual medium, visual action is the strongest way to convey storytelling details, so I am interested in how interaction can be used within visual-action storytelling, to enhance that storytelling.

I mentioned earlier that it is difficult to include gameplay within visual-action storytelling. With most of the means of environmental storytelling, the gameplay sits outside of full visual-action storytelling. Currently the only options for interaction in visual-action storytelling are QTEs, choosing from a small menu of actions (e.g. which of the two people do I try to save?), and during-gameplay cutscenes (which are not a form of environmental storytelling, just like cutscenes are not a form of environmental storytelling).

During-gameplay cutscenes (see post about the different kinds of cutscenes) are one way of combining gameplay and cutscenes. So far they have been used in relatively-few games, and their use could be explored further.

I think there are a lot of unexplored options for using interaction within visual-action storytelling, and it is these that I am primarily interested in exploring. But that is a topic for another post.

Wednesday, January 24, 2024

Interactive storytelling: Types of cutscenes in games

Video game cutscenes are like little movies, played between gameplay segments. They, like movies and TV show episodes, are a sequence of one or more scenes, where each scene is a sequence of one or more shots.

There are 1st-person shots and scenes, which show the action from the perspective of a particular character. And there are 3rd-person shots and scenes, which don't show the action from the perspective of a particular character, but rather the view from where the camera is located and facing[1]. 

[1] there are also scenes rendered in Virtual Reality, in VR games and movies. These add an extra dimension of immersion.

In movies and TV shows most shots are 3rd-person shots, with the occasional 1st-person shot thrown in. In games, 1st-person (cut)scenes are more common. For example, in the Metro games, the Dishonored games, Halo 3: ODST, Halo: Reach and Cyperpunk 2077. The reasons for this needn't concern us in this post.

Video game cutscenes can include interaction. This post looks at such cutscenes, and how the interactivity in them affects the strength of the storytelling in them.

 

Visual-action storytelling

Here we introduce the notion of "visual-action storytelling". It will help us discuss the affect of interactivity on cutscenes in games.

The primary form of storytelling found in movies and TV is "visual-action storytelling", in which we see the story events occur. By "action" I don't mean in the sense of an "action movie", with weapons, fights, and chase-scenes. I don't mean something that has to be highly dynamic. The "action" is just what the viewer sees unfold over time, and that includes very still scenes where very little is happening.

Visual action is made up of components like the actor's performances, the sets, the cinematography, and the editing.

If there was a movie that was just 1.5 hours of a character recounting a story to some others, where all the footage was just of the storyteller and their audience, this would be a very weak form of visual action. It would contain visual action of the storyteller and their audience, but no visual action of the story being told. In that situation, the "real" story details are being told, not shown.

Whereas if instead of focusing totally on the storyteller and their audience, there were also visual-action scenes showing the events of the story that's being recounted, that would be a stronger form of visual action. In that version, the movie's viewers would be shown the visual action of the story details.

 

Interactivity in visual-action storytelling

Normal cutscenes are non-interactive. In video games, some degree of interactivity can be introduced into the visual-action storytelling of cutscenes. Though, as we'll see, it's only limited forms of interactivity, and their addition can lessen the strength of the visual-action storytelling.

The following looks at the different ways interactivity can play a role in the visual-action of cutscenes.
 

QTE-and-choice cutscenes

Two forms of interactions that may occur within cutscenes are QTEs (Quick-time Events) and making choices.

Imagine a story-focused game, where, after much journeying the player makes it to the castle, and gains an audience with the king at the king's court. There's a cutscene of the player character entering the court and talking to the king. As the cutscene continues, the situation goes south, and a fight breaks out between the player's character and the guards.

One way that fight could be implemented would be with normal gameplay, where the player moves their character about, attacks with their weapons, and blocks with their shield.

Alternatively, the fight could be a continuation of the cutscene. That way, the fight could be made to look very cinematic. It could contain dedicated character animation and performances, be 'shot' in a cinematic way, use various camera angles and movements, and be edited to look spectacular. However, the player would lose direct control over their character.

Imagine a moment during the cutscene where an enemy swings their sword at the player, and the player dodges to the left, just in time to avoid the blade. A QTE (Quick-time Event) could be used for making the character dodge the blade, to add some interaction into the cutscene.

Here's how the QTE would work. As the enemy goes to swing their sword, time will slow down a bit, and the screen will show a prompt, telling the player to press left on their joystick. The player would (usually) have a small window of time to enter that input. If they enter the correct input within the time limit, they succeed at that QTE -- and successfully dodge the sword -- otherwise they fail at it. The penalties for failure depend on the game and the particular situation in it. The penalties could be minor, all the way up to player's character dying.

As another example, during the fight the player could be grabbed from behind by some of the guards, and there could be a QTE prompt telling the player to quickly tap (and keep tapping) one of the controller buttons. If the player taps the button fast enough, they'll escape the grips of the guards.

Remember that in both of these examples, the action would be shown in a cinematic fashion, just like in a movie.

Other types of inputs used in QTEs include moving a joystick in a particular path (e.g. in a full circle, or to right and then anticlockwise, to up). And on touch-screen devices, the player may need to tap on hotspots on the screen, or slide their finger along a path (e.g. a circular path). The QTEs can be action-mirroring inputs.

QTEs are divisive. Many players do not like them. They see QTEs as a fairly pointless attempt at including a bit of gameplay here and there in cutscenes. They may not find QTEs enjoyable.

One thing that I think all could agree on is that QTEs are a fairly simple form of input. The game tells the player what to do, and when to do it, and the player just needs to follow the instructions properly. Performing the input(s) for a QTE is a fairly rote task. The player doesn't have much agency when it comes to QTEs. (We'll talk about choice below, and choices may be implemented as part of the QTEs, so in this sense they can provide some agency).

Choices provide another means for there to be player-interactions within cutscenes. At points in cutscenes the player can be presented with a choice from a small menu of options, and -- like with QTEs -- there will usually be a time limit for making the choice. The choice might be between a small number (2 to 4) of dialog options or action options.

As an example of choices between action options, there might be a oncoming zombie horde, where the player has to quickly choose between saving one of their companions, Joe or Jackie. Especially in timed action choices, we may consider these kinds of choices to be a kind of QTE.

We can call the kinds of cutscenes just described "QTE-and-Choice cutscenes" (or "Q&C cutscenes" for short).

The addition of QTEs and choices don't weaken the strength of the storytelling in these cutscenes. They're just like normal cutscenes, except for the introduction of some basic forms of interaction (QTEs and choices).

 

During-Gameplay Cutscenes

There are kinds of cutscenes that can play during the gameplay scenes in games. With these, the player still has some control over their character, while, at the same time, cutscene-like details play out around them. We'll call them "during-gameplay cutscenes" (or "D-G cutscenes" for short).

Half-Life 2 is the classic example of a game that includes during-gameplay cutscenes. Almost all of its cutscenes are of this sort. The Dishonored games are another example of games with during-gameplay cutscenes. The Metro games also contain a fair number of D-G cutscenes.

Half-Life 2 is a first-person shooter in which some hostile aliens have invaded earth. The game starts with you disembarking a train, and soon after you see an alien guard, holding a large baton, shove a person they're overseeing. As you walk past some of the humans there, they say things to you. Soon, you reach a checkpoint, where you're taken away to an interrogation room, where the helmeted alien who led you there takes his helmet off, to reveal themselves to be a human -- a person that your character knows. The entire time all this unfolds, the player has control over their character's movement (though of course where they can move is constrained by the walls etc in their environment), and can control where their character is looking.

Such during-gameplay cutscenes consist of scripted dialog and events (animated occurrences), that are triggered by the player's actions. For example, an NPC saying something to the player might be triggered by the player walking close-enough to the NPC. Or the player performing actions, like picking up a gun on the ground, might trigger some during-gameplay cutscene.

The cutscene elements of D-G cutscenes can consist of any scripted details. They could be as simple as some characters talking. D-G cutscenes can also be of any length. Some may quite brief, lasting only a couple of seconds.

One form of D-G cutscene is where the player's character and NPC(s) are having a conversation as they walk along. These are epitomised by Naughty Dog's Uncharted games and The Last of Us games. Yahtzee Crowshaw calls these "Walk and talk sequences".

We wouldn't normally think of these Walk-and-talk scenes as kinds of cutscenes, but they match the criteria laid out above, of being scripted details that play out during gameplay. In this case, there is the scripted movements of your NPC companions, and the scripted dialog said by your character and the NPCs.

These conversation-focused D-G cutscenes don't have take place while the characters are walking. They can occur during any kind of activity the characters are undertaking, like climbing a structure, or fighting against enemies.

Games like Rockstar's Grand Theft Auto (GTA) games have a huge amount of D-G cutscene detail going on all around the character during gameplay. There's vehicles on the roads, people walking about, and so on. We wouldn't normally think of these as kinds of cutscenes, but they meets our criteria. In this case, the scripted details are more "background details", that are in-part there to give the cities a lived-in feel. They're always rolling, and aren't there to play a specific role in the plot.

Other well-known games with D-G cutscenes are the Metro games, and the recent God of War games.

In a D-G cutscene, the player's control may be restricted in certain ways. They may be able to look around, but not move around, such as in the opening sequence of Skyrim, where the player wakes up as a prisoner, in a horse-drawn cart. Or in Metro 2033 where the player is sitting at table, with other characters, in a bar.

Arguably, the greatest implementation of during-gameplay cutscenes thus far, is the gang's camp in Red Dead Redemption 2. The camp seems to have a life of its own, outside of the existence of the player's interaction with it. Characters go about chores, have conversations amongst themselves, sit around the campfire at night drinking, chatting, and playing musical instruments, before retiring to their tent once it gets late enough for them. And the player can choose to, or not, take part in the goings on, such as the conversations.

 

Mixing during-gameplay cutscenes and other forms of cutscenes

During-gameplay cutscenes and other kinds of cutscenes can be seamlessly mixed together. For example, in the aforementioned scene in Metro 2033, where you're sitting at a table with a few others, having drinks and conversation. Most of the time it's a D-G cutscene, where characters are talking and the player can freely look around. But at times, NPCs will propose a toast, at which point it will show a short non-interactive 1st-person cutscene, where you grab your drink and take as swig of it, and put it back down.

It's possible to mix all three types of cutscene -- non-interactive, QTE-and-choice, and during-gameplay -- within the one overall cutscene.

 

Downsides and limitations of during-gameplay cutscenes

During-gameplay cutscenes sound good on paper. They enable gameplay/control and visual-action storytelling to be integrated. They appear to be the best of both worlds. We may wonder why they aren't used more in games. One reason is that they have a number of limitations and downsides. We'll look at these now.

 

During-Gameplay Cutscenes Weaken and Constrain the Visual-Action Storytelling

Movie/TV scenes and cutscenes use cinematic techniques, throught the use of camera angles, camera movements, and editing (cuts). In D-G cutscenes, because the player has control, the same camera that's used in gameplay -- whether first- or third- person -- is still used in the D-G cutscene. That lessens the strength of the storytelling, to some degree.

Movie/TV scenes and cutscenes involve real/virtual actor's performances. These performances include the

  • blocking (including the marks where they stand and move at each moment of each shot)
  • physical interactions between characters
  • body language, including posture, how they move, facial expressions, gestures, etc.
  • dialogue

And their timing is crucial to their performances.

And of course the performances involve interactions between characters. Eye-contact between characters. Where a person is standing and facing in relation to others, and how people walk around, amongst others. When two people meet for the first time. When a person approaches another to ask them for assistance. Or when there is flirting between people. When there's a group of people, and someone is speaking, there's how others are listening, and their reactions. Vigorous agreement, vs indifference. And so on.

All those body language and performance details have meaning, meaning that they communicate. A lot, for example, is communicated by the nature of the eye-contact between people. For example, the differences between avoiding eye contact, confident eye contact, aggressive staring, and flirty eye contact.
 
The usual control schemes in games are too simple, to be able to control such details. The player's character might be able to move in different directions, look in different directions, jump, and crouch. Those options pale in comparison to the sort of control an actor has in their performance. So the player's character can only have a 'passive' role in D-G cutscenes, that does not include those specific performance details.

As a different way of seeing this point, consider that in D-G cutscenes, the player could only communicate to other characters via moving forwards, backwards, left and right, turning to the side, and, by what direction they're looking in. The reader can imagine if they, themselves, were in a situation with other people and these were the only ways they could communicate to others. Or imagine if they were controlling a robot with such a control scheme, that was amongst those people, and the robot had zero ability to express facial expressions or body language, just those kinds of movement.

There's a fundamental difficulty here, in that the game can't know the player's intentions. The player may want to be friendly, or dismissive, towards another character. But the game can't infer those intentions from the player's control over their character. There's a paucity of information about the player's intentions, and only so much that can be inferred. This couldn't be overcome by having AI to have NPCs react more realistically.

To accommodate these control limitations, the player might be seeing some event taking place on the other side of a chain link fence. Or seeing large alien machines moving around in the distance. In such cases the player is distanced from the action, and are more like spectators.

There could be D-G cutscenes where player's character can choose where to stand amongst a number of people, as a conversation plays out. In such situations, it doesn't matter where the player character is standing in the scene, because all that matters is that they hear the dialog. In such cases the player is more of a passive spectator of what's going on.

That the player tends to be relegated to a passive spectator is why D-G cutscenes are so often focused on auditory details, like conversations between characters, or messages over loudspeakers. The player can be a passive participant in these, because audio is (somewhat) omni-directional, it doesn't matter exactly where the player character is facing and positioned at anytime during the D-G cutscene -- they'll still be able to hear the details.

Imagine a D-G cutscene where the player was an active participant in it. E.g. if player is up close to another character, in defiance of them, gets shoved back, and then darts back towards them, looks quickly to the sides at the other people there, and then turns to one of them, puts a hand on their shoulder, and whispers something to them. That would have to be done with a normal cutscene (which could possibly include QTEs).

Or imagine game developers trying to make a D-G cutscene where the player is showing another around their study, pointing out different objects to them. Where the player picks up a rare bottle of whiskey and shows that to the other character, handing the bottle to them. This couldn't be done either, if the player character is being controlled in the usual fashion.

One possible way of making the player character's performance include what was required for such scenes would be forcing them to do exactly what is required (and thus give up freedom of control).

A control scheme that would give the player control over all those nuances would have to be very complex. Perhaps it could be done with a mouse and keyboard, using all the keys on the keyboard, and use of those keys in combination with modifier keys. But then it'd be too complex for any normal person to use. VR with facial and body tracking could be a more realistic option.

Or something like QTEs could be used, though this wouldn't really give the player control over the character, it would be more just the player entering in the correct inputs to satisfy each of the QTEs.

A potential option would be to give the player the freedom, and have the other characters respond to the player's actions in a realistic way. The problem with this is that it'd be too complex to script in the different branching possibilities, and we don't have the means to simulate the responses of the other characters. And even if we could simulate them, we'd also have to also simulate the plot implications of some of the possibilities. And even if we could do that, many of those other possibilities would result in weaker storytelling (because strong stories don't lie near to each other in story space).

The player's character having to play a fairly passive role in D-G cutscenes means that such cutscenes could not be used for the storytelling of most scenes from movies/TV -- scenes that don't usually contain a passive role. Given that, it's fair to say that D-G cutscenes generally contain weaker storytelling than can be present in non-interactive cutscenes.

To me, the way the D-G cutscene details are always happening at a distance from the player character makes them feel a bit like a theme park attraction. As if the player is moving along through a set path, and there's scenery/sets and animatronics on either side of the path. The start of Half-Life 2, and the metro stations where the people live, in the Metro games, have this feel to me. I feel there's an uncanny valley feeling to the interactions with NPCs in during-gameplay cutscenes.

 

In conclusion, non-interactive cutscenes enable strong visual-action storytelling, but are at odds with gameplay. Interaction allows QTE & Choice cutscenes, and During-Gameplay cutscenes. QTE & Choice cutscenes involve weak gameplay as part of strong visual-action storytelling. During-Gameplay cutscenes enable stronger gameplay, at the expense of weaker and more-constrained visual-action storytelling.

Sunday, January 21, 2024

Interactive storytelling: Activity-and-Choice Story-Games

In this post I'll define a genre of video games that I'll call 'Activity-and-Choice Story-Games'. These are exemplified by games like Detroit: Become Human, Life is Strange: True Colors, The Wolf Among Us, and Florence.

These games can be seen as an evolution[1] of Point-and-Click Adventure Games, in which the puzzles of those games are replaced by 'activities' the player undertakes and choices they have to make. We'll get to what activities and choices are in a moment. These changes have been made to make this genre more story-focused.

[1] that B evolved from A doesn't automatically make B superior to A. Bs could be simpler than As, not necessarily more complex. Evolution does not have a direction. So we're not saying this genre is superior to Point-and-Click Adventure Games.

Here are some other prominent examples of 'Activity and Choice Story-Games' (A&C Story-Games): Life is Strange 2, Telltale's The Walking Dead[2], and Tales from the Borderlands

[2] The first season of Telltale's The Walking Dead is a transitional form, that still involved some puzzles. The subsequent seasons of it became more pure A&C Story-Games.

Traditional Point-and-Click (P&C) Adventure Games are puzzle-focused. The player solves puzzles to progress through the game and its story. The player can pick up objects, look at objects, use objects (e.g. turn on a light switch), or use one object on another (e.g. putting a key in a lock, to open the lock). These actions are primarily means for the player to solve the puzzles.

This affects the kinds of stories that can be told in P&C Adventure Games. One, what the player is primarily doing is solving puzzles, and thus that is the main thing the player's character is doing in the game's story -- going about, solving puzzles. And two, the story can only[3] progress after a puzzle is solved. This second effect slows down the pacing of the story. It might take the player quite some time to solve a puzzle. All of these things restrict the kinds of stories that can be told in P&C Adventure Games.

[3] This isn't strictly true, some story events can happen while the player is going about trying to solve a puzzle. However, that can't be a story event that would change the game world in any way that would prevent the player from solving the current puzzle. Such story events are uncommon.

The story has to be one in which the main character is going around solving puzzles, where it still has to make at least some sense if the story is put on hold for potentially a fair while, till the player has solved the puzzle. The focus on puzzles thus detracts from the ability of these games to tell a story.

Activities

In Activity-and-Choice Story-Games, one of the elements that replaces puzzles are 'Activities'. An 'activity' is some simple task that the player needs to undertake, where there is minimal challenge for completing the task. An activity there is to provide the player with ways to interact with the game, and they tend to be story-focused.

Activities are usually straightforward tasks, and present either no or minimal challenge. In many of the games, the player will be told what the current activity is (perhaps in the form of the activity's goal or as an item in a to-do list).

Here are some examples of activities, from the early parts of Life is Strange 2, in which you play as a teenager, Sean:

  • After coming back home from school, and chatting with a friend on a deck outside the house, you go inside. Your dad and little brother are there, you have a conversation with them, and your dad says that you have to decide whether your little brother or him gets the last chocolate bar. You get to be the judge making the decision of who gets to have it (you can also choose to have it yourself).
  • That night you're going to go to a party with some classmates, and before heading out, you need to grab some stuff to take over: drinks, snacks, a blanket, and some money for supplies. You have to walk around the house, and get the items. And you have options for what to choose as the drinks to bring, and what to choose as the snack. You have the option to take some of your dad's beer, and some money from your dad's wallet, or not.
  • Later you and your brother are on the run, and need to find some place to stay overnight. You come across a national park area, and the activity is going down the paths from the entrance to where the campsite areas are. During this there's conversations and some things you can look at. And after you've found some shelter, there's the activity of collecting some firewood for a fire. That's just a matter of exploring around and finding pieces of suitable wood.
  • Then there's lighting the fire, then sitting there engaging in conversation with each other.
  • Later on in the game, you stop at a roadside gas station and store, and get some supplies for the trip there. There's various options for what to buy, and you've got limited cash, so you have to think about what are the priorities.
  • Sometimes an activity can be as simple as walking from location A to location B. Along the way there may things the player can look at, and people they can talk to.


Activities are usually something relevant to the story, and they're more like scenes in a movie than any interactive parts of P&C Adventure Games are. Often they're 'slice of life' kinds of details.

Another example of an activity, from the start of Life is Strange: True Colors, is unpacking your bag when you arrive from out-of-town to the place you're staying at. You unpack the bag item by item, and for some items she might reflect on the role it had in her life, or some other items are textual (like some letters), which also provide a bit of a window into her life. Thus it provides some backstory. At the end of the process she goes to put her bag under her bed, and in doing so finds the guitar that her brother had gotten her as a present, which leads to a cutscene of her taking it out, commenting on it, and playing a bit of it.

While undertaking an activity the player may be able to look at things in their surrounds (including other characters), and the player character may comment on them in a way that reveals their thoughts on what is happening at the current point in the story. The player may also be able to talk to characters in their surrounds, about things relevant to what's currently happening in the story.

Activities may be very basic, easy-to-solve puzzles. That is, puzzles that can be quickly solved, and which present very little challenge to the player. Because A&C Story-Games de-emphasise puzzles, if they do contain some puzzles, they will (in addition to being easy) contain much fewer of them, compared to the number in a P&C adventure game.

Activities may also involve Quick-time Events (QTEs), to add small moments of interactivity and/or challenge in cutscenes. Successfully completing a QTE is usually easier than solving a puzzle (and of course it is a quite different kind of difficulty), and QTEs are compatible with a story-focus, because they can be embedded in cutscenes.


Choice

The second component of A&C Story-Games are choices.

We've already mentioned how choice can play a role in Activities. For example, in Life is Strange 2, when Sean and his brother are on the run, they come to a gas station and store, and with their limited money they have to choose which supplies will buy (if I recall correctly, there is also an option to steal some items). The player may also have choices in what to say to other characters, when they're talking to them.

These games may also have more explicit choices. Where the game presents the player with between 2-4 options for what to do, in the vein of the "Choose Your Own Adventure" books. For example, there may be an argument going on between person A and person B. Should your character

  • Take person A's side
  • Take person B's side
  • Stay out of it


The first season of Telltale's The Walking Dead (2012) innovated on this formula in a way that proved to be very successful, and which also helped to popularise the use of choice in games. The main part of this innovation was to make the choices timed.

They applied this formula to both "Choose Your Own Adventure"-style choices and dialogue choices. The player would only have to a very limited amount of time to make a decision. If they failed to make a choice in this time, it'd either result in inaction from their character, or them choosing a default action.

Second, they made the choices very difficult -- where there were pros and cons to all the options.

Having to make a quick decision about a difficult choice turned out to be quite immersive. It forces the player to put themselves in their character's shoes.


Since that first season of TellTale's The Walking Dead, the use of choice in story-focused games has become quite popular. Interestingly, though, the use of timed choices hasn't been widely taken up. For example, the Life is Strange games have choices, but they're not timed.

Sometimes the choices are about personal expression rather than being difficult choices. For example, early in Life is Strange: True Colors, your character can choose which of two potential bad-guy characters will be the villain for the LARP another character is setting up.


A&C story-games replaced puzzles with activities and choice to make the games more story-focused games. Lets look at the consequences of this for the storytelling in these games.

Because the focus in P&C is on puzzles, the interactable objects in the environment, the descriptions, and character conversations, are all primarily oriented towards solving puzzles. That is, to give the player clues for solving puzzles, and the means to solve puzzles. That is not to say that there's no story focus regarding these things. Only that such has to be secondary.

If puzzle-solving isn't put first, then the details presented to the player could hinder their ability to solve the puzzles. E.g. providing details that don't at all help with the puzzles may mislead the player about the solutions to the puzzles.

Whereas in A&C Story-Games, the objects in the environment, looking at them, and talking to characters, can all have a story-focus.

When the player looks at an object, their response may reflect their state of mind at this point in the story. It may also involve their thoughts about a character who has some connection to that object (maybe it's something they own). When the player talks to other characters, it is more about story-relevant details, than to get information for solving puzzles.

We could, if we wish, consider the acts of talking to other characters, and looking at objects, in A&C Story-Games, to be kinds of activity.


What we're calling "Activity-and-Choice Story-Games" would most often be called "Narrative Adventure Games". However, that latter term is often used to refer to a much broader set of games than A&C Story-Games. Some of the kinds of games that "Narrative Adventure Games" gets used to describe include P&C Adventure Games that have a more of a focus on story than the typical P&C Adventure Game, like Kathy Rain, Walking Simulators, like Gone Home and Everybody's Gone to the Rapture, Visual Novels, like Doki Doki Literature Club, and film-like games that involve only choices, like Last Stop.

(Why do we not consider Walking Simulators like Gone Home to be A&C Story-Games? That game doesn't have any choices, though there's no reason why a Walking Simulator couldn't have choices. It's that pure Walking Simulators, like those games, don't have anything like the Activities we have been talking about. In them, you walk around, perhaps look at objects, open drawers and doors, and perhaps find audio-logs or audio-visual-logs. In literal terms these are 'activities', but they are not activities where the player is given a particular goal, and where the activity(s) that satisfy that goal have a particular role within the game's story.)

Note that there are also "Activity Story-Games", which only include activities, and don't involve any choices. These are much less common. The only example that comes to mind is Rainswept.


Having easy/fewer (or no) puzzles, means that the narrative in A&C Story-Games is less constrained than they are in P&C Adventure Games.

  • 'Solving puzzles' will play little or no part in what the player's character is doing during the story.
  • There will be less "dead time" while the player is trying to figure out how to solve puzzles, so the story's pacing won't be slowed down as much. Which means the story isn't constrained to be one where it "makes sense" for there to potentially be large amounts of time between each story beat.


The shift from puzzles, to activities and choices, means that each scene in the game can be a specific narrative situation. This is actually quite different to P&C Adventure Games. In them, there tend not to be specific scenes during gameplay segments, but rather a set of locations accessible to the player, for the current set of puzzles they can work on.

Some examples of situations the player might find themselves in in a P&C adventure game. They've been locked inside an apartment, and have to figure out how to get out. Or they need to find a way to distract a guard, so they can go in and talk to the person in the office. In story-terms, these are fairly low-level kinds of details. They're not particularly meaningful.

As we've mentioned before, with a puzzle-focus, the story has to be put on hold until the player has completed puzzles. And because the player usually has multiple puzzles open to them at any point in time, the story can't advance much until the current set of puzzles have all been solved. So there can't be much in the way of story-relevant happenings while the player is working on puzzles. There can't even be much that's strongly associated with the story, that happens within, or upon the completion of, most of the individual puzzles -- only on the completion of the set of puzzles after which the story can move forwards. The ability to work on multiple puzzles at once is good for gameplay, just not so great for story.

And this gives a "genericness" to lots of the details in the game. The object descriptions and conversation choices need to be generic. The player's predicament, for each puzzle open to them at that time, has to be "weak", in that it will also let them work on other puzzles at the same time. If the player has multiple puzzles open to them at once, then when they're walking around in and between locations they're in the same generic holding pattern, until all of those puzzles required for advancing the story have been solved.

With an activity-focus, each "scene" in the game can be about a certain story- event/situation, in a specific moment in time within the story. Within that scene the player is given one (or more, though it's usually one) activity to undertake (and potentially at some points will have to make some choices). That activity, and the player's interactions, have a stronger story-focus in A&C story-based games. So story occurrences can be happening within the scene.

 

Because of how activities and choices allow the games to be much more focused on story, A&C Story-Games are probably what most deserve to be called interactive movies. There are full-motion video (FMV) games that are often considered interactive movies, but these tend to focus only on choices, without any of the activities. As a result, they tend to be shallower experiences.