Wednesday, January 29, 2025

Wolf Tivy and Christopher Sommer on what it takes to do novel work

Two pieces that resonate with my approach to life and to what I'm working on.

The first is Quit Your Job, by Wolf Tivy (Jan 2022). I came across it via this tweet from Kevin Kelly.

In society, there's always new problems to address and new things to create to make things better. You can live your life within other people's existing visions of problems to address and things to create. You can chase money and use that money to consume. But there's also the important task of creating a vision -- defining what are the new problems to address and things to create. And then going after that vision.

This essay is about what it takes to create and execute on such a vision, and how that really can't be done within the confines of a job. You need extended periods of freedom to explore. Thus, "Quit Your Job".

(Doing so is a good way for a person to use wealth they've obtained. It may be difficult for people with a family, unless they're rich. But it's easier than most people think, to live cheaply for an extended period of time, while working things out. And it's definitely possible to do if you're willing to be poor for an extended period, as I have been). 

Here's a sample of it:

The key implication is that while you have not yet found the unique opportunity that will be the engine and purpose of your empire, you have to adjust your sense of value. Value is very legible within a clear plan to reach a clear objective. But you cannot pursue interesting novelty—things that no one else is doing or which you have never seen before, or the little threads of nagging curiosity or doubt—by chasing along known direct value gradients. But that’s where the treasure is. That’s how you will find the place where you need to build. To get the biggest and most interesting payoffs, you have to start by chasing merely interesting novelty in an open-ended way.

Working even a good job cramps your sense of possibility, imposes narrow objectives, and eats away at the little things that could grow into big things if they weren’t so oppressed by the rigors of existing structure. I’ve seen this with my friends, in how they are full of ideas and adventurous spirit a few months after I convince them to quit their jobs. The world is full of ideas and opportunities to explore, but it takes time outside of structure to even adjust your eyes to the landscape of possibility. You are cramped by your job, unable to make the class of investments that is necessary for a life beyond the existing tracks.

If your role in the universe is structured work within order found and built by someone else, those off-road investments are pointless. This conventional work is usually more immediately valuable than anything you could do on your own and it does not require much open-ended exploratory leisure. This efficient pursuit of predictable value is the quiet dignity of the mass of working people. But if we are to solve the bigger structural, spiritual, and intellectual problems which aren’t addressed by existing institutions, someone needs to be exploring off of the established road, where there is a high probability of failing to accomplish anything at all, and a significant probability of discovering and exploiting the next big breakthroughs.

[...]

If you have the resources to spend some time exploring, if you are on to interesting threads of novelty that few other people have, and if you have the spirit to tighten your belt, throw out your map, and explore off-road, then your real job is to do so. It is a grave sin to neglect that kind of cosmic duty. But many more people have the means and privilege to quit their tracked careers than ever realize it and act on it. You need far less than you think to live in monk mode and pursue this kind of exploration. What this means in practice is that at some point far before you are or feel ready, you need to quit your job.

[...]

True ascent beyond the kept life comes only from taking bold, determinate leaps of faith on real constructive projects. [...]

To make such bets you must be indifferent at some level to whether you end up a king or a monk, or even dead. The indeterminate hedge-trader with his logarithmic utility function assigns infinite negative utility to ruin. The man of action serenely regards ruin as the most likely possible outcome, mitigates it where he can, and leaps anyway. He rejects the comfortable half-existence of drifting with the indeterminate human tide and manifests his bold vision into the world. Ruin is largely an illusion in the modern world anyway. If you lose everything you own, you generally still have your network and skills. Even a nominally risk-loving financial utility function is overly conservative in practice because it’s hard to lose these intangible assets.

Life necessarily involves these fatal leaps of faith—bets which you have no certain way of knowing will work out but which define your whole existence and require your intense effort. [...] The highest returns of life and glory come from taking hard bets on your best visions of the future and being able to make them work through dedicated struggle.

[...]

No one can or should be the lone overman who defines all value for himself. We need to cooperate with and defer to each other to make society possible. But even if we individually can only bite off a small piece of the overall purpose structure of our society to manage ourselves, we need to actually do that far more than we do now. For any given question of ends, someone, somewhere, must be taking responsibility. Someone must make that leap of faith to define ends for the rest of us to work towards. No one else is going to do it. Why not you?

[...]

to actually accomplish at your full potential, you have to start doubling down on particular bets long before you know that you can follow through. You won’t see the whole path when you begin. You will have no way of knowing whether it exists, or whether what you are pursuing is even possible. If you have more certainty than that, you aren’t aiming high enough. You have to bet your life on faith that the universe will provide if your vision is good enough.

[...]

Perhaps this is why our society has been so stagnant and uncreative in some ways for the past 50 years. We chose the path of comfort, certainty, measurable progress, and indeterminate hedging of bets. In our cowardice, we turned away from the uncertain leaps of faith of collective struggle after fatal ends that would have demanded us to truly live.

Read the full essay here.

 

The second piece is by former US National Team gymnastics coach, Christopher Sommer. It's from an email he sent to Tim Ferriss that was included in Tim Ferriss's 5-Bullet Friday email of 28 Sep 2024.

We live in a world where people try to manage everything, to make everything productive and efficient. There's deadlines for most things. Things should be done within certain time frames. To not be working to deadlines is seen as to be lazy, unfocused, or sloppy.

Yet this is not true for everything. Developing skills and undertaking creative tasks don't work this way. Trying to fit them into time frames will tend to lead to failure. They take how long they take. This especially applies for the tasks of creating a vision -- defining new problems to address and things to create -- and chasing after that vision, that the first, Wolf Tivy, piece was about.

The approach Sommer describes really reasonates with my own approach to research.

Here's the full text of it:

In fact, this impatience in dealing with frustration is the primary reason that most people fail to achieve their goals. Unreasonable expectations timewise, resulting in unnecessary frustration, due to a perceived feeling of failure. Achieving the extraordinary is not a linear process.

The secret is to show up, do the work, and go home.

A blue collar work ethic married to indomitable will. It is literally that simple. Nothing interferes. Nothing can sway you from your purpose. Once the decision is made, simply refuse to budge. Refuse to compromise.

And accept that quality long-term results require quality long-term focus. No emotion. No drama. No beating yourself up over small bumps in the road. Learn to enjoy and appreciate the process. This is especially important because you are going to spend far more time on the actual journey than with those all too brief moments of triumph at the end.

Certainly celebrate the moments of triumph when they occur. More importantly, learn from defeats when they happen. In fact, if you are not encountering defeat on a fairly regular basis, you are not trying hard enough. And absolutely refuse to accept less than your best.

Throw out a timeline. It will take what it takes.

If the commitment is to a long-term goal and not to a series of smaller intermediate goals, then only one decision needs to be made and adhered to. Clear, simple, straightforward. Much easier to maintain than having to make small decision after small decision to stay the course when dealing with each step along the way. This provides far too many opportunities to inadvertently drift from your chosen goal. The single decision is one of the most powerful tools in the toolbox.

Saturday, March 09, 2024

Broadening the notion of affordances

In the design of physical objects and user interfaces, an object’s “affordances” are how the object’s appearances suggest to the user how the object should be interacted with.

Different door handles have different affordances.


In this post, I want to broaden the meaning of “affordances”.

Multiple tools may be all used for the same kind of task. Eg for recording textual information, there’s pen and paper, a word processor, a whiteboard, and voice notes. 

While the standard notion of affordances in design is “how the object suggests it may be used, before it is used”, the notion we’re describing here concerns how the object, while it is being used, shapes how the user does the task.

A whiteboard suits getting info down fairly quickly, in bullet points, and for drawing arrows between items to show their relationships.

A word processor encourages writing in full sentences. Because we can easily see what we’ve already noted down, they also encourage us to write in sequences of sentences and paragraphs.

Pencil and paper seems to be part way between the free-form nature of the whiteboard and the more regimented form of the word processor.

Voice notes are more focused on the present moment. You can’t see what you’ve already said, and it’s more effort to go back to hear the early part of the note. They’re good for brainstorming.

The traditional notion of affordances covers how the design of an object affects a user’s expectations about how to use that object, before they actually use it. We’re expanding this notion to also include how an object’s design shapes the way it is used by a user.

Friday, January 26, 2024

Interactive storytelling: Fictional realism

This post is about the notion of "fictional realism", which I am using to mean a fictional account that is nonetheless meant to 1) accurately portray the time and place where it is set, and, optionally, to 2) focus more on this portrayal, than on presenting a story to the reader/viewer.

Examples of fictional realism include the TV shows "The Wire" and "The Sopranos", the movie "Casino", the novel "One Day in the Life of Ivan Denisovich", and the game "Attentat 1942" (Steam page). Most of these examples have a fairly strong storytelling focus, so don't fit the optional second criteria.

A work of fictional realism may be intended to convey the same kinds of details that a non-fiction work may convey, but to do so using fictional characters (or fictionalised versions of real people), in fictional situations (or fictionalised versions of real situations).

 

"Interactive storytelling" can be used for fictional realism. One of my interests is in using interactive storytelling for exploring a strong form of fictional realism, meeting the second criteria described above (focus more on the portrayal of what life was like, than on presenting a story to the reader/viewer), that presents what it'd be like to be a specific character in a specific situation. So that the player can learn about the player character, about what they do, how they do it, and how they react to situations. Like worker on a sailing ship in the 17th century spice trade. What was their work like? How did they perceive their job (exciting? a journey of exploration?). What were their relationships with the various other sorts of people on the ship?

Interaction could help place the player in the character's shoes, to help immerse them in the character's world. I'd like to use interaction to let the player experience what it's like to be that character. There is the dictum 'show, don't tell'. I want 'experience, don't show or tell'.

I have some ideas about how the interactive storytelling could work, such as to achieve this, though I won't get into such details in this post.

Fictional realism can be used in an educational context. Or be an enriching kind of entertainment. By giving the player interactivity, and letting them experience what it's like to be that character, we hope we can make a compelling way to experience fictional realism.

 

I think that behavior-psychology congruence is a core requirement for fictional realism, and I'll explore this in a future post. In brief, I want the player to control the character such that the character acts in a realistic way.

 

I've also written about the notion of 'strong storytelling', which we can think of as effective or good storytelling. Strong storytelling has a strong focus on plot, and moving the plot forwards. Thus it will tend to cut-out details that aren't relevant to the plot. Thus it would cut out the sorts of details I'm interested in, in fictional realism. The sort of 'day in the life' details.

Compared to strong storytelling, fictional realism is more like real-life. Real-life tends not to be like a story. In stories, all the details are there to serve the overall goals of the story, like its climax, conclusion, and themes. In real life, things happen, but it's just one thing after the other, and they aren't there to result in some climax and conclusion.  

There is, however, no reason why a work of fictional realism couldn't have a plot. It could. It's just that the fictional realism details will dilute the story details, thus making it a weaker form of storytelling.

Strong storytelling and fictional realism are just different forms, each with their own pros and cons.

Interactive storytelling: Behavior-psychology congruence

We wish to introduce the notion of a fictional-character's behavior being congruent, or not, with their psychology. This will help us to, in subsequent posts, look at how, in interactive storytelling, the player having control affects the storytelling.

We use 'psychology' to mean two things: the character's makeup and circumstances.

A character's makeup, is their nature, their personality, their character, and how they think about situations. Such details are a result of their nature, and nurture. How their character is shaped by their life experiences. It includes how their personality might be changed by brain damage or a brain tumor. Or how medications they are taking affect their personality.

By a character's circumstances, we mean what has been going on in their life. Perhaps they have had a stressful few weeks at work. Maybe a loved one died a few months ago, and they are going through grief. Or maybe they started a new relationship and they are happy as a result.

I don't think there's a hard-and-fast distinction between a character's makeup and their circumstances. These are just rough categories.   

In real life, a person's behavior is always congruent with their psychology (their makeup and circumstances). In fictional works, we almost always strive to make a character's behavior congruent with their makeup and circumstances, though we may fail to achieve this. So there can be a lack of congruence.

Behavior-psychology congruence doesn't just apply to "realistic" characters. It applies to all characters, even wacky and "out there" cartoon characters. Wile E. Coyote from the Warner Brothers cartoons wants to catch the Roadrunner, and sets up traps for this purpose. Despite many failures, he's never one to give up trying. The Roadrunner, in turn, likes running fast along roads, and seems to take joy in making the Wile E.'s traps backfire on him. These characters are not realistic, they're not at all like real coyotes and roadrunners. But Wile E.'s psychology is to want to capture the Roadrunner, and to setup traps to do so, and so Wylie's behavior is congruent with that.

If a character has a cartoony makeup then their behavior should be cartoony as well, and good writers will make sure their behavior is congruent with their makeup and circumstances.

If there were scenes where Wile E. was sincerely explaining to other characters that he has been vegan since he became an adult, because he believes no animals should be harmed, then this would not be congruent with his established psychology.

 

In fiction, a character's behavior may be incongruent with their psychology, because of poor writing, poor acting, or poor directing. We can imagine that a very inexperienced writer setting out to write a novel. Earlier in their draft they gave the main character a gentle personality, whereas later on in the draft they gave them an aggressive personality, where the author didn't realise this change had happened. Which leads to inconsistencies in how that character is portrayed, with no explanation in the novel of why the character is different.

We may have a philosophical objection to this talk of incongruence between a character's psychology and their behavior. If all we as viewers or readers see is the character's behavior, and we infer their personality from that, then it would seem to be impossible for there to be such incongruence. Any apparent incongruence would just seem to be incongruence because we didn't yet know enough about the character's psychology. The philosophical objection is that we can only know behavior, so behavior is what defines our picture of the character's psychology, thus /by definition/ there can never be incongruence between them. Any /apparent/ incongruence is simply because we have formed an incorrect picture of their pscyhology, by jumping to incorrect conclusions about it based on the prior behavior of theirs that we've observed.

If a real person's behavior seems incongruous with their psychology, then it is our understanding of their psychology that is wrong (or incomplete). But here we are not talking about real people, but characters in fiction, fiction that may be written by a beginner, or an untalented, author.

In fiction (novels especially), the author may explicitly describe aspects of a character's personality. This way, that character's behavior can be incongruous with the stated aspects of the character's personality, if the writer is inexperienced or otherwise not very good.

But even when the character's personality is only inferrable from behavior, it is still possible for the two to be incongruous. It can be possible to infer psychological traits from behavior, and so we may have two sets of behavior B1 and B1, which reflect psychological traits P1 and P2 -- and P1 and P2 may conflict. A character may be terrified by speaking in front of their class, one point, and yet later inexplicably be supremely confident speaking in front of a large group. We're not saying it's impossible for there to be such a transformation; we're talking about the case of a story that has not included any details explaining such a transformation. At least one of those two behaviors is therefore incongruous with part of the person's psychology.

 

We usually expect that if the character seems to be acting incongruently with their established psychology, that the story will provide an explanation of why. But it may not, and the incongruence may be a result of poor writing (or acting or directing).

 

In subsequent posts, we'll look at how interactivity, in interactive storytelling, can lead to incongruence between a character's behavior and their psychology. The basic idea is that if the player can control a character's actions, then those actions will tend to reflect the player's psychology, not the character's. Using the terminology of those future posts, we'll explain why behavior-psychology congruence is necessary for strong storytelling and fictional realism.

Thursday, January 25, 2024

Interactive storytelling: Environment and storytelling

In a video game, the environment includes the locations the player sees as they traverse the game world. The environment includes the objects they can interact with, such as objects they can pick up and look at, doors they can open, lights they can switch on, etc.

We'll look at the ways a game's environment can contribute to the storytelling in interactive storytelling like video games. Then we'll look at how effectively each of these ways can contribute to storytelling.

 

The setting

The game environment provides the setting for the story. A story about an ad executive living in New York City obviously is set in New York City. A story may lean heavily into its setting, and concern the nature of that setting -- like one that concerns the culture in New York City. Or the story's setting can be more of just a backdrop, where the same story could potentially be set in a number of different places, without changing much about it.

 

The stage necessary for story events

The game's environment may contain details necessary for enabling certain story events. A dense town or city, with suitably-small gaps between the rooftops enables a story event in which the protagonist is chased by several bad guys over rooftops.

 

Atmosphere and world building

An environment of moss and plant-covered ruins could contribute to a post-apocalyptic atmosphere. Posters plastered around a city, instructing the populace on how they should behave, could contribute to the world-building and atmosphere of story set in a fascist country. Dim lighting, with a yellowish hue, along with strange sounds, could give an alleyway an eerie atmosphere.


Characterisation

The environment can contribute to characterisation. If a character's house is neat and tidy (or very dirty, and messy) that will convey something about their personality. As could the paintings we find in their house, or the entries in the diary hidden under their pillow.


The main plot

The environment can contribute to the main plot. The space under a bed may contain a piece of evidence that conclusively shows who's guilty of a murder. A character finding it will be a major plot point.

 

Backstory

The environment can contribute to backstory. Backstory concerns past events, prior to the events the player is currently experiencing in the main story-thread. Those events could have occurred before the start of the main story thread. They could have happened a long time ago. Such distant backstory includes 'lore', that concerns historical details of the setting. Lore may be found in tomes that the player finds, or some runes written on some ruins.

We'll also take 'backstory' to include details that may have only recently happened. For example, halfway through the story the player might receive details about something that had happened a hour prior -- that is, an event that occurred well after the point in time where the story began. We'll call this backstory, too.

Environmental details that contribute to backstory include artefacts like diary entries, letters, memos, and books.

Audio-logs are a means to fill the player in on backstory that's used in video games. Audio-logs are sound recordings that the player can listen to. They may be physical objects, like a tape recorder, that the player can find and play. Such might be a sound recording of a diary entry or voice note. Or an audio-log recording might automatically start playing when it is 'triggered' by the player's actions -- like if the player enters a particular room, or perhaps opens a particular person's locker.

Audio-logs were popularised in the first-person shooter BioShock (2007), which used them as the main means to tell its story. Dear Esther (2012) and Gone Home (2013) are games entirely focused on exploring an environment and finding audio-logs, in order to piece together their stories. Those latter two games are key examples of the genre that came to be known as "walking simulators".

There are also what I've termed audiovisual-logs, used in games like The Vanishing of Ethan Carter (2014), where they only play a small role, Everybody's Gone to the Rapture (2015), and Tacoma (2017). Audiovisual-logs allow the player to see an audiovisual recreation of some past event, in the location where that event occurred. The player can walk around and through this recreation while it is playing.

 

(an audiovisual-log from Everybody's Gone to the Rapture)

(an audiovisual log from Tacoma)


The following links are to the original videos those gifs were created from. Everybody's Gone to the Rapture, and Tacoma.

Audiovisual-logs are the core means of storytelling in Everybody's Gone to the Rapture and Tacoma. In The Vanishing of Ethan Carter, they only play a relatively-small role[1].

[1] In that game, there are certain puzzles, each of which concerns finding out how a particular character died. At the completion of each one of these puzzles, the player is shown a cut-scene showing the full details of how the character died, and then after that the player can follow a floating light. When they get to the light's resting place, they can see a short audiovisual-log of what came next, after the overall happening they've just seen the cutscene of.

For more details on audiovisual-logs, see the audio-logs link, above.

There's what I've termed "frozen-moment-logs". The only game I know of that uses these is The Vanishing of Ethan Carter (2014). The player will come across places where they can unlock a frozen-moment-log, consisting of a static 3D image of some past occurrence (backstory) there, that the player can walk around and view from different angles. In that game, these logs appear as part of larger puzzles.
 

(a frozen-moment-log from The Vanishing of Ethan Carter)

 

Another technique, where the environment contributes to backstory, is what I'm calling "Inferred Backstory". This is where environmental details enable the player to infer some past events.

In a post-apocalyptic game, the player may come across a dilapidated house, and in one of its bedrooms find two mummified corpses in the bed, frozen in an embrace. On the bedside table may sit a framed photo of a happy couple, along with an empty bottle of sleeping-pills.

From these details, the player may infer that the couple was once happy, and loved each other, but ended up finding their circumstances untenable. They may imagine the couple coming to this realisation, taking the sleeping pills, and tearfully embracing each other in the bed as they awaited their fate.

Inferred backstory is often like a little puzzle, where there are clues and the player infers the backstory from them. Usually it's a very simple puzzle.

'Inferred backstory', as we are using the term, concerns any cases of where the player infers prior details from present-moment details. That can include when a player immediately and effortlessly infers some prior details, including details that recently occurred. E.g. they're in a forest and come across some fresh large-animal droppings. They'll immediately infer there was recently some kind of large animal in this place. Some other examples where the player will make immediate and effortless inference: if the player came across the charred remains of a fire, or doors that been broken open.

And of course, by inferring details that have recently happened (e.g. a large animal being here) we may also infer present-moment details -- e.g. that the large animal may be nearby us right now. 

Carson[2] calls these cases of inferred backstory "cause and effect" vignettes, and the above description of them is based on his article.

[2] "Environmental Storytelling: Creating Immersive 3D Worlds Using Lessons Learned from the Theme Park Industry", by Don Carson, March 2000


To effectively communicate story-relevant details

This section draws a lot from Carson.
 
The environment should be designed to effectively communicate important details to the player. Details like where the player is, what sort of place it is, and where they should go next. Usually, we want the player to be able immediately determine such details.

These kinds of concerns don't, of course, only apply to the environments in games. They also apply in movies, TV shows, theatre, and theme park rides. Think set design.

Here are some details that we want the environment to communicate to the player. Which details in a location are the most important story-relevant ones. The atmosphere of a location, and the kind of place it is. What objects and features are important for the player to be aware of. And where the player should go next.

The following can aid in that communication. We can draw attention to important details by how we arrange the objects and features in that location, and by how those objects and features are lit. And by not including too much detail in the location, especially detail that's of little relevance to the story. We don't want to confuse or overwhelm the player.

Contrast can be used to heighten qualities. For example, if we make the player crawl through a narrow passageway to reach a cave chamber, that can heighten the feeling of how large that chamber is. Or, if we want the player to feel that the temple in the forest is a pristine place, we can make them experience a disordered space (like thick jungle) before they find it.

The environment can be designed to guide players as to where they need to go next. For example, in a dark area, having a well-lit large object in one corner, that the player will want to investigate, where this will take them to the exit from this location, to the next place they need to go to.

Many of the ways that the environment can contribute to storytelling are showing rather than telling. Rather than explicitly describing the atmosphere and world building of an environment (perhaps by a character commenting on them), environmental details can show them. With inferred backstory, the player draws their own conclusion about what happened.

And, as Carson points out, the player being able to discover details like artefacts (letters, memos, audio-logs), and inferred backstory, themselves, can be a more enjoyable experience for them than them simply being told those details. (That discovery is part of the gameplay, so the enjoyment comes from the gameplay. It doesn't come from the storytelling).


Summary of how the environment may contribute to storytelling

To summarise, the environment may contribute to the following elements of a story:

  • The setting (e.g. NYC)
  • The stage enabling certain events (e.g. rooftops for a chase sequence)
  • Atmosphere and world building (e.g. moss-covered ruins, strange sounds)
  • Characterisation (e.g. a character's messy room)
  • The main plot (e.g. evidence of who the murderer is).
  • Backstory (e.g. diary entries, audio-logs, audiovisual-logs, frozen-moment-logs, and inferred backstory like the corpses in the bed).

And there are techniques that can be used to effectively communicate such elements of the story. For example, to highlight the size of large cave chamber, make the player crawl though a small space to get to it.

 

Environmental storytelling

The reader may have noticed that we started this post with the heading "Environment and Storytelling" not "Environmental Storytelling". For these are two different things. "Environment and storytelling" refers to all of the ways that the environment contributes to storytelling. Whereas "Environmental storytelling" refers only to a subset of them.

"Environmental storytelling" is a commonly used term to describe storytelling in games. It is used to refer to only cases where the environment contributes to backstory.

For some people, "environmental storytelling" refers to all of the ways that environmental details can contribute to backstory: diary entries, letters, memos, audio-logs, audiovisual-logs, frozen-moment-logs, and inferred backstory (like the corpses in the bed).

For other people, "environmental storytelling" has an even narrower meaning. For them, it only refers to cases of inferred backstory, and does not refer to cases like diary entries, letters, memos, audio-logs, audiovisual-logs, and frozen-moment-logs.

Both definitions of "environmental storytelling" involve environmental details that are leftover from, or reflections of, the past (in the broad sense of "prior to the current moment", not just things in the more distant past). Obviously that's the case for Inferred Backstory like the corpses in the bed. It's also the case for things like diary entries, letters, memos, audio-logs, audiovisual-logs, and frozen-moment-logs.

In this post, we'll use "environmental storytelling" in its broader sense, which includes all the ways that environmental details can contribute to backstory.


Environment and storytelling in movies and TV

Movies and TV use the story's environment (sets, and on-location shots) for storytelling purposes, and of course have done so since well before video games came onto the scene. Movies/TV and games mostly use their environments in similar ways for storytelling, except for some of the ways backstory is used.

(As well as movies/TV, theatre, theme parks, and theme park rides, all use the environment to contribute to storytelling. But this post will focus only on video games, movies, and TV).

Movies/TV can use environmental details to contribute to:

  • The setting (e.g. NYC)
  • The stage enabling certain events (e.g. rooftops for a chase sequence)
  • Atmosphere and world building (e.g. moss-covered ruins, strange sounds)
  • Characterisation (e.g. a character's messy room)
  • The main plot (e.g. evidence of who the murderer is).
  • Backstory ('environmental storytelling', e.g. diary entries, audio-logs, audiovisual-logs, frozen-moment-logs, and inferred backstory like the corpses in the bed).

One difference, regarding the techniques that can be used to effectively communicate such elements of the story, is the following. In games, the environment can be designed to guide players as to where they need to go next, which obviously doesn't apply to movies/TV/cutscenes. Whereas in movies/TV/cutscenes, the cinematography, lighting and set-design can guide where the viewer looks during scenes.

In a moment we'll get into some differences in how backstory (environmental storytelling) is used in movies/TV compared to games.


Why is environmental storytelling more common in games?

The environmental storytelling techniques (contributing to backstory) are either less common or not found at all in movies and TV. This includes things like diary entries, audio-logs, audiovisual-logs, frozen-moment-logs, and inferred backstory like the corpses in the bed.

There's a clear difference between games and movies/TV in this respect, as games often heavily rely on such environmental storytelling. In games, it's often the main form of storytelling that's used. In movies and TV, it tends to be a supplemental form of storytelling.

What is the reason for this difference?

 

Cost and ease

Movies and TV are focused on "cinematics" -- visual portrayals of actors and environments. Whereas game development companies have a primary focus on gameplay. In most cases, it requires additional resources for the game developer to be able to include "cinematics" in their games. And smaller game developers may not have the skills and/or budget for this.

Compared to cinematics, it's cheaper and quicker to add artefacts like diary entries and letters to a game. Audio-logs require hiring voice actor(s), but a game might have a total of less than 1 hour of audio-log audio, which can be recorded quickly and thus doesn't require the expense of hiring a voice-actor for a long period of time.

On the other hand, high-quality animated cutscenes take more time to develop, and require animator(s) and voice actor(s). FMV (full motion video) or motion-capture for cutscenes requires (real or virtual) sets, and hiring actors. Motion capture requires specialised equipment (either purchased or hired) and the skills to turn it's output into animation. I imagine that the time and money required for cutscenes is similar to the time and money required for audiovisual-logs.

 

Visual mediums excel at visual-action storytelling

Here's a reason that environmental storytelling is used less in movies/TV. Movies and TV are primarily visual, and excel at visual-action storytelling. This is the visual depiction of action (what I referred to as 'cinematics', above). By 'action', I don't mean just things like fights and shootouts, as you'd find in action movies. I mean 'action' in a general sense, of the visual details that can be captured by a video camera. This could include characters simply talking to each other, or a tense scene where two characters are sitting in the same room, each silently trying to ignore the other.

In constructing visual action, all the tools of acting (performance), cinematography, editing, and so on, are brought to bear. Visual-action storytelling is a strong form of storytelling.

If the movie or TV show contains backstory, it's usually presented through visual action -- that is, through a flashback. Environmental storytelling also conveys backstory, but it mostly does /not/ do so through visual action. In a moment we'll examine this, and see why environmental storytelling tends to thus be a weaker form of storytelling[3].

[3] Before leaving this topic, we can note that one kind of use of inferred backstory in movies/TV is where the 'clues' are amongst the background details of scene(s), that might only briefly be in shot. Most people watching the movie/show wouldn't notice them, and they're there as interesting details or "easter eggs" for repeat or careful viewers. And/or as details designed for other viewers to subconsciously take in.

 

Games excel at visual-action gameplay, but not visual-action storytelling

Like movies and TV, graphical games are also a visual medium -- with the addition of gameplay. They excel at incorporating gameplay into visual-action. Consider first-person shooters, platform games, racing games, etc. However, games do not excel at incorporating gameplay into visual-action storytelling.

That's why, if a game is to include visual-action storytelling, that's done in a cutscene (effectively a little movie) that is separate to the main gameplay. Cutscenes are usually non-interactive, though they can include simple forms of interactivity like Quick-time Events (QTEs) and "Choose Your Own Adventure"-style choices. We don't have a way of integrating the player having control over a character and movie-like visual action.

In visual-action storytelling, all the tools of acting (performance), cinematography, editing, and so on, are brought to bear. But, during gameplay, when the player has moment-to-moment control over a character (like the character's movements), it's not possible to strongly exploit those tools of visual-action storytelling.

Here we'll turn our attention to cutscenes. These aren't a form of environmental storytelling, but looking at them will help convey the point that games are poor at integrating interaction/gameplay with strong visual-action storytelling.

In this earlier post, I looked at the types of cutscenes in games, and the ways interactivity and cutscenes can be mixed together.

The standard non-interactive cutscenes are strong visual-action storytelling, but the involve no interactions at all.

In QTE-and-Choice cutscenes, simple player inputs are incorporated into the strong visual-action storytelling. These are interactions like QTEs (Quick-time Events), and choices where the player can choose from a small menu of options, such as dialogue choices or choices about which course of action to take (save Billy or save Jenny, from the oncoming horde of zombies).

QTE-and-Choice cutscenes are strong forms of visual-action storytelling, however as far as interactivity goes, they contain weak forms of interactions/gameplay. The player has very limited control over a character.

During-gameplay cutscenes are how most of the cutscenes are handled in games like Half-Life 2 and the Dishonored games. The player still has some degree of control over their character, while some scripted events occur around them. The player may be able to freely turn their head to look around, or that plus the freedom to move around (within the constraints of their environment, like brick walls etc).

As that earlier post argued, During-Gameplay cutscenes have stronger forms of gameplay but weaker forms of storytelling than non-interactive cutscenes.

So none of the kinds of cutscenes involve strong visual-action storytelling along with strong interaction/gameplay.

Returning to environmental storytelling, audiovisual-logs, like in Everybody's Gone to the Rapture, present visual action, but since it is visual action that the player can walk around and through, the tools of cinematography and editing can't be brought to bear on it. Like with during-gameplay cutscenes, the player is reduced to a spectator of the events.

In summary, even though visual-action storytelling is the strongest form of storytelling in a visual medium, games are not very suited to it. To use visual-action storytelling, games have to either include non-interactive or Q&C cutscenes, which clash somewhat with the interactive nature of games, or D-G cutscenes which have more interaction but weaker storytelling.

 

Environmental storytelling is more compatible with gameplay

That it's difficult to insert gameplay into visual-action storytelling is a reason why environmental storytelling is so often used in games.

Environmental storytelling can form part of the gameplay.

In it, the player explores the environment, and finds artefacts (like diary entries, letters, memos, audio-logs, and audiovisual-logs). A player who is not looking carefully might miss some of the artefacts. That exploration and finding is part of the gameplay.

Audio-logs can be listened to while the player is still engaging in gameplay, where they're moving around and continuing to explore.

When an audiovisual-log is playing, the player can move around, and within, the recreated visuals. The player may move around to find a good view of all the action, to follow a particular character around, or to be closer to one particular conversation (if there are multiple happening at the same time). The audiovisual-logs in Tacoma allow the player to scrub back and forth in the log, to find pertinent details.

Environmental storytelling (e.g. diary entries, audio-logs, audiovisual-logs, frozen-moment-logs, and inferred backstory like the corpses in the bed) can present a kind of puzzle to the player. Something unknown, in the past, has happened, and environmental storytelling provides some clues to its nature. It's like each bit of environmental storytelling is a puzzle piece and the player needs to figure out how they fit together, to see the overall picture of what happened. This is a very common pattern in games that heavily use environmental storytelling. For example, in BioShock, Gone Home, Everybody's Gone to the Rapture, and Tacoma. (Note that this can be done in other mediums, like movies/TV, and novels. But games more often make use of such).

So all these forms of environmental storytelling are more compatible with gameplay. Environmental storytelling brings storytelling into the main gameplay parts of the game, rather than having it be fairly distinct from gameplay, as with cutscenes.

 

Environmental storytelling is generally a weaker form of storytelling

While environmental storytelling allows storytelling to be better integrated with the gameplay, it unfortunately involves a weaker form of storytelling.

We've stated that visual-action storytelling is the strongest form of storytelling in a visual medium. This, and some subsequent sections, look at why environmental storytelling is weaker than visual-action storytelling.

To clarify, I'm saying they're generally weaker forms of storytelling, not that they're bad. They can be quite effective. However, when they are heavily relied upon, like they often are in games, that will tend to weaken the overall storytelling.

Environmental storytelling conveys backstory. I suggest that storytelling that focuses on the main story thread is, generally speaking, stronger storytelling than that focusing on backstory. This is debatable, but I think it's the reason that backstory tends to be used sparingly in most movies and TV shows.

And, if backstory is used, it's more strongly presented with flashbacks, which present the details through visual action. Environmental storytelling mostly does not convey details through visual-action. So these forms of environmental storytelling are generally weaker forms of storytelling.

Audiovisual-logs, such as used in Everybody's Gone to the Rapture, and Tacoma, are the only form of environmental storytelling that's via visual action (during-gameplay cutscenes also use visual action, but they are, like normal cutscenes, not a form of environmental storytelling). Though, as mentioned before, audiovisual-logs can't make use of cinematography or editing, as the player is still in control of a character. The camera needs to be suited to moving a character around. Editing involves cutting out parts of the action, showing only the details before and after it, and that doesn't mesh well with gameplay. Still, audiovisual-logs have the potential to be fairly strong forms of storytelling -- similar to flashbacks.

All the other forms of environmental storytelling are not presented through visual action. They may be textual artefacts like diary entries and letters. And auditory ones like audio-logs.

What about inferred backstory, like the corpses in the bed? Here the backstory is told using visual details (from which the player infers past events). But those are present-moment visual details; the backstory being conveyed is not shown visually. It doesn't, for example, show the visual action of the couple taking the sleeping pills and getting into the bed.

The player coming to their own realisation of what happened, through the environmental storytelling, is something that a number of players enjoy. This is part of the appeal of environmental storytelling. However, I don't consider this to add a lot to the strength of the storytelling.

 

We can draw a distinction between artefacts that provide narrative details, and those that don't, but which convey narrative-relevant information. We can call these narrative and non-narrative artefacts.

Artefacts like diary entries and letters can be either narrative or non-narrative. They can convey narrative backstory, like a diary entry recounting an event that occurred. A non-narrative example is a diary entry that said "Bought new clothes at shops" and then went on to list the items of clothing. This is just some information, though from it the player may infer some narrative-relevant details (that the person who wrote it cared a lot about their neighbour, who they bought a number of items for).

(To turn to textual mediums for a moment, epistolary novels tell a story through letters sent between characters. "Epistolary novels" is also often used in a broader sense, covering stories told through any kinds of artefacts, such as diary entries, newspaper clippings, or other kinds of documents. Some well known examples are "Carrie", by Stephen King (1974), "Posession", by A. S. Byatt (1990), and the Adrian Mole series, by Sue Townsend (1982-2009). Such novels show how something akin to environmental storytelling can be used quite successfully for storytelling. I believe that using artefacts in this way is much more suited to textual mediums, and much less so in visual mediums like visual video games. This is because they are textual artifacts, which means they fit in with the textual nature of novels. Whereas, showing textual artefacts on screen for the viewers to read, or having them read out by a character, is not as suited to the visual-action character of visual mediums.)

I contend that such non-narrative information is, generally speaking, a weaker way of conveying details that are part of the narrative.

Audio-logs may simply be recorded versions of artefacts like diary entries, which may or may not be focused on narrative details. Narrative audio-logs might, for example, contain a recording of when some bad guys stormed a character's office, and took them hostage. That is, an audio-log may contain a recording of an event that happened. Audiovisual-logs will usually convey narrative details.

Inferred backstory (like the corpses in the bed) also conveys narrative details.

Here's a summary of the ways the environment may directly contribute to the narrative and those that do so indirectly.

These environmental storytelling techniques directly contribute to the narrative. Those that contribute to:

  • The main plot (e.g. evidence of who the murderer is).
  • backstory (with narrative artefacts)
    (e.g. certain of: diary entries, audio-logs, audiovisual-logs, frozen-moment-logs, and inferred backstory like the corpses in the bed).

Environmental storytelling techniques that only indirectly contribute to the narrative/plot:

  • The setting (e.g. NYC)
  • The stage enabling certain events (e.g. rooftops for a chase sequence)
  • Atmosphere and world building (e.g. moss-covered ruins, strange sounds)
  • Characterisation (e.g. a character's messy room)
  • non-narrative artefacts conveying backstory (e.g. diary entries, audio-logs).

And where environmental details are used to effectively communicate story-relevant details (e.g. to highlight size of large cave chamber, make player crawl though a small space to get to it).

Audio-logs and inferred backstory that convey narrative details, don't do so through visual action, so they are generally weaker forms of (back)storytelling than flashbacks, which do.

To help reinforce these points, we can note that movies and TV could employ an equivalent of audio-logs. There could be scenes where a character is listening to a sound recording. Or where the audience hears a character reading out a diary entry or letter. The visuals might show the character driving in their car as they listen to the audio recording, or walking around their house while reading the diary entry or letter.

Because the focus of such scenes would be on the audio, the visuals would essentially serve as a background to the audio. The visuals couldn't convey any substantial narrative details, as that would distract the viewer from the audio. So thus it'd be weaker storytelling, because it's not focused on visual action.

That such scenes are, as far as I'm aware, rare in movies and TV is, I suggest, because they're a weaker form of storytelling.

 

Pacing

Another reason flashbacks are a stronger means of conveying backstory is that their visual-storytelling benefits pacing. Pacing is an important part of storytelling. A story may concern some events that take place over a week, but they -- as represented in a movie or TV show -- may do so through only a couple of hours of visual action. They condense the details. They filter out the narratively-irrelevant details, to leave a more narratively-concentrated end-product. If the narratively-interesting details are padded out with a lot of irrelevant details, it will slow down the pacing, and dilute the narrative.

A heavy focus on environmental storytelling, as is often found in games, negatively affects the pacing. It slows the pacing.

Imagine if, during a movie or TV show, there were several occasions where the main character read a full-page diary entry, letter, or memo. Where, each time, the shot of them reading it lasted long enough for the character to read the full text. (Their reading of it might be conveyed with a voice-over representing the character's inner voice, as they read the page). That would, I think, make the pacing feel quite strange. It'd be jarring, to go from the normal speed of the pacing in a movie or TV show, to these really slowed-down segments.

(And this is on top of the fact that gameplay itself, also slows the pacing of the storytelling. Because there'll be long segments of gameplay-focused action in between each narrative-focused segment, and the gameplay is usually not conveying much vis a vis the game's narrative).

 

Limitations to narrative complexity of audio-logs and inferred backstory

I mentioned earlier that audio-logs and inferred backstory can convey narrative details. They are, however, quite limited in their ability to do so.

An individual audio-log or inferred backstory instance can only present fairly short and simple narrative details. And there are fairly strong limits on how substantial/complex the overall narrative details from the totality of the audio-logs/inferred-backstory.

With inferred backstory (like the corpses in the bed) there's visual details from which the player can infer past events. But it's difficult to convey a lot of detail in this fashion. The player has to infer -- figure out -- the events from the clues. It would be too complex for the player to infer more than a simple set of details. For one thing, it would be very challenging to indicate the sequence of the events.

Audio-logs that convey narrative events (like a recording of a kidnapping) need to be relatively brief. Audio-logs are designed such as to not get in the way of gameplay, while the player is listening to them. While listening, the player can still continue to explore around, and possibly even take on some enemies. While listening, they'll be watching visuals (of their environment) that are likely pretty unrelated to the content of the audio-log. Which means they face distractions while listening to the audio-log. They won't usually be paying 100% attention to it. So the player would have trouble being able to properly take in longer audio-logs. Also, if audio-logs were lengthy, then while the player is still listening to one, they might get into a situation (like a major fight with multiple enemies) where 1) they can't focus at all on the audio-log and 2) the audio-log might distract them from the gameplay. (Though this latter point might be addressable by a means to pause audio-logs).

All of the narrative audio/audiovisual -logs, and all of the inferred backstory, in a game, could together contribute to the overall narrative. However, there might be an average of, say, 10-30 minutes of playtime between each audio-log or inferred backstory that the player comes across. During which time the player is undertaking gameplay. This places fairly heavy demands on the player's memory, and the game designer can't expect that the player will be able to recall a lot of the specifics presented in earlier audio-logs and inferred backstory instances. Therefore, individual audio/audiovisual -logs and inferred backstory have to be designed to be somewhat stand-alone. They can't be narratively connected together in intricate ways.

So there are limits on how substantial/complex the overall narrative, conveyed through multiple audio/audiovisual -logs and/or inferred backstory instances, can be. One other reason for this, that applies to a number of games, is that different players may come across the audio/audiovisual -logs and/or inferred-backstories in somewhat different orders.

In contrast, visual action can convey a lengthy sequence of events (like, a whole movie's worth).

 

Contrived nature of audio-logs

Audio-logs are also somewhat contrived. Why were these recordings made in the first place? It might make sense if they're like diary entries or voice notes that a character made. But it doesn't if they're a recording of a narratively-significant event that happened. Who thinks to switch on a recorder just before a significant event? And why are the audio-logs found in various different places in the environment? Often it doesn't make sense. So this contrivance is another reason why you wouldn't have characters in movies/TV finding such audio-logs.

 

Audio/audiovisual -logs are only compatible with certain story settings

And both audio-logs and audiovisual-logs are also only compatible with certain kinds of stories. Neither of them could appear in a realistic story set in the 1700s. Audiovisual-logs, further, need to be in a story that contains elements that are magical, supernatural, alien, or high-tech.

 

In summary, environmental storytelling consists of generally-weaker forms of storytelling, and the reason they're often used in games is that they fit better with gameplay than do other means of storytelling.

 

Where next?

Given the storytelling limitations of the environmental storytelling we've looked at, are there other ways environmental storytelling could be used, that might be better for storytelling?


Further exploring the use of audiovisual-logs

Audiovisual-logs have potential because they can present narrative details through visual-action. The player sees the characters and events being portrayed. Yet, I'm only aware of three games that use them. One (The Vanishing of Ethan Carter -- see below) barely uses them, and the other two (Everybody's Gone to the Rapture, and Tacoma) don't lean much into the visual action.

I expect further exploration of the use of audiovisual-logs in future works. Especially since it should become cheaper and easier for game developers to create them. Motion capture and animating the captured data, is only going to get cheaper and easier over time. AI will likely play a role in that. AI will likely make it quicker and easier to record and process the motion capture data, and to generate animated models from it.

 

Audiovisual-logs with full character-detail

In The Vanishing of Ethan Carter (2014), the audiovisual-logs have a quite minor role. Each one is quite short -- probably 5-10 seconds long -- and there's only a few of them in the game (probably 5-10). In these you see the representations of the characters like you would in a cutscene, except you are there in the scene and can move around and look around while you're watching it play out.

In Everybody's Gone to the Rapture (2015) and Tacoma (2017), the audiovisual-logs are the main source of the storytelling, but at the same time, those games don't lean much into the visual action in the audiovisual-logs, because they show highly abstracted representations of the characters.

In Everybody's Gone to the Rapture (2015), the characters are represented by glowing, dancing points of light. It makes it difficult to even get a clear view of the characters and their movements.

(A still from an audiovisual-log in Everybody's Gone to the Rapture. From this one-minute video. It's from early in the game.)

In Tacoma (2017), you can see the shapes of each of the characters, but those shapes are just filled in with a single colour, where each character has their own colour. These character representations are more 'readable' than the ones in Everybody's Gone to the Rapture. However, in both of these games, you can't see any details of the character's faces. Without facial details, the audiovisual-logs are missing an important part of visual action.

(A still from an audiovisual-log in Tacoma. From this trailer for the game)


Those two games may have used highly abstracted representations of the characters for technical reasons (e.g. for performance). But whatever the reasons were, it seems clear to me that audiovisual-logs showing full details of the characters are far superior. The characters might be shown as semi-transparent, to indicate that you're seeing the details of a past event. Current gaming hardware should be able to handle showing such details, without any problem.

We can call these 'audiovisual-logs with full character-detail'. It's something I expect to see explored more in the future.

 

Audiovisual-logs about ongoing or future situations

Existing audiovisual-logs convey backstory. In The Vanishing of Ethan Carter, Everybody's Gone to the Rapture, and Tacoma, the player enters a situation where a series of events has taken place in the past, before they arrived[4]. The player's goal is to try to understand what happened. 

[4] or not quite so, for one of these games. But to explain would be a spoiler.

Rather than conveying details that happened a while prior to the current moment, audiovisual-logs could be used to convey events that happened only a short while ago, events that are happening now but in a different location, or even future events that haven't happened yet.

So despite what I've said elsewhere in this post, environmental storytelling is not inherently restricted to conveying backstory. What it can't convey are the generally stronger, from a storytelling perspective, details that are happening here and now where the player is.

 

NPC-perspective audiovisual-logs

Audiovisual-logs present some past situation involving some NPCs. Instead of presenting them from the player's point of view (POV), as is normally done, they could be presented through the POV of one or more of those NPCs.

The player could freely switch between the POVs of the different NPCs in the situation. Or perhaps there could be a puzzle element to it, where the player has to do something to unlock each of the different NPCs' POVs.

 

Combining inferred backstory with audiovisual-logs

The Vanishing of Ethan Carter contains puzzles where the player has to discover clues about some past events, and then put those clues in the correct temporal order. This is the player inferring some backstory from the clues. At the end of this process the player is shown a brief audiovisual-log.

There are other possible ways of combining inferred backstory and audiovisual-logs, that are yet to be explored. For example, the following.

The player comes across some clues to some inferred backstory, and once they've seen them all, the game could play an audiovisual-log of the backstory details.

For this to work, the game has to know that the player has noticed those clues. The game could have a 'look' verb, that the player could use on objects. And if the player has 'looked' at each of the clues, the game could play the audiovisual-log.

Or, to make it more challenging for the player, them noticing the clues might require them to apply a separate 'is clue' verb to each of the items they think are clues. That way, a player who just looks at every available objects (as players tend to do in games) would not thereby automatically find all the clues.

In The Vanishing of Ethan Carter, the player is required to put the clues in the correct temporal order. But there could be alternatives to this. Like correctly linking each clue to the person that left it behind.


Interaction within visual action, to enhance the storytelling

My primary interest is in storytelling, and how gameplay, or interaction, could be used to enhance the storytelling.

In a visual medium, visual action is the strongest way to convey storytelling details, so I am interested in how interaction can be used within visual-action storytelling, to enhance that storytelling.

I mentioned earlier that it is difficult to include gameplay within visual-action storytelling. With most of the means of environmental storytelling, the gameplay sits outside of full visual-action storytelling. Currently the only options for interaction in visual-action storytelling are QTEs, choosing from a small menu of actions (e.g. which of the two people do I try to save?), and during-gameplay cutscenes (which are not a form of environmental storytelling, just like cutscenes are not a form of environmental storytelling).

During-gameplay cutscenes (see post about the different kinds of cutscenes) are one way of combining gameplay and cutscenes. So far they have been used in relatively-few games, and their use could be explored further.

I think there are a lot of unexplored options for using interaction within visual-action storytelling, and it is these that I am primarily interested in exploring. But that is a topic for another post.

Wednesday, January 24, 2024

Interactive storytelling: Types of cutscenes in games

Video game cutscenes are like little movies, played between gameplay segments. They, like movies and TV show episodes, are a sequence of one or more scenes, where each scene is a sequence of one or more shots.

There are 1st-person shots and scenes, which show the action from the perspective of a particular character. And there are 3rd-person shots and scenes, which don't show the action from the perspective of a particular character, but rather the view from where the camera is located and facing[1]. 

[1] there are also scenes rendered in Virtual Reality, in VR games and movies. These add an extra dimension of immersion.

In movies and TV shows most shots are 3rd-person shots, with the occasional 1st-person shot thrown in. In games, 1st-person (cut)scenes are more common. For example, in the Metro games, the Dishonored games, Halo 3: ODST, Halo: Reach and Cyperpunk 2077. The reasons for this needn't concern us in this post.

Video game cutscenes can include interaction. This post looks at such cutscenes, and how the interactivity in them affects the strength of the storytelling in them.

 

Visual-action storytelling

Here we introduce the notion of "visual-action storytelling". It will help us discuss the affect of interactivity on cutscenes in games.

The primary form of storytelling found in movies and TV is "visual-action storytelling", in which we see the story events occur. By "action" I don't mean in the sense of an "action movie", with weapons, fights, and chase-scenes. I don't mean something that has to be highly dynamic. The "action" is just what the viewer sees unfold over time, and that includes very still scenes where very little is happening.

Visual action is made up of components like the actor's performances, the sets, the cinematography, and the editing.

If there was a movie that was just 1.5 hours of a character recounting a story to some others, where all the footage was just of the storyteller and their audience, this would be a very weak form of visual action. It would contain visual action of the storyteller and their audience, but no visual action of the story being told. In that situation, the "real" story details are being told, not shown.

Whereas if instead of focusing totally on the storyteller and their audience, there were also visual-action scenes showing the events of the story that's being recounted, that would be a stronger form of visual action. In that version, the movie's viewers would be shown the visual action of the story details.

 

Interactivity in visual-action storytelling

Normal cutscenes are non-interactive. In video games, some degree of interactivity can be introduced into the visual-action storytelling of cutscenes. Though, as we'll see, it's only limited forms of interactivity, and their addition can lessen the strength of the visual-action storytelling.

The following looks at the different ways interactivity can play a role in the visual-action of cutscenes.
 

QTE-and-choice cutscenes

Two forms of interactions that may occur within cutscenes are QTEs (Quick-time Events) and making choices.

Imagine a story-focused game, where, after much journeying the player makes it to the castle, and gains an audience with the king at the king's court. There's a cutscene of the player character entering the court and talking to the king. As the cutscene continues, the situation goes south, and a fight breaks out between the player's character and the guards.

One way that fight could be implemented would be with normal gameplay, where the player moves their character about, attacks with their weapons, and blocks with their shield.

Alternatively, the fight could be a continuation of the cutscene. That way, the fight could be made to look very cinematic. It could contain dedicated character animation and performances, be 'shot' in a cinematic way, use various camera angles and movements, and be edited to look spectacular. However, the player would lose direct control over their character.

Imagine a moment during the cutscene where an enemy swings their sword at the player, and the player dodges to the left, just in time to avoid the blade. A QTE (Quick-time Event) could be used for making the character dodge the blade, to add some interaction into the cutscene.

Here's how the QTE would work. As the enemy goes to swing their sword, time will slow down a bit, and the screen will show a prompt, telling the player to press left on their joystick. The player would (usually) have a small window of time to enter that input. If they enter the correct input within the time limit, they succeed at that QTE -- and successfully dodge the sword -- otherwise they fail at it. The penalties for failure depend on the game and the particular situation in it. The penalties could be minor, all the way up to player's character dying.

As another example, during the fight the player could be grabbed from behind by some of the guards, and there could be a QTE prompt telling the player to quickly tap (and keep tapping) one of the controller buttons. If the player taps the button fast enough, they'll escape the grips of the guards.

Remember that in both of these examples, the action would be shown in a cinematic fashion, just like in a movie.

Other types of inputs used in QTEs include moving a joystick in a particular path (e.g. in a full circle, or to right and then anticlockwise, to up). And on touch-screen devices, the player may need to tap on hotspots on the screen, or slide their finger along a path (e.g. a circular path). The QTEs can be action-mirroring inputs.

QTEs are divisive. Many players do not like them. They see QTEs as a fairly pointless attempt at including a bit of gameplay here and there in cutscenes. They may not find QTEs enjoyable.

One thing that I think all could agree on is that QTEs are a fairly simple form of input. The game tells the player what to do, and when to do it, and the player just needs to follow the instructions properly. Performing the input(s) for a QTE is a fairly rote task. The player doesn't have much agency when it comes to QTEs. (We'll talk about choice below, and choices may be implemented as part of the QTEs, so in this sense they can provide some agency).

Choices provide another means for there to be player-interactions within cutscenes. At points in cutscenes the player can be presented with a choice from a small menu of options, and -- like with QTEs -- there will usually be a time limit for making the choice. The choice might be between a small number (2 to 4) of dialog options or action options.

As an example of choices between action options, there might be a oncoming zombie horde, where the player has to quickly choose between saving one of their companions, Joe or Jackie. Especially in timed action choices, we may consider these kinds of choices to be a kind of QTE.

We can call the kinds of cutscenes just described "QTE-and-Choice cutscenes" (or "Q&C cutscenes" for short).

The addition of QTEs and choices don't weaken the strength of the storytelling in these cutscenes. They're just like normal cutscenes, except for the introduction of some basic forms of interaction (QTEs and choices).

 

During-Gameplay Cutscenes

There are kinds of cutscenes that can play during the gameplay scenes in games. With these, the player still has some control over their character, while, at the same time, cutscene-like details play out around them. We'll call them "during-gameplay cutscenes" (or "D-G cutscenes" for short).

Half-Life 2 is the classic example of a game that includes during-gameplay cutscenes. Almost all of its cutscenes are of this sort. The Dishonored games are another example of games with during-gameplay cutscenes. The Metro games also contain a fair number of D-G cutscenes.

Half-Life 2 is a first-person shooter in which some hostile aliens have invaded earth. The game starts with you disembarking a train, and soon after you see an alien guard, holding a large baton, shove a person they're overseeing. As you walk past some of the humans there, they say things to you. Soon, you reach a checkpoint, where you're taken away to an interrogation room, where the helmeted alien who led you there takes his helmet off, to reveal themselves to be a human -- a person that your character knows. The entire time all this unfolds, the player has control over their character's movement (though of course where they can move is constrained by the walls etc in their environment), and can control where their character is looking.

Such during-gameplay cutscenes consist of scripted dialog and events (animated occurrences), that are triggered by the player's actions. For example, an NPC saying something to the player might be triggered by the player walking close-enough to the NPC. Or the player performing actions, like picking up a gun on the ground, might trigger some during-gameplay cutscene.

The cutscene elements of D-G cutscenes can consist of any scripted details. They could be as simple as some characters talking. D-G cutscenes can also be of any length. Some may quite brief, lasting only a couple of seconds.

One form of D-G cutscene is where the player's character and NPC(s) are having a conversation as they walk along. These are epitomised by Naughty Dog's Uncharted games and The Last of Us games. Yahtzee Crowshaw calls these "Walk and talk sequences".

We wouldn't normally think of these Walk-and-talk scenes as kinds of cutscenes, but they match the criteria laid out above, of being scripted details that play out during gameplay. In this case, there is the scripted movements of your NPC companions, and the scripted dialog said by your character and the NPCs.

These conversation-focused D-G cutscenes don't have take place while the characters are walking. They can occur during any kind of activity the characters are undertaking, like climbing a structure, or fighting against enemies.

Games like Rockstar's Grand Theft Auto (GTA) games have a huge amount of D-G cutscene detail going on all around the character during gameplay. There's vehicles on the roads, people walking about, and so on. We wouldn't normally think of these as kinds of cutscenes, but they meets our criteria. In this case, the scripted details are more "background details", that are in-part there to give the cities a lived-in feel. They're always rolling, and aren't there to play a specific role in the plot.

Other well-known games with D-G cutscenes are the Metro games, and the recent God of War games.

In a D-G cutscene, the player's control may be restricted in certain ways. They may be able to look around, but not move around, such as in the opening sequence of Skyrim, where the player wakes up as a prisoner, in a horse-drawn cart. Or in Metro 2033 where the player is sitting at table, with other characters, in a bar.

Arguably, the greatest implementation of during-gameplay cutscenes thus far, is the gang's camp in Red Dead Redemption 2. The camp seems to have a life of its own, outside of the existence of the player's interaction with it. Characters go about chores, have conversations amongst themselves, sit around the campfire at night drinking, chatting, and playing musical instruments, before retiring to their tent once it gets late enough for them. And the player can choose to, or not, take part in the goings on, such as the conversations.

 

Mixing during-gameplay cutscenes and other forms of cutscenes

During-gameplay cutscenes and other kinds of cutscenes can be seamlessly mixed together. For example, in the aforementioned scene in Metro 2033, where you're sitting at a table with a few others, having drinks and conversation. Most of the time it's a D-G cutscene, where characters are talking and the player can freely look around. But at times, NPCs will propose a toast, at which point it will show a short non-interactive 1st-person cutscene, where you grab your drink and take as swig of it, and put it back down.

It's possible to mix all three types of cutscene -- non-interactive, QTE-and-choice, and during-gameplay -- within the one overall cutscene.

 

Downsides and limitations of during-gameplay cutscenes

During-gameplay cutscenes sound good on paper. They enable gameplay/control and visual-action storytelling to be integrated. They appear to be the best of both worlds. We may wonder why they aren't used more in games. One reason is that they have a number of limitations and downsides. We'll look at these now.

 

During-Gameplay Cutscenes Weaken and Constrain the Visual-Action Storytelling

Movie/TV scenes and cutscenes use cinematic techniques, throught the use of camera angles, camera movements, and editing (cuts). In D-G cutscenes, because the player has control, the same camera that's used in gameplay -- whether first- or third- person -- is still used in the D-G cutscene. That lessens the strength of the storytelling, to some degree.

Movie/TV scenes and cutscenes involve real/virtual actor's performances. These performances include the

  • blocking (including the marks where they stand and move at each moment of each shot)
  • physical interactions between characters
  • body language, including posture, how they move, facial expressions, gestures, etc.
  • dialogue

And their timing is crucial to their performances.

And of course the performances involve interactions between characters. Eye-contact between characters. Where a person is standing and facing in relation to others, and how people walk around, amongst others. When two people meet for the first time. When a person approaches another to ask them for assistance. Or when there is flirting between people. When there's a group of people, and someone is speaking, there's how others are listening, and their reactions. Vigorous agreement, vs indifference. And so on.

All those body language and performance details have meaning, meaning that they communicate. A lot, for example, is communicated by the nature of the eye-contact between people. For example, the differences between avoiding eye contact, confident eye contact, aggressive staring, and flirty eye contact.
 
The usual control schemes in games are too simple, to be able to control such details. The player's character might be able to move in different directions, look in different directions, jump, and crouch. Those options pale in comparison to the sort of control an actor has in their performance. So the player's character can only have a 'passive' role in D-G cutscenes, that does not include those specific performance details.

As a different way of seeing this point, consider that in D-G cutscenes, the player could only communicate to other characters via moving forwards, backwards, left and right, turning to the side, and, by what direction they're looking in. The reader can imagine if they, themselves, were in a situation with other people and these were the only ways they could communicate to others. Or imagine if they were controlling a robot with such a control scheme, that was amongst those people, and the robot had zero ability to express facial expressions or body language, just those kinds of movement.

There's a fundamental difficulty here, in that the game can't know the player's intentions. The player may want to be friendly, or dismissive, towards another character. But the game can't infer those intentions from the player's control over their character. There's a paucity of information about the player's intentions, and only so much that can be inferred. This couldn't be overcome by having AI to have NPCs react more realistically.

To accommodate these control limitations, the player might be seeing some event taking place on the other side of a chain link fence. Or seeing large alien machines moving around in the distance. In such cases the player is distanced from the action, and are more like spectators.

There could be D-G cutscenes where player's character can choose where to stand amongst a number of people, as a conversation plays out. In such situations, it doesn't matter where the player character is standing in the scene, because all that matters is that they hear the dialog. In such cases the player is more of a passive spectator of what's going on.

That the player tends to be relegated to a passive spectator is why D-G cutscenes are so often focused on auditory details, like conversations between characters, or messages over loudspeakers. The player can be a passive participant in these, because audio is (somewhat) omni-directional, it doesn't matter exactly where the player character is facing and positioned at anytime during the D-G cutscene -- they'll still be able to hear the details.

Imagine a D-G cutscene where the player was an active participant in it. E.g. if player is up close to another character, in defiance of them, gets shoved back, and then darts back towards them, looks quickly to the sides at the other people there, and then turns to one of them, puts a hand on their shoulder, and whispers something to them. That would have to be done with a normal cutscene (which could possibly include QTEs).

Or imagine game developers trying to make a D-G cutscene where the player is showing another around their study, pointing out different objects to them. Where the player picks up a rare bottle of whiskey and shows that to the other character, handing the bottle to them. This couldn't be done either, if the player character is being controlled in the usual fashion.

One possible way of making the player character's performance include what was required for such scenes would be forcing them to do exactly what is required (and thus give up freedom of control).

A control scheme that would give the player control over all those nuances would have to be very complex. Perhaps it could be done with a mouse and keyboard, using all the keys on the keyboard, and use of those keys in combination with modifier keys. But then it'd be too complex for any normal person to use. VR with facial and body tracking could be a more realistic option.

Or something like QTEs could be used, though this wouldn't really give the player control over the character, it would be more just the player entering in the correct inputs to satisfy each of the QTEs.

A potential option would be to give the player the freedom, and have the other characters respond to the player's actions in a realistic way. The problem with this is that it'd be too complex to script in the different branching possibilities, and we don't have the means to simulate the responses of the other characters. And even if we could simulate them, we'd also have to also simulate the plot implications of some of the possibilities. And even if we could do that, many of those other possibilities would result in weaker storytelling (because strong stories don't lie near to each other in story space).

The player's character having to play a fairly passive role in D-G cutscenes means that such cutscenes could not be used for the storytelling of most scenes from movies/TV -- scenes that don't usually contain a passive role. Given that, it's fair to say that D-G cutscenes generally contain weaker storytelling than can be present in non-interactive cutscenes.

To me, the way the D-G cutscene details are always happening at a distance from the player character makes them feel a bit like a theme park attraction. As if the player is moving along through a set path, and there's scenery/sets and animatronics on either side of the path. The start of Half-Life 2, and the metro stations where the people live, in the Metro games, have this feel to me. I feel there's an uncanny valley feeling to the interactions with NPCs in during-gameplay cutscenes.

 

In conclusion, non-interactive cutscenes enable strong visual-action storytelling, but are at odds with gameplay. Interaction allows QTE & Choice cutscenes, and During-Gameplay cutscenes. QTE & Choice cutscenes involve weak gameplay as part of strong visual-action storytelling. During-Gameplay cutscenes enable stronger gameplay, at the expense of weaker and more-constrained visual-action storytelling.