Wednesday, January 24, 2024

Interactive storytelling: Types of cutscenes in games

Video game cutscenes are like little movies, played between gameplay segments. They, like movies and TV show episodes, are a sequence of one or more scenes, where each scene is a sequence of one or more shots.

There are 1st-person shots and scenes, which show the action from the perspective of a particular character. And there are 3rd-person shots and scenes, which don't show the action from the perspective of a particular character, but rather the view from where the camera is located and facing[1]. 

[1] there are also scenes rendered in Virtual Reality, in VR games and movies. These add an extra dimension of immersion.

In movies and TV shows most shots are 3rd-person shots, with the occasional 1st-person shot thrown in. In games, 1st-person (cut)scenes are more common. For example, in the Metro games, the Dishonored games, Halo 3: ODST, Halo: Reach and Cyperpunk 2077. The reasons for this needn't concern us in this post.

Video game cutscenes can include interaction. This post looks at such cutscenes, and how the interactivity in them affects the strength of the storytelling in them.

 

Visual-action storytelling

Here we introduce the notion of "visual-action storytelling". It will help us discuss the affect of interactivity on cutscenes in games.

The primary form of storytelling found in movies and TV is "visual-action storytelling", in which we see the story events occur. By "action" I don't mean in the sense of an "action movie", with weapons, fights, and chase-scenes. I don't mean something that has to be highly dynamic. The "action" is just what the viewer sees unfold over time, and that includes very still scenes where very little is happening.

Visual action is made up of components like the actor's performances, the sets, the cinematography, and the editing.

If there was a movie that was just 1.5 hours of a character recounting a story to some others, where all the footage was just of the storyteller and their audience, this would be a very weak form of visual action. It would contain visual action of the storyteller and their audience, but no visual action of the story being told. In that situation, the "real" story details are being told, not shown.

Whereas if instead of focusing totally on the storyteller and their audience, there were also visual-action scenes showing the events of the story that's being recounted, that would be a stronger form of visual action. In that version, the movie's viewers would be shown the visual action of the story details.

 

Interactivity in visual-action storytelling

Normal cutscenes are non-interactive. In video games, some degree of interactivity can be introduced into the visual-action storytelling of cutscenes. Though, as we'll see, it's only limited forms of interactivity, and their addition can lessen the strength of the visual-action storytelling.

The following looks at the different ways interactivity can play a role in the visual-action of cutscenes.
 

QTE-and-choice cutscenes

Two forms of interactions that may occur within cutscenes are QTEs (Quick-time Events) and making choices.

Imagine a story-focused game, where, after much journeying the player makes it to the castle, and gains an audience with the king at the king's court. There's a cutscene of the player character entering the court and talking to the king. As the cutscene continues, the situation goes south, and a fight breaks out between the player's character and the guards.

One way that fight could be implemented would be with normal gameplay, where the player moves their character about, attacks with their weapons, and blocks with their shield.

Alternatively, the fight could be a continuation of the cutscene. That way, the fight could be made to look very cinematic. It could contain dedicated character animation and performances, be 'shot' in a cinematic way, use various camera angles and movements, and be edited to look spectacular. However, the player would lose direct control over their character.

Imagine a moment during the cutscene where an enemy swings their sword at the player, and the player dodges to the left, just in time to avoid the blade. A QTE (Quick-time Event) could be used for making the character dodge the blade, to add some interaction into the cutscene.

Here's how the QTE would work. As the enemy goes to swing their sword, time will slow down a bit, and the screen will show a prompt, telling the player to press left on their joystick. The player would (usually) have a small window of time to enter that input. If they enter the correct input within the time limit, they succeed at that QTE -- and successfully dodge the sword -- otherwise they fail at it. The penalties for failure depend on the game and the particular situation in it. The penalties could be minor, all the way up to player's character dying.

As another example, during the fight the player could be grabbed from behind by some of the guards, and there could be a QTE prompt telling the player to quickly tap (and keep tapping) one of the controller buttons. If the player taps the button fast enough, they'll escape the grips of the guards.

Remember that in both of these examples, the action would be shown in a cinematic fashion, just like in a movie.

Other types of inputs used in QTEs include moving a joystick in a particular path (e.g. in a full circle, or to right and then anticlockwise, to up). And on touch-screen devices, the player may need to tap on hotspots on the screen, or slide their finger along a path (e.g. a circular path). The QTEs can be action-mirroring inputs.

QTEs are divisive. Many players do not like them. They see QTEs as a fairly pointless attempt at including a bit of gameplay here and there in cutscenes. They may not find QTEs enjoyable.

One thing that I think all could agree on is that QTEs are a fairly simple form of input. The game tells the player what to do, and when to do it, and the player just needs to follow the instructions properly. Performing the input(s) for a QTE is a fairly rote task. The player doesn't have much agency when it comes to QTEs. (We'll talk about choice below, and choices may be implemented as part of the QTEs, so in this sense they can provide some agency).

Choices provide another means for there to be player-interactions within cutscenes. At points in cutscenes the player can be presented with a choice from a small menu of options, and -- like with QTEs -- there will usually be a time limit for making the choice. The choice might be between a small number (2 to 4) of dialog options or action options.

As an example of choices between action options, there might be a oncoming zombie horde, where the player has to quickly choose between saving one of their companions, Joe or Jackie. Especially in timed action choices, we may consider these kinds of choices to be a kind of QTE.

We can call the kinds of cutscenes just described "QTE-and-Choice cutscenes" (or "Q&C cutscenes" for short).

The addition of QTEs and choices don't weaken the strength of the storytelling in these cutscenes. They're just like normal cutscenes, except for the introduction of some basic forms of interaction (QTEs and choices).

 

During-Gameplay Cutscenes

There are kinds of cutscenes that can play during the gameplay scenes in games. With these, the player still has some control over their character, while, at the same time, cutscene-like details play out around them. We'll call them "during-gameplay cutscenes" (or "D-G cutscenes" for short).

Half-Life 2 is the classic example of a game that includes during-gameplay cutscenes. Almost all of its cutscenes are of this sort. The Dishonored games are another example of games with during-gameplay cutscenes. The Metro games also contain a fair number of D-G cutscenes.

Half-Life 2 is a first-person shooter in which some hostile aliens have invaded earth. The game starts with you disembarking a train, and soon after you see an alien guard, holding a large baton, shove a person they're overseeing. As you walk past some of the humans there, they say things to you. Soon, you reach a checkpoint, where you're taken away to an interrogation room, where the helmeted alien who led you there takes his helmet off, to reveal themselves to be a human -- a person that your character knows. The entire time all this unfolds, the player has control over their character's movement (though of course where they can move is constrained by the walls etc in their environment), and can control where their character is looking.

Such during-gameplay cutscenes consist of scripted dialog and events (animated occurrences), that are triggered by the player's actions. For example, an NPC saying something to the player might be triggered by the player walking close-enough to the NPC. Or the player performing actions, like picking up a gun on the ground, might trigger some during-gameplay cutscene.

The cutscene elements of D-G cutscenes can consist of any scripted details. They could be as simple as some characters talking. D-G cutscenes can also be of any length. Some may quite brief, lasting only a couple of seconds.

One form of D-G cutscene is where the player's character and NPC(s) are having a conversation as they walk along. These are epitomised by Naughty Dog's Uncharted games and The Last of Us games. Yahtzee Crowshaw calls these "Walk and talk sequences".

We wouldn't normally think of these Walk-and-talk scenes as kinds of cutscenes, but they match the criteria laid out above, of being scripted details that play out during gameplay. In this case, there is the scripted movements of your NPC companions, and the scripted dialog said by your character and the NPCs.

These conversation-focused D-G cutscenes don't have take place while the characters are walking. They can occur during any kind of activity the characters are undertaking, like climbing a structure, or fighting against enemies.

Games like Rockstar's Grand Theft Auto (GTA) games have a huge amount of D-G cutscene detail going on all around the character during gameplay. There's vehicles on the roads, people walking about, and so on. We wouldn't normally think of these as kinds of cutscenes, but they meets our criteria. In this case, the scripted details are more "background details", that are in-part there to give the cities a lived-in feel. They're always rolling, and aren't there to play a specific role in the plot.

Other well-known games with D-G cutscenes are the Metro games, and the recent God of War games.

In a D-G cutscene, the player's control may be restricted in certain ways. They may be able to look around, but not move around, such as in the opening sequence of Skyrim, where the player wakes up as a prisoner, in a horse-drawn cart. Or in Metro 2033 where the player is sitting at table, with other characters, in a bar.

Arguably, the greatest implementation of during-gameplay cutscenes thus far, is the gang's camp in Red Dead Redemption 2. The camp seems to have a life of its own, outside of the existence of the player's interaction with it. Characters go about chores, have conversations amongst themselves, sit around the campfire at night drinking, chatting, and playing musical instruments, before retiring to their tent once it gets late enough for them. And the player can choose to, or not, take part in the goings on, such as the conversations.

 

Mixing during-gameplay cutscenes and other forms of cutscenes

During-gameplay cutscenes and other kinds of cutscenes can be seamlessly mixed together. For example, in the aforementioned scene in Metro 2033, where you're sitting at a table with a few others, having drinks and conversation. Most of the time it's a D-G cutscene, where characters are talking and the player can freely look around. But at times, NPCs will propose a toast, at which point it will show a short non-interactive 1st-person cutscene, where you grab your drink and take as swig of it, and put it back down.

It's possible to mix all three types of cutscene -- non-interactive, QTE-and-choice, and during-gameplay -- within the one overall cutscene.

 

Downsides and limitations of during-gameplay cutscenes

During-gameplay cutscenes sound good on paper. They enable gameplay/control and visual-action storytelling to be integrated. They appear to be the best of both worlds. We may wonder why they aren't used more in games. One reason is that they have a number of limitations and downsides. We'll look at these now.

 

During-Gameplay Cutscenes Weaken and Constrain the Visual-Action Storytelling

Movie/TV scenes and cutscenes use cinematic techniques, throught the use of camera angles, camera movements, and editing (cuts). In D-G cutscenes, because the player has control, the same camera that's used in gameplay -- whether first- or third- person -- is still used in the D-G cutscene. That lessens the strength of the storytelling, to some degree.

Movie/TV scenes and cutscenes involve real/virtual actor's performances. These performances include the

  • blocking (including the marks where they stand and move at each moment of each shot)
  • physical interactions between characters
  • body language, including posture, how they move, facial expressions, gestures, etc.
  • dialogue

And their timing is crucial to their performances.

And of course the performances involve interactions between characters. Eye-contact between characters. Where a person is standing and facing in relation to others, and how people walk around, amongst others. When two people meet for the first time. When a person approaches another to ask them for assistance. Or when there is flirting between people. When there's a group of people, and someone is speaking, there's how others are listening, and their reactions. Vigorous agreement, vs indifference. And so on.

All those body language and performance details have meaning, meaning that they communicate. A lot, for example, is communicated by the nature of the eye-contact between people. For example, the differences between avoiding eye contact, confident eye contact, aggressive staring, and flirty eye contact.
 
The usual control schemes in games are too simple, to be able to control such details. The player's character might be able to move in different directions, look in different directions, jump, and crouch. Those options pale in comparison to the sort of control an actor has in their performance. So the player's character can only have a 'passive' role in D-G cutscenes, that does not include those specific performance details.

As a different way of seeing this point, consider that in D-G cutscenes, the player could only communicate to other characters via moving forwards, backwards, left and right, turning to the side, and, by what direction they're looking in. The reader can imagine if they, themselves, were in a situation with other people and these were the only ways they could communicate to others. Or imagine if they were controlling a robot with such a control scheme, that was amongst those people, and the robot had zero ability to express facial expressions or body language, just those kinds of movement.

There's a fundamental difficulty here, in that the game can't know the player's intentions. The player may want to be friendly, or dismissive, towards another character. But the game can't infer those intentions from the player's control over their character. There's a paucity of information about the player's intentions, and only so much that can be inferred. This couldn't be overcome by having AI to have NPCs react more realistically.

To accommodate these control limitations, the player might be seeing some event taking place on the other side of a chain link fence. Or seeing large alien machines moving around in the distance. In such cases the player is distanced from the action, and are more like spectators.

There could be D-G cutscenes where player's character can choose where to stand amongst a number of people, as a conversation plays out. In such situations, it doesn't matter where the player character is standing in the scene, because all that matters is that they hear the dialog. In such cases the player is more of a passive spectator of what's going on.

That the player tends to be relegated to a passive spectator is why D-G cutscenes are so often focused on auditory details, like conversations between characters, or messages over loudspeakers. The player can be a passive participant in these, because audio is (somewhat) omni-directional, it doesn't matter exactly where the player character is facing and positioned at anytime during the D-G cutscene -- they'll still be able to hear the details.

Imagine a D-G cutscene where the player was an active participant in it. E.g. if player is up close to another character, in defiance of them, gets shoved back, and then darts back towards them, looks quickly to the sides at the other people there, and then turns to one of them, puts a hand on their shoulder, and whispers something to them. That would have to be done with a normal cutscene (which could possibly include QTEs).

Or imagine game developers trying to make a D-G cutscene where the player is showing another around their study, pointing out different objects to them. Where the player picks up a rare bottle of whiskey and shows that to the other character, handing the bottle to them. This couldn't be done either, if the player character is being controlled in the usual fashion.

One possible way of making the player character's performance include what was required for such scenes would be forcing them to do exactly what is required (and thus give up freedom of control).

A control scheme that would give the player control over all those nuances would have to be very complex. Perhaps it could be done with a mouse and keyboard, using all the keys on the keyboard, and use of those keys in combination with modifier keys. But then it'd be too complex for any normal person to use. VR with facial and body tracking could be a more realistic option.

Or something like QTEs could be used, though this wouldn't really give the player control over the character, it would be more just the player entering in the correct inputs to satisfy each of the QTEs.

A potential option would be to give the player the freedom, and have the other characters respond to the player's actions in a realistic way. The problem with this is that it'd be too complex to script in the different branching possibilities, and we don't have the means to simulate the responses of the other characters. And even if we could simulate them, we'd also have to also simulate the plot implications of some of the possibilities. And even if we could do that, many of those other possibilities would result in weaker storytelling (because strong stories don't lie near to each other in story space).

The player's character having to play a fairly passive role in D-G cutscenes means that such cutscenes could not be used for the storytelling of most scenes from movies/TV -- scenes that don't usually contain a passive role. Given that, it's fair to say that D-G cutscenes generally contain weaker storytelling than can be present in non-interactive cutscenes.

To me, the way the D-G cutscene details are always happening at a distance from the player character makes them feel a bit like a theme park attraction. As if the player is moving along through a set path, and there's scenery/sets and animatronics on either side of the path. The start of Half-Life 2, and the metro stations where the people live, in the Metro games, have this feel to me. I feel there's an uncanny valley feeling to the interactions with NPCs in during-gameplay cutscenes.

 

In conclusion, non-interactive cutscenes enable strong visual-action storytelling, but are at odds with gameplay. Interaction allows QTE & Choice cutscenes, and During-Gameplay cutscenes. QTE & Choice cutscenes involve weak gameplay as part of strong visual-action storytelling. During-Gameplay cutscenes enable stronger gameplay, at the expense of weaker and more-constrained visual-action storytelling.

No comments:

Post a Comment