Baldur’s Gate 3 has a lot of words—and even ‘a lot of words’ is a major understatement. In a Steam post before the game’s release, it was revealed the game’s total script is about 2 million words long. For context, all five books in the current Game of Thrones series add up to about 1.7 million words. Big. It’s a big game.
Which is why I was pretty damn impressed to find this tool casually popping up on the game’s subreddit, able to show which character had the most changes to their dialogue since launch. It’s Wyll, which is interesting—but not a huge surprise, seeing as his story stands the most to gain from some added nattering (we still like him, though). Still, I wanted to know how the heck something like this was built, so I reached out to the tool’s creator.
They go by the name of Invuska on Reddit, GitHub, the Larian Forums and Discord, and they credit the BG3 Patch Dialogue Difference Tool‘s existence to a shared effort by other modders in the community. “The extractor (by Norbyte), multi-tool (ShinyHobo), dialog parser (roksik-dnd & anonymous collaborator), and the dialog difference tool (me)—all of the prior work is what made development of this tool (and many others) manageable.”
While Invuska mentions that without the collaborative effort this thing could’ve been easily “twice the amount of work”, they’ve also got some compliments for Larian Studios itself. “Each line contained ‘character codes’ for which line was associated with which character and was structured in a way that I could fairly easily pick it apart … a data scientist loves nothing more than already very well structured and clean data to work with.”
As for their own personal observations, Invuska’s only just finished their first playthrough, which means they haven’t been diving too deep into the script beyond a broad, numbers-based overview. Instead, they’ve been staggered—again—by how mammoth of a game Baldur’s Gate 3 is.
“There are approximately [over] 1,888 characters with dialog in the game, even more considering some dialog may be misattributed and that this count doesn’t include generic dialog (e.g. generic group of goblins). I definitely have not talked to 1,888 characters.”
They also have a pretty good idea of how many lines—which could be multi-sentenced—the game has. “From what the internal code of the tool gathers there are 114,921 lines [in Patch 5],” compared to “110,869 on launch day.” While the tool does highlight a ton of typo fixes, as Invuska mentions: “It’s easy to think from the difference tool that there are a lot of typos in the script, but notice how in-game you don’t even see them! That just goes to show how massive this game is.”
As for why Invuska would put this all together, that’s down to one simple reason: justice for our big lady. “Justice for Karlach was actually the main reason why the tool was created, with more primitive code being created sometime in September after Patch 2 … many of us on Reddit, Discord, and in the Larian Forums thread for Karlach were/are quite hungry for an Infernal Engine fix of some sort that didn’t necessitate her becoming a mindflayer or her having to go back to the Hells.”
This means the tool started out targeting one specific character, then expanded to the whole cast: “I started working on simpler versions of the tool to satiate some of my curiosity/anticipation. A few others seemed to share the same curiosity and were interested in its development. Seeing how this tool may be useful for characters outside of just Karlach, I fleshed out my small collection of scripts for a more ‘everyone-ready’ version that you see today.”
I live for this stuff. While some might take a dim view of data mining, it’s clear that stats wizardry has a lot in common with speedrunning communities. Neither is trying to ‘break’ a game—instead, finding all the hidden secrets in between lines of code.
It’s an expression of love, kinda like how you might wear out your favourite bit of hardware. In regards to the tool itself, Invuska’s happy to share. “[I’m] planning to create more mods and tools in the future, so stay tuned. Also, the tool is open source on an MIT licence for anyone who is interested in forking/extending/etc. Go wild.”