Splinter Cell (2002) was one of the first games I had on the original Xbox and still remains one of my favorite games of all time. The game was developed by Ubisoft using Unreal Engine 2 β licensed from a small indie dev called Epic Games who continues to use and license its game engine technology for contemporary small-budget indie games such as Fortnite and Halo: Campaign Evolved.
I got into programming/hacking through video games and I still enjoy data mining/exploring cut content from the few games I play nowadays. I recently randomly decided to look online for cut content from Splinter Cell and I was kind of surprised on the lack of datamined info. There isnβt really much information on the topic asβ¦
Splinter Cell (2002) was one of the first games I had on the original Xbox and still remains one of my favorite games of all time. The game was developed by Ubisoft using Unreal Engine 2 β licensed from a small indie dev called Epic Games who continues to use and license its game engine technology for contemporary small-budget indie games such as Fortnite and Halo: Campaign Evolved.
I got into programming/hacking through video games and I still enjoy data mining/exploring cut content from the few games I play nowadays. I recently randomly decided to look online for cut content from Splinter Cell and I was kind of surprised on the lack of datamined info. There isnβt really much information on the topic aside from an OG Xbox review copy of the game which contained two levels cut from the retail Xbox version and some other minor differences.
Naturally, I decided to legally backup my personal disc copy of the game and got to digging into the files.
My initial core objective was to examine the format of the game data and sniff out if thereβs any indicators of cut content such as textures, models, interesting strings β whatever. Some nice finds would be debug menus, voice lines, weapon concepts, or levels that are unreachable through normal game progression.
The gameβs (truncated) file tree looks like this:
.
βββ contentimage.xbx
βββ dashupdate.xbe
βββ default.xbe
βββ downloader.xbe
βββ dynamicxbox.umd
βββ LMaps
β βββ 000_menu
β β βββ common.lin
β β βββ menu.lin
β βββ 001_Training
β β βββ 0_0_2_Training.bik
β β βββ 0_0_2_Training.lin
β β βββ 0_0_2_Training_progress.tga
β β βββ 0_0_2_Training_start.tga
β β βββ 0_0_3_Training.lin
β β βββ 0_0_3_Training_complete.tga
β β βββ 0_0_3_Training_progress.tga
β β βββ common.lin
β β βββ French
β β βββ 0_0_2_Training_progress.tga
β β βββ 0_0_2_Training_start.tga
β β βββ 0_0_3_Training_complete.tga
β β βββ 0_0_3_Training_progress.tga
.xbe files are Xbox Executables, .bik are Bink Video files, and .tga are images... but .lin is new to me.
In Splinter Cell the maps are divided into separate parts. So in the training mission 001_Training, you likely have 0_0_2_Training.lin for the first part and 0_0_3_Training.lin for the second which gets loaded via an in-game loading sequence after advancing to some zone in the map.
I instantly thought that common.lin might contain data common to both of these parts as a way to reduce file size. The Halo games for instance have a shared.map containing assets which are shared across most maps, and load data at a fixed address so that the file can be trivially transmuted from a binary blob to its in-memory data structures.
Examining the common.lin file in a hex editor, a few things become immediately apparent:
ββββββββββ¬ββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββ¬βββββββββ¬βββββββββ
β00000000β 04 00 00 00 0c 00 00 00 β 78 9c 7b d7 97 c2 00 00 β........βx.{.....β
β00000010β 06 2e 01 e1 04 00 00 00 β 0c 00 00 00 78 9c 63 60 β........β....x.c`β
β00000020β 90 66 00 00 00 3a 00 1c β 04 00 00 00 0c 00 00 00 β.f...:..β........β
β00000030β 78 9c 73 48 67 60 00 00 β 02 39 00 a8 04 00 00 00 βx.sHg`..β.9......β
β00000040β 0c 00 00 00 78 9c b3 e0 β 65 60 00 00 01 0b 00 46 β....x...βe`.....Fβ
ββββββββββ΄ββββββββββββββββββββββββββ΄ββββββββββββββββββββββββββ΄βββββββββ΄βββββββββ
- Data between
0x0..0x4and0x4..0x8are little-endian 32-bit integers:0x00000004and0x0000000C. - At offset
0x8is what appears to be a zlib-compressed chunk of data β noted by the distinctive βxβ in the ASCII view and0x78 0x9c. - Thereβs another sequence of this at offset
0x14, which happens to be0xCbytes past the offset of the zlib data (0x8), and another at0x28.
Presumably the format here is {decompressed_data_len, compressed_data_len, zlib_block[compressed_data_len]} repeated.
So far so good.
I wrote a quick tool to decompress the archive and without a hitch ended up with a 64k file containing 4 u32s prefixing it. Since these 4 are in their own dedicated zlib-compressed chunks I consider to be separate from the main data. I later reverse engineered and identified how they are used:
uncompressed_data_size: 0x648EEE
texture_cache_size? - later used when calling D3DDevice_CreateTexture2: 0x1B0000
vertex_buffer_size? - ditto, D3DDevice_CreateVertexBuffer2: 0x6740
index_buffer_size? - ditto, XGSetIndexBufferHeader: 0xD38
And this is what the main data sectionβs first 0x100 bytes look like:
ββββββββββ¬ββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββ¬βββββββββ¬βββββββββ
β00000000β 5c 58 9e 13 00 a3 c5 e3 β 9f b4 92 9b 13 5c 58 9e β\X......β.....\X.β
β00000010β 13 01 00 00 00 04 2a d6 β fe 7e 37 13 4d 61 70 73 β......*.β.~7.Mapsβ
β00000020β 5c 6d 65 6e 75 5c 6d 65 β 6e 75 2e 75 6e 72 00 00 β\menu\meβnu.unr..β
β00000030β 00 00 00 ee de 00 00 00 β 00 00 00 16 4d 61 70 73 β........β....Mapsβ
β00000040β 5c 31 5f 31 5f 30 54 62 β 69 6c 69 73 69 2e 75 6e β\1_1_0Tbβilisi.unβ
β00000050β 72 00 f0 de 00 00 6d c9 β 17 00 00 00 00 00 16 4d βr.....m.β.......Mβ
β00000060β 61 70 73 5c 31 5f 31 5f β 31 54 62 69 6c 69 73 69 βaps\1_1_β1Tbilisiβ
β00000070β 2e 75 6e 72 00 60 a8 18 β 00 98 34 21 00 00 00 00 β.unr.`..β..4!....β
β00000080β 00 16 4d 61 70 73 5c 31 β 5f 31 5f 32 54 62 69 6c β..Maps\1β_1_2Tbilβ
β00000090β 69 73 69 2e 75 6e 72 00 β 00 dd 39 00 89 63 19 00 βisi.unr.β..9..c..β
β000000a0β 00 00 00 00 18 4d 61 70 β 73 5c 30 5f 30 5f 32 5f β.....Mapβs\0_0_2_β
β000000b0β 54 72 61 69 6e 69 6e 67 β 2e 75 6e 72 00 90 40 53 βTrainingβ.unr..@Sβ
β000000c0β 00 0f 9f 0c 00 00 00 00 β 00 18 4d 61 70 73 5c 30 β........β..Maps\0β
β000000d0β 5f 30 5f 33 5f 54 72 61 β 69 6e 69 6e 67 2e 75 6e β_0_3_Traβining.unβ
β000000e0β 72 00 a0 df 5f 00 48 86 β 11 00 00 00 00 00 1e 4d βr..._.H.β.......Mβ
β000000f0β 61 70 73 5c 31 5f 32 5f β 31 44 65 66 65 6e 73 65 βaps\1_2_β1Defenseβ
ββββββββββ΄ββββββββββββββββββββββββββ΄ββββββββββββββββββββββββββ΄βββββββββ΄βββββββββ
And at what appears to be the end of the file table:
ββββββββββ¬ββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββ¬βββββββββ¬βββββββββ
β0002c580β 79 6e 63 68 5c 69 6e 74 β 5c 55 73 61 53 6f 6c 64 βynch\intβ\UsaSoldβ
β0002c590β 69 65 72 5c 55 53 4f 55 β 4e 43 5f 33 2e 62 69 6e βier\USOUβNC_3.binβ
β0002c5a0β 00 40 8d 9b 13 74 05 00 β 00 00 00 00 00 c1 83 2a β.@...t..β.......*β
β0002c5b0β 9e 64 00 11 00 01 00 00 β 00 10 0e 00 00 88 00 00 β.d......β........β
β0002c5c0β 00 fa 0f 00 00 f3 7a 11 β 00 4e 00 00 00 3e 78 11 β......z.β.N...>x.β
β0002c5d0β 00 de ad f0 0f 42 01 9c β 90 92 8f 96 93 9e 8b 96 β.....B..β........β
β0002c5e0β 90 91 9a 9c 97 9a 93 90 β 91 df af bc ba bc b7 ba β........β........β
β0002c5f0β b3 b0 b1 df a6 c5 a3 ba β bc b7 ba b3 b0 b1 a3 ac β........β........β
β0002c600β a6 ac ab ba b2 a3 df ce β cf d0 cd c9 d0 cf cd df β........β........β
β0002c610β cd ce c5 cf cd c5 ce cb β ff 00 00 00 00 00 00 00 β........β........β
β0002c620β 00 00 00 00 00 00 00 00 β 00 01 00 00 00 fa 0f 00 β........β........β
β0002c630β 00 10 0e 00 00 05 4e 6f β 6e 65 00 10 04 07 04 06 β......Noβne......β
β0002c640β 43 6f 6c 6f 72 00 10 04 β 07 04 0d 49 6e 74 65 72 βColor...β...Interβ
β0002c650β 6e 61 6c 54 69 6d 65 00 β 10 00 07 00 07 45 6e 67 βnalTime.β.....Engβ
β0002c660β 69 6e 65 00 10 00 07 04 β 05 43 6f 72 65 00 10 00 βine.....β.Core...β
β0002c670β 07 04 07 53 79 73 74 65 β 6d 00 10 00 07 04 06 55 β...Systeβm......Uβ
ββββββββββ΄ββββββββββββββββββββββββββ΄ββββββββββββββββββββββββββ΄βββββββββ΄βββββββββ
Just to save some blog space on my trial and error process here, Iβm going to drop some of the resources I found which discuss this format:
- https://oldunreal.com/phpBB3/viewtopic.php?t=4885
- https://zenhax.com/viewtopic.php@t=1049.html
- https://reshax.com/topic/1421-ubisoft-unreal-engine-2-open-season-2006-video-game-umd-also-lin-xbox-xbox-360-pc-and-liv-latter-being-exclusive-to-xbox-360/
- https://www.unrealarchive.org/wikis/unreal-wiki/Legacy:UMOD/File_Format.html
The last two posts in particular had structure info that was helpful in figuring out the packed int format (think UTF-8 and its variable-length encoding) and a couple unknown vars.
What I gathered from all of these posts was that over time, nobodyβs really been able to figure out this formatβs quirks sufficiently to unpack the data. Everyone seems to think that some kind of VFS is created and the data gets mapped at a specific address and then read. Which may be true for some titles or consoles, but is not for this one.
My objective has now changed: I now want to reverse engineer this file format and be able to dump individual files from this filesystem. Then I can achieve my core goal of looking for cut content. Then I can maybe play the game.
# tl;dr of the general .lin structure
common.lin has a different layout from the other .lin files that looks roughly like:
/* ==== Standard Data ==== */
// These three, from research + reverse engineering, should not be considered
// as part of the "whole" file
u32 maybe_load_address; // 5C 58 9E 13 (0x139e585c) in common.lin
compressed_int name_length; // 0 in common.lin
char name[name_length];
/* ==== common.lin-specific file header ==== */
u32 magic; // 0x9fe3c5a3 in little endian, i.e. A3 C5 E3 9F
u32 unk_address; // B4 92 9B 13, (0x139b92b4) suspiciously similar to maybe_load_address.
// unk_address - load_address gives you the start of the file
// table, relative to the magic?
u32 load_address2; // 5C 58 9E 13 same as maybe_load_address
u8 unknown[8]; // 01 00 00 00 04 2A D6 FE
compressed_int file_entry_count;
FileEntry file_entries[file_entry_count];
struct FileEntry {
compressed_int name_len;
char name[name_len];
u32 offset;
u32 len;
u32 unk;
}
Then immediately following the FileEntry table are 54 Unreal Engine Package files in sequence (identified via their 0x9E2A83C1 magic β these are also referred to as Linker files) that presumably map to the files in the file table.
The map-specific files like menu.lin and 0_0_2_Training.lin do not have the file table, but they do have the first 3 fields (and a non-null string like βmenu\x0β for the name field) then a sequence of Linker files.
But the difficulty with parsing this data starts with the file table.
# Problems
# File Table
The file table is a very simple format that Iβm able to parse with my program:
FileEntry {
name: Maps\\menu\\menu.unr,
offset: 0x0,
len: 0xDEEE,
unk: 0x0,
},
FileEntry {
name: Maps\\1_1_0Tbilisi.unr,
offset: 0xDEF0,
len: 0x17C96D,
unk: 0x0,
},
FileEntry {
name: Maps\\1_1_1Tbilisi.unr,
offset: 0x18A860,
len: 0x213498,
unk: 0x0,
},
FileEntry {
name: Maps\\1_1_2Tbilisi.unr,
offset: 0x39DD00,
len: 0x196389,
unk: 0x0,
},
FileEntry {
name: Maps\\0_0_2_Training.unr,
offset: 0x534090,
len: 0xC9F0F,
unk: 0x0,
},
FileEntry {
name: Maps\\0_0_3_Training.unr,
offset: 0x5FDFA0,
len: 0x118648,
unk: 0x0,
},
FileEntry {
name: Maps\\1_2_1DefenseMinistry.unr,
offset: 0x7165F0,
len: 0x249AF6,
unk: 0x0,
},
FileEntry {
name: Maps\\1_2_2DefenseMinistry.unr,
offset: 0x9600F0,
len: 0x20F662,
unk: 0x0,
},
<snip>
At first glance the files seem to be laid out sequentially, aligned to a pointer-width boundary. Except, notice that last fileβs offset... 0x9600F0. This is way outside of the range of my 0x648EEE-length file, and this file list contains 3,582 files! Not 54 as expected from the count of Unreal Package magics!
The mismatch file count could be explained by not every file in this container being an Unreal Package, but the offsets so far are extremely wrong.
# File Reading
After debugging the game in the Original Xbox emulator xemu, I was able to find the routine which opens the file, as well as the function which reads and decompresses data.
Function Identification Methodology
If anyoneβs curious on the methodology: I identified NtCreateFile, set a breakpoint, recorded the HANDLE returned for the file path I cared about, then set a breakpoint at NtReadFile and broke when the input HANDLE matched the expected value. The call stack/stepping from here helped identify interesting callers. Alternatively, the string βunknown compression methodβ is useful in finding the decompression routine inflateInit2.
This is not super relevant to the blog post which is why itβs in this little collapse section. I hate reading posts like this that skip over a detail Iβm interested in like itβs just common knowledge how something is done, so Iβm trying to avoid doing that :)
Note: Click images to see in higher res.
This function basically checks the requested read size against how much data it has precached in its decompressed data buffer. It will then copy as much data as it can from its precached buffer to the output buffer, then read the next block of compressed zlib data into its precache buffer if the previous one was exhausted. Repeat this process until the request is satisfied.
Identifying this function was pretty important for my reverse engineering process. I could now set breakpoints on the code which copies data to the output buffer and see whoβs calling this function when data is read from offsets I care about.
I stepped through this code, set Memory Read breakpoints on data I didnβt yet understand, and noted something interesting early on!
Those βaddressesβ from the header (0x139e585c)? Those are actually passed to what I can only guess is a Seek routine which updates the position property of the file reader, which then makes an indirect call to another function that literally does nothing.
The entire content of the function is:
retn 4
Thatβs it.
Then the reads just... continue from their last position? Since the function is an indirect call, I can only assume that I was looking at some composed C++ object where the outer class object updates its own position in Seek() and then calls its underlying file readerβs Seek()... which is a no-op?
After setting Memory Read breakpoints on the objectβs position field, I noticed itβs only ever used in their file reader equivalent of FTell(). It doesnβt affect where data is actually being read from at all.
The reason for the Seek() being a no-op is likely because the underlying file reader is reading directly from the compressed buffer, which reads in 0x4000-byte chunks. Since you cannot reasonably map an uncompressed data offset to a compressed offset the format must be designed to ignore seeks and just read data linearly.
...the .lin extension makes a lot more sense.
π‘ In order to read these files, you have to assume that you cannot seek forward/backward. Easy enough.
# Load Order Matters
We still have a problem that has not been addressed: why does the file table have a large count of files with bad offsets?
I continued to use breakpoints inside of the file read function to trace where interesting bits of data were read and forced a break when the data immediately following the file table was read. Eventually I traced the file read operation back far enough to find this function, StaticLoadObject:
This function calls ResolveName which I was able to log the arguments to via a debugger breakpoint script, which told me the InName was ini:Engine.Engine.GameEngine:
This ini:Engine.Engine.GameEngine name gets parsed as:
ini:<- resolve the name from the gameβs INI filesEngine.Engine<- the INI table to read fromGameEngine<- the key from the table to read
If I look in UW.ini included with the game, this table is defined as:
[Engine.Engine]
RenderDevice=D3DDrv.D3DRenderDevice
GameRenderDevice=D3DDrv.D3DRenderDevice
AudioDevice=XboxAudio.XboxAudioSubsystem
Console=Engine.Console
DefaultPlayerMenu=UPreview.UPreviewRootWindow
Language=int
GameEngine=Engine.GameEngine
EditorEngine=Editor.EditorEngine
WindowedRenderDevice=D3DDrv.D3DRenderDevice
DefaultGame=Echelon.EchelonGameInfo
DefaultServerGame=WarfareGame.WarfareTeamGame
ViewportManager=XboxDrv.XboxClient
Render=Render.Render
Input=Engine.Input
Canvas=Echelon.ECanvas
Editor3DRenderDevice=D3DDrv.D3DRenderDevice
So the resulting value returned from this function is Engine.GameEngine, which matches what this function resolves.
This is then used to resolve the package Engine and its exported object GameEngine. The game binary looks for the file Engine in its available sources (partial matching strategy), which includes searching against the LIN file table, and then resolves that name as System\Engine.u. My tool that reads the file table confirms that this is declared in the LIN file:
FileEntry {
name: System\\Engine.u,
offset: 0x13482120,
len: 0x127DA1,
unk: 0x0,
},
Except the file start offset + len donβt make sense. If I assume the Engine.u file is the first file immediately following the file table, advancing forward by this length appears to land right in the middle of some string?
ββββββββββ¬ββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββ¬βββββββββ¬βββββββββ
β00154330β 09 45 4d 65 73 68 53 46 β 58 00 10 00 07 00 1b 43 β.EMeshSFβX......Cβ
β00154340β 68 61 6e 64 65 72 6c 65 β 72 43 72 79 73 74 61 6c βhanderleβrCrystalβ
β00154350β 50 61 72 74 69 63 75 6c β 65 00 10 00 07 00 12 46 βParticulβe......Fβ
ββββββββββ΄ββββββββββββββββββββββββββ΄ββββββββββββββββββββββββββ΄βββββββββ΄βββββββββ
Iβll save some time and just say that I did not identify the wrong file. The lengths just donβt matter, and for all intents and purposes are wrong. The reader in the game engine must just read the data in-order using its self-description in its own header?
The Unreal Engine Package/Linker file format has been well documented and does include some sizes in its header. The packages contain about what youβd expect of some object-oriented programming (OOP) script/data format.
It has exported objects which are named instances of some OOP type and has properties and data. Or the object can be a class/struct definition. The exports may rely on types exported from other packages which are declared as imports. Both of these have names or string data associated with them which are defined in the name table.
I mapped the existing documentation to the following Rust struct:
pub struct PackageHeader<'i> {
pub version: u32,
pub flags: u32,
pub name_count: u32,
pub name_offset: u32,
pub export_count: u32,
pub export_offset: u32,
pub import_count: u32,
pub import_offset: u32,
// Note: this is not in the above documented description
pub unk: u32,
// Ditto.
// Not shown: compressed int for length of this data at this position
pub unknown_data: &'i [u8],
pub guid_a: u32,
pub guid_b: u32,
pub guid_c: u32,
pub guid_d: u32,
// Not shown: compressed int for length of this data at this position.
pub generations: Vec<GenerationInfo>,
}
And of course, the offsets in this format are also unusable (e.g. the name_offset lands you after the start of the name table). But the counts look good:
PackageHeader {
version: 0x110064,
flags: 0x1,
name_count: 0xE10,
name_offset: 0x88,
export_count: 0xFFA,
export_offset: 0x117AF3,
import_count: 0x4E,
import_offset: 0x11783E,
unk: 0xFF0ADDE,
unknown_data: [
...
]
guid_a: 0x0,
guid_b: 0x0,
guid_c: 0x0,
guid_d: 0x0,
generations: [
GenerationInfo {
export_count: 0xFFA,
name_count: 0xE10,
},
],
}
Now with my tool updated to read these tables β parsing by assuming that they immediately follow this header and each other β I have imports that look like:
Package Core.Core
Import { class_package: 4, class_name: B64, package_index: 0, object_name: 4, object: None }
Class Core.Object
Import { class_package: 4, class_name: B62, package_index: FFFFFFFF, object_name: 13, object: None }
Class Core.Function
Import { class_package: 4, class_name: B62, package_index: FFFFFFFF, object_name: BBD, object: None }
And exports:
Class Actor
(0x0) ObjectExport {
class_index: 0x0,
super_index: 0xFFFFFFFE,
package_index: 0x0,
object_name: 0x206,
object_flags: 0x40F0004,
serial_size: 0x3A8,
serial_offset: 0xF719,
}
Class Pawn
(0x1) ObjectExport {
class_index: 0x0,
super_index: 0x1,
package_index: 0x0,
object_name: 0x1A,
object_flags: 0x40F0004,
serial_size: 0x281,
serial_offset: 0xFAC1,
}
...
Class GameEngine
(0xEFB) ObjectExport {
class_index: 0x0,
super_index: 0x1C8,
package_index: 0x0,
object_name: 0x1D8,
object_flags: 0x40F0004,
serial_size: 0x5B,
serial_offset: 0xC50DB,
}
So the GameEngine object has export index 0xEFB and its data is supposedly located at offset 0xC50DB relative to the package start. You guessed it though, its offset is wrong!
# Export Data
Up to this point we know:
- You cannot seek in the file reader.
- The offsets do not map cleanly to the on-disk representation and arenβt really used other than for position tracking.
- The sizes (at least in the file table, and I soon realized in the export data) are incorrect.
- We know
GameEngineis the first object requested by the C++ side of the game and is export index0xEFBin theEnginepackage. It may not be the first object actually parsed, but itβs the first object requested.
Now, to achieve my goal of dumping these files I attempted to simply sum the size of these exports... but trying a combination of that calculated size + any of the {end_of_export_table, start_of_file} offsets landed me in weird places with other Linker files in-between.
By referencing Unreal-Library to help fill in some of the blanks, I observed the following high-level parsing logic in the game engine:
- An exported object is requested by the game. If it isnβt loaded already, the export is lazy loaded.
- Lazy loading requires resolving the
supertypeβs object. For some things this is theClassorStructbase types, for other things this is a different parent class which will eventually haveClassas its parent type. - Exports have properties which can be of varying size. As you read an export, you deserialize its data as described by its
serial_sizeandserial_offsetfields, and however the types exported from the C++ side defines the deserialization routine.
Which visually results in something like the following flow when resolving imports/exports:
To give a concrete example, imagine that GameEngine has the following class hierarchy:
GameEngine -> Engine -> Subsystem -> Class
Also imagine that GameEngine is the very first object ever parsed β nothing else has been loaded yet. Requesting to load GameEngine from the Engine.u package will trigger the following sequence of events:
Engine.uheader read/parse (since no package has been created yet)- Lookup
EngineβsGameEngineexport. Itβs not yet parsed, so we need to construct this object by constructing/deserializing it. GameEngineβs parent class isEngine.Engine. It has not yet been parsed, so we need to deserialize it beforeGameEngine.Core.SubsystemisEngine.Engineβs parent class. Same thing.Core.uheader read/parse (sinceCorehasnβt been loaded yet)Core.ClassisCore.Subsystemβs parent class (and the base class). Construct this object.Core.Classproperty deserialization. We can now continue withCore.Subsystemcreation.Core.Subsystemproperty deserialization...Engine.Engineproperty deserialization..Engine.GameEngineproperty deserialization...- We can now return the fully constructed
Engine.GameEngine.
I believe this can result in export data that is interleaved, unfortunately. For the above scenario the data may be on disk like the following diagram. Note: for space/simplicity Iβve omitted Core.Class, as well as the potential for the properties themselves to trigger deserializing of other exports.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β
β β
β File Table β
β β
β β
β β
βββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββ€
βCore.u Header β Engine.u Header β
β β β
β β β
ββββββ¬βββββ¬ββββββββββββββββββ΄ββββββββ¬ββββββββββββββββββββββββββ€
β ββ°β°β°β°ββ°β°β°β°β°β°β°β°β°β°β°β°β°β°β°β°β°β°β°β°β°β°β°β°β°β β
β ββ°β°β°β°ββ°β°β°β°β°β°β°β°β°β°β°β°β°β°β°β°β°β°β°β°β°β°β°β°β°β β β β
β ββ°β°β°β°ββ°β°β°β°β°β°β°β°β°β°β°β°β°β°β°β°β°β°β°β°β°β°β°β°β°β β β β
β β² β β² β°ββ°β° β² β°β°β°β°β°β°β°β°β°β°β°β°β°β°β°β°β°β°β°β°β β² β β²β β² β
ββββΌββ΄ββΌβββ΄ββββΌββββββββββββββββββββββ΄βββββββββΌββ΄ββββΌβ΄βββββββΌβββ€
β β β β β β β β
β β β βββ΄βββββββββββββββββββββββββββ β β β β
β β β β Core.Subsystem Export Data β β β β β
β β β ββββββββββββββββββββββββββββββ β β β β
β β βββ΄βββββββββββββββββββββββββββββββββ β β β β
β β β Engine (Super Class) Export Data β β β β β
β β ββββββββββββββββββββββββββββββββββββ β β β β
β β βββββββββββ΄ββββββ΄ββββββββ΄βββ€
ββββ΄βββββββββββββββββββββββββ β GameEngine Object β
ββ GameEngine Object Start β β Properties β
βββββββββββββββββββββββββββββ ββββββββββββββββββββββββββββ€
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
And now if you imagine that thereβs a second object which also extends from Engine loaded after GameEngine, then their common the super class Engine has already been parsed and its information is already in-memory. i.e. if you serialize two objects of the same exact type, the first object might have all the data for its parent classes interleaved with its own export data and the second object only contains its own property data.
Unfortunately, this means that to read these files statically (even for just static recompilation) you need to have full knowledge of how each C++-implemented type is parsed in order to parse all exports and their properties. Additionally, reading one export may trigger resolving of imports in your own Linker object, which in turn trigger deserialization of exports in another Linker object.
Note: Iβm not 100% confident in the data being interleaved vs just sequential. Through observing seek/read operations for various exports, I do see seeks going to a wildly different offset in the middle of deserializing an export, then another export deserializing, then seeking back to the original export and continuing to deserialize it again This is a PITA to debug though.
# Why??????
I imagine thereβs a very good reason for packaging data this way. Itβs best to consider the constraints of the time:
- The game is being shipped on a physical disc.
- The Xbox has 64MB of RAM shared between the CPU and GPU, with some portion of that being dedicated to the OS.
- The CPU wasnβt terribly slow for the time, but wasting cycles would have been noticed.
The .lin format mitigates these issues with:
- Compressing data means you save space on the disc... If you conveniently ignore the fact that
common.linis duplicated in each mapβs directory and is the same for every map I tested, which kinda negates part of this. - Streaming data in from the file instead of decompressing the whole thing at once saves on overall memory pressure during the data loading phase.
- Laying out the file in a byte-for-byte exact read order increases I/O speeds by not having to seek around the physical media, and ensures that you donβt need magic to translate an uncompressed offset to a compressed one in a performant manner.
# Logging Load Order for Static Recompilation
I really, really wanted to avoid doing any runtime dumping that requires playing the game in an emulator or physical console. It doesnβt scale well to other games and is generally less flexible. But doing runtime observations are extremely useful in making sense of the format, so I went ahead and added some logging to get an idea of the file read order from the compressed archive when booting the game:
..\System\Engine.u
..\System\Core.u
..\System\Echelon.u
..\Textures\HUD.utx
..\Sounds\FisherFoley.uax
..\Sounds\CommonMusic.uax
..\System\EchelonEffect.u
..\Textures\ETexSFX.utx
..\Textures\2-1_CIA_tex.utx
..\Textures\generic_shaders.utx
..\Textures\LightGenTex.utx
..\Textures\5_1_PresidentialPalace_tex.utx
..\Textures\1_2_Def_Ministry_tex.utx
..\Textures\EGO_Tex.utx
..\Textures\ETexIngredient.utx
..\Textures\1-1_TBilisi_tex.utx
..\Textures\1_3_CaspianOilRefinery_TEX.utx
..\StaticMeshes\EMeshSFX.usx
..\StaticMeshes\EGO_OBJ.usx
..\Textures\ETexCharacter.utx
..\Textures\4_3_Chinese_Embassy_tex.utx
..\Textures\4_3_0_Chinese_Embassy_tex.utx
..\Textures\4_3_2_Chinese_Embassy_tex.utx
..\Sounds\water.uax
..\Sounds\DestroyableObjet.uax
..\Sounds\FisherVoice.uax
..\Sounds\FisherEquipement.uax
..\Sounds\GunCommon.uax
..\Sounds\Interface.uax
..\Sounds\Electronic.uax
..\Sounds\Dog.uax
..\Sounds\Lambert.uax
..\StaticMeshes\EMeshIngredient.usx
..\StaticMeshes\EMeshCharacter.usx
..\Textures\2_2_1_Kalinatek_tex.utx
..\StaticMeshes\LightGenOBJ.usx
..\Textures\ETexRenderer.utx
..\Sounds\Door.uax
..\Sounds\GenericLife.uax
..\Sounds\Special.uax
..\Sounds\ThrowObject.uax
..\StaticMeshes\Generic_Mesh.usx
..\StaticMeshes\prog\generic_obj.usx
..\Textures\0_0_Training_tex.utx
..\Textures\3_4_Severo_tex.utx
..\System\EchelonIngredient.u
..\Sounds\Gun.uax
..\System\EchelonGameObject.u
..\Animations\ESkelIngredients.ukx
..\Sounds\Metal.uax
..\Animations\ETrk.ukx
..\StaticMeshes\2-1_cia_obj.usx
..\System\EchelonHUD.u
..\Animations\ESam.ukx
..\Maps\menu\menu.unr // <--- # 55
..\Textures\2_2_Kalinatek_tex.utx
..\StaticMeshes\2_2_Kalinatek_OBJ.usx
..\System\EchelonPattern.u
..\Sounds\S3_4_2Voice.uax
..\Sounds\S3_4_3Voice.uax
..\Sounds\S2_2_2Voice.uax
..\Sounds\S2_1_2Voice.uax
..\Sounds\S5_1_2Voice.uax
..\Sounds\S3_2_2Voice.uax
..\Sounds\S4_2_2Voice.uax
..\Sounds\S4_1_1Voice.uax
..\Sounds\S1_2_1Voice.uax
..\Sounds\S1_1_2Voice.uax
..\Sounds\S0_0_3Voice.uax
..\Sounds\S3_2_1Voice.uax
..\Sounds\S4_2_1Voice.uax
..\Sounds\S1_3_3Voice.uax
..\Sounds\S0_0_2Voice.uax
..\Sounds\S4_3_2Voice.uax
..\Sounds\S1_1_1Voice.uax
..\Sounds\S2_2_1Voice.uax
..\Sounds\S4_3_1Voice.uax
..\Sounds\S5_1_1Voice.uax
..\Sounds\S4_1_2Voice.uax
..\Sounds\S2_1_1Voice.uax
..\Sounds\S1_1_0Voice.uax
..\Sounds\S2_2_3Voice.uax
..\Sounds\S2_1_0Voice.uax
..\Sounds\S1_2_2Voice.uax
..\Sounds\Vehicules.uax
..\Sounds\S1_1_Voice.uax
..\Sounds\S2_1_Voice.uax
..\Sounds\S4_3_0Voice.uax
..\Sounds\S1_3_2Voice.uax
..\Sounds\Machine.uax
..\Sounds\FireSound.uax
..\Sounds\SoundEvent.uax
..\Sounds\S0_0_Voice.uax
..\Sounds\S4_3_Voice.uax
..\Sounds\S4_2_Voice.uax
..\Sounds\S5_1_Voice.uax
..\Sounds\XboxLive.uax
..\System\EchelonCharacter.u
..\Sounds\GearCommon.uax
..\Animations\ENPC.ukx
..\Sounds\Exspetsnaz.uax
..\Sounds\GeorgianSoldier.uax
..\Sounds\RussianMafioso.uax
..\Sounds\GeorgianCop.uax
..\Sounds\EliteForce.uax
..\Sounds\CiaSecurity.uax
..\Sounds\CiaAgentMale.uax
..\Sounds\ChineseSoldier.uax
..\Animations\EFemale.ukx
..\Animations\EDog.ukx
..\Sounds\GeorgianPalaceGuard.uax
File Dumping Script
I set a breakpoint in the prologue of a function with the string βLinkerExistsβ that I later determined to be the constructor for an object called ULinkerLoad. One of the arguments is the file name for this object.
When triggered, the breakpoint executes the following IDA Python script which reads the filename pointer, then the filename, outputs it to the IDA console, and continues execution:
import ida_idd, ida_kernwin, ctypes
p=ida_dbg.get_reg_val("ebx")
s=b""
while True:
c = ida_idd.dbg_read_memory(p,2)
if not c or c == b"\x00\x00": break
s += c; p+=2
ida_kernwin.msg("ULinkerLoad: " + s.decode('utf-16-le')+"\n")
In the above file load order I annotated file #55 which is ..\Maps\menu\menu.unr. The common.lin file has 54 Linker files and #55 in the above listing happens to be the map which is loading and has its own dedicated .lin file. This is a strong indicator that the common.lin archive genuinely contains only 54 files and anything else is read from level-specific archives.
I also set a breakpoint in the function which deserializes exports (called Preload) and did some logging of which export is read and when a stream seek occurred:
ULinkerLoad: ..\System\Engine.u
ULinkerLoad: ..\System\Core.u
Export offset: 0x0,0x0,0x0,0x97,0x40f0004,0x4d,0x1b05
Seeking to/from: 0x1b05,0x10883
Export offset: 0xfffffffe,0x0,0x3,0x13d,0x70004,0x1c,0x6531
Seeking to/from: 0x6531,0x1b18
Read complete: 0xfffffffe,0x0,0x3,0x13d,0x70004,0x1c,0x6531
Seeking to/from: 0x1b18,0x654d
Export offset: 0xfffffffe,0x0,0x3,0x13c,0x70004,0x1c,0x6515
Seeking to/from: 0x6515,0x1b18
Read complete: 0xfffffffe,0x0,0x3,0x13c,0x70004,0x1c,0x6515
Seeking to/from: 0x1b18,0x6531
Export offset: 0xfffffffe,0x0,0x3,0x119d,0x70004,0x2c,0x6432
Seeking to/from: 0x6432,0x1b18
Seeking to/from: 0x6451,0x6452
Seeking to/from: 0x6453,0x6454
Seeking to/from: 0x6454,0x6455
Seeking to/from: 0x6455,0x6456
Export offset: 0xfffffffd,0x0,0x2d7,0x477,0x70004,0xb,0x1c35
Seeking to/from: 0x1c35,0x6457
Read complete: 0xfffffffd,0x0,0x2d7,0x477,0x70004,0xb,0x1c35
Seeking to/from: 0x6457,0x1c40
Export offset: 0xfffffffd,0x0,0x2d7,0x46d,0x70004,0xb,0x2736
Export Preload Script
IDA Python breakpoint script at Preload entry, identifiable by the string βSerialSizeβ and after the deserialization routine is called:
import ida_dbg, ida_idd, ida_kernwin, ctypes, time
export_addr=ida_dbg.get_reg_val("ebp")
class_index = int.from_bytes(ida_idd.dbg_read_memory(export_addr, 4), "little")
super_index = int.from_bytes(ida_idd.dbg_read_memory(export_addr + 4, 4), "little")
package_index = int.from_bytes(ida_idd.dbg_read_memory(export_addr + 8, 4), "little")
object_name = int.from_bytes(ida_idd.dbg_read_memory(export_addr + 12, 4), "little")
object_flags = int.from_bytes(ida_idd.dbg_read_memory(export_addr + 16, 4), "little")
serial_size = int.from_bytes(ida_idd.dbg_read_memory(export_addr + 20, 4), "little")
serial_offset = int.from_bytes(ida_idd.dbg_read_memory(export_addr + 24, 4), "little")
edx=ida_dbg.get_reg_val("edx")
properties = [class_index, super_index, package_index, object_name, object_flags, serial_size, serial_offset]
ida_kernwin.msg("Export data: " + ",".join(hex(n) for n in properties) +"\n")
There is really no discernable pattern to the loads at all. The file/export load order seems to be just satisfying the dependency graph (exports required for parents/properties of yet-to-be-parsed types) for requested objects from the C++ side of the house.
I think an acceptable compromise to doing this statically would be requiring dumping the file/export load order from the game... but more work is needed to prove the viability of this approach.
I adjusted my program to read my logged lines into a queue of exports to be parsed, using the completed reads (lines starting with Read complete rather than Export offset). It then attempted to find the matching export in the export table across any package, and read its size. Repeat until the next Linker object is encountered, parse that, add it to the list, and repeat.
This quickly proved to be non-viable with my very barebones program. I woudl hit a point where I failed to find a matching export for the line logged, presumably because I was not reading the correct amount of data required to reach the next Unreal Package where that export was declared.
This was either a bug, or maybe some of the types attempt to seek+read without triggering a Preload(). At any rate, I had now invested a week or longer on the static approach with no data successfully dumped yet.
# Dumping at Runtime
At some point during the above research, I discovered the EnhancedSC project β a community patch for Splinter Cell 1 on PC which fixes bugs, adds gameplay improvements, and has folks who certainly know the game engine better than me. I joined their Discord and asked if anyone knew about this format and they said that itβs been a dead end for anyone whoβs bothered.
They were quite interested though in any progress achieved as they want to port some content from the Xbox versions of the games to PC. Through this community I got some great help with various theories, ideas, and introduced to tooling like UE-Explorer.
After spending about a week on static recompilation I didnβt want to spend even more time investing in getting things dumped only to hit a hard wall. For example discovering that the files were wildly different than expected, wouldnβt work on PC, or wouldnβt work with UE Explorer. I needed to dump something.
The game can obviously read the data fine. The thought came to me that perhaps I could just dump the data into some crappy format after itβs read that makes piecing it back together easy.
While doing static analysis I came across a function that was very peculiar to me. I identified the ULinkerLoad function mentioned earlier by searching for the Unreal Package file magic (highlighted below), and found the following function:
As expected, the file magic is checked against whatβs read from disk. But thereβs another result for the magic in a different function that is setting some structureβs field to the magic:
And what is the purpose of this code? As it turns out, user game saves are just Unreal Objects serialized in the same format β sans compression and other oddities that go along with it!
# Patching OG Xbox Binaries
In order to do interesting things, we need to run our own code alongside the game. Debugger scripts are simply too slow and unreliable, so we need something running in the emulator or on a physical device. Itβd also be cool if I could write a QEMU plugin for the emulator... but thatβs another rabbit hole.
Injecting code into a game on Windows or Unix is easy. You can CreateRemoteThread() or DLL hijack on Windows, and on Unix use LD_PRELOAD. On Xbox 360 you can βinjectβ persistent DLLs. On original Xbox, you have one process with (as far as I know), no DLLs.
This could probably be a blog post on its own since modern information is pretty scarce (RIP XboxHacker.org), but there are at least two tools I know of that can be used to manipulate original Xbox executables.
- The Python library pyxbe
- The CLI tool XboxImageExploder
Both of these tools allow you to add a new section to an executable and basically create a code cave that you can use for placing additional code or data. When the system loads the image, it maps that newly added section with the appropriate permissions. You then need to patch some place in the original executable so that your code runs.
Using XboxImageExploder and XePatcher I was able to write a patch which calls the serialization routine on an object after it gets loaded into memory.
tl;dr of the patch:
- Define a hook point at the end of the
LoadMap()function. This definition will cause XePatcher to write these instructions that jump execution toHack_LoadMapat the declared file offset. Hack_LoadMapcallsHack_DumpAllLinkersand does the standard epilogue cleanup forLoadMap()which wonβt be executed since we hijacked executionHack_DumpAllLinkersiterates a global list ofLinkerobjects and callsHack_DumpFilewith that linker as an argument.Hack_DumpFileensures that the output directory for the givenLinkerfile is created, then calls the game-provided function which serializes theLinkerto that path. For example, the..\System\Engine.ulinker file from thecommon.linfile will be written toz:\System\Engine.u.
;---------------------------------------------------------
; At the very end of the LoadMap() routine
;---------------------------------------------------------
; file offset, not a VA
dd 73698h
dd (_load_map_return_end - _load_map_return_start)
_load_map_return_start:
; Jump to our detour function
push esi
mov eax, Hack_LoadMap
jmp eax
_load_map_return_end:
_Hack_LoadMapCalled:
dd 0
_Hack_LoadMap:
mov eax, Hack_DumpAllLinkers
call eax
mov eax, Hack_LoadMapCalled
mov dword [eax], 1
_load_map_restore_registers:
; return value that we clobbered in the
; hook
pop eax
; Since we patched in the prologue, we will just
; do the register restore ourselves
pop edi
pop esi
pop ebx
mov esp, ebp
pop ebp
retn 8
_Hack_DumpAllLinkers:
push ebx
push esi
%define g_ObjectLinkers 0033c42ch
; Load the linker count
mov ebx, [g_ObjectLinkers + 4]
test ebx, ebx
jz _dump_all_linkers_restore_registers
; esi will be our index
mov esi, 0
_dump_all_linkers_linker_loop_start:
cmp esi, ebx
jz _dump_all_linkers_linker_loop_finish
; Iterate the linkers
mov eax, [g_ObjectLinkers]
mov ecx, esi
imul ecx, 4
add eax, ecx
mov eax, [eax]
push eax
mov ecx, Hack_DumpFile
call ecx
add esp, (4 * 1)
_dump_all_linkers_linker_loop_end:
inc esi
jmp _dump_all_linkers_linker_loop_start
_dump_all_linkers_linker_loop_finish:
_dump_all_linkers_restore_registers:
pop esi
pop ebx
ret
_Hack_DumpFile:
; Load the argument representing the
; object that's being saved
mov eax, [esp + 4]
; Save registers
push edi
push esi
push ebx
mov edi, eax
_dump_file_do_dump:
; Iterate the object's exports and save their flags
; ==== NOT USED
; Grab the export data pointer
;mov ecx, [edi + 0x88]
; Grab the number of exports
;mov ebx, [edi + 0x8C]
; ==== NOT USED
; Allocate space for the file path
sub esp, 0x200
; Grab the linker's filename
mov eax, [edi + 0x98]
; Put the input filename in esi
mov esi, eax
; If the input filename is empty, jump to the cleanup routine
; since this is not a file that's in the packed .lin
cmp word [eax], 0
jz _Hack_DumpFile_Done
;===== DIRECTORY CREATION
; The file path is located at the beginning of the stack
mov ebx, esp
; Set the filename on the stack to `z:`
; This has to be a char*, not a wchar_t*
mov byte [esp], 'z'
mov byte [esp + 1], ':'
; This will hold our position in the path we're building
mov ebx, 0
_Hack_DumpFile_File_Directory:
; We are looking for a backslash
; this is wchar_t `\`
push 0x005c
; Grab the position of the last backslash for the
; input file
push esi
mov eax, appStrchr
call eax
add esp, (4 * 2)
; Not found
test eax, eax
jz _Hack_DumpFile_Directory_Finish
; We found a slash -- check if we've discarded the first
; bit of data before the slash (it's expected to start
; with "..\" )
test ebx, ebx
jnz _Hack_DumpFile_File_Directory_Create_Directory
; Update ebx to point to the first slash so we can use it
; for later copying.
mov ebx, eax
jmp _hack_dumpfile_directory_end
_Hack_DumpFile_File_Directory_Create_Directory:
; Skip the Z: part for the dest file path
lea ecx, [esp + 2]
push edx
push esi
; Start of the linker's file path
mov esi, ebx
; Copy from ebx to eax
_hack_dump_file_copy_directory_loop:
cmp esi, eax
je _hack_dump_file_copy_directory_loop_finish
mov dl, [esi]
mov [ecx], dl
inc ecx
; we're doing some janky wchar_t to char
; conversion tricks
add esi, 2
jmp _hack_dump_file_copy_directory_loop
_hack_dump_file_copy_directory_loop_finish:
; Add null terminator
mov byte [ecx], 0
pop esi
pop edx
mov ecx, esp
; Make sure we don't clobber eax
push eax
; Attributes
push 0x0
; Create this directory
push ecx
mov ecx, CreateDirectory
call ecx
; cdecl function, it cleans up
pop eax
_hack_dumpfile_directory_end:
; Save the position
lea esi, [eax + 2]
jmp _Hack_DumpFile_File_Directory
_Hack_DumpFile_Directory_Finish:
; Set the file path we want to copy
mov esi, ebx
;===== FILE CREATION
; The file path is located at the beginning of the stack
mov ebx, esp
; Set the start of VeryLongString to `Z:`
push ZDrive
push ebx
mov eax, wstrcpy
call eax
add esp, (4 * 2)
; Set the copy target to the bytes immediatley
; following `z:`, so the result should be
; `z:\filename`
lea eax, [ebx + 4]
; Copy the filename to the path buffer
push esi
; Set ESI to the full file path for later use