Reverse Engineering the Sierra Adventure Game Interpreter - Part 2

Continuing from my last post where I wrote a parser for the vocabulary portion of the Sierra AGI engine, I wrote a parser for the OBJECT file, which contains all the inventory items for the game.

The community-driven documentation gives this description of the format of this file

The object file stores two bits of information about the inventory items used in an AGI game. The starting room location and the name of the inventory item. It also has a byte that determines the maximum number of animated objects.

The file encryption
The first obstacle to overcome is the fact that most object files are encrypted. I say most because some of the earlier AGI games were not, in which case you can skip to the next section. Those that are encrypted are done so with the string Avis Durgan (or, in case of AGDS games, Alex Simkin). The process of unencrypting the file is to simply taken every eleven bytes from the file and XOR each element of those eleven bytes with the corresponding element in the string Avis Durgan. This sort of encryption is very easy to crack if you know what you are doing and is simply meant to act as a shield so as not to encourage cheating. In some games, however, the object names are clearly visible in the saved game files even when the object file is encrypted, so it’s not a very effective shield.

File format

Byte  Meaning
----- -----------------------------------------------------------
 0-1  Offset of the start of inventory item names
  2   Maximum number of animated objects
----- -----------------------------------------------------------

Following the first three bytes as a section containing a three byte entry for each inventory item all of which conform to the following format:

Byte  Meaning
----- -----------------------------------------------------------
 0-1  Offset of inventory item name i
  2   Starting room number for inventory item i or 255 carried
----- -----------------------------------------------------------

where i is the entry number starting at 0. All offsets are taken from the start of entry for inventory item 0 (not the start of the file). Then comes the textual names themselves. This is simply a list of NULL terminated strings. The offsets mentioned in the above section point to the first character in the string and the last character is the one before the 0x00.

The first obstacle to overcome is the encryption - or at least establishing whether or not encryption was used. A quick sanity check was to simply load the first 16 bytes and see the value of the offset provided. (Note that for the OBJECT file the byte code convention is little-endian as opposed to big-endian which was used for the WORDS.TOK file). Since the offset value was much higher than the offset values in the WORDS.TOK I concluded that the file itself was ‘encrypted’.

I realized that I was starting to duplicate a bunch of logic between the two parsers so I wrote a File object which could provide convenience methods for reading and unpacking bytes, as well as provide decryption.

module ExtractAgi
  class File
    class << self
      def open(file_path, symmetric_encryption_key: nil)
        if symmetric_encryption_key.nil?
          yield new(StringIO.new(::File.binread(file_path)))
        else
          cleartext = String.new
          ::File.open(file_path, 'rb') do |file|
            encryption_key_index = 0

            until file.eof?
              byte = file.read(1).unpack1(UNSIGNED_EIGHT_BIT)
              mutated = byte ^ symmetric_encryption_key[encryption_key_index].ord
              encryption_key_index = (encryption_key_index + 1) % symmetric_encryption_key.length
              cleartext << [mutated].pack(UNSIGNED_EIGHT_BIT)
            end
          end

          yield new(StringIO.new(cleartext))
        end
      end
    end

    UNSIGNED_EIGHT_BIT = 'C'
    BIG_ENDIAN_UNSIGNED_SIXTEEN_BIT = 'n'
    LITTLE_ENDIAN_UNSIGNED_SIXTEEN_BIT = 'v'

    private_constant :UNSIGNED_EIGHT_BIT, :BIG_ENDIAN_UNSIGNED_SIXTEEN_BIT, :LITTLE_ENDIAN_UNSIGNED_SIXTEEN_BIT

    def initialize(io)
      @io = io
    end

    def seek(...)
      @io.seek(...)
    end

    def read_u8
      @io.read(1)&.unpack1(UNSIGNED_EIGHT_BIT)
    end

    def read_u16be
      @io.read(2)&.unpack1(BIG_ENDIAN_UNSIGNED_SIXTEEN_BIT)
    end

    def read_u16le
      @io.read(2)&.unpack1(LITTLE_ENDIAN_UNSIGNED_SIXTEEN_BIT)
    end
  end
end

Now I could easily decrypt the file, seek to the provided offset, and print out all the object names.

module ObjectsParser
  SYMMETRIC_ENCRYPTION_KEY = 'Avis Durgan'

  class << self
    def parse_objects(file_path)
      ExtractAgi::File.open(file_path, symmetric_encryption_key: SYMMETRIC_ENCRYPTION_KEY) do |file|
        file.seek(file.read_u16le, IO::SEEK_SET)

        word = String.new
        loop do
          break unless (byte = file.read_u8)

          if byte.zero?
            puts word
            word = String.new
          else
            word << byte.chr
          end
        end
      end
    end
  end
end

I ran the script against the OBJECT file from KQ1, and this is the output it gave me:

B
?
dagger
chest
carrot
gold walnut
key
note
magic ring
fourleaf clover
ceramic bowl
full bowl
water bucket
full bucket
pebbles
leather slingshot
pouch of diamonds
pouch
sceptre
cheese
magic mirror
gold egg
shield
fiddle
walnut
mushroom
beans
water

So I’m seeing some expected text, but some garbage at the start (with some extra invisible bytes in there). I suspected the offset provided at the start of the file needs to be combined with another offset as the documentation mentions.

The file appears to be structured as follows:

  • The first 2 bytes contain the offset (starting point) of all the names of the inventory objects
  • The next byte contains the maximum number of animated objects, I have no idea what that means
  • The next section - so from byte 4 through the starting point of the inventory objects - contains 2 bytes for the offset to the inventory object name, followed by 1 byte for the starting room number for the inventory object
  • The last section is all the inventory object names, separated by a byte with the value 0
def parse_objects(file_path)
  ExtractAgi::File.open(file_path, symmetric_encryption_key: SYMMETRIC_ENCRYPTION_KEY) do |file|
    offset_to_names = file.read_u16le
    max_animated_objects = file.read_u8

    (3..offset_to_names).step(3).each do |offset|
      file.seek(offset, IO::SEEK_SET)

      name_offset = file.read_u16le
      starting_room = file.read_u8

      file.seek(name_offset + 3, IO::SEEK_SET)
      word = String.new
      loop do
        byte = file.read_u8

        break if byte.zero?

        word << byte.chr
      end
      puts "object: #{offset / 3 - 1}, name: #{word}, starting_room: #{starting_room}"
    end
  end
end

This gave me the following output for KQ1:

object: 0, name: ?, starting_room: 0
object: 1, name: dagger, starting_room: 0
object: 2, name: chest, starting_room: 0
object: 3, name: carrot, starting_room: 0
object: 4, name: gold walnut, starting_room: 0
object: 5, name: key, starting_room: 0
object: 6, name: note, starting_room: 0
object: 7, name: magic ring, starting_room: 0
object: 8, name: fourleaf clover, starting_room: 0
object: 9, name: ceramic bowl, starting_room: 0
object: 10, name: full bowl, starting_room: 0
object: 11, name: water bucket, starting_room: 0
object: 12, name: full bucket, starting_room: 0
object: 13, name: pebbles, starting_room: 0
object: 14, name: leather slingshot, starting_room: 0
object: 15, name: pouch of diamonds, starting_room: 0
object: 16, name: pouch, starting_room: 0
object: 17, name: sceptre, starting_room: 0
object: 18, name: cheese, starting_room: 0
object: 19, name: magic mirror, starting_room: 0
object: 20, name: gold egg, starting_room: 0
object: 21, name: shield, starting_room: 0
object: 22, name: fiddle, starting_room: 0
object: 23, name: walnut, starting_room: 0
object: 24, name: mushroom, starting_room: 0
object: 25, name: beans, starting_room: 0
object: 26, name: water, starting_room: 0

This looks pretty close, but it still feels like the implementation is incomplete. I’m pretty sure there is no item named ? in the game - I suspect the bytes 4 through 6 are actually reserved or used for something else - or it could just be a bug in the actual data file for the game that has gone unnoticed or ignored all these years. If I just print out the value of the first 6 bytes this is what I get:

81
0
17
81
0
0

So given that we’re using little-endian byte code convention, to me this says

  1. The offset to the inventory object names is 81
  2. The maximum number of animated objects is 17
  3. The name for the first object is located at 81, and the starting room for that object is 0 (or unused in this specific game)

If I print out all the bytes starting at 81 until a find a zero byte, I get:

66
1

So something is definitely off. I don’t seem to be the only one with this problem, since the other solutions I have found also have this issue. I’m going to leave it for now and move on to some of the other files, but hopefully I will get clarity at some point. As before, the code is available on Github.