Loads
Interchange File Format, which was already an archaic file format by the release of The Sims (EA designed it back in 85).
Maxis used it for The Sims to store objects and related data. At that point, it was sketchy, at best. The Sims was originally developed on Macs, meaning that the final format (if there ever was a final...) version had a a multitude of cases where it couldn't decide if fields were supposed to be big endian and little endian.
Now, by the time that The Sims Online came along, the format had developed into a monstrosity of epic proportions, as is evident from the loading code. After... five or six expansion packs, where objects clearly needed to be developed in a hurry, their already sketchy use of the format... well, let's take a look, shall we?
- Most IFFs should contain a RSMP chunk (ReSourceMaP), indicating where the other chunks are located in the file. But all IFFs don't. I don't know why the hell they figured that scanning all headers to find the chunk you're looking for would be faster... go figure!
- Looking up a chunk doesn't go much faster when you take into account that all chunks are uniquely identified by a name string (!), and a file may contain multiple chunks of the same type. These name strings were apparently a source of endless debate in developer meetings, so eventually it was decided to just assume that all name strings had to be 64 characters, and to zero terminate them if they happened to be shorter.
Interestingly, this invention was added in a version after the initial one, which just referenced chunks by their ChunkID - which would have been fine.
The Sims Online obviously uses both versions...
- In theory, data inside chunks should be little endian, and the container itself should be big endian. But there are at least a couple of cases where the chunks can be either/or, and so this must be determined before reading.
- Sometimes, a RSMP chunk can be empty (ooops!), meaning that if you write your reader to read the RSMP - hoping it will be faster - you'll have to back up to where you originally started off and proceed to scan all headers.
- Sometimes, a chunk's typecode can be empty, and... well, I'm not sure my code handles this case yet, because I'm not sure there's any logical way to determine the type of the chunk, except for guesstimating by trying to read all chunks and seeing if the data makes sense.
- Several of the chunks have at least two versions of their own, where fields may (or may not - remember, inconsistency!) exist based on the version. My favorite (though untypical) example of this is the STR# chunk, listed below. Just the fact that they decided they needed more than one way to represent an object's description string is mindboggling, let alone coming up with five of them!
Code:
STR#
This chunk type holds text strings.
The first two bytes correspond to the format code, of which there are four types. Some chunks in the game do not specify any data after the version number, so be sure to implement bounds checking.
Format: 00 00 (0)
Number of strings - A 2-byte unsigned integer specifying the number of strings contained in this chunk
Strings - As many Pascal strings as defined by the previous field
Format: FF FF (−1)
This format changed 00 00 to use C strings rather than Pascal strings.
Number of strings - A 2-byte unsigned integer specifying the number of strings contained in this chunk
Strings - As many null-terminated strings as defined by the previous field
Format: FE FF (−2)
This format changed FF FF to use string pairs rather than single strings.
Number of string pairs - A 2-byte unsigned integer specifying the number of string pairs contained in this chunk
String pairs - As many pairs of two null-terminated strings as defined by the previous field. The second string directly follows the first. Usually, the first string of the pair is the data and the second string is a comment. In most cases, the comment string is empty, so it looks like the main string is terminated with two null characters.
Format: FD FF (−3)
This format changed FD FF to use a language code.
Number of string pairs - A 2-byte unsigned integer specifying the number of string pairs contained in this chunk
String pairs - As many pairs of two null-terminated strings as defined by the previous field, each pair preceded by a 1-byte language code. The second string directly follows the first. Usually, the first string of the pair is the data and the second string is a comment. In most cases, the comment string is empty, so it looks like the main string is terminated with two null characters.
Format: FC FF (−4)
This format is only used in The Sims Online. The format is essentially a performance improvement: it counteracts both the short string limit of 255 characters found in 00 00 and the inherent slowness of null-terminated strings in the other formats (which requires two passes over each string), and it also provides a string pair count for each language set which eliminates the need for two passes over each language set.
Number of language sets - A 1-byte unsigned integer specifying the number of sets contained in this chunk pertaining to each supported language, in order of the Language codes table; this number is invariably twenty. Even strings that aren't translated specify 20 language sets; the ones which are not translated remain empty.
Language sets - As many language sets as defined by the previous field. For each language set:
Number of string pairs - A 2-byte unsigned integer specifying the number of string pairs contained in this language set
String pairs - As many pairs of two Pascal-like strings as defined by the previous field, each pair preceded by a 1-byte Language set index (which is equivalent to the language code minus 1). The second string directly follows the first. Usually, the first string of the pair is the data and the second string is a comment. The strings are formatted with either 1 or 2 length bytes; if the first byte's most significant bit (msb) is not set, the value is used as is and there is no second byte; if it is set, the first byte is used in conjunction with the second in little-endian order excluding the first byte's msb. Thus, a string with 1 length byte may be 0 to 127 characters long, while a string with 2 length bytes may 0 to 32,767 characters long.