SimAntics Decompiling

Discussion in 'Programming' started by Cytlan, Mar 8, 2016.

  1. Cytlan

    Cytlan New Member

    So I've been playing around with reverse engineering various formats used in The Sims, specifically the Arry formats found in the Housexx.iff files, in the hopes of being able to create a house renderer in Javascript.
    I got a little side tracked and ended up working on a decompiler for SimAntics.
    While it's far from finished (it's barely even usable, and the decompiled output cannot even be recompiled yet), I thought I'd share a little bit of what I've been working on so far and maybe get some feedback.

    As we all know, SimAntics is a visual scripting language based on a tree structure. This doesn't easily translate into text-based code without some serious decompiler-fu. I am however able to generate the following output for 4100.BHAV from the base game (which I believe is found in all Userxxxxx.iff files) with my decompiler in its current state:
    #    Binary                                Path:#/Branch  Code
    00 - 00 02 01 FD 14 00 05 00 00 05 12 07 | 0:0          : my.person.20 = 5
    01 - 00 02 02 FD 00 00 03 00 00 05 19 07 | 0:1          : local.0 = 3
    02 - 00 02 03 FD 01 00 03 00 00 05 19 07 | 0:2          : local.1 = 3
    03 - 00 02 04 FD 02 00 03 00 00 05 19 07 | 0:3          : local.2 = 3
    04 - 00 02 05 FD 03 00 01 00 00 05 19 07 | 0:4          : local.3 = 1
    05 - 00 02 06 FD 00 00 2E 00 00 05 08 07 | 0:5          : temp.0 = 46
    06 - 00 08 07 FD 00 00 09 00 0B 00 07 00 | 0:6          : param.0 = rand(11)
    07 - 00 02 08 09 00 00 00 00 00 02 09 07 | 0:7 / 1:0    : if(param.0 != 0) goto 1:0
    08 - 00 02 0B 06 03 00 01 00 00 02 19 07 | 0:8 / 0:6    : if(local.3 != 1) goto 0:6
    0B - 00 02 12 FD 03 00 00 00 00 05 19 07 | 0:9          : local.3 = 0
    12 - 00 02 13 FD 00 00 00 00 00 05 1E 09 | 0:10         : my.person[temp.0] = param.0
    13 - 00 02 14 FD 00 00 01 00 00 03 08 07 | 0:11         : temp.0 += 1
    14 - 00 02 FE 06 00 00 38 00 00 02 08 07 | 0:12 / true  : if(temp.0 == 56) goto true
                                                            : goto 0:6
    #    Binary                                Path:#/Branch  Code
    09 - 00 02 0C 0A 00 00 04 00 00 01 09 07 | 1:0 / 2:0    : if(param.0 < 4) goto 2:0
    0A - 00 02 0D 0E 00 00 07 00 00 01 09 07 | 1:1 / 3:0    : if(param.0 > 7) goto 3:0
    0D - 00 02 10 06 01 00 00 00 00 00 19 07 | 1:2 / 0:6    : if(local.1 < 0) goto 0:6
    10 - 00 02 12 FD 01 00 01 00 00 04 19 07 | 1:3          : local.1 -= 1
                                                            : goto 0:10
    #    Binary                                Path:#/Branch  Code
    0C - 00 02 0F 06 02 00 00 00 00 00 19 07 | 2:0 / 0:6    : if(local.2 < 0) goto 0:6
    0F - 00 02 12 FD 02 00 01 00 00 04 19 07 | 2:1          : local.2 -= 1
                                                            : goto 0:10
    #    Binary                                Path:#/Branch  Code
    0E - 00 02 11 06 00 00 00 00 00 00 19 07 | 3:0 / 0:6    : if(local.0 < 0) goto 0:6
    11 - 00 02 12 FD 00 00 01 00 00 04 19 07 | 3:1          : local.0 -= 1
                                                            : goto 0:10
    The code above might not be the easiest to follow, but I'm hoping that I'll be able to make my decompiler produce something like this:
    my.person.20 = 5
    local.0 = 3
    local.1 = 3
    local.2 = 3
    local.3 = 1
    temp.0 = 46
    while(temp.0 != 56)
        param.0 = rand(11)
        if(param.0 == 0)
            if(local.3 != 1) continue
            local.3 = 0
        if(param.0 < 4)
            if(local.2 < 0) continue
            local.2 -= 1
        else if(param.0 > 7)
            if(local.0 < 0) continue
            local.0 -= 1
            if(local.1 < 0) continue
            local.1 -= 1
        my.person[temp.0] = param.0
        temp.0 += 1
    return true
    It should be fairly easy to understand what this BHAV does from the code above:
    my.person[temp.0] writes to the person data fields from 46 to 56, which we can see from Behavior.iff are the interest fields.
    It randomizes the interests for a sim, with 3 interests being more than 7, 3 interests being between 7 and 4 (inclusive), 3 interests being lower than 4, and 1 interest being 0.
    We can also see that it also writes to the two person data fields labeled "Unused & Do NOT Use" after the interest fields (which suggests that these two fields used to be additional interests that was removed from the base game)

    I don't know of any other projects like this, so I thought it'd be of interest (pun intended).
    What do you guys think? Is this a worthwhile cause, anything you'd like to see, or should I just make an Edith clone in JS instead?
    Personally I prefer text-based scripting, instead of the visual scripting done with Edith, which is what encouraged me to carry on with this project.
    zero35, Fatbag, zc456 and 2 others like this.
  2. RHY3756547

    RHY3756547 FreeSO Developer Staff Member Moderator

    Really nice work! I've been thinking of writing a kind of sequence block analyser like this as the starting point for a SimAntics->C# JIT, though I haven't had the time to get around to it yet. I've also been thinking about the idea of a text based language based around SimAntics too, and this seems like a great starting point.

    I think the main benefit for a text based representation like this would not be decompiling existing code (some things are really branch heavy - wouldn't look ok unless you automatically regenerated a "switch" statement for them) but writing/compiling it. Right now you get a lot of annoying expression chains like this (gameball.iff::Compute Random Direction /w Body):


    That could be represented as:

    if ((Rand(100) + (PersonData.BodySkill * BODY_ACCURACY_FACTOR)/1000) < Param.PercentChanceE1) {
        if (Rand(2) == 0) Local.DirectionMult := 1
        else Local.DirectionMult := -1
    } else {
        Temp[0] = Local.BaseDirection
        return true;
    Which allows much more freedom for straight computation, something that you notice Maxis objects avoid as much as possible.

    However this would be a bitch to decompile to. One idea would be to let people write in this text stucture, then let it be saved out to the IFF alongsidethe compiled BHAVs as like, TBHV or something for future editing.

    One issue with a language like this is parameter names. You notice that maxis objects really like using spaces in all of their variable names, so you would either have to surround them [Like This] or automatically change to a target format Like_This or LIKE_THIS for constants. You could always drop them entirely, but that makes everyone's lives more difficult.

    I'd be interested to see more on this. How far did you get with the Arry chunk format? I was thinking of reading them when we backport to TS1 to at least keep the structure of the starting houses, if not the state.
    zc456 and Cytlan like this.
  3. Cytlan

    Cytlan New Member

    My main idea for decompiling isn't editing the Maxis objects, but rather for analysis, dynamic recompiling and repackaging. I have previously worked on a 6502 to JS recompiler, so that's what I had in mind when starting out. A compiler should be able to compile into both native code and SimAntics, so starting with the decompiler was just my way of getting familiar with the engine and the requirements of a text language.
    Those chains of expressions is exactly why I'm not a huge way of the visual approach, and probably is one of the main reasons why Maxis objects aren't very complicated in terms of calculation.

    The labels... Yeah, we'd have to agree on new names for quite a lot of them. "IsAllowedSocialAndPuppeteering" isn't exactly an acceptable name, and I have no good alternatives to it (which is why I simply referenced it by ID in my output). I don't think keeping the Maxis labels as-is is useful.

    I haven't gotten terribly far on the Arry formats. If you think the SPR RLE-ish format is a pain, wait until you see some of the Arry's!
    I've reversed the Arry format used by the pools and floors quite a bit, but I'm still trying to work out how the more complicated pools and floor patterns are encoded. The format works kind of like a brush, with each entry in the Arry specifying a location or an offset, and is terminated by XOR'ing the last location where a pool/tile was put. I've also gotten rather far on the objects Arry, the OBJM chunk and the objt chunks, which is a really complicated mess of cross-refrencing.
    I'll try to summarize my notes and see if I cannot post some useful information regarding those chunks soon.

    I don't know if you'd be able to port TSO lots to TS1, as TS1 lots are restricted to 63*63 tiles in the Arry format, but the other way around should be fine.

    I don't know a whole lot about TSO in general. I never played it, so my attention has been mainly focused on TS1.
  4. RHY3756547

    RHY3756547 FreeSO Developer Staff Member Moderator

    If you can work out a way to decompile expressions into the more complicated format, that would be mind-blowing. I can agree that most of the names are terrible, though for the ones with spaces in them I was mainly talking about Locals, Parameters (specified by TPRP) and BHAV names.

    Backport to TS1 meaning, fork the project into a "FreeS1" kind of deal. Our lots are also 63x63, but we count the 0th row and column, so it's really 64x64.
    I can imagine. Our custom house-save-state format used for network synchronisation is set up so that all objects have to be created before any of their Threads and Object ID cross references can be resolved. I'm not really interested in the object thread format, but the creators of the experimental "TSO-SE", @Fatbag and @Blayer98 might be interested.
  5. francot514

    francot514 Well-Known Member

    Good work for this, Are you going deepth into array format??, i was also intersted in know if there is a way to convert TS1 houses to blueprint xml data, you think is possible??
    Also have you worked out with other specific chunks like NBGH, NGBRS (The neighborhood sims data and statistics)???
  6. Cytlan

    Cytlan New Member

    Thanks! Yes, the plan is to be able to decode and encode all Arry formats, write up some documentation on it, and have encoding/decoding modules implemented in JS. For the floor and pools, I'll just decode it into a 63*63 2d array. If I will be able to get that far before I lose interest is another story all together, but that's the plan at least.

    I say 63*63, because if memory serves me right, either the 0*x and x*0 coordinates, or the 63*x and x*63 coordinates had a special meaning to them in the Arry_3 chunk. I'm quite forgetful, and I don't have my notes here right now.

    You'll have to excuse my ignorance though, but what blueprint XML format are you referring to? My guess is that it should be possible without too much difficulty to convert it to any kind of format you want once all the chunks have properly been decoded. If you could point me to a spec or reference implementation, I'll take it into consideration for the tool I'm writing.

    I've only focused on lot-specific chunks so far, so no neighbourhood or Sim chunks have been worked on yet.

    I think I probably should mention that the pipe-dream is to have a full reimplementation of TS1 in Javascript in the browser. That will of course bring a lot of challenges with regards to filesize, which is why it's necessary for me to create tools to automatically repackage all the assets into a more lightweight and compressed format (gzip everything, discard unnecessary data such as medium and far zoom sprites, re-encode images into PNGs or, ugh, JPGs, etc. etc.)
    Don't take this as any kind of announcement, though (Let's be real; I'll probably never get that far before the disinterest kicks in!), but it should give you a general idea of what I have been and will be working on for the next few months.
    Last edited: Mar 8, 2016
  7. LetsRaceBwoi

    LetsRaceBwoi Well-Known Member

    Rhys did something rather similar a few years ago, although it was only CAS, SAS and (I think?) city view from TSO.

    If I could find a link, I'd post it :p
  8. Cytlan

    Cytlan New Member

    I saw that. Pretty cool! But it did require a 1.5GB download. I'd like to rather stream only the data needed, as well as have the server only transfer repackaged data to the client. Of course, this setup would require all servers to be kept private, else it'd be blatant piracy.
    But that's just what I want to do, not what I'm going to do, so take it with a barrel of salt ;)
    What I'm going to do is decode the remaining TS1 chunks, as well as continue working on decompiling and recompiling SimAntics, and only the latter is somewhat useful to FreeSO.
    LetsRaceBwoi likes this.
  9. francot514

    francot514 Well-Known Member

    No problem, the blueprint xml is the format used for offline houses in TSO, you can check the document reference about it here:
    Cytlan likes this.
  10. RHY3756547

    RHY3756547 FreeSO Developer Staff Member Moderator

    I'm not sure converting to blueprint would be too productive for the FSO engine - it's likely possible to just read them in directly using a custom lot activator similar to the blueprint one. So long as the format doesn't depend on a ton of things we're not emulating correctly, you might be able to read straight threads too.
  11. Cytlan

    Cytlan New Member

    Just thought I'd share a status update:
    Been focusing on Arry(3) since my last post, and I've made some progress. I'm still struggling to calculate some Y offsets properly, however.
    Each entry contains info on where to put the next object (Yes, next. Not the current object, but the next one in the list. Why? I have no idea.), and there's some logic deciding when to increment the Y position that I still haven't completely nailed yet.
    The way I'm approaching this problem is through 100% black-box reverse engineering, so it can be difficult to figure out why stuff happens sometimes.
    Hopefully I'll figure it out soon!

    I think it would be neat for FreeSO if people could import their old TS1 houses or something.
    RHY3756547 likes this.
  12. RHY3756547

    RHY3756547 FreeSO Developer Staff Member Moderator

    That's what I'm planning support for, especially for making an open source TS1 in future. Wouldn't be much fun to have to recreate the entire TS1 neighbourhood in our engine.

    TSO itself probably won't have any house import, as you have to earn the money to build the house in the first place. Not sure if you know this, but it's helpful to remember that TS1/TSO positions objects in 16th tiles, which allows subtile movement for avatars and things like roaches.
  13. Fatbag

    Fatbag Member

    Hey Cytlan, I share your opinion that I'm not that comfortable or efficient using Edith's box-and-arrow layout, and I'd prefer using a text disassembler like we get with Edith's "File -> Export all behaviors" functionality.

    However, the main problem with Edith's text disassembly is that the nodes in each subroutine are sorted by their node ID, which is essentially random. So the result is that the code jumps all over the place. What would be nice is if we could rearrange all of the instructions to minimize jumping around.

    For example, the React function in behaviorsflamingo.txt is as follows:
    (React).000 Play Sound Event (flamingo_consider_vox) true:1
    (React).001 Animate Sim (id 2 from object, name: a2o-consider) true:7 false:1
    (React).002 Play Sound Event (flamingo_approve_vox) true:3
    (React).003 Animate Sim (id 3 from object, name: a2o-approve) true:true false:3
    (React).004 Play Sound Event (flamingo_shrug_vox) true:5
    (React).005 Animate Sim (id 5 from object, name: a2o-shrug) true:true false:5
    (React).006 Animate Sim (id 4 from object, name: a2o-disapprove) true:true false:6
    (React).007 my person data Playful Personality > 600 true:2 false:8
    (React).008 my person data Playful Personality > 300 true:4 false:9
    (React).009 Play Sound Event (flamingo_disapprove_vox) true:6
    This can be better ordered like such:
    (React).000 Play Sound Event (flamingo_consider_vox) true:1
    (React).001 Animate Sim (id 2 from object, name: a2o-consider) true:7 false:1
    (React).007 my person data Playful Personality > 600 true:2 false:8
    (React).002 Play Sound Event (flamingo_approve_vox) true:3
    (React).003 Animate Sim (id 3 from object, name: a2o-approve) true:true false:3
    (React).008 my person data Playful Personality > 300 true:4 false:9
    (React).004 Play Sound Event (flamingo_shrug_vox) true:5
    (React).005 Animate Sim (id 5 from object, name: a2o-shrug) true:true false:5
    (React).009 Play Sound Event (flamingo_disapprove_vox) true:6
    (React).006 Animate Sim (id 4 from object, name: a2o-disapprove) true:true false:6
    Here, we assume that the true branch is always more likely to be taken than the false branch, and so we place the true branch first and the false branch second.

    We want to rearrange all of the instructions so as to minimize the "jumping around". In formal terms, the set of instructions forms a directed cyclic graph, and we want to determine the minimal feedback arc set over that graph. (Well, jumping down is a bad thing too, actually...)

    You might be interested in adding this as a preprocessing step (or postprocessing) for your decompiler.
    Last edited: Mar 14, 2016
  14. francot514

    francot514 Well-Known Member

    Great progress keep it work up, i will also interested in helping you testing some stuff, if you need some help with that..
  15. RHY3756547

    RHY3756547 FreeSO Developer Staff Member Moderator

    I'm not entirely convinced simply re-ordering the primitives will do any good - there are simply too many branches in complex SimAntics trees to reasonably follow the numbers. Maybe splitting them into "blocks" like he's already doing would be a good representation, with obviously the full decompilation being the most effective representation for improving understanding (imo, just ahead of visual style - it is very easy when everything is named, laid out and explained correctly)

    Here's an extreme example of a tree that abuses jumps entirely, and may be hard to follow without the visual display or a full disassembly:

    Because you need to return to Animate Sim primitives after handling events, the tree makes a lot of cyclic and arbitrary jumps. You'll notice that it jumps back to a switch which checks the object type again - eventually getting to the right primitive to continue the animation that was in progress. The problem is that this code has had 3 separate handlers shoehorned into it which have to be checked depending on the object type, and that lots of primitives want to go to the same standard exit procedure on the right there. This has been manually rearranged by me too - the default "tree" style arrangement did not capture the execution well at all:

    Nothing I love more than a bunch of overlapping arrows. I really need to make arrows force draw under primitives (and partially visible through them) so we don't get ugly overlapping like this..

    To clean this up a bit, I was thinking it might be useful to generate "labels", like the original edith supported, for areas which have many incoming pointers, and simply "goto" them in a disconencted fashion to avoid cluttering the node graph with overlapping arrows. Specifically areas like that first "Test Object Type" with 7 incoming arrows, and the standard exit which is called from all over the place. This tree in particular could definitely be reduced to a while loop with a ton of break;s for a text based representation... I'd like to see the results of that, as with the correct handling trees like this could translate very well to a normal control flow language.

    This wasn't even really cherry picked - this was a function I just opened today when I was testing out the Play Sound Event operand editor. There are definitely trees much more complicated.
    pisarz1958 likes this.
  16. Cytlan

    Cytlan New Member

    This is exactly right. In my code, I completely disregard the order of the instructions in the binary when considering the program flow (and as a result, IDs are only used as pointers), because it doesn't represent the program flow in any meaningful way.

    There's no guarantee the true branch is any more likely to be followed than the false branch, but yes, the first thing the decompiler does is considering all branches and selecting the path with the most instructions as the main execution path. Granted, this can get very slow very fast with larger snippets, but given that one BHAV cannot be larger than 253 instructions, I'm not particularly worried. Especially when using this approach we can get a fairly good idea of where the code starts and where it loops.

    Thanks! There's no need for any testing yet, though. What I need is help cracking the X/Y data of each Arry entry. The LSB of the data field denotes wether the position is an absolute or relative position, and there's something funky going on when the relative Y position is 0.

    Mind throwing a few complicated BHAVs my way? It would be useful to test my assumptions against something that's likely to break them.
    Last edited: Mar 15, 2016
  17. Cytlan

    Cytlan New Member

    Not quite. I'll make a new topic about Arry(3) decoding tomorrow, as we're getting too much off topic from SimAntics decompiling.

    On topic, I'm rewritten the decompiler to use a graph approach instead, and it can reliably detect loops now. I'm surprised it isn't as slow as I imagined it to be.
    I'll go into details once I get some cool output.

    One thing I've noticed, there seems to be some nodes in some BHAVS that have no other nodes connecting to them. Quite interesting.
  18. francot514

    francot514 Well-Known Member

    Ok sorry, about sims antics decompiling have you worked with these primities "Find Best Action" "Goto sub found action" in interested in know how those works..
  19. RHY3756547

    RHY3756547 FreeSO Developer Staff Member Moderator

    You tend to find those when code has been changed mid-development, it's quite common in TSO as a few things were changed to play a bit better online. Their method of stripping BHAVs only strips the position data and comments (POSI), so disconnected BHAVs remain present.
    Cytlan likes this.
  20. Cytlan

    Cytlan New Member

    I see. I only noticed when I stated looking for nodes that weren't exercised when iterating the graph.
    As someone who's very interested the the development of things, I find this fascinating.

Share This Page