Currently the graphics rendering in many places is way more intensive than it needs to be. Here are some changes that I'm planning:
General (replacement static rendering)
A few things in the game currently implement "static" rendering - however right now that is confined to what is currently on the screen space! This means that when the camera is moved, all the static buffers have to be redrawn anyways, making everything hilariously slow. To solve this for the lot and city, we should be using a scrollable buffer larger than the screen that we only regenerate when it has been exhausted or when a change occurs. This will create a small (maybe 2 frames on older hardware, depending on lot/city complexity) stutter on static buffer regeneration, which isn't great but otherwise you'd be getting a 2 frame stutter on every frame!
Here's an illustration showing how a static buffer like this works:
You can see that in the red (screen) area, it looks like the crosshatch is smoothly scrolling at 60fps - but this is actually a LIE!! The contents of the scrolling buffer are only re-rendered when the edge is hit.
Another alternative to this would be to render in many "tile" increments and only rerender tiles that have changed - which is a lot less intensive on the pixel shader but requires a lot more draw calls to render many tiles at once, and calculations to determine which sprites are in which tile.. We'll probably just go for an enlarged single buffer so that the game isn't cpu bound.
City Rendering
Right now the city is rendered every frame on all settings. We'll probably want to implement something like the above - even on shadowed mode since the time, and thus the shadows move quite slowly (regen every 1/2 second?). Zooming into or out of the active lot should also have a cool effect, js.
Vitaboy (Sim Rendering)
Vitaboy currently re-calculates an avatar's mesh every frame that it is animated. It will be much faster to offload this animation work to the vertex shader, passing in an array of matrices as the bone transforms to calculate the animation on the graphics card, and only pass the original mesh once. Not much will have to be changed from the current implementation, but characters will still need a custom shader to handle the bone transforms.
Revised vertex format needs:
POSITION: real vert position
TEXCOORD0: texture coordinates
TEXCOORD1: blend vert position
TEXCOORD2: {x:real vert binding, y: blend vert binding, z: blend vert factor (<=0 means no blend vert)}
prelim vertex shader:
Lot Sprite Rendering (2DWorldBatch)
This is currently very inefficient. Right now sprite geometry is regenerated by the cpu for every draw, and the object ID step causes an extra set of draw calls and the aforementioned geometry generation for every object in the update loop!
What we want is for sprites to instead cache their geometry - positions and all until it changes. Sprite lists can then be built off of the prebuilt geometry, minimizing the per frame impact on the cpu (we could also use geometry shaders to offload this work to the gpu, but it may not be faster), and the projection/view matrix will move instead of the sprites.
Terrain
The terrain is a tricky element, since the original game rendered the terrain using the CPU and thus did a few things that we can't easily consider (just drew the blades from the bottom up without any geometry generation or transfer to gfx, just pixel writing), and actually rendered it Z-buffered so that grass could appear over certain elements. I have a few ideas for how this should work, and they all have different pros and cons:
Grass blades as primitives (current setup)
Right now, if you enable grass blade rendering (currently disabled for performance concerns) this is what you'll get. We basically generate a ton of lines that are the "blades", put them in a graphics buffer, send them to the GFX card and render then whenever needed. As you can probably guess, all of these lines take up a lot of memory, and they take a long time to render.
Pros:
Flat grass plane with additional grass layers coming out (fur shader style)
This would render the terrain and "overlay" layers floating above the grass that use a PRNG based on the ground's "texture-coord" to draw protruding grass from that position when the PRNG generates a float less than a specified value, which can be altered by how "dead" the grass is or can change depending on which layer is active, to make lower value PRNG grass blades have shorter heights. (would also be affected by how dead grass is, so grass would also be shorter around dead areas!)
Pros:
Flat grass plane with pixel shader "paralax occlusion map" grass
Same as above, but work will be moved from the fillrate to the pixel shader - the ground will only render once.
Pros:
General (replacement static rendering)
A few things in the game currently implement "static" rendering - however right now that is confined to what is currently on the screen space! This means that when the camera is moved, all the static buffers have to be redrawn anyways, making everything hilariously slow. To solve this for the lot and city, we should be using a scrollable buffer larger than the screen that we only regenerate when it has been exhausted or when a change occurs. This will create a small (maybe 2 frames on older hardware, depending on lot/city complexity) stutter on static buffer regeneration, which isn't great but otherwise you'd be getting a 2 frame stutter on every frame!
Here's an illustration showing how a static buffer like this works:
You can see that in the red (screen) area, it looks like the crosshatch is smoothly scrolling at 60fps - but this is actually a LIE!! The contents of the scrolling buffer are only re-rendered when the edge is hit.
Another alternative to this would be to render in many "tile" increments and only rerender tiles that have changed - which is a lot less intensive on the pixel shader but requires a lot more draw calls to render many tiles at once, and calculations to determine which sprites are in which tile.. We'll probably just go for an enlarged single buffer so that the game isn't cpu bound.
City Rendering
Right now the city is rendered every frame on all settings. We'll probably want to implement something like the above - even on shadowed mode since the time, and thus the shadows move quite slowly (regen every 1/2 second?). Zooming into or out of the active lot should also have a cool effect, js.
Vitaboy (Sim Rendering)
Vitaboy currently re-calculates an avatar's mesh every frame that it is animated. It will be much faster to offload this animation work to the vertex shader, passing in an array of matrices as the bone transforms to calculate the animation on the graphics card, and only pass the original mesh once. Not much will have to be changed from the current implementation, but characters will still need a custom shader to handle the bone transforms.
Revised vertex format needs:
POSITION: real vert position
TEXCOORD0: texture coordinates
TEXCOORD1: blend vert position
TEXCOORD2: {x:real vert binding, y: blend vert binding, z: blend vert factor (<=0 means no blend vert)}
prelim vertex shader:
Code:
VitaVertexOut vsVitaboy(VitaVertexIn v){
VitaVertexOut out;
float4 position = (1.0-v.params.z) * mul(v.position, SkelBindings[asint(v.params.x)]) + v.params.z * mul(v.bvPosition, SkelBindings[asint(v.params.y)]);
out.texCoord = v.texCoord;
out position = mul(v.position, ViewProjection);
return out;
}
Lot Sprite Rendering (2DWorldBatch)
This is currently very inefficient. Right now sprite geometry is regenerated by the cpu for every draw, and the object ID step causes an extra set of draw calls and the aforementioned geometry generation for every object in the update loop!
What we want is for sprites to instead cache their geometry - positions and all until it changes. Sprite lists can then be built off of the prebuilt geometry, minimizing the per frame impact on the cpu (we could also use geometry shaders to offload this work to the gpu, but it may not be faster), and the projection/view matrix will move instead of the sprites.
Terrain
The terrain is a tricky element, since the original game rendered the terrain using the CPU and thus did a few things that we can't easily consider (just drew the blades from the bottom up without any geometry generation or transfer to gfx, just pixel writing), and actually rendered it Z-buffered so that grass could appear over certain elements. I have a few ideas for how this should work, and they all have different pros and cons:
Grass blades as primitives (current setup)
Right now, if you enable grass blade rendering (currently disabled for performance concerns) this is what you'll get. We basically generate a ton of lines that are the "blades", put them in a graphics buffer, send them to the GFX card and render then whenever needed. As you can probably guess, all of these lines take up a lot of memory, and they take a long time to render.
Pros:
- Physically accurate in any render state (2d, 3d, all zoom levels)
- Very memory intensive!
- Grass takes a long time to initially generate (took like 5 seconds to generate the grass alone at the volume TSO normally used)
- Rendering so many primitives is incredibly straining on the rasterizer, so older cards may not be able to keep up
Flat grass plane with additional grass layers coming out (fur shader style)
This would render the terrain and "overlay" layers floating above the grass that use a PRNG based on the ground's "texture-coord" to draw protruding grass from that position when the PRNG generates a float less than a specified value, which can be altered by how "dead" the grass is or can change depending on which layer is active, to make lower value PRNG grass blades have shorter heights. (would also be affected by how dead grass is, so grass would also be shorter around dead areas!)
Pros:
- Incredibly fast, only memory usage for first grass layer..
- No pre generation
- Works in 3d!
- Can make grass sway in wind with vertex shader on layers (future)
- Risk of severe overdraw - too many grass layers will involve redrawing a portion of the screen layer times, stressing older cards' fillrates. (probably faster than primitives tho)
- Illusion of a continuous blade will be shattered when zooming in 2x or more, or panning the camera to be near perpendicular to the ground (future).
Flat grass plane with pixel shader "paralax occlusion map" grass
Same as above, but work will be moved from the fillrate to the pixel shader - the ground will only render once.
Pros:
- No redraw, so potentially faster where fill rate is limited
- Pixel shader becomes incredibly intensive - Especially if we want to calculate the altered depth of said blades.
- At edges of the world, grass blades will suddenly cut off.
- Will not work correctly with bumpy terrain! (not relevant right now, but may be in future.)
Last edited: