🎉 Celebrating 25 Years of GameDev.net! 🎉

Not many can claim 25 years on the Internet! Join us in celebrating this milestone. Learn more about our history, and thank you for being a part of our community!

Where to put GUI when update and render are on separate threads?

Started by
9 comments, last by SyncViews 4 years, 2 months ago

Continuing on multithreading exploration, I can't figure out where the GUI should live in a nice way. Most GUI libraries just avoid threading entirely, the two I am aware of and suited for games, CEGUI, and ImGui included. At least with the provided rendering implementations.

I wasn't able to find any example code or really any questions on the topic (a few about multi-threading the UI itself, not what I want), and am wondering if I missed a simple solution to this?

Should the entire GUI (along with handling mouse/keyboard/etc. input) be with the rendering? Or with the update? Or should I somehow split my UI differently, maybe it lives separately from both somehow?

GUI in the render thread

Given that the OGL/D3D implementations in the libraries I looked at don't provide a simple means to split the logic from the actual rendering, the most obvious solution seems to be the entire thing should live in the render thread (including input handling).

Actions that will affect the game world are then sent over to the update thread to enact (which seems good practice anyway rather than directly manipulating game object/entity state, and this input layer can also be used for recordings, networking, etc.).

struct UnitAttackCommand // as part of some sort of union/variant collection
{
    std::vector<EntityId> selection;
    EntityId target;
};

The big catch is while implementing this into an actual project, I see that various aspects of the GUI (widgets/windows themselves, HUD stuff, game world overlays or floating elements like HP bars, labels, etc.) can end up wanting to read all sorts of state.

struct ProjectileRenderState
{
    const Sprite *spr; // actual image / pre-backed atlas entry, size, etc.
    Vector2F prev_pos, next_pos; // 2 points for interpolation in each state set
    float prev_dir, next_dir;
    // Note that there is nothing here about it's owner, target, AI (guided target prediction etc.). For more complex objects their is a lot of stuff referencing other entities, dynamically allocated collections, etc.
};

I wonder if I am just creating this entire issue for myself here where most games don't have one by trying to “over-optimise”.?

Would it be normal to clone the entire game state over to the render thread on every update? Even information that would generally never be "visible” like stuff in different map regions, AI/command data, etc. just in case the UI needs it for something?
I avoided doing this because such a copy seemed pretty complex and expensive, especially with a lot of things containing dynamically sized collections and handling any references/pointers.

GUI in the update thread

The other idea was that the GUI lives in the update thread where it can read anything in the game, and sends over it's render data.

The catch this time being that most GUI libraries seem completely not designed to do this (and want to touch font, texture/image objects, etc. in many places, e.g. on creation, and then later for both layout and rendering), so I wonder if this is just a terrible idea? If this is a done thing, is there any open source example implementation of doing so?

Especially fonts/text, as well as complete stuff like ImGui, neither the DirectWrite or FreeType (both supporting at least some Unicode, loading glyphs on demand) text implementations I personally have around will handle the font issues easily it seems (maybe putting mutexes around such stuff, I suspect I will regret that however).

struct UiDrawText // and similar for images, gradients, etc. in a variant/union container
{
    Font *font;
    Vector2I pos;
    RectI clip;
    TextAlign align;
    Colour col;
    std::string text; // UTF-8, possibly need to do full layout in update thread and just pass a list of glyph positions, but thread saftey on that "Font" object seems a big issue
};

While seemingly simpler and better initially, having now made it mostly work, I am wondering if this was truly the wrong direction?

Advertisement

I recently tried my hand at making a multi-threaded version of the app that I'm working on. I make one worker thread that does the calculations, while the main thread handles the GUI.

A talk I recommend on and off is from GDC 2015 where Naguhty Dog talked about their parallel pipeline. TL;DR they are not strictly bound to render and update threads rather than spread all of the work to all available CPU cores.

I did something similar some time ago implementing my own fiber based job system (which is pretty usefull for multi-threaded programming) and had to solve putting my game loop onto it. I ended in the following scenario:

  • Having tasks performing updates for the ‘current’ state of the game where current is relative to when the task started
  • A completed update task is incrementing a state and awaits the end of the frame
  • The global game state used by the render task is preserved until a frame ends
  • Update tasks are just preparing their updatet data but not pushing them until the render task ends the frame
  • A signal is set that updates can be pushed and tasks awaiting start running
  • The render task awaits until the state reaches zero again (which signals that all updates are done) and starts the next frame

User-Input is collected in my event system and consumed by update tasks asynchronously to their update routine. Same for incoming network packages.

It required a lot of planning dependencies and how to synchronize which data at what stage of processing but I feel like it was worth it. Not to mention the nightmare to debug the whole system ?

Shaarigan said:
The global game state used by the render task is preserved until a frame ends Update tasks are just preparing their updatet data but not pushing them until the render task ends the frame A signal is set that updates can be pushed and tasks awaiting start running

Interesting. What does the game state look like that to make it work? How exactly is object state being stored?

Threading the update in general seems even harder, because update systems want to read just as much as the GUI, and can also modify all sorts of stuff, and I would like to keep them deterministic. Is the current state of all entities/"things" something you can memcpy and even merge in a simple way?

I looked at both an OOP and an ECS way now and both seem complicated. Even with an ECS design with a “no pointers” rule if I did ever figure out how to make that work nicely, essentially copying all the component arrays sounds expensive.

Can't you do a blend?

Intuitively, I would say display is part of render, and computing what to display is part of update. This is also how a game works, the game data is updated with a new position of an object, and the renderer is simply told “paint it there please".

So you would have gui-data aimed at describing what to display in the gui, which is likely not very large. In the update part of the game you update the gui-data using whatever you need from the game, as well as user input. The renderer just takes that gui-data and shows it to the user.

Basically it's an input sub-system of course, the game runs its course with input from the user. That input is coming from the input sub-system, which may be a key, or a mouse-click in the gui.

Alberth said:
So you would have gui-data aimed at describing what to display in the gui, which is likely not very large. In the update part of the game you update the gui-data using whatever you need from the game, as well as user input. The renderer just takes that gui-data and shows it to the user.

Basically it's an input sub-system of course, the game runs its course with input from the user. That input is coming from the input sub-system, which may be a key, or a mouse-click in the gui.

So this is like my second “do it in the update” idea and then over a “render command list” of sorts? Can you think of any example of this? Originally I thought this was a great idea, until I realised that I couldn't find any information on a GUI library actually doing this?

To handle say a “mouse click”, I need to know where GUI elements are to see what the click “hits”, which means I need to do the GUI layout, which means I need to deal with fonts, image sizes, etc. which generally goes into things the libraries say is not thread safe (Id need those same fonts, sprite-sheets, etc. to actually draw the “gui-data”). Do I start duplicating fonts etc. so each can have its own copy for metrics?

This is for an RTS/citysim type game. There are thousands of objects, and a lot of them have a lot of state and a lot of information the GUI might present, but to render in the game world are fairly simple individually.

[quote]I see that various aspects of the GUI (widgets/windows themselves, HUD stuff, game world overlays or floating elements like HP bars, labels, etc.) can end up wanting to read all sorts of state.[/quote]

You have to copy over a bunch of state to render the world, too!

The player characters need to know where they are and what their animation state is, the camera needs to know where it is, and so forth.

The GUI is no different.

You could have some big structure that contains all the GUI you want to render, and update that each time through your update loop, for example. Then render that GUI state. As you've discovered, you can either send data across the thread barrier, or pitch constructed graphics objects (textures, render states, command buffers) across the thread barrier. Which one you choose depends on preference and profiling measurements for your own engine.

In general, though, state synchronization between threads is THE BIG THING in game engines these days. If you have a deformable object or terrain feature, where do you calculate the deformed vertex data? Physics writes the deformation, graphics reads it.

In general, the “robust” solution to this is to build a very light-weight task queue (like a non-blocking FIFO or something,) and a light-weight dependency model, that lets some task say “I need X, Y, and Z to be ready before I can run.” Then dump all your work into this dependency graph, work on the queue with as many threads are are available on the CPU (or one plus number of cores divided by two, or something, if you're worried about hyper-threading pessimizations.)

It's also not uncommon to have two versions of state: “read-only, previous tick state" and “write-only, current tick state.” The simulation will read from the read-only state, and write to the write-only state, but will NOT read from the write-only state. That way, multiple readers can all read the read-only state, and not collide. There's a synchronization point where all readers declare “I'm done reading,” and the read/write buffers can be swapped. GUI would register the read lock, do its thing, and then release the claim, letting simulation swap buffers and continue. Generally, this will let GUI and simulation overlap without colliding and without too much serialization. (Note that GUI should release the claim BEFORE it tries to present, so you don't end up holding it while waiting for the screen refresh.)

enum Bool { True, False, FileNotFound };

hplus0603 said:
You have to copy over a bunch of state to render the world, too!

So I guess maybe my “over-optimisation” here to only copy the few bits of state needed for regular rendering? The unit type, position, orientation, a short list of outstanding injury/damage effects, and what it's current action is (animation selection).

But over in logic land it has a whole bunch of state to do with physics, healing/repairs, what is it doing/targeting, what is its planned path, what is attacking it, escorting/formation with nearby allies, projectiles to dodge, etc. And the UI might display those in specific cases.

A lot of such UI is similar in detail I guess to what you might find in many games like Cities:Skylines, Factorio, RimWorld, any of the X-Series, etc.

For example in Cities:Skylines rendering the paths for all the vehicles through a junction segment, seems like a lot of data that the renderer just doesn't normally need. Likewise probably a bunch of state to do with distributing employees and their places of work, area desirability, etc.

hplus0603 said:
It's also not uncommon to have two versions of state: “read-only, previous tick state

I guess same reason for me deciding against that. Am I just overthinking this? Is copying all that something that can be optimised enough? Either a way to determine pointer values “in the copy”, or a way to optimise any sort of “GetUnit(EntityId)” etc. (or even something for an ECS with components stored separately) once it scales to thousands of objects (and frequently many of them not really “interesting” most of the time)?

I tried doing this before in single-threaded code, and it ate performance compared to logic that just touches the few fields it needed to right now, and can do some nice simple like “target->position()”, “target->type()->size”, etc.

I did keep “two versions” of certain things, but only what the renderer specifically needed for interpolation (e.g. positions of movable objects). Similar when I was testing multi-threaded pathfinding, I only maintained a copy of certain things it needed, e.g. the tilemap so it can gives units their path ready for next frame.

hplus0603 said:
In general, the “robust” solution to this is to build a very light-weight task queue (like a non-blocking FIFO or something,) and a light-weight dependency model, that lets some task say “I need X, Y, and Z to be ready before I can run.”

I had thought of kinda considering it like a web page, when the player does something the GUI needs to display, but the GUI doesn't have the data, it asks the logic thread for that data specifically (as an “async task” the logic thread can run at a suitable time), and the GUI finishes building next frame. It's going to still be much faster than a web XHR/AJAX, so maybe that is an idea. Hopefully not a terrible idea/architecture?

EDIT: Also guess I am worrying about a lot of this without a good example to look at.

Several games I played like Cities:Skylines, From the Depths, Oxygen Not Included, etc. have pretty terrible performance once the entity count starts stacking up. Factorio and Minecraft seem pretty well optimised, but is limited by how well a single thread can do. So seemed having an architectural idea that wasn't going to lead to the same corner is a good idea.
A lot of games I see information on seem to have way less logic/simulation, and seem to be looking more on graphics side, optimising animations, particle effects, etc. for far fewer active entities.

Is copying all that something that can be optimised enough?

Modern CPUs are very, very, fast at copying memory that's in “flat buffers.” They are much worse at copying memory that contains pointers. Thus, modern game engines designed for performance often use an “id” instead of a “pointer.”

If you can have at most 8192 entities, you allocate an array up front that's Entity[8192]. Instead of an Entity*, you use an EntityID, which is just an integer. You may even “waste” the ID=0 value to mean “not allocated.”

When the Entity then needs a ParticleSystem or a PhysicsBody or whatever, it contains another ID that points into the appropriate array.

Now, these arrays are all nice and flat, and because you're not using pointers, you don't need to patch anything up when copying. You can saturate the memory bus and just copy those big arrays each frame. Let's say you have 64 component kinds, and you can have at most 8192 instances total of a component, and each component is on average 128 bytes in size, the amount of data to copy is 64 MB. How big is your game? This may be overkill? I get the following numbers from my machine:

#include <time.h>
#include <string.h>
#include <stdio.h>

char bufa[8192*64*128];
char bufb[8192*64*128];

int main() {
    double bestpass = 0;
    for (int i = 0; i < 100; ++i) {
        struct timespec ts;
        clock_gettime(CLOCK_MONOTONIC_RAW, &amp;ts);

        memcpy(bufa, bufb, sizeof(bufa));

        struct timespec te;
        clock_gettime(CLOCK_MONOTONIC_RAW, &amp;te);

        double pass = double(te.tv_sec-ts.tv_sec) + double(te.tv_nsec-ts.tv_nsec)/1000000000;
        if (i == 0 || pass < bestpass) {
            bestpass = pass;
        }
    }
    double throughput = sizeof(bufa)/bestpass/(1024*1024*1024);
    printf("%.6f seconds per pass\n", bestpass);
    printf("%.1f GB/s\n", throughput);
    return 0;
}

0.002352 seconds per pass
26.6 GB/s

So, two milliseconds to copy all that over. Do you have fewer component kinds? Are your components smaller? Do you have fewer entities? It'll run faster. (If the buffers are small enough, they will fit in cache and throw off this measurement, but my machine doesn't have 128 MB cache :-) Presumably you can arrange for this copy time to happen while the GPU is busy actually rendering/presenting the previous frame.

Yes, this is a bit of overhead, compared to doing nothing. But if you have four or more cores to use, I think this overhead is much lower than the cost of wrapping everything in locks, or keeping everything serial on a single core.

Regarding ID management for a fixed-size array: There are tons of little finessing you can do there. You can index into the array using &8191 so that the ID=8192 item ends up in slot 0. You can keep a “generation counter” in the upper bits of the ID, so that if you delete an entity, and then re-use that slot for a new entity, you can tell the ID of the deleted entity from the ID of the new entity. You can also know which is the “first” and “last” slot used in each entity/component array, and copy less for each one, although I prefer to just build for the worst case, and the better cases will take care of themselves.

enum Bool { True, False, FileNotFound };

hplus0603 said:
Modern CPUs are very, very, fast at copying memory that's in “flat buffers.” They are much worse at copying memory that contains pointers. Thus, modern game engines designed for performance often use an “id” instead of a “pointer.”


If you can have at most 8192 entities, you allocate an array up front that's Entity[8192]. Instead of an Entity*, you use an EntityID, which is just an integer. You may even “waste” the ID=0 value to mean “not allocated.”

So I tried stuff going the whole ECS route and ran into issues, but then maybe that was going to far and a middle ground is better… maybe getting onto another topic slightly…

Basically the problem I had is “Entity count” grows very quickly in games where players can build stuff up (for a quick reference, I loaded up my current Factorio game which is by no means a megabase, that is 500,000 “entities”). The thing is most of these are really not interesting entities, they are basically inactive things lying around the map, which I see in a lot of RTS/City like games and foresee having in my stuff (trees, resource points, items on ground, corpses, wall segments, generic buildings, etc.). In the average frame, they do nothing at all if not on screen, unless some other entity interacts with them.

So having like a “Position” component just became massive and with way too many binary searches for things. “ID-reuse” avoids that, but meant every array had to be this length with blank spaces for unused-component, plus had to treat them like pointers in regards to memory safety ("dangling ID" would be bad. I guess that “generation counter” you mentioned would have solved that one though).

How would your `Entity` there handle such different types? Currently I did actually keep different arrays for the different concrete types in the game which I could maybe adapt, `List<Projectile>; List<Ship>;` etc. But these do have base classes for the purposes of references from other stuff, again because of the interactions, a turret might shoot at a unit, or it might shoot at a projectile. Should I just have a ProjectileId and UnitId as separate “spaces” and use both as needed?

hplus0603 said:
“flat buffers.”

This was the other problem I had, complex objects I found keep having “internal lists” not just single value references, with a regular class and dynamic array this doesn't really matter, but a “worst case” reservation would be fairly large.

For example looking at some stuff I have currently making up a potential “unit”:

  • The list of targets for the current “command” (attack, guard, etc.).
  • The list of units currently guarding this unit (e.g. so it can distribute them into a surrounding formation)
  • The list of incoming guided projectiles aimed at the unit to dodge, counter measure, shoot down, etc.
  • The list of damage hits and recent shield hits (for visuals, merging nearby hits into one larger effect. shields hits are time limited, damage ones stay until damage is repaired)
  • The list of subcomponents (also fairly complex themselves, with things like turrets having their own targetting and AI, and many contributing to the main units stats/behaviour).

This topic is closed to new replies.

Advertisement