🎉 Celebrating 25 Years of GameDev.net! 🎉

Not many can claim 25 years on the Internet! Join us in celebrating this milestone. Learn more about our history, and thank you for being a part of our community!

Concurrent modification of components in ECS

Started by
12 comments, last by ballzak 4 years, 5 months ago

@All8Up Fully initializing the newly created Entities concurrently, into alternative storage, and not just enqueue Commands to do so, could indeed perform better since that will then also be parallelized, and the last “merge” is basically just one, or a few, memcpy (one per alt. storage). I guess it could get contentious around the Entity id/handle (index+generation) “allocator”, ideally it would use a wait-free algorithm, but of course, the single-threaded Command queue approach wouldn't be much better.

I presume the barrier() isn't blocking all threads, just the jobs affected by it, e.g. according to the DAG, probably what you describe as the “optimizer pass”? Theoretically, i guess some Entity jobs, unrelated to force/position, could be permitted to run until the end of the frame.

Using more threads than cores is supposedly pointless, but i guess for blocking IO calls it would make sense, and using a separate scheduler for them.

Thanks for all the info.

Advertisement

@All8Up Fully initializing the newly created Entities concurrently, into alternative storage, and not just enqueue Commands to do so, could indeed perform better since that will then also be parallelized, and the last “merge” is basically just one, or a few, memcpy (one per alt. storage). I guess it could get contentious around the Entity id/handle (index+generation) “allocator”, ideally it would use a wait-free algorithm, but of course, the single-threaded Command queue approach wouldn't be much better.

Actually, handles are fairly trivial also as they use per-thread staging just like the entities. If you create an entity, a per thread handle is created. The ID is globally unique so when the entity is merged so is the handle. Figuring out the ID is a bit of a pain but otherwise this makes things almost trivial. Once merged though, handles are completely free threaded since they are read only at all times during updates. Adding/removing components again falls back to per-thread staging and is part of the merge process, that's the only time the handles are modified.

I presume the barrier() isn't blocking all threads, just the jobs affected by it, e.g. according to the DAG, probably what you describe as the “optimizer pass”? Theoretically, i guess some Entity jobs, unrelated to force/position, could be permitted to run until the end of the frame.

Barrier is indeed blocking given how things work and the intended guarantee's. There is no dynamic work pushed to this particular scheduler. it is 100% pre-computed within the world update, there is nothing it would be able to do safely anyway. This is actually a purposely designed decision, dynamic work in the main game loop conflicts with the “writing systems require little or no threading knowledge” goal. Rather, any dynamic work is pushed to the secondary wsjs system and the only communications between the two is either handles which can be polled for readiness or explicit event queues which are processed within the world update in the same way your write a system. This adds a minor overhead to some things but in general not expecting gameplay programmers to understand and/or worry about complex threading is a much more worthwhile benefit.

Using more threads than cores is supposedly pointless, but i guess for blocking IO calls it would make sense, and using a separate scheduler for them.

That's the point of the balancer system. It knows the total thread count available and tries to keep things under that. So, for instance, it knows there are 64 hardware threads so it will never over subscribe. An even balance would be 32 reserved for wsjs and 32 for parallel. It will balance back and forth based on which side is utilizing more per-thread time and a few other heuristics, but it will always keep the total in use below the total available. Actually I reserve 2 right off the bat to let the OS have something to use and various subsystems that crank up independent workers such as sound, input and rendering. Also worth noting that the sweet spot for thread utilization is generally around ¾ths the total hardware threads because hyperthreading is not equivalent to actually having more physical cores. So, generally speaking, about 54 threads of parallel processing on the 2990wx is the best performance, beyond that the ALU's, SIMD units and memory bandwidth are completely saturated and there is just no benefit to trying to use more threads.

@All8Up Thanks for the tip of separating the Entity ID generation, i guess a few bits of the “index” part could be reserved for thread number, then increasing by that many bits when “allocating” new ones, leaving holes.

Brainstorming, that idea taken to its extreme, like a distributed database, where each thread (node) store a sub-set of the total number of Component, of a particular type, avoiding “merges”? As said, i haven't made many Systems, nor a full game, so i don't know how common it would be for a System to query Components stored in another/multiple threads in such a design, i'd expect at least spacial queries and rendering (depth sorting) would have to be “merged”.

I expect it's more difficult to manage/reserve thread on a consumer PC with way less cores.

This topic is closed to new replies.

Advertisement