🎉 Celebrating 25 Years of GameDev.net! 🎉

Not many can claim 25 years on the Internet! Join us in celebrating this milestone. Learn more about our history, and thank you for being a part of our community!

Back to General and Gameplay Programming

Concurrent modification of components in ECS

ballzak · 2020-01-29T04:10:02

I have little game dev experience but trying to create an ECS based game engine for personal use. Components are stored in, and managed by, separate Systems, as i believe it's good SRP, and the most performant since each System can choose a memory layout optimal for its purpose, e.g. sorted, stored in GPU, etc. The engineering issue i've been struggling with is how to create/update/delete Components in such a design, preferably allowing good parallelization. I've read every article/post i can find, none seem to present a full or convincing solution. Conceivable methods: 1. Direct modification, synchronous message passing A System just calls the function to directly create/update/delete a Component in another System, or itself. Simple, but would make parallelization nearly pointless since every Component data structure would require locking. No batching, if Components must maintain order, or some other post-processing, sorting after every call is sub-optimal. To avoid locking, Systems could possibly be orchestrated do their processing in an order to not modify each other concurrently, but as the number of Systems increase i fear it would become a maintainable nightmare, also limiting parallelization. 2. Single “global” Event queue, asynchronous message passing A System will enqueue Events, such as EnemySpawned or WeaponFired, on a single “global” queue. All Systems would then process the queue when they can do so without interference, possibly in parallel, to create/update/delete their Components, e.g. PositionSystem create its PositionComponent for the Event, RenderSystem its MeshComponent, etc.. I'd expect better performance since less locking is required, only on the queue itself. Batching would be possible, but every System would have to process lots of irrelevant Events. This also seems illogical to me, a RenderSystem shouldn't have to understand high-level concepts such as Enemy nor Weapon, and as the number of Events increase so does “listeners” in every interested System. 2.1. Each Event type has a “global” queue Similar to #2 except every type of Event has its own global queue. This could improve performance as interested Systems wouldn't have to process irrelevant Events, less queue lock contention. Downside is that there's could possibly be an unmanageable amount of queues floating around. 2.2. Each Event type has a “global” queue for every worker thread Same as #2.1 except every worker thread has its own queue, so they don't need any locking, possibly improving performance. Downside, even more queues. 3. Single “global” Command queue, asynchronous message passing Similar to #2, but instead of Events a System enqueue specific Commands, e.g. CreatePositionComponent handled by PositionSystem, CreateMeshComponent handled by RenderSystem, etc.. Possible improvements over #2 are that RenderSystem no longer has to understand high-level concepts, and that every System probably only need three “listeners” (create/update/delete). Possible downside is that spawning an (Enemy) entity could create a lot of Commands. 3.1. Each Command type has a “global” queue Similar to #3 except every type of Command has its own global queue, so there's less queue lock contention. Beyond that i don't see any improvement since a System wouldn't be able to perform create/delete in a single batch. 3.2. Each Command type has a "global" queue for every worker thread Same as #3.1 except every worker thread has its own queue, so it doesn't need for locking, possibly improving performance. Downside, more queues. 4. Each System has a single Command queue Similar to #3 except every System has its own Command queue. This could improve performance since the System wouldn't have to process any irrelevant Commands meant for other Systems. It could perform create/delete in a single batch, and it would also be possible, since the System owns, or take ownership of, the queue, to sort the Commands in an order optimal for processing. 4.1. Each System has a Command queue for every worker thread Same as #4 except every worker thread has it's own queue, so they don't need any locking, possibly improving performance. This is how Vulkan does it, so i'd expect it's the preferable solution. 5. (Theoretical) task scheduler with System dependency graph Some AAA seem to use such a design, i have no idea of the internals but at lower-level i guess similar to #4.1 except it enqueue a Command handler function, co-routine or fiber instead of just Command data. EDIT: 6. Double-buffered Component data Each Systems has two, or X number of frames, copies/sets of their Component data, then swap after each frame. Read from current set, write to next/future set. More of simulation/logic data requirement i guess, since i don't seem how this would reduce locking when concurrently creating/deleting Components in other System, an Event/Command buffer would still be required for that. Copying all Component data every frame could be costly. Any other solutions? What's your preference, why?

General and Gameplay Programming Programming

Started by ballzak January 26, 2020 06:51 AM

12 comments, last by ballzak 4 years, 5 months ago

ballzak

Author

January 28, 2020 03:11 AM

@All8Up Fully initializing the newly created Entities concurrently, into alternative storage, and not just enqueue Commands to do so, could indeed perform better since that will then also be parallelized, and the last “merge” is basically just one, or a few, memcpy (one per alt. storage). I guess it could get contentious around the Entity id/handle (index+generation) “allocator”, ideally it would use a wait-free algorithm, but of course, the single-threaded Command queue approach wouldn't be much better.

I presume the barrier() isn't blocking all threads, just the jobs affected by it, e.g. according to the DAG, probably what you describe as the “optimizer pass”? Theoretically, i guess some Entity jobs, unrelated to force/position, could be permitted to run until the end of the frame.

Using more threads than cores is supposedly pointless, but i guess for blocking IO calls it would make sense, and using a separate scheduler for them.

Thanks for all the info.

All8Up

5,999

January 28, 2020 03:21 PM

@All8Up Fully initializing the newly created Entities concurrently, into alternative storage, and not just enqueue Commands to do so, could indeed perform better since that will then also be parallelized, and the last “merge” is basically just one, or a few, memcpy (one per alt. storage). I guess it could get contentious around the Entity id/handle (index+generation) “allocator”, ideally it would use a wait-free algorithm, but of course, the single-threaded Command queue approach wouldn't be much better.

Actually, handles are fairly trivial also as they use per-thread staging just like the entities. If you create an entity, a per thread handle is created. The ID is globally unique so when the entity is merged so is the handle. Figuring out the ID is a bit of a pain but otherwise this makes things almost trivial. Once merged though, handles are completely free threaded since they are read only at all times during updates. Adding/removing components again falls back to per-thread staging and is part of the merge process, that's the only time the handles are modified.

I presume the barrier() isn't blocking all threads, just the jobs affected by it, e.g. according to the DAG, probably what you describe as the “optimizer pass”? Theoretically, i guess some Entity jobs, unrelated to force/position, could be permitted to run until the end of the frame.

Barrier is indeed blocking given how things work and the intended guarantee's. There is no dynamic work pushed to this particular scheduler. it is 100% pre-computed within the world update, there is nothing it would be able to do safely anyway. This is actually a purposely designed decision, dynamic work in the main game loop conflicts with the “writing systems require little or no threading knowledge” goal. Rather, any dynamic work is pushed to the secondary wsjs system and the only communications between the two is either handles which can be polled for readiness or explicit event queues which are processed within the world update in the same way your write a system. This adds a minor overhead to some things but in general not expecting gameplay programmers to understand and/or worry about complex threading is a much more worthwhile benefit.

Using more threads than cores is supposedly pointless, but i guess for blocking IO calls it would make sense, and using a separate scheduler for them.

That's the point of the balancer system. It knows the total thread count available and tries to keep things under that. So, for instance, it knows there are 64 hardware threads so it will never over subscribe. An even balance would be 32 reserved for wsjs and 32 for parallel. It will balance back and forth based on which side is utilizing more per-thread time and a few other heuristics, but it will always keep the total in use below the total available. Actually I reserve 2 right off the bat to let the OS have something to use and various subsystems that crank up independent workers such as sound, input and rendering. Also worth noting that the sweet spot for thread utilization is generally around ¾ths the total hardware threads because hyperthreading is not equivalent to actually having more physical cores. So, generally speaking, about 54 threads of parallel processing on the 2990wx is the best performance, beyond that the ALU's, SIMD units and memory bandwidth are completely saturated and there is just no benefit to trying to use more threads.

ballzak

Author

January 29, 2020 04:10 AM

@All8Up Thanks for the tip of separating the Entity ID generation, i guess a few bits of the “index” part could be reserved for thread number, then increasing by that many bits when “allocating” new ones, leaving holes.

Brainstorming, that idea taken to its extreme, like a distributed database, where each thread (node) store a sub-set of the total number of Component, of a particular type, avoiding “merges”? As said, i haven't made many Systems, nor a full game, so i don't know how common it would be for a System to query Components stored in another/multiple threads in such a design, i'd expect at least spacial queries and rendering (depth sorting) would have to be “merged”.

I expect it's more difficult to manage/reserve thread on a consumer PC with way less cores.

🎉 Celebrating 25 Years of GameDev.net! 🎉

Concurrent modification of components in ECS

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

🎉 Celebrating 25 Years of GameDev.net! 🎉

Concurrent modification of components in ECS

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines