I have little game dev experience but trying to create an ECS based game engine for personal use. Components are stored in, and managed by, separate Systems, as i believe it's good SRP, and the most performant since each System can choose a memory layout optimal for its purpose, e.g. sorted, stored in GPU, etc. The engineering issue i've been struggling with is how to create/update/delete Components in such a design, preferably allowing good parallelization. I've read every article/post i can find, none seem to present a full or convincing solution. Conceivable methods:
1. Direct modification, synchronous message passing
A System just calls the function to directly create/update/delete a Component in another System, or itself. Simple, but would make parallelization nearly pointless since every Component data structure would require locking. No batching, if Components must maintain order, or some other post-processing, sorting after every call is sub-optimal. To avoid locking, Systems could possibly be orchestrated do their processing in an order to not modify each other concurrently, but as the number of Systems increase i fear it would become a maintainable nightmare, also limiting parallelization.
2. Single “global” Event queue, asynchronous message passing
A System will enqueue Events, such as EnemySpawned or WeaponFired, on a single “global” queue. All Systems would then process the queue when they can do so without interference, possibly in parallel, to create/update/delete their Components, e.g. PositionSystem create its PositionComponent for the Event, RenderSystem its MeshComponent, etc.. I'd expect better performance since less locking is required, only on the queue itself. Batching would be possible, but every System would have to process lots of irrelevant Events. This also seems illogical to me, a RenderSystem shouldn't have to understand high-level concepts such as Enemy nor Weapon, and as the number of Events increase so does “listeners” in every interested System.
2.1. Each Event type has a “global” queue
Similar to #2 except every type of Event has its own global queue. This could improve performance as interested Systems wouldn't have to process irrelevant Events, less queue lock contention. Downside is that there's could possibly be an unmanageable amount of queues floating around.
2.2. Each Event type has a “global” queue for every worker thread
Same as #2.1 except every worker thread has its own queue, so they don't need any locking, possibly improving performance. Downside, even more queues.
3. Single “global” Command queue, asynchronous message passing
Similar to #2, but instead of Events a System enqueue specific Commands, e.g. CreatePositionComponent handled by PositionSystem, CreateMeshComponent handled by RenderSystem, etc.. Possible improvements over #2 are that RenderSystem no longer has to understand high-level concepts, and that every System probably only need three “listeners” (create/update/delete). Possible downside is that spawning an (Enemy) entity could create a lot of Commands.
3.1. Each Command type has a “global” queue
Similar to #3 except every type of Command has its own global queue, so there's less queue lock contention. Beyond that i don't see any improvement since a System wouldn't be able to perform create/delete in a single batch.
3.2. Each Command type has a "global" queue for every worker thread
Same as #3.1 except every worker thread has its own queue, so it doesn't need for locking, possibly improving performance. Downside, more queues.
4. Each System has a single Command queue
Similar to #3 except every System has its own Command queue. This could improve performance since the System wouldn't have to process any irrelevant Commands meant for other Systems. It could perform create/delete in a single batch, and it would also be possible, since the System owns, or take ownership of, the queue, to sort the Commands in an order optimal for processing.
4.1. Each System has a Command queue for every worker thread
Same as #4 except every worker thread has it's own queue, so they don't need any locking, possibly improving performance. This is how Vulkan does it, so i'd expect it's the preferable solution.
5. (Theoretical) task scheduler with System dependency graph
Some AAA seem to use such a design, i have no idea of the internals but at lower-level i guess similar to #4.1 except it enqueue a Command handler function, co-routine or fiber instead of just Command data.
EDIT:
6. Double-buffered Component data
Each Systems has two, or X number of frames, copies/sets of their Component data, then swap after each frame. Read from current set, write to next/future set. More of simulation/logic data requirement i guess, since i don't seem how this would reduce locking when concurrently creating/deleting Components in other System, an Event/Command buffer would still be required for that. Copying all Component data every frame could be costly.
Any other solutions?
What's your preference, why?