Among people who haven't yet developed a networked game, there's this meme that you can “just add networking.” That's like thinking you can have a baby and ”just add a car seat."
This is not true. EVERYTHING CHANGES.
Not only does your main simulation engine need to be intimately tied into the networking system, but often the rendering needs to be, too, at a minimum to be able to extrapolate or interpolate animations based on delayed remote entity state.
But, even worse! Gameplay itself needs to be designed with the limitations of your particular networking engine in mind. There is a reason most RTS games work they way they do. There's a reason most FPS shooters work the way they do. There's a reason most “base defender” games work the way they do. There's a reason most “farming” games work the way they do. All of this is because that's the gameplay you end up with when you choose a particular networking approach, and push the gameplay into the corners it will be allowed to go by that particular technology.
This is not optional in any way – networking determines engine, and networking plus engine determine certain gameplay rules. If you want to go the other way, and start with novel gameplay rules, then you will have to attempt building a custom networking setup and custom engine tailored to those rules. As it turns out, many such “custom” approaches have been tried over the years, and the ones that actually ship tend to fall back to one of the “known” attractor states.
In a battle between game designers and the speed of light, the speed of light wins.
It sounds like you're on a team that hasn't done this before. If that is the case, and the entire team isn't 100% committed to each and everyone do everything it takes to success in networking, then prepare for your project to fail. If you're not OK with that, then you should find a team that will listen to reason.
If you absolutely have to make some random game with some random engine “work” (and I use that word loosely,) then the best you can do is to never re-simulate, but instead just receive state from other players in a constant stream of updates (probably at a rate slower than your simulation rate.) Then use a simple “simulation” for the other players that just interpolates between the last two received states, and thus shows them delayed by one network tick on the screen. Accept whatever gameplay and display glitches happen because of this, because you can do nothing better at that point.
“How does this square with GGPO, who says that anything can be networked?” you may ask. It turns out, when that article came out, the observation was made that it only works where you have a simple simulation engine that not only supports cheap frame updates, but also supports cheap state rewinds/snapshots. For a fighter game, that's generally not so bad; for something like a RTS or Battle Royale, that would be a very poor match. Hence, the gameplay itself is predicated on simulation being cheap to update and roll back, and the engine used matches that requirement, and thus that particular networking approach works out in that case.