I highly recommend Professor Tim Roughgarden's Youtube series on PoW mining. He lays things out simply, but from a very high level understanding:
Lecture 10.2 Maximizing Block Rewards
Lecture 10.3. 51% attack
Lecture 10.4. Selfish mining pt1
Lecture 10.5. Selfish mining pt2
but since the cheating pool is already ahead with two blocks
It's not so straightforward. The selfish mining pool operating in secret takes a big risk. They spend all the electricity but keep the winning block hidden. The thing is, a block can be hit upon ahead of time. It's still random luck. If monero takes 2 minutes to mine a block, someone can potentially mine a block in 20 seconds. Imagine The A Team (the goodies) has published A1, and is now looking for A2. The B Team (the baddies) has secretly found B1, has been searching for B2 has just found it. Again, The B Team refuses to publish B2 and instead looks for B3. Suddenly, The A Team publishes A2, and then by great fortune hits upon A3. The B Team has just lost a lot of money in this situation. Another possibility is that The A Team cannot find A3 in good time, but neither can the B Team find B3. A C Team might enter the game and find A3.
Antpool controls about 20% of the hashrate. You might think it should win 1 block in 5. Recently, it went 20 blocks without a win. One time, however, it won 7 blocks in a row.
The upshot of all this, I believe, is that the monero blocks cost too little too produce, both in fees and hardware. This made it worth the risk to forego block rewards. Satoshi's method depends on their being a real, dynamic cost to publishing blocks.
If you enjoy strategy, building and war games then you might enjoy this amazing game which is called Terracore. https://www.terracoregame.com/