Planning algorithm that balances exploration/exploitation by sampling trajectories; used in agents. ← Mainnet Layer 1 (L1) →