There's been research I want to cover and write up about that covers how we can make:
- TD-MPC 2 like models where there's a world model and an explicit planning & thinking loop, (ArXiv)[]
- Dreamer like model which learns how to reason during training - but the validator type world model gets thrown away after training is finished,(ArXiv)[]
- Continuous Thought Machines which has a decoupled internal time dimension to spend arbitrarily long compute per output token. It can dynamically change its own attention maps - it's just more expensive to train but it is a close high levle representation of how human brain works.(ArXiv)[]
- Latent space planning in LLMs, (ArXiv)[https://arxiv.org/pdf/2601.21598]
Then there are some even more implicit/passive strategies like:
- Mixture of Depths like gating to spend more compute on certain tokens than others. (ArXiv)[https://arxiv.org/abs/2404.02258]
