Reasoning & Thinking Updates

There's been research I want to cover and write up about that covers how we can make:

TD-MPC 2 like models where there's a world model and an explicit planning & thinking loop, (ArXiv)[]
Dreamer like model which learns how to reason during training - but the validator type world model gets thrown away after training is finished,(ArXiv)[]
Continuous Thought Machines which has a decoupled internal time dimension to spend arbitrarily long compute per output token. It can dynamically change its own attention maps - it's just more expensive to train but it is a close high levle representation of how human brain works.(ArXiv)[]
Latent space planning in LLMs, (ArXiv)[https://arxiv.org/pdf/2601.21598]

Mixture of Depths like gating to spend more compute on certain tokens than others. (ArXiv)[https://arxiv.org/abs/2404.02258]