An introduction to modular induction and some attempts to solve it
lesswrong.com·3d
🔍Type Inference
Preview
Report Post

Published on December 23, 2025 10:35 PM GMT

The current crop of AI systems appears to have world models to varying degrees of detailedness, but we cannot understand these world models easily as they are mostly giant floating-point arrays. If we knew how to interpret individual parts of the AIs’ world models, we would be able to specify goals within those world models instead of relying on finetuning and RLHF for instilling objectives into AIs. Hence, I’ve been thinking about world models.

I don’t have a crisp definition for the term “world model” yet, but I’m hypothesizing that it involves an efficient representation of the world state, together with rules and laws that govern the dynamics of the world.

In a sense, we already know how to get a p…

Similar Posts

Loading similar posts...