Why Genie 3 Matters & Intent is moving | Intent, 0031
World models might be the most important thing.
As a reminder, Intent is all about helping talent in tech become more intentional with their career by staying informed, fluent, and aware of what’s going on in and around the industry. Thanks for sticking with us!
Today’s agenda is shorter than usual:
Google’s Genie 3 and Why World Models Matter
Before we get to the main course: we’re moving email providers from Beehiiv to Substack this week. You don’t need to do anything to keep receiving Intent, but keep an eye out for a transition email tomorrow — if it lands in Promotions/Spam, move it back to your Inbox to stay in tune!
A few days ago, we mentioned that Google’s DeepMind had released Project Genie (fueled by Genie 3) to Google AI Ultra subscribers. Now, we want to talk a little bit more about why world models like Genie 3 are so important.
How it works: users can prompt with either text or images, generating an interactive, 3D world with basic controls to move around in the environment. You can say something like “robot chicken in a digital simulation of New York,” and you’ll have a generated world that you can navigate for 60 seconds before it expires.

How it’s made: Genie 3 is a world model. Although all of the architecture hasn’t been shared, the key element is a combination of learning through frames of video combined with an action representation. In other words, when Genie trains on a chunk of video, it also infers what actions would needed to have happened in the scene to explain why things changed the way they did.
This is why Genie 3 is able to simulate worlds and give users control. It learns an almost causal relationship from its training data, which means that it can simulate physics and interactions while maintaining consistency.
If it can see many videos of a marble on one side of a flat table, then see that it winds up on the other side of a flat table, with a hand touching it in between, it can interpret the action-driven dynamics (including what shouldn’t change about a scene) and predict them in the future.
Examples:
You can imagine these simulated experiences getting better and better over time, eventually giving us something adjacent or perhaps even completely like custom video games.
But more importantly — these virtual environments can embody AI training in a way we simply haven’t seen before. Genie 3 gives us the Matrix, but a Matrix is useless without an occupant.
Enter SIMA 2 — Google’s embodied agent that can navigate, follow instructions, and interact in virtual worlds (video games or 3D environments, for example).

By running simulations inside these virtual worlds, SIMA can drive reinforcement learning and self-supervised training — it can be repeatedly asked to navigate a virtual factory, powered by accurate physics from Genie 3 and other world models, developing synthetic training data that generalizes across environments.
With this model-based reinforcement learning (MBRL), SIMA can run simulations in Genie 3 over and over again to learn how to operate in a factory (in a simulation loop), developing a strong ‘theory’ about how the world works before being inserted into a robot.
Given that most AI robotics are desperately thirsty for more real-world training data that we don’t have (some are even paying people to wear cameras on their face while they clean their home or operate in a warehouse), this sort of simulated training can provide a cheaper and faster alternative to the trillion-dollar bottleneck of physical data collection.
So, yes, Genie 3 is cool because it can simulate worlds and maybe even give us rich media experiences in the near future — but it’s even cooler for its possibility to provide the industry with dynamic training environments that would rapidly change the speed we get to generalized robotics (or even digitally-simulated physics, chemistry, and biology).
As Google puts it, “Genie 3 makes it possible to explore an unlimited range of realistic environments. This is a key stepping stone on the path to AGI – enabling AI agents capable of reasoning, problem solving, and real-world actions.”
Pretty rad.
Upcoming workshops:
Think a friend could use a dose of Intent? Forward this along – inbox envy is real.
Sent with Intent,
By Free Agency
