MarioVGG is a new AI model that can generate plausible video of Super Mario Bros. from user inputs.,The model was trained on over 737,000 frames of Mario gameplay.,Despite limitations, MarioVGG shows potential for AI to replace game engines in the future.
The Future of Gaming: AI-Generated Video
Imagine playing your favourite video game without a traditional game engine. Instead, an AI model generates the gameplay based on video footage. This is the fascinating concept behind MarioVGG, a new AI model that simulates Super Mario Bros. from video data. Developed by researchers from Virtuals Protocol, MarioVGG represents a significant step towards AI-generated video games.
Training MarioVGG: A Massive Undertaking
To train MarioVGG, the researchers used a public dataset containing 280 levels of Super Mario Bros. gameplay. This dataset included over 737,000 individual frames, which were preprocessed into 35-frame chunks. The model focused on two inputs: "run right" and "run right and jump." Even with these limitations, training the model took about 48 hours on a single RTX 4090 graphics card.
How MarioVGG Works
MarioVGG uses a standard convolution and denoising process to generate new frames of video from a static starting game image and a text input. The model can create gameplay videos of any length by using the last frame of one sequence as the first frame of the next. This results in "coherent and consistent gameplay," according to the researchers. For a deeper dive into video generation, you might explore our guide on Beginner's Guide to Using Sora AI Video.
Challenges and Limitations
Despite its impressive capabilities, MarioVGG has several limitations. The model downscales the output frames to a resolution of 64×48, much lower than the NES's 256×240 resolution. It also condenses 35 frames of video into just seven generated frames, resulting in rougher-looking gameplay. Additionally, MarioVGG struggles to approach real-time video generation, taking six seconds to generate a six-frame video sequence. This highlights some of the ongoing challenges in running out of data: the strange problem behind AI's next bottleneck.
Impressive Results Despite Limitations
Even with these limitations, MarioVGG can create passably believable video of Mario running and jumping. The model can infer game physics, such as Mario falling when he runs off a cliff and halting his forward motion when adjacent to an obstacle. MarioVGG can also hallucinate new obstacles for Mario, although these can't be influenced by user prompts. The ability of AI to generate creative content continues to evolve, as seen with OpenAI adds reusable ‘characters’ and video stitching to Sora.
The Future of AI in Gaming
The researchers hope that MarioVGG represents a first step towards "producing and demonstrating a reliable and controllable video game generator." They even suggest that AI models like MarioVGG could one day replace game development and game engines completely. This echoes broader discussions about AI's impact on various industries, including whether AI & Call Centres: Is The End Nigh?. The potential for AI to transform creative fields is immense, as detailed in this research paper on AI in game design.
Comment and Share:
What do you think about the future of AI in gaming? Could AI models like MarioVGG really replace traditional game engines? Share your thoughts and experiences in the comments below. Don't forget to Subscribe to our newsletter for updates on AI and AGI developments.







Latest Comments (4)
wow 737,000 frames is a lot for just two inputs, "run right" and "run right and jump." makes me think about how much data would be needed to simulate something more complex, like a k-drama scene with different character interactions and dialogue. how much bigger would the dataset need to be for a more nuanced world?
I'm thinking about the implications for game preservation, especially with so many older titles in Asia that might not have easily accessible source code, if all you need is footage.
@pierred It's interesting to see MarioVGG and the whole idea of replacing game engines with AI-generated video. From a research perspective, while the 737,000 frames is a substantial dataset for a single game, the downscaled 64x48 resolution and the 35 frames compressed to seven generated frames highlight a persistent challenge in generative models for video. This reminds me of some of the work we see coming out of institutions like EPFL, focusing on ultra-low latency inference for high-fidelity video generation. The "coherent and consistent gameplay" is a good step, but the path to truly actionable, high-resolution, real-time interactive environments via pure generation, sans engine, remains a significant hurdle. C'est la vie, these models are still in their infancy.
The mention of using 737,000 frames for training, even for a classic like Mario, highlights the significant data requirements. This is something we are actively considering in our discussions around building shared digital infrastructure within the ASEAN digital economy framework. I'm keen to see if this kind of model could be adapted for public service simulations.
Leave a Comment