NVIDIA Unveils Cosmos 3: Cutting-Edge Physical AI Foundation Model

NVIDIA has launched Cosmos 3, an open physical AI foundation model that revolutionizes AI reasoning and simulation. This new model utilizes a state-of-the-art mixture-of-transformers architecture. It is the first fully open omnimodel equipped with inherent capabilities for vision reasoning and multimodal generation. This includes generating synthetic data across text, images, videos, ambient sounds, and physical actions.
NVIDIA’s New Initiative
The unveiling occurred during the NVIDIA GTC event in Taipei. Along with the model, NVIDIA introduced the Cosmos Coalition. This initiative aims to foster collaboration among AI labs and robotics experts. Key partners in this coalition include Agile Robots, Black Forest Labs, Generalist, LTX, Runway, and Skild AI.
Features of Cosmos 3
- World’s first open omnimodel for physical AI.
- Integration of vision reasoning, world simulation, and action generation.
- Reduction of physical AI training cycles from months to days.
According to Jensen Huang, CEO of NVIDIA, the breakthroughs in multimodal reasoning signal a significant advancement in physical AI. The technology enables robots and autonomous vehicles to perceive and act effectively in real-world environments.
Architecture Innovations
Cosmos 3 addresses challenges in physical AI by enhancing generalization capabilities with limited training data. Its innovative architecture pairs reasoning and generation transformers. This allows for understanding complex interactions and generating accurate motion predictions.
The model is trained on one of the largest multimodal datasets available, including billions of samples. This extensive training allows developers to create effective physical AI systems while minimizing data requirements and costs.
Performance Benchmarks
Cosmos 3 excels in various physical AI benchmarks, ranking highly across several evaluation criteria:
- First place in Artificial Analysis and Physics-IQ for world generation accuracy.
- Leading scores in RoboLab and RoboArena for action policy.
- Top rankings in VANTAGE-Bench and TAR for vision understanding.
Variants of Cosmos 3
NVIDIA offers different versions of Cosmos 3 tailored for varying physical AI development needs:
- Cosmos 3 Super: Targeted for post-training robotics and vehicles requiring high precision and quality.
- Cosmos 3 Nano: Designed for rapid video and action reasoning.
- Cosmos 3 Edge: Upcoming model for real-time processing at the edge.
Community Collaboration through the Cosmos Coalition
The Cosmos Coalition invites AI developers and model builders to collaborate on world model advancements. This effort enhances innovation and interoperability across industries. The member organizations contribute resources and expertise to leverage Cosmos 3 technologies effectively.
Real-World Applications
Cosmos 3 supports various industries, aiding in robotics, autonomous vehicles, and vision AI applications. Leading companies such as LG Electronics and Samsung are already developing solutions using this advanced platform.
Availability and Resources
Cosmos 3 Super and Nano are now accessible, while Cosmos 3 Edge will be available soon. Developers can explore the models at build.nvidia.com and access open resources via Hugging Face and GitHub.
This launch represents a significant step towards accelerating the development of physical AI applications across multiple sectors.




