Tencent HunyuanWorld-Voyager: Generating 3D-consistent video from a single photo
Ctrl+Alt+Future7 Sep 2025

Tencent HunyuanWorld-Voyager: Generating 3D-consistent video from a single photo

Tencent has unveiled its AI-powered tool called HunyuanWorld-Voyager, which can transform a single image into a directional, 3D-consistent video—providing the thrill of exploration without the need for actual 3D modeling. It’s a clever solution: by blending RGB and depth data, it preserves the position of objects from different angles, creating the illusion of spatial consistency.


The model aims to create 3D-consistent point cloud sequences from a single image with user-defined camera movement for world exploration. The framework also includes a data acquisition mechanism that automates the prediction of camera angles and metric depth for videos, allowing for the creation of large amounts of annotated training data. Voyager has demonstrated outstanding performance in scene video generation and 3D world reconstruction, outperforming previous methods in terms of geometric coherence and visual quality.


The results aren't true 3D models, but they achieve a similar effect: The AI ​​tool generates 2D video images that maintain spatial consistency as if the camera were moving in a real 3D space. Each generation results in just 49 frames—roughly two seconds of video—although Tencent says multiple clips can be strung together to create "multiple-minute" sequences. Objects remain in the same relative position as the camera moves around them, and the perspective changes correctly, as would be expected in a real 3D environment. While the output is video with depth maps rather than true 3D models, this information can be transformed into 3D point clouds for reconstruction purposes. The system accepts a single input image and a user-defined camera trajectory. Users can specify camera movements, such as forward, backward, left, right, or pan, via the provided interface. The system combines image and depth data with a memory-efficient "world cache" to produce video sequences that reflect user-defined camera movements.


Voyager is trained to recognize and reproduce patterns of spatial consistency, but with an added geometric feedback loop. As it creates each frame, it converts the output into 3D points, then projects those points back into 2D to reference subsequent frames.


The model comes with significant licensing restrictions. Like Tencent's other Hunyuan models, the license prohibits use in the European Union, the United Kingdom, and South Korea. In addition, commercial deployments exceeding 100 million monthly active users require separate licensing from Tencent.


Links

HunyuanWorld-Voyager: https://3d-models.hunyuan.tencent.com/world/Kutatási anyag: https://3d-models.hunyuan.tencent.com/voyager/voyager_en/assets/HYWorld_Voyager.pdfHugging Face: https://huggingface.co/tencent/HunyuanWorld-VoyagerGitHub: https://github.com/Tencent-Hunyuan/HunyuanWorld-VoyagerRunPod: https://runpod.io?ref=2pdhmpu1Runpod bemutató: https://www.youtube.com/watch?v=WudXnf8Gogc

Episoder(15)

Qwen3-Next: Free large language model from Alibaba that could revolutionize training costs?

Qwen3-Next: Free large language model from Alibaba that could revolutionize training costs?

Qwen3-Next is a new large-scale language model (LLM) from Alibaba that has 80 billion parameters but only activates 3 billion during inference through a hybrid attention mechanism and rare Mixture-of-...

15 Sep 202546min

HunyuanImage 2.1 is an open source model that can generate high resolution (2K) images

HunyuanImage 2.1 is an open source model that can generate high resolution (2K) images

HunyuanImage 2.1 is an open source text-to-image diffusion model capable of generating ultra-high resolution (2K) images. It stands out with its dual text encoder, two-stage architecture including a r...

12 Sep 202533min

Google Stitch: user interface (UI) design using artificial intelligence

Google Stitch: user interface (UI) design using artificial intelligence

Google Stitch is an AI-powered tool designed for app developers to generate user interfaces (UI) for mobile and web applications. It can turn ideas into UIs. By default, it uses Google DeepMind’s late...

12 Sep 202533min

Kimi K2 0905 is the latest update to Moonshot AI's large-scale Mixture-of-Experts language model

Kimi K2 0905 is the latest update to Moonshot AI's large-scale Mixture-of-Experts language model

Kimi K2 0905 is the latest update to Moonshot AI’s large-scale Mixture-of-Experts (MoE) language model, which is well-suited for complex agent-like tasks. With its advanced coding and reasoning capabi...

7 Sep 202529min

GLM-4.5: The Next Generation of Artificial Intelligence That Thinks and Acts

GLM-4.5: The Next Generation of Artificial Intelligence That Thinks and Acts

Z.ai introduces its latest flagship models, the GLM-4.5 and GLM-4.5-Air, which take the capabilities of intelligent assistants to a new level. These models uniquely combine deep analytics, master-leve...

7 Sep 202535min

Gemini 2.5 Flash Image: Advanced AI Generation and Editing

Gemini 2.5 Flash Image: Advanced AI Generation and Editing

Gemini 2.5 Flash Image, also known as Nano Banana, is an advanced, multimodal image creation and editing model that can interpret both text and image commands, allowing users to create, edit, and iter...

4 Sep 202549min

Qwen-Image image generation model: complex text display and precise image editing

Qwen-Image image generation model: complex text display and precise image editing

Qwen-Image is a basic image generation model developed by Alibaba's Qwen team. It has two outstanding capabilities: complex text rendering and precise image editing.Qwen-Image can render text, even lo...

3 Sep 202539min

Populært innen Teknologi

romkapsel
rss-avskiltet
teknisk-sett
tomprat-med-gunnar-tjomlid
energi-og-klima
lydartikler-fra-aftenposten
rss-impressions-2
shifter
nasjonal-sikkerhetsmyndighet-nsm
fornybaren
elektropodden
hans-petter-og-co
smart-forklart
pedagogisk-intelligens
rss-alt-vi-kan
rss-fish-ships
teknologi-og-mennesker
rss-for-alarmen-gar
rss-ki-praten
rss-alt-som-gar-pa-strom