Tencent HunyuanWorld-Voyager: Generating 3D-consistent video from a single photo
Ctrl+Alt+Future7 Syys 2025

Tencent HunyuanWorld-Voyager: Generating 3D-consistent video from a single photo

Tencent has unveiled its AI-powered tool called HunyuanWorld-Voyager, which can transform a single image into a directional, 3D-consistent video—providing the thrill of exploration without the need for actual 3D modeling. It’s a clever solution: by blending RGB and depth data, it preserves the position of objects from different angles, creating the illusion of spatial consistency.


The model aims to create 3D-consistent point cloud sequences from a single image with user-defined camera movement for world exploration. The framework also includes a data acquisition mechanism that automates the prediction of camera angles and metric depth for videos, allowing for the creation of large amounts of annotated training data. Voyager has demonstrated outstanding performance in scene video generation and 3D world reconstruction, outperforming previous methods in terms of geometric coherence and visual quality.


The results aren't true 3D models, but they achieve a similar effect: The AI ​​tool generates 2D video images that maintain spatial consistency as if the camera were moving in a real 3D space. Each generation results in just 49 frames—roughly two seconds of video—although Tencent says multiple clips can be strung together to create "multiple-minute" sequences. Objects remain in the same relative position as the camera moves around them, and the perspective changes correctly, as would be expected in a real 3D environment. While the output is video with depth maps rather than true 3D models, this information can be transformed into 3D point clouds for reconstruction purposes. The system accepts a single input image and a user-defined camera trajectory. Users can specify camera movements, such as forward, backward, left, right, or pan, via the provided interface. The system combines image and depth data with a memory-efficient "world cache" to produce video sequences that reflect user-defined camera movements.


Voyager is trained to recognize and reproduce patterns of spatial consistency, but with an added geometric feedback loop. As it creates each frame, it converts the output into 3D points, then projects those points back into 2D to reference subsequent frames.


The model comes with significant licensing restrictions. Like Tencent's other Hunyuan models, the license prohibits use in the European Union, the United Kingdom, and South Korea. In addition, commercial deployments exceeding 100 million monthly active users require separate licensing from Tencent.


Links

HunyuanWorld-Voyager: https://3d-models.hunyuan.tencent.com/world/Kutatási anyag: https://3d-models.hunyuan.tencent.com/voyager/voyager_en/assets/HYWorld_Voyager.pdfHugging Face: https://huggingface.co/tencent/HunyuanWorld-VoyagerGitHub: https://github.com/Tencent-Hunyuan/HunyuanWorld-VoyagerRunPod: https://runpod.io?ref=2pdhmpu1Runpod bemutató: https://www.youtube.com/watch?v=WudXnf8Gogc

Jaksot(15)

OpenAI gpt-oss: OpenAI's latest development in open source AI models

OpenAI gpt-oss: OpenAI's latest development in open source AI models

We’d like to introduce OpenAI’s latest development in open source AI models: the gpt-oss series. These two open-weight language models, gpt-oss-120b and gpt-oss-20b, have been tested by OpenAI to deli...

3 Syys 202551min

Qwen-Image-Edit: Image editing with artificial intelligence. No need for Photoshop anymore?

Qwen-Image-Edit: Image editing with artificial intelligence. No need for Photoshop anymore?

Today, we will look at an AI model that simplifies image editing: Qwen-Image-Edit. This model builds on the foundation of the original, high-performance Qwen-Image, and brings amazing capabilities in ...

3 Syys 202527min

ByteDance Seed-OSS-36B, a large language model specifically for long context understanding and reasoning

ByteDance Seed-OSS-36B, a large language model specifically for long context understanding and reasoning

Seed-OSS is a set of open-source large-scale language models developed by ByteDance Seed Team, designed to provide powerful capabilities in long-context understanding, reasoning, and agentic tasks. It...

3 Syys 202539min

Microsoft VibeVoice is excellent for creating podcasts, even by cloning our own voice

Microsoft VibeVoice is excellent for creating podcasts, even by cloning our own voice

VibeVoice is a novel framework designed to generate expressive, emotional, and lifelike long-form, multi-actor audio, such as podcasts, from text. The model aims to solve the significant challenges of...

3 Syys 202540min

Deep Cogito - Cogito v2: Free model. Using a unique, iterative self-learning method (IDA)

Deep Cogito - Cogito v2: Free model. Using a unique, iterative self-learning method (IDA)

According to developer Deep Cogito, Cogito v2 is one of the world’s most powerful open-source AI models, available in sizes ranging from 70B to 671B parameters. Thanks to its unique, iterative self-le...

3 Syys 202547min

Mastering Prompt Tricks with Large Language Models

Mastering Prompt Tricks with Large Language Models

In this episode, we dive deep into the art of crafting effective prompts for large language models. Join our hosts as they explore essential techniques to optimize outputs, enhance creativity, and imp...

26 Syys 202410min

AI in Enterprise

AI in Enterprise

The rapid development of AI has outpaced the ability of many organisations to adapt1. This discrepancy presents both challenges and opportunities. While there is growing pressure to utilize AI for its...

13 Syys 20244min