Ctrl+Alt+Future12 Sep 2025

HunyuanImage 2.1 is an open source model that can generate high resolution (2K) images

HunyuanImage 2.1 is an open source text-to-image diffusion model capable of generating ultra-high resolution (2K) images. It stands out with its dual text encoder, two-stage architecture including a refinement model, and PromptEnhancer module for automatic prompt transcription, all contributing to image-to-text consistency and more detailed control.

What does HunyuanImage 2.1 image generation model do?

- High resolution: Generates ultra-high resolution (2K) images with cinematic quality composition

- Supports various aesthetics, from photorealism to anime, comics, and vinyl figures, providing outstanding visual appeal and artistic quality.

- Multilingual prompt support: Natively supports both Chinese and English prompts. The multilingual ByT5 text encoder integrated into the model improves text rendering capabilities and image-to-text integration.

- Advanced semantics and granular control: It can handle ultra-long and complex prompts, up to 1000 tokens. It precisely controls the generation of multiple objects with different descriptions within a single image, including scene details, character poses, and facial expressions.

- Flexible aspect ratios: It supports various aspect ratios such as 1:1, 16:9, 9:16, 4:3, 3:4, 3:2, 2:3

HunyuanImage 2.1 stands out from other models with several technological innovations and unique features:

- Two-stage architecture:

1. Basic text-to-image model: This first stage uses two text encoders: a multimodal large-scale language model (MLLM) to improve image-text matching, and a multilingual character-aware encoder to improve text rendering in different languages. This stage includes a single and dual-stream diffusion transformer (DiT) with 17 billion parameters. It uses human feedback-based reinforcement learning (RLHF) to optimize aesthetics and structural coherence.

2. Refiner Model: The second stage introduces a refiner model that further improves image quality and clarity while minimizing artifacts.

- High-compression VAE (Variational Autoencoder): The model uses a highly expressive VAE with a 32x spatial compression ratio, significantly reducing computational costs. This allows it to generate 2K images with the same token length and inference time as other models require for 1K images.

- PromptEnhancer module (text transcription model): This is an innovative module that automatically transcribes user prompts, supplementing them with detailed and descriptive information to improve descriptive accuracy and visual quality

- Extensive training data and captioning: It uses an extensive dataset and structured captions that involve multiple expert models to significantly improve text-to-image matching. It also employs an OCR agent and IP RAG to address the shortcomings of VLM captioners in dense texts and world knowledge descriptions, and a two-way verification strategy to ensure caption accuracy.

- Open source model: HunyuanImage 2.1 is open source, and the inference code and pre-trained weights were released on September 8, 2025

Links

Twitter: https://x.com/TencentHunyuan/status/1965433678261354563

Blog: https://hunyuan.tencent.com/image/en?tabIndex=0

PromptEnhancer: A Simple Approach to Enhance Text-to-Image Models via Chain-of-Thought Prompt: https://hunyuan-promptenhancer.github.io/

GitHub PromptEnhancer: https://github.com/Hunyuan-PromptEnhancer/PromptEnhancer

PromptEnhancer Paper: https://www.arxiv.org/pdf/2509.04545

Hugging Face HunyuanImage-2.1: https://huggingface.co/tencent/HunyuanImage-2.1

GitHub: https://github.com/Tencent-Hunyuan/HunyuanImage-2.1

Checkpoints: https://github.com/Tencent-Hunyuan/HunyuanImage-2.1/blob/main/ckpts/checkpoints-download.md

Hugging Face demo: https://huggingface.co/spaces/tencent/HunyuanImage-2.1

RunPod: https://runpod.io?ref=2pdhmpu1

Leaderboard-Image: https://github.com/mp3pintyo/Leaderboard-Image

Oppdag Premium

Prøv 14 dager gratis

Kjøp Premium

Episoder(15)

Qwen3-Next: Free large language model from Alibaba that could revolutionize training costs?

Qwen3-Next is a new large-scale language model (LLM) from Alibaba that has 80 billion parameters but only activates 3 billion during inference through a hybrid attention mechanism and rare Mixture-of-...

15 Sep 202546min

Google Stitch: user interface (UI) design using artificial intelligence

Google Stitch is an AI-powered tool designed for app developers to generate user interfaces (UI) for mobile and web applications. It can turn ideas into UIs. By default, it uses Google DeepMind’s late...

12 Sep 202533min

Kimi K2 0905 is the latest update to Moonshot AI's large-scale Mixture-of-Experts language model

Kimi K2 0905 is the latest update to Moonshot AI’s large-scale Mixture-of-Experts (MoE) language model, which is well-suited for complex agent-like tasks. With its advanced coding and reasoning capabi...

7 Sep 202529min

Tencent HunyuanWorld-Voyager: Generating 3D-consistent video from a single photo

Tencent has unveiled its AI-powered tool called HunyuanWorld-Voyager, which can transform a single image into a directional, 3D-consistent video—providing the thrill of exploration without the need fo...

7 Sep 202546min

GLM-4.5: The Next Generation of Artificial Intelligence That Thinks and Acts

Z.ai introduces its latest flagship models, the GLM-4.5 and GLM-4.5-Air, which take the capabilities of intelligent assistants to a new level. These models uniquely combine deep analytics, master-leve...

7 Sep 202535min

Gemini 2.5 Flash Image: Advanced AI Generation and Editing

Gemini 2.5 Flash Image, also known as Nano Banana, is an advanced, multimodal image creation and editing model that can interpret both text and image commands, allowing users to create, edit, and iter...

4 Sep 202549min

Qwen-Image image generation model: complex text display and precise image editing

Qwen-Image is a basic image generation model developed by Alibaba's Qwen team. It has two outstanding capabilities: complex text rendering and precise image editing.Qwen-Image can render text, even lo...

3 Sep 202539min

Reklamefrie Premium-podkaster

Hør populære podkaster som Storefri med Mikkel og Herman, Ida med hjertet i hånden, Krimpodden og mye mye mer

Skap din egen podkastboble

I appen skaper du ditt eget bibliotek med favoritter, og vi gir deg også anbefalinger til podkaster du ikke kan gå glipp av.

Prøv 14 dager gratis

Dersom du er ny Podme-bruker får du 14 dager gratis prøveperiode når du oppretter abonnement

Premium

99 kr/ måned

Tilgang til alle våre Premium-podkaster
Alle podkaster fra VG, Aftenposten, BT og SA
Reklamefritt Premium-innhold
Ingen bindingstid. Avslutt når du ønsker

Prøv 14 dager gratis

Premium

129 kr/ måned

Tilgang til alle Premium-podkaster
Alle podkaster fra VG, Aftenposten, BT og SA
Reklamefritt Premium-innhold
Ingen bindingstid. Avslutt når du ønsker
En Ekstra bruker

Prøv 14 dager gratis

Populært innen Teknologi

Historiene og stemmene du vil høre

Ubegrenset tilgang til alle dine favorittpodkaster og lydbøker

Les mer