ByteDance Seed-OSS-36B, a large language model specifically for long context understanding and reasoning
Ctrl+Alt+Future3 Syys 2025

ByteDance Seed-OSS-36B, a large language model specifically for long context understanding and reasoning

Seed-OSS is a set of open-source large-scale language models developed by ByteDance Seed Team, designed to provide powerful capabilities in long-context understanding, reasoning, and agentic tasks. It stands out with its flexible control of the "thinking budget", robust performance on various benchmarks, and research-friendly approach, making it a versatile tool for developers and researchers alike.

- Specifically designed to provide long-context understanding, reasoning, agentic, and general capabilities.

- Primarily optimized for internationalized (i18n) use cases.

- Users can flexibly adjust the length of reasoning as needed

- Seed-OSS is specifically optimized for reasoning tasks

According to ByteDance, the open-source SOTA (State-Of-The-Art) performs well in various categories, including Knowledge (MMLU-Pro, MMLU, TriviaQA for the base model; MMLU-Pro, MMLU for the Instruct model), Mathematics (GSM8K, MATH for the base model; AIME24, AIME25, BeyondAIME for the Instruct model), Coding (MBPP, HumanEval for the base model; LiveCodeBench v6 for the Instruct model), Instruction Following (IFEval), Agent (TAU1-Retail, SWE-Bench, Multi-SWE-Bench), Multilingualism (MMMLU), and Long Context (RULER)


Links

Seed-OSS Open-Source Models Release: https://seed.bytedance.com/en/blog/seed-oss-open-source-models-release?view_from=blogHugging Face ByteDance-Seed/Seed-OSS-36B-Instruct: https://huggingface.co/ByteDance-Seed/Seed-OSS-36B-InstructGitHub: https://github.com/ByteDance-Seed/seed-ossLM Studio: https://lmstudio.ai/home


Jaksot(15)

Qwen3-Next: Free large language model from Alibaba that could revolutionize training costs?

Qwen3-Next: Free large language model from Alibaba that could revolutionize training costs?

Qwen3-Next is a new large-scale language model (LLM) from Alibaba that has 80 billion parameters but only activates 3 billion during inference through a hybrid attention mechanism and rare Mixture-of-...

15 Syys 202546min

HunyuanImage 2.1 is an open source model that can generate high resolution (2K) images

HunyuanImage 2.1 is an open source model that can generate high resolution (2K) images

HunyuanImage 2.1 is an open source text-to-image diffusion model capable of generating ultra-high resolution (2K) images. It stands out with its dual text encoder, two-stage architecture including a r...

12 Syys 202533min

Google Stitch: user interface (UI) design using artificial intelligence

Google Stitch: user interface (UI) design using artificial intelligence

Google Stitch is an AI-powered tool designed for app developers to generate user interfaces (UI) for mobile and web applications. It can turn ideas into UIs. By default, it uses Google DeepMind’s late...

12 Syys 202533min

Kimi K2 0905 is the latest update to Moonshot AI's large-scale Mixture-of-Experts language model

Kimi K2 0905 is the latest update to Moonshot AI's large-scale Mixture-of-Experts language model

Kimi K2 0905 is the latest update to Moonshot AI’s large-scale Mixture-of-Experts (MoE) language model, which is well-suited for complex agent-like tasks. With its advanced coding and reasoning capabi...

7 Syys 202529min

Tencent HunyuanWorld-Voyager: Generating 3D-consistent video from a single photo

Tencent HunyuanWorld-Voyager: Generating 3D-consistent video from a single photo

Tencent has unveiled its AI-powered tool called HunyuanWorld-Voyager, which can transform a single image into a directional, 3D-consistent video—providing the thrill of exploration without the need fo...

7 Syys 202546min

GLM-4.5: The Next Generation of Artificial Intelligence That Thinks and Acts

GLM-4.5: The Next Generation of Artificial Intelligence That Thinks and Acts

Z.ai introduces its latest flagship models, the GLM-4.5 and GLM-4.5-Air, which take the capabilities of intelligent assistants to a new level. These models uniquely combine deep analytics, master-leve...

7 Syys 202535min

Gemini 2.5 Flash Image: Advanced AI Generation and Editing

Gemini 2.5 Flash Image: Advanced AI Generation and Editing

Gemini 2.5 Flash Image, also known as Nano Banana, is an advanced, multimodal image creation and editing model that can interpret both text and image commands, allowing users to create, edit, and iter...

4 Syys 202549min

Qwen-Image image generation model: complex text display and precise image editing

Qwen-Image image generation model: complex text display and precise image editing

Qwen-Image is a basic image generation model developed by Alibaba's Qwen team. It has two outstanding capabilities: complex text rendering and precise image editing.Qwen-Image can render text, even lo...

3 Syys 202539min