SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model | #ai #2025 #genai #google
AI Today7 Helmi 2025

SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model | #ai #2025 #genai #google

Paper: https://arxiv.org/pdf/2501.17161 This research paper compares supervised fine-tuning (SFT) and reinforcement learning (RL) for post-training foundation models. Using novel and existing tasks involving arithmetic and spatial reasoning, the study finds that RL promotes better generalization to unseen data, unlike SFT which tends to memorize training data. Further analysis reveals RL enhances visual recognition capabilities in multimodal models, while SFT aids in stabilizing RL training by improving output formatting. The paper also explores the impact of increased inference-time computation on generalization. #ai, #artificialintelligence, #arxiv, #research, #paper, #publication, #llm, #genai, #generativeai, #largevisualmodels, #largelanguagemodels, #largemultimodalmodels, #nlp, #text, #machinelearning, #ml, #nvidia, #openai, #anthropic, #microsoft, #google, #technology, #cuttingedge, #meta, #llama, #chatgpt, #gpt, #elonmusk, #samaltman, #deployment, #engineering, #scholar, #science, #apple, #samsung, #turing, #aiethics, #innovation, #futuretech, #deeplearning, #datascience, #computervision, #autonomoussystems, #robotics, #dataprivacy, #cybersecurity, #digitaltransformation, #quantumcomputing, #aiapplications, #aiethics, #techleadership, #technews, #aiinsights, #aiindustry, #aiadvancements, #futureai, #airesearchers

Jaksot(30)

OpenAI's o3 and o3-mini: A New Frontier in AI | #ai #2024 #genai

OpenAI's o3 and o3-mini: A New Frontier in AI | #ai #2024 #genai

Blog: https://openai.com/12-days/ OpenAI announced two new large language models, o3 and o3-mini, showcasing significantly improved performance on various benchmarks, including coding, mathematics, ...

21 Joulu 202422min

Alignment Faking in Large Language Models | #ai #2024 #genai

Alignment Faking in Large Language Models | #ai #2024 #genai

Paper: https://arxiv.org/pdf/2412.14093 This research paper explores "alignment faking" in large language models (LLMs). The authors designed experiments to provoke LLMs into concealing their true pr...

21 Joulu 202414min

Veo 2, Imagen 3, and Whisk: State-of-the-Art AI Image and Video Generation | #ai #2024 #genai

Veo 2, Imagen 3, and Whisk: State-of-the-Art AI Image and Video Generation | #ai #2024 #genai

Blog: https://blog.google/technology/google... Google announced updates to its AI video and image generation models, Veo 2 and Imagen 3, boasting state-of-the-art capabilities in realism and style d...

21 Joulu 202419min

Allegro: Open the Black Box of Commercial-Level Video Generation Model | #ai #2024 #genai

Allegro: Open the Black Box of Commercial-Level Video Generation Model | #ai #2024 #genai

Paper: https://arxiv.org/pdf/2411.01747 This research report introduces Allegro, a novel, open-source text-to-video generation model that surpasses existing open-source and many commercial models in ...

4 Joulu 202419min

DynaSaur : Large Language Agents Beyond Predefined Actions | #ai #2024 #genai

DynaSaur : Large Language Agents Beyond Predefined Actions | #ai #2024 #genai

Paper: https://arxiv.org/pdf/2411.01747 The paper "DynaSaur: Large Language Agents Beyond Predefined Actions" introduces a novel large language model (LLM) agent framework that dynamically generates ...

4 Joulu 202419min

STAR ATTENTION: EFFICIENT LLM INFERENCE OVER LONG SEQUENCES | #ai #2024 #genai

STAR ATTENTION: EFFICIENT LLM INFERENCE OVER LONG SEQUENCES | #ai #2024 #genai

Paper: https://arxiv.org/pdf/2411.17116 The paper introduces Star Attention, a novel two-phase attention mechanism for efficient Large Language Model (LLM) inference on long sequences. It improves co...

4 Joulu 202416min

FERRET-UI 2: MASTERING UNIVERSAL USER INTERFACE UNDERSTANDING ACROSS PLATFORMS | #ai #2024 #genai

FERRET-UI 2: MASTERING UNIVERSAL USER INTERFACE UNDERSTANDING ACROSS PLATFORMS | #ai #2024 #genai

Paper: https://arxiv.org/pdf/2410.18967 The paper introduces Ferret-UI 2, a multimodal large language model (MLLM) that significantly improves upon its predecessor, Ferret-UI, by enabling universal u...

27 Marras 202414min

Adapting While Learning: Grounding LLMs for Scientific Problems I-Tool Usage Adaptation | #ai #2024

Adapting While Learning: Grounding LLMs for Scientific Problems I-Tool Usage Adaptation | #ai #2024

Paper: https://arxiv.org/abs/2411.00412 This research introduces a novel two-stage training method to improve Large Language Models' (LLMs) ability to solve complex scientific problems. The method, c...

27 Marras 202414min