SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model | #ai #2025 #genai #google
AI Today7 Helmi 2025

SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model | #ai #2025 #genai #google

Paper: https://arxiv.org/pdf/2501.17161 This research paper compares supervised fine-tuning (SFT) and reinforcement learning (RL) for post-training foundation models. Using novel and existing tasks involving arithmetic and spatial reasoning, the study finds that RL promotes better generalization to unseen data, unlike SFT which tends to memorize training data. Further analysis reveals RL enhances visual recognition capabilities in multimodal models, while SFT aids in stabilizing RL training by improving output formatting. The paper also explores the impact of increased inference-time computation on generalization. #ai, #artificialintelligence, #arxiv, #research, #paper, #publication, #llm, #genai, #generativeai, #largevisualmodels, #largelanguagemodels, #largemultimodalmodels, #nlp, #text, #machinelearning, #ml, #nvidia, #openai, #anthropic, #microsoft, #google, #technology, #cuttingedge, #meta, #llama, #chatgpt, #gpt, #elonmusk, #samaltman, #deployment, #engineering, #scholar, #science, #apple, #samsung, #turing, #aiethics, #innovation, #futuretech, #deeplearning, #datascience, #computervision, #autonomoussystems, #robotics, #dataprivacy, #cybersecurity, #digitaltransformation, #quantumcomputing, #aiapplications, #aiethics, #techleadership, #technews, #aiinsights, #aiindustry, #aiadvancements, #futureai, #airesearchers

Jaksot(30)

Mixtures of In-Context Learners | #ai #genai #llm #2024 #ml

Mixtures of In-Context Learners | #ai #genai #llm #2024 #ml

Paper: https://arxiv.org/pdf/2411.02830 This research introduces Mixtures of In-Context Learners (MOICL), a novel approach to improve in-context learning (ICL) in large language models (LLMs). MOICL ...

27 Marras 202414min

LLM2CLIP: POWERFUL LM UNLOCKS RICHER VISUAL REPRESENTATION | #ai #genai #lvm #llm #mmm #cv #ms #2024

LLM2CLIP: POWERFUL LM UNLOCKS RICHER VISUAL REPRESENTATION | #ai #genai #lvm #llm #mmm #cv #ms #2024

Paper: https://arxiv.org/pdf/2411.04997 Github: https://github.com/microsoft/LLM2CLIP The paper introduces LLM2CLIP, a method to improve the visual representation learning capabilities of CLIP by int...

27 Marras 202414min

OPENSCHOLAR: SYNTHESIZING SCIENTIFICLITERATURE WITH RETRIEVAL-AUGMENTED LMS | #ai #genai #llm #2024

OPENSCHOLAR: SYNTHESIZING SCIENTIFICLITERATURE WITH RETRIEVAL-AUGMENTED LMS | #ai #genai #llm #2024

Paper: https://arxiv.org/pdf/2411.14199 Github: https://github.com/AkariAsai/OpenScholar The research introduces OpenScholar, a retrieval-augmented large language model (LLM) designed for synthesizin...

27 Marras 202414min

Bilateral Reference for High-Resolution Dichotomous Image Segmentation | #ai #genai #llm #cv #2024

Bilateral Reference for High-Resolution Dichotomous Image Segmentation | #ai #genai #llm #cv #2024

Paper: https://arxiv.org/pdf/2401.03407 Github: https://github.com/ZhengPeng7/BiRefNet This research introduces BiRefNet, a novel deep learning framework for high-resolution dichotomous image segment...

27 Marras 202414min

LLaVA-o1: Let Vision Language Models Reason Step-by-Step | #ai #genai #lvm #llm #mmm #cv #2024

LLaVA-o1: Let Vision Language Models Reason Step-by-Step | #ai #genai #lvm #llm #mmm #cv #2024

Paper: https://arxiv.org/pdf/2411.10440 Github: https://github.com/PKU-YuanGroup/LLaV... The paper introduces LLaVA-o1, a vision-language model designed for improved multi-stage reasoning. Unlike pre...

27 Marras 202414min

Model-Based Transfer Learning for Contextual Reinforcement Learning | #ai #mit #rl #genai #ml #2024

Model-Based Transfer Learning for Contextual Reinforcement Learning | #ai #mit #rl #genai #ml #2024

Paper: https://arxiv.org/pdf/2408.04498 This research introduces Model-Based Transfer Learning (MBTL), a novel framework for improving the efficiency and robustness of deep reinforcement learning (RL...

27 Marras 202414min

Diverse and Effective Red Teaming Auto-gen Rewards & Multi-step RL | #aisafety #openai #genai #2024

Diverse and Effective Red Teaming Auto-gen Rewards & Multi-step RL | #aisafety #openai #genai #2024

Paper: https://cdn.openai.com/papers/diverse... Blog: https://openai.com/index/advancing-re... This OpenAI research paper presents novel methods for automated red teaming of large language models (LL...

27 Marras 202414min

OpenAI’s Approach to External Red Teaming for AI Models and System | #aisafety #openai #genai #2024

OpenAI’s Approach to External Red Teaming for AI Models and System | #aisafety #openai #genai #2024

Paper: https://cdn.openai.com/papers/openais... Blog: https://openai.com/index/advancing-re... This white paper details OpenAI's approach to external red teaming for AI models and systems. External r...

27 Marras 202414min