Code Impact

DeepSeek-R1: Reasoning via Reinforcement Learning

19 min · 26. tammi 2025
jakson DeepSeek-R1: Reasoning via Reinforcement Learning kansikuva

Kuvaus

This research paper introduces DeepSeek-R1, a large language model enhanced for reasoning capabilities using reinforcement learning (RL). Two versions are presented: DeepSeek-R1-Zero, trained purely via RL without supervised fine-tuning, and DeepSeek-R1, which incorporates additional multi-stage training and cold-start data for improved readability and performance. DeepSeek-R1 achieves results comparable to OpenAI's o1-1217 on various reasoning benchmarks. The study also explores distilling DeepSeek-R1's reasoning capabilities into smaller, more efficient models, achieving state-of-the-art results. Finally, the paper discusses unsuccessful attempts using process reward models and Monte Carlo Tree Search, providing valuable insights for future research. https://github.com/deepseek-ai/DeepSeek-R1/blob/main/DeepSeek_R1.pdf

Kommentit

0

Ole ensimmäinen kommentoija

Rekisteröidy nyt ja liity Code Impact-yhteisöön!

Aloita nyt

3 kuukautta hintaan 3,99 €

Sitten 7,99 € / kuukausi · Peru milloin tahansa.

  • Podimon podcastit
  • 20 kuunteluaikaa / kuukausi
  • Lataa offline-käyttöön

Kaikki jaksot

72 jaksot

jakson DeepSeek-R1: Reasoning via Reinforcement Learning kansikuva

DeepSeek-R1: Reasoning via Reinforcement Learning

This research paper introduces DeepSeek-R1, a large language model enhanced for reasoning capabilities using reinforcement learning (RL). Two versions are presented: DeepSeek-R1-Zero, trained purely via RL without supervised fine-tuning, and DeepSeek-R1, which incorporates additional multi-stage training and cold-start data for improved readability and performance. DeepSeek-R1 achieves results comparable to OpenAI's o1-1217 on various reasoning benchmarks. The study also explores distilling DeepSeek-R1's reasoning capabilities into smaller, more efficient models, achieving state-of-the-art results. Finally, the paper discusses unsuccessful attempts using process reward models and Monte Carlo Tree Search, providing valuable insights for future research. https://github.com/deepseek-ai/DeepSeek-R1/blob/main/DeepSeek_R1.pdf

26. tammi 202519 min
jakson Jira Cloud Performance Enhancement with Protobuf kansikuva

Jira Cloud Performance Enhancement with Protobuf

This Atlassian blog post details the migration of Jira Cloud's Issue Service from JSON to Protocol Buffers (Protobuf) to enhance performance. The switch involved a phased approach to minimise downtime, creating new endpoints and logic to handle both formats concurrently before a complete transition. The results showcased significant improvements: 75% less Memcached CPU usage, 80% smaller data size, and a substantially faster response time. Challenges encountered included Protobuf's handling of null values and incompatibility with Spring's default error controller, which required workarounds. Ultimately, the migration yielded substantial performance gains and reduced infrastructure needs. https://www.atlassian.com/blog/atlassian-engineering/using-protobuf-to-make-jira-cloud-faster

26. tammi 202520 min
jakson Hyaline: Fast and Transparent Lock-Free Memory Reclamation kansikuva

Hyaline: Fast and Transparent Lock-Free Memory Reclamation

This research paper introduces Hyaline, a novel family of memory reclamation schemes for lock-free data structures in unmanaged C/C++ code. Hyaline leverages reference counting, but only during reclamation, minimising overhead during object access and balancing workload across threads. The paper details Hyaline's design, including a scalable multi-list version and robust extensions to handle stalled threads. Extensive testing across multiple architectures demonstrates Hyaline's superior performance and memory efficiency compared to existing schemes like epoch-based reclamation and hazard pointers, particularly in read-dominated and oversubscribed scenarios. The paper concludes by proving Hyaline's correctness and lock-freedom properties.

25. tammi 202532 min