Adaptive Learning Rate Strategies for Training Large Language Models: Balancing Speed and Stability

EasyChair Preprint 12276

8 pages•Date: February 24, 2024

Abstract

The training of large language models (LLMs) demands a delicate equilibrium between speed and stability. Conventional fixed learning rate approaches often encounter challenges in efficiently converging. In this paper, a novel framework is proposed for adaptive learning rate strategies tailored specifically for LLM training. The framework addresses the challenge of dynamically optimizing learning rates throughout the training process to enhance convergence speed and stability. Leveraging insights from adaptive optimization algorithms and recent advancements in large-scale language model training, a comprehensive analysis of various adaptive learning rate techniques and their implications for LLM training is presented.

Keyphrases: adaptive, learning, rate

Links:

https://easychair.org/publications/preprint/nGwd

BibTeX entry

BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:

@booklet{EasyChair:12276,
  author    = {Kurez Oroy and Robert Chris},
  title     = {Adaptive Learning Rate Strategies for Training Large Language Models: Balancing Speed and Stability},
  howpublished = {EasyChair Preprint 12276},
  year      = {EasyChair, 2024}}

Download PDF Open PDF in browser