• Skip to primary navigation
  • Skip to main content
  • Skip to footer
UT Shield
The University of Texas at Austin
  • Home
  • Papers
  • Research Blogs
  • Academic Stuff

February 22, 2024, Filed Under: Papers

Papers

The broad theme of my current work is efficient pre-training of Large Language Models (LLMs). I believe pre-training of large scale language models is poorly understood and most research is focused towards applications of these models. Moreover very few institutions with extremely large computing resources can only pre-train LLMs of reasonably good performance. Through my research (sitting in academia with fewer resources) I want to democratize LLM pre-training by creating better models using much lesser resources. To achieve this in [2] we show that LLMs can be trained much faster without compromising any generalization (val loss i.e. log perplexity) using far away checkpoint averaging throughout training. We presented results with Pythia 1B, 2.8B, 6.9B and 12B models along with GPT2-large, GPT2-medium and GPT2-small. Next in [1] we empirically explored the question that How much performance one can achieve when pre-training an LLM of few billion parameters with just 1 billion tokens? Here we provide insights of sample efficient pre-training of small base language models.

  1. Pre-training Small Base LMs with Fewer Tokens
    Sunny Sanyal, Sujay Sanghavi and Alex Dimakis.
    Under Submission
    [paper][code][blog]


  2. Early Weight Averaging Meets High Learning Rates for LLM Pre-training
    Sunny Sanyal, Atula Tejaswi, Jean Kaddour, Abhishek Kumar, and Sujay Sanghavi.
    Under Submission
    [paper][code][blog]
    Presented at NeurIPS 2023 WANT workshop (OpenReview).
    Featured in two popular newsletters a) Ahead of AI b) Interconnect.ai

Footer

FOOTER SECTION ONE

FOOTER SECTION TWO

FOOTER SECTION THREE

  • Email
  • Facebook
  • Instagram
  • Twitter

UT Home | Emergency Information | Site Policies | Web Accessibility | Web Privacy | Adobe Reader

© The University of Texas at Austin 2025