A Survey of Large Language Models

Abstract: Language is essentially a complex, intricate system of human expressions governed by grammatical rules. It poses a significant challenge to develop capable AI algorithms for comprehending and grasping a language. As a major approach, language modeling has been widely studied for language understanding and generation in the past two decades, evolving from statistical language models to neural language models. Recently, pre-trained language models (PLMs) have been proposed by pre-training Transformer models over large-scale corpora, showing strong capabilities in solving various NLP tasks. Since researchers have found that model scaling can lead to performance improvement, they further study the scaling effect by increasing the model size to an even larger size. Interestingly, when the parameter scale exceeds a certain level, these enlarged language models not only achieve a significant performance improvement but also show some special abilities that are not present in small-scale language models. To discriminate the difference in parameter scale, the research community has coined the term large language models (LLM) for the PLMs of significant size. Recently, the research on LLMs has been largely advanced by both academia and industry, and a remarkable progress is the launch of ChatGPT, which has attracted widespread attention from society. The technical evolution of LLMs has been making an important impact on the entire AI community, which would revolutionize the way how we develop and use AI algorithms. In this survey, we review the recent advances of LLMs by introducing the background, key findings, and mainstream techniques. In particular, we focus on four major aspects of LLMs, namely pre-training, adaptation tuning, utilization, and capacity evaluation. Besides, we also summarize the available resources for developing LLMs and discuss the remaining issues for future directions.

Preview

PDF Thumbnail

Synopsis

Overview

  • Keywords: Large Language Models, Pre-trained Language Models, Emergent Abilities, Scaling Laws, Natural Language Processing
  • Objective: Review recent advances in large language models (LLMs) and summarize key findings, techniques, and future directions.
  • Hypothesis: The scaling of language models leads to emergent abilities that enhance their performance on complex tasks.
  • Innovation: The paper systematically reviews LLMs, focusing on their pre-training, adaptation tuning, utilization, and capacity evaluation, while also discussing emergent abilities and scaling laws.

Background

  • Preliminary Theories:

    • Statistical Language Models (SLM): Early models based on statistical methods that predict word sequences using the Markov assumption, facing challenges like the curse of dimensionality.
    • Neural Language Models (NLM): Introduced neural networks for language modeling, allowing for distributed representations of words and improved context handling.
    • Pre-trained Language Models (PLM): Models like BERT and GPT-2 that leverage large-scale corpora for pre-training, establishing a paradigm of pre-training followed by fine-tuning.
    • Scaling Laws: Empirical relationships that describe how model performance improves with increased model size, data size, and training compute.
  • Prior Research:

    • 2018: Introduction of BERT, which set new benchmarks in NLP tasks through bidirectional context.
    • 2020: Release of GPT-3, showcasing few-shot learning capabilities and significantly larger model size (175 billion parameters).
    • 2021: Development of models like PaLM and OPT, further pushing the boundaries of model scaling and performance.
    • 2022: Emergence of ChatGPT, which popularized the application of LLMs in conversational AI, leading to increased research interest.

Methodology

  • Key Ideas:

    • Scaling Laws: The paper discusses two primary scaling laws—KM scaling and Chinchilla scaling—that quantitatively describe the relationship between model size, data size, and performance.
    • Emergent Abilities: Defined as capabilities that appear in larger models but are absent in smaller ones, such as in-context learning and instruction following.
    • Pre-training and Fine-tuning: Emphasizes the importance of extensive pre-training on diverse datasets and the subsequent fine-tuning for specific tasks.
    • Prompting Techniques: Investigates how different prompting strategies can enhance model performance in various applications.
  • Experiments:

    • Ablation Studies: The paper reviews experiments that isolate the effects of different training techniques and model architectures on performance.
    • Benchmarks: Evaluates LLMs against established benchmarks in NLP to assess their capabilities in tasks like language generation, reasoning, and knowledge utilization.
    • Datasets: Utilizes large-scale datasets for training and evaluation, including diverse text corpora to ensure comprehensive language understanding.
  • Implications: The methodology highlights the need for careful design in training and evaluation to harness the full potential of LLMs, particularly in addressing issues like bias and ethical considerations.

Findings

  • Outcomes:

    • Performance Improvements: LLMs demonstrate significant performance gains on complex tasks compared to smaller models, particularly in few-shot and zero-shot settings.
    • Emergent Abilities: Notable capabilities such as reasoning and instruction following emerge when models exceed certain parameter thresholds.
    • Adaptation Techniques: Instruction tuning and reinforcement learning from human feedback (RLHF) are effective strategies for aligning LLMs with user expectations and improving task performance.
    • Challenges Identified: Issues such as bias in generated content, hallucinations, and the need for robust evaluation frameworks are critical for practical applications.
  • Significance: The research underscores a paradigm shift in NLP, where LLMs not only outperform previous models but also introduce new methodologies for interaction and task execution.

  • Future Work: Suggested areas for further exploration include:

    • Developing formal theories to explain emergent abilities and scaling effects.
    • Enhancing model alignment with human values and ethical considerations.
    • Investigating cross-disciplinary approaches to understand LLM behaviors.
  • Potential Impact: Advancements in understanding and developing LLMs could lead to more effective AI systems that are capable of complex reasoning, better user interaction, and broader applicability across various domains.

Notes

Meta

Published: 2023-03-31

Updated: 2025-08-27

URL: https://arxiv.org/abs/2303.18223v12

Authors: Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, Yifan Du, Chen Yang, Yushuo Chen, Zhipeng Chen, Jinhao Jiang, Ruiyang Ren, Yifan Li, Xinyu Tang, Zikang Liu, Peiyu Liu, Jian-Yun Nie, Ji-Rong Wen

Citations: 1256

H Index: 281

Categories: cs.CL, cs.AI

Model: gpt-4o-mini