Skip to content

OpenMOSS/Thus-Spake-Long-Context-LLM

Repository files navigation

Thus Spake Long-Context Large Language Model

阅读中文版本

This repository provides a collection of papers and resources focused on long-context LLMs, including architecture, infrastructure, training, and evaluation. For a clear taxonomy and more insights about the methodology, you can refer to our survey: Thus Spake Long-Context Large Language Model.

In this survey, we give a global picture of the lifecycle of long-context LLMs from four perspectives: architecture, infrastructure, training, and evaluation, including length extrapolation, cache optimization, memory management, architecture innovation, training infrastructure, inference infrastructure, long-context pre-training, long-context post-training, long-context MLLM (mainly long VideoLLM), and long-context evaluation, showcasing the full spectrum of long-context technologies. At the end of this survey, we present 10 unanswered questions currently faced by long-context LLMs.

This survey is inspired by the symphonic poem, Thus Spake Zarathustra. We draw an analogy between the journey of extending the context of LLM and the attempts of humans to transcend its mortality. We also try to combine this survey and the symphonic poem, and make a 36-minute video with music to show the development of the long-context-related work. Please enjoy it in Bilibili or YouTube (soon).

We appreciate any useful suggestions for improvement of this paper list or survey from peers and commit to regularly updating the repository. If you would like to include your paper or any modifications in this survey and repository, please feel free to raise issues or send an email to [email protected]. We sincerely appreciate your collaboration!

We would also like to mention A Comprehensive Survey on Long Context Language Modeling (Github), a concurrent survey that provides a collection of papers and resources focused on Long Context Language Modeling. They also provide a clear taxonomy and valuable insights about long-context LLMs. More reference can also been found at Awesome-LLM-Long-Context-Modeling.

If you find our survey useful for your research, please consider citing the following paper:

@misc{liu2025spakelongcontextlargelanguage,
      title={Thus Spake Long-Context Large Language Model}, 
      author={Xiaoran Liu and Ruixiao Li and Mianqiu Huang and Zhigeng Liu and Yuerong Song and Qipeng Guo and Siyang He and Qiqi Wang and Linlin Li and Qun Liu and Yaqian Zhou and Xuanjing Huang and Xipeng Qiu},
      year={2025},
      eprint={2502.17129},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2502.17129}, 
}

News

  • [2025.03.27] 🎉🔥🎉 We make a talk in wisemodel on our long-context survey and related work in FNLP. See slide in github.
  • [2025.03.23] 🎉⚡🎉 We release the split video version in Chinese of our survey on RedNote .
  • [2025.03.13] 🎉🤝🎉 We have a good communication with the authors of concurrent work, and will promote work of both parties in the future.
  • [2025.03.12] 🎉🚀🎉 We collect papers and blogs mentioned in the survey and update it in GitHub.
  • [2025.02.27] 🎉⚡🎉 We release the complete video version in Chinese of our survey on Bilibili.
  • [2025.02.26] 🎉🚀🎉 We release our slide on GitHub.
  • [2025.02.25] 🎉🔥🎉 Our paper reveices the #1 paper of the day on HuggingFace.
  • [2025.02.24] 🎉🚀🎉 We release the first version of the paper on arXiv.
  • [2025.01.29] 🎉🎉🎉 We release the first version of the paper on GitHub.

Table of Contents


Paper List

Survey & Technical Report

Architecture

Length Extrapolation

KV Cache Optimization

Memory Management

Architecture Innovation

Infrastructure

Training Infrastructure

Inference Infrastructure

Training

Long-Context Pre-Training

Long-Context Post-Training

Long-Context MLLM

Evaluation

Long-Context Evaluation

Unanswered Question

About

a survey of long-context LLMs from four perspectives, architecture, infrastructure, training, and evaluation

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •