January 13, 2026, Tuesday, morning.
Another early start to the day today, waking up a little after 7 AM. Upon waking, I discovered that DeepSeek has published a new paper proposing a new technique called Engram.
DeepSeek - Engram Open Source Repository, which includes a demo and the paper PDF.
The paper is titled Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models.
The core idea of the paper is the introduction of a new memory mechanism that allows the model to dynamically query and utilize externally stored memory fragments during text generation, thereby enhancing the model's contextual understanding and generation capabilities.
This mechanism is implemented through a scalable lookup table, enabling the model to access relevant memory content when needed, rather than relying solely on its internal parameters. This approach not only improves the model's performance but also significantly reduces computational resource consumption, allowing large-scale language models to operate efficiently even in resource-constrained environments.
The introduction of this memory mechanism opens up new directions for the development of large language models, particularly in handling long texts and complex tasks, where it can better leverage external knowledge and contextual information.
Furthermore, the paper compares the optimization problem of component ratios between Engram and MoE, finding that the Engram / MoE ratio's impact on performance follows a U-shaped curve. This indicates that balancing the proportions of different components is a crucial consideration when designing large models.
Philosophically speaking, from "Attention is All You Need" to "Mixture of Experts," and now to Engram, the field has been exploring how to more efficiently utilize model parameters and computational resources to enhance model expressiveness and generalization capabilities. It's akin to the progression from stem cells to differentiated cells, and then to organ systems—each step explores how complex systems can operate more efficiently. We may see more innovations like this in the future, pushing large language models toward greater intelligence and efficiency.
Overall, this paper provides new insights into memory mechanisms for large language models and is worthy of further research and exploration.
It's worth noting: what surprises will the upcoming DeepSeek v4 bring?
Looking forward to it...