DeltaKV: Residual-Based KV Cache Compression
DeltaKV:残差式 KV Cache 压缩
A paper overview for residual-based KV-cache compression through long-range similarity in long-context LLM inference.
概览 DeltaKV 如何利用长上下文 KV 表征中的长程相似性,实现残差式 KV-cache 压缩。
Research notes, paper overviews, and technical write-ups on efficient AI and unified multimodal models.
关于高效 AI 与统一多模态模型的研究笔记、论文概览和技术文章。
A compact list of research overviews and technical write-ups.
研究概览和技术文章列表。
A paper overview for residual-based KV-cache compression through long-range similarity in long-context LLM inference.
概览 DeltaKV 如何利用长上下文 KV 表征中的长程相似性,实现残差式 KV-cache 压缩。
A two-end-separated, middle-shared architecture for reducing modality conflict in unified multimodal understanding and generation.
介绍“两端分离,中间共享”的 Uni-X 架构,以及它如何缓解统一多模态理解与生成中的模态冲突。
An overview of LRC, which uses low-rank projection and activation clone to make small language model training far more token-efficient.
概览 LRC 如何通过低秩投影和激活克隆,让小语言模型训练具备更高 token 效率。
A method overview for token-dropping-free long-context inference with dynamic KV-cache selection and offloading.
介绍 OmniKV 如何通过动态 KV-cache 选择与 offloading,实现不永久丢 token 的长上下文高效推理。