Find Similarity
发现相似性
Identifies long-range similarity in KV representations across extended contexts.
识别长上下文 KV 表征中的长程相似性。
DeltaKV studies long-range similarity in KV representations and compresses long-context LLM memory by storing semantic residuals relative to historical references instead of discarding tokens.
DeltaKV 研究 KV 表征中的长程相似性,通过将语义表示编码为相对历史参考的残差来压缩长上下文 LLM 显存,而不是直接丢弃 token。
DeltaKV belongs to the efficient long-context LLM inference line. It addresses the memory pressure of KV cache by exploiting redundancy across distant context positions rather than pruning away context outright.
DeltaKV 属于高效长上下文 LLM 推理方向,针对 KV cache 显存压力问题,利用远距离上下文位置之间的冗余,而不是直接剪掉上下文。
Identifies long-range similarity in KV representations across extended contexts.
识别长上下文 KV 表征中的长程相似性。
Encodes semantic residuals relative to historical references to reduce memory.
将语义表示编码为相对历史参考的残差,以降低显存。
Targets near-lossless long-context inference with better throughput.
面向近无损的长上下文推理和更高吞吐。
DeltaKV connects residual-based representation compression, KV-cache memory reduction, and sparse-first LLM serving systems such as Sparse-vLLM.
DeltaKV 连接残差式表示压缩、KV-cache 显存优化,以及 Sparse-vLLM 这类稀疏优先的大模型服务系统。
Paper, implementation, and Chinese write-up for DeltaKV.
DeltaKV 的论文、代码与中文解读。