Blog

博客

Research notes, paper overviews, and technical write-ups on efficient AI and unified multimodal models.

关于高效 AI 与统一多模态模型的研究笔记、论文概览和技术文章。

Efficient AI Paper Overview Multimodal Models

A compact list of research overviews and technical write-ups.

研究概览和技术文章列表。

Paper Overview · arXiv 2026

A paper overview for residual-based KV-cache compression through long-range similarity in long-context LLM inference.

概览 DeltaKV 如何利用长上下文 KV 表征中的长程相似性，实现残差式 KV-cache 压缩。

KV cache compression Long-context LLMs Residual compression

Read阅读

Paper Overview · ICLR 2026

A two-end-separated, middle-shared architecture for reducing modality conflict in unified multimodal understanding and generation.

介绍“两端分离，中间共享”的 Uni-X 架构，以及它如何缓解统一多模态理解与生成中的模态冲突。

Unified multimodal models Modality conflict GenEval

Read阅读

Paper Overview · NeurIPS 2025 Spotlight

An overview of LRC, which uses low-rank projection and activation clone to make small language model training far more token-efficient.

概览 LRC 如何通过低秩投影和激活克隆，让小语言模型训练具备更高 token 效率。

Knowledge distillation Small language models Low-rank clone

Read阅读

Paper Overview · ICLR 2025

A method overview for token-dropping-free long-context inference with dynamic KV-cache selection and offloading.

介绍 OmniKV 如何通过动态 KV-cache 选择与 offloading，实现不永久丢 token 的长上下文高效推理。

Long-context LLMs KV cache Efficient inference

Read阅读