Jitai Hao 郝继泰郝继泰 Jitai Hao

我的主要研究方向为 Efficient AI/LLMs 与 统一多模态理解与生成模型。目前我在哈尔滨工业大学（深圳）攻读博士学位，导师为俞俊教授，并与黄强老师保持紧密合作。在此之前，我于 2022 和 2025 年分别在山东大学获得计算机科学学士和硕士学位，导师是任昭春教授。

My research interests include Efficient AI/LLMs and unified multimodal understanding & generation models. I am currently a Ph.D. student at Harbin Institute of Technology (Shenzhen), supervised by Prof. Jun Yu, and I also work closely with Prof. Qiang Huang. Before that, I obtained my Bachelor's and Master's degrees in Computer Science from Shandong University in 2022 and 2025, respectively, under the supervision of Prof. Zhaochun Ren.

Research Map研究脉络 Publications研究成果

Research Lineage

研究脉络

Two lines, one long-term agenda 两条主线，一个长期问题

Efficiency

The core thread: reduce memory, tokens, and inference cost.

主线：降低显存、token 与推理成本。

Research line研究主线

MEFT

Memory-efficient fine-tuning with sparse adapters.

通过稀疏 adapter 做内存高效微调。

ACL 2024

OmniKV

Dynamic context selection for long-context LLMs.

长上下文 LLM 的动态上下文选择。

ICLR 2025

DeltaKV

Residual-based KV cache compression.

基于残差的 KV Cache 压缩。

arXiv 2026

Sparse-vLLM

Sparse-first inference framework.

以稀疏性为核心的推理框架。

System系统

LRC

Branch from MEFT: low-rank modules clone teacher knowledge, making each training token far more valuable.

从 MEFT 分支：用低秩模块克隆教师知识，让每个训练 token 承载更多监督信号。

NeurIPS 2025 Spotlight

Unified Multimodal Models

The second thread: unify understanding and generation.

第二主线：统一理解与生成。

Research line研究主线

Uni-X

Two-end-separated architecture for modality conflict.

两端分离架构，缓解模态冲突。

ICLR 2026

News

Publications

研究成果

The list below follows the same structure as the map: efficiency first, then unified multimodal models.

下面的论文列表与上方脉络图一致：先展示效率主线，再展示统一多模态主线。

arXiv 2026 Efficiency

DeltaKV: Residual-Based KV Cache Compression via Long-Range Similarity

Jitai Hao, Qiang Huang, Yaowei Wang, Min Zhang, Jun Yu

Overview PDF arXiv Code Zhihu

TL;DR: 我们发现 KV 表征中存在显著的长程相似性，据此提出 DeltaKV。通过将语义表示编码为相对历史参考的残差，在不丢弃 token 的情况下将 KV 内存降至 29%，在 SCBench、AIME 等任务上接近无损，并实现 2x 吞吐提升。

TL;DR: Motivated by the long-range similarity in KV representations, we propose DeltaKV. It encodes semantic residuals relative to historical references, reducing KV memory to 29% without discarding tokens. It achieves near-lossless performance on SCBench and AIME, with 2x throughput gain.

ICLR 2026 Multimodal

Uni-X: Mitigating Modality Conflict with a Two-End-Separated Architecture for Unified Multimodal Models

Jitai Hao*, Hao Liu*, Xinyan Xiao, Qiang Huang, Jun Yu

Overview PDF OpenReview Code

TL;DR: 提出 Uni-X “两端分离，中间共享”架构，通过模态特定路径缓解统一多模态模型中的梯度冲突。3B 规模的 Uni-X 性能比肩 7B 模型，并在 GenEval 上达到 82.0。

TL;DR: Uni-X uses an X-shaped two-end-separated architecture to mitigate modality conflicts in unified multimodal models. A 3B Uni-X matches or surpasses 7B models, achieving 82.0 on GenEval.

NeurIPS 2025 Spotlight Efficiency

A Token is Worth over 1,000 Tokens: Efficient Knowledge Distillation through Low-Rank Clone

Jitai Hao, Qiang Huang, Hao Liu, Xinyan Xiao, Zhaochun Ren, Jun Yu

Overview PDF Code Zhihu

TL;DR: 我们提出 Low-Rank Clone (LRC)，一种能显著提高模型训练效率的方法。LRC 仅需约 10B-20B tokens 数据，就能达到甚至超越那些用数万亿 tokens 训练的顶尖模型性能。

TL;DR: We propose Low-Rank Clone (LRC), a method to significantly improve model training efficiency. With only about 10B-20B tokens, LRC can match or even surpass the performance of SOTA models trained on trillions of tokens.

ICLR 2025 Efficiency

OmniKV: Dynamic Context Selection for Efficient Long-Context LLMs

Jitai Hao*, Yuke Zhu*, Tian Wang, Jun Yu, Xin Xin, Bo Zheng, Zhaochun Ren, Sheng Guo

Overview PDF Code Zhihu

TL;DR: 通过提出层间注意力相似性，OmniKV 能够动态选择最重要的上下文信息，从而在处理长文本时提升 LLM 的效率和性能，同时降低计算成本。

TL;DR: By proposing inter-layer attention similarity, OmniKV dynamically selects crucial context information, enhancing efficiency and performance for long-context tasks while reducing cost.

ACL 2024 Efficiency

MEFT: Memory-Efficient Fine-Tuning through Sparse Adapter

Jitai Hao, Weiwei Sun, Xin Xin, Qi Meng, Zhumin Chen, Pengjie Ren, Zhaochun Ren

PDF Code

TL;DR: MEFT 是一种内存效率更高的模型微调方法。它通过引入稀疏适配器来减少微调时占用的内存，使得在大模型上进行微调更加可行和高效。

TL;DR: MEFT is a memory-efficient fine-tuning method. It reduces memory usage during fine-tuning by introducing a sparse adapter, making it more feasible to fine-tune large models.

EMNLP 2023 Findings Reasoning

Multi-Defendant Legal Judgment Prediction via Hierarchical Reasoning

Yougang Lyu*, Jitai Hao*, Zihan Wang, Kai Zhao, Shen Gao, Pengjie Ren, Zhumin Chen, Fang Wang, Zhaochun Ren

PDF Code

TL;DR: 本文研究如何通过分层推理来预测涉及多个被告的法律判决结果，旨在理解案件中复杂的实体关系和逻辑链条，以提高预测的准确性。

TL;DR: This paper investigates how to predict legal judgment outcomes involving multiple defendants through hierarchical reasoning, aiming to understand complex relationships and logic chains.

Projects

项目

Systems that turn the efficiency line into reusable infrastructure.

把效率主线沉淀成可复用系统。

Sparse-vLLM: A Sparse-First Inference Framework for Long-Context LLMs

统一稀疏推理框架，支持物理淘汰、逻辑掩码与混合压缩。

A unified sparse inference engine supporting physical eviction, logical masking, and hybrid compression.

GitHub DeepWiki DeltaKV Paper

Internship Experience

实习经历

百度，研究实习生，2025年3月 - 2025年10月
蚂蚁集团，研究实习生，2024年5月 - 2024年10月

Baidu, Research Intern, Mar. 2025 - Oct. 2025
Ant Group, Research Intern, May 2024 - Oct. 2024

Awards & Honors

奖项与荣誉

国家奖学金，一等学业奖学金等
ACM/ICPC 亚洲区域赛银牌，2 枚
大学生软件创新大赛 OPPO 杯，国家一等奖

National Scholarship, First-Class Academic Scholarship, etc.
ACM/ICPC Asia Regional Contest, Silver Medal, x2
National First Prize, University Student Software Innovation Competition, OPPO Cup

Jitai Hao 郝继泰 郝继泰 Jitai Hao