Jitai Hao 郝继泰 郝继泰 Jitai Hao

I work on two connected research lines: making large models more efficient, and building unified multimodal models that understand and generate across modalities.

我的研究围绕两条主线展开:一条是让大模型更高效,另一条是构建能统一理解与生成的多模态模型。

Efficient AI / LLMs MEFT, OmniKV, DeltaKV, Sparse-vLLM, and LRC. 从 MEFT、OmniKV 到 DeltaKV / Sparse-vLLM,并分支到 LRC。
Unified Multimodal Models Uni-X for mitigating modality conflict in unified models. 以 Uni-X 为代表,缓解统一多模态模型中的模态冲突。

Research Lineage

研究脉络

Two lines, one long-term agenda 两条主线,一个长期问题

Efficiency

The core thread: reduce memory, tokens, and inference cost.

主线:降低显存、token 与推理成本。

Research line研究主线

MEFT

Memory-efficient fine-tuning with sparse adapters.

通过稀疏 adapter 做内存高效微调。

ACL 2024

OmniKV

Dynamic context selection for long-context LLMs.

长上下文 LLM 的动态上下文选择。

ICLR 2025

DeltaKV

Residual-based KV cache compression.

基于残差的 KV Cache 压缩。

arXiv 2026

Sparse-vLLM

Sparse-first inference framework.

以稀疏性为核心的推理框架。

System系统

LRC

Branch from MEFT: low-rank modules clone teacher knowledge, making each training token far more valuable.

从 MEFT 分支:用低秩模块克隆教师知识,让每个训练 token 承载更多监督信号。

NeurIPS 2025 Spotlight

Unified Multimodal Models

The second thread: unify understanding and generation.

第二主线:统一理解与生成。

Research line研究主线

Uni-X

Two-end-separated architecture for modality conflict.

两端分离架构,缓解模态冲突。

ICLR 2026

About Me

关于我

I am currently a Ph.D. student at Harbin Institute of Technology (Shenzhen).

我目前在哈尔滨工业大学(深圳)攻读博士学位。

我的主要研究方向为 Efficient AI/LLMs统一多模态理解与生成模型。目前我在哈尔滨工业大学(深圳)攻读博士学位,导师为 俞俊教授,并与 黄强老师 保持紧密合作。在此之前,我于 2022 和 2025 年分别在山东大学获得计算机科学学士和硕士学位,导师是 任昭春教授

My research interests include Efficient AI/LLMs and unified multimodal understanding & generation models. I am currently a Ph.D. student at Harbin Institute of Technology (Shenzhen), supervised by Prof. Jun Yu, and I also work closely with Prof. Qiang Huang. Before that, I obtained my Bachelor's and Master's degrees in Computer Science from Shandong University in 2022 and 2025, respectively, under the supervision of Prof. Zhaochun Ren.

Current 当前 Ph.D. student, HIT Shenzhen 哈尔滨工业大学(深圳)博士生
Research 研究方向 Efficient LLMs and multimodal models 高效大模型与统一多模态模型

News

最新动态

Recent papers and milestones across the two research lines.

两条研究主线上的近期论文与进展。

Feb 2026 2026年2月

Our new paper Uni-X on mitigating modality conflict for unified multimodal models has been accepted to ICLR 2026 (Poster).

我们关于统一多模态模型训练中模态冲突缓解的新论文 Uni-X 已被 ICLR 2026 (Poster) 接收。

Sep 2025 2025年9月

Our paper on efficient knowledge distillation for LLMs has been accepted to NeurIPS 2025 (Spotlight). We propose Low-Rank Clone (LRC), an efficient pretraining method for SLMs.

我们关于大模型高效知识蒸馏的新论文已被 NeurIPS 2025 (Spotlight) 接收。我们提出 Low-Rank Clone (LRC),一种高效的 SLM 预训练方法。

Publications

研究成果

The list below follows the same structure as the map: efficiency first, then unified multimodal models.

下面的论文列表与上方脉络图一致:先展示效率主线,再展示统一多模态主线。

arXiv 2026 Efficiency

DeltaKV: Residual-Based KV Cache Compression via Long-Range Similarity

Jitai Hao, Qiang Huang, Yaowei Wang, Min Zhang, Jun Yu
TL;DR: 我们发现 KV 表征中存在显著的长程相似性,据此提出 DeltaKV。通过将语义表示编码为相对历史参考的残差,在不丢弃 token 的情况下将 KV 内存降至 29%,在 SCBench、AIME 等任务上接近无损,并实现 2x 吞吐提升。
TL;DR: Motivated by the long-range similarity in KV representations, we propose DeltaKV. It encodes semantic residuals relative to historical references, reducing KV memory to 29% without discarding tokens. It achieves near-lossless performance on SCBench and AIME, with 2x throughput gain.
ICLR 2026 Multimodal

Uni-X: Mitigating Modality Conflict with a Two-End-Separated Architecture for Unified Multimodal Models

Jitai Hao*, Hao Liu*, Xinyan Xiao, Qiang Huang, Jun Yu
TL;DR: 提出 Uni-X “两端分离,中间共享”架构,通过模态特定路径缓解统一多模态模型中的梯度冲突。3B 规模的 Uni-X 性能比肩 7B 模型,并在 GenEval 上达到 82.0。
TL;DR: Uni-X uses an X-shaped two-end-separated architecture to mitigate modality conflicts in unified multimodal models. A 3B Uni-X matches or surpasses 7B models, achieving 82.0 on GenEval.
NeurIPS 2025 Spotlight Efficiency

A Token is Worth over 1,000 Tokens: Efficient Knowledge Distillation through Low-Rank Clone

Jitai Hao, Qiang Huang, Hao Liu, Xinyan Xiao, Zhaochun Ren, Jun Yu
TL;DR: 我们提出 Low-Rank Clone (LRC),一种能显著提高模型训练效率的方法。LRC 仅需约 10B-20B tokens 数据,就能达到甚至超越那些用数万亿 tokens 训练的顶尖模型性能。
TL;DR: We propose Low-Rank Clone (LRC), a method to significantly improve model training efficiency. With only about 10B-20B tokens, LRC can match or even surpass the performance of SOTA models trained on trillions of tokens.
ICLR 2025 Efficiency

OmniKV: Dynamic Context Selection for Efficient Long-Context LLMs

Jitai Hao*, Yuke Zhu*, Tian Wang, Jun Yu, Xin Xin, Bo Zheng, Zhaochun Ren, Sheng Guo
TL;DR: 通过提出层间注意力相似性,OmniKV 能够动态选择最重要的上下文信息,从而在处理长文本时提升 LLM 的效率和性能,同时降低计算成本。
TL;DR: By proposing inter-layer attention similarity, OmniKV dynamically selects crucial context information, enhancing efficiency and performance for long-context tasks while reducing cost.
ACL 2024 Efficiency

MEFT: Memory-Efficient Fine-Tuning through Sparse Adapter

Jitai Hao, Weiwei Sun, Xin Xin, Qi Meng, Zhumin Chen, Pengjie Ren, Zhaochun Ren
TL;DR: MEFT 是一种内存效率更高的模型微调方法。它通过引入稀疏适配器来减少微调时占用的内存,使得在大模型上进行微调更加可行和高效。
TL;DR: MEFT is a memory-efficient fine-tuning method. It reduces memory usage during fine-tuning by introducing a sparse adapter, making it more feasible to fine-tune large models.
EMNLP 2023 Findings Reasoning

Multi-Defendant Legal Judgment Prediction via Hierarchical Reasoning

Yougang Lyu*, Jitai Hao*, Zihan Wang, Kai Zhao, Shen Gao, Pengjie Ren, Zhumin Chen, Fang Wang, Zhaochun Ren
TL;DR: 本文研究如何通过分层推理来预测涉及多个被告的法律判决结果,旨在理解案件中复杂的实体关系和逻辑链条,以提高预测的准确性。
TL;DR: This paper investigates how to predict legal judgment outcomes involving multiple defendants through hierarchical reasoning, aiming to understand complex relationships and logic chains.

Projects

项目

Systems that turn the efficiency line into reusable infrastructure.

把效率主线沉淀成可复用系统。

Sparse-vLLM: A Sparse-First Inference Framework for Long-Context LLMs

统一稀疏推理框架,支持物理淘汰、逻辑掩码与混合压缩。

A unified sparse inference engine supporting physical eviction, logical masking, and hybrid compression.

Internship Experience

实习经历

  • 百度,研究实习生,2025年3月 - 2025年10月
  • 蚂蚁集团,研究实习生,2024年5月 - 2024年10月
  • Baidu, Research Intern, Mar. 2025 - Oct. 2025
  • Ant Group, Research Intern, May 2024 - Oct. 2024

Awards & Honors

奖项与荣誉

  • 国家奖学金,一等学业奖学金等
  • ACM/ICPC 亚洲区域赛银牌,2 枚
  • 大学生软件创新大赛 OPPO 杯,国家一等奖
  • National Scholarship, First-Class Academic Scholarship, etc.
  • ACM/ICPC Asia Regional Contest, Silver Medal, x2
  • National First Prize, University Student Software Innovation Competition, OPPO Cup