郝继泰

Jitai Hao

关于我

About Me

我的主要研究方向为 Efficient AI/LLMs统一多模态理解与生成模型。目前我在哈尔滨工业大学(深圳)攻读博士学位,导师为 俞俊教授,并与 黄强老师 保持紧密合作。在此之前,我于 2022 和 2025 年分别在山东大学获得计算机科学学士和硕士学位,导师是 任昭春教授

My research interests include Efficient AI/LLMs and unified multimodal understanding & generation models. I am currently a Ph.D. student at Harbin Institute of Technology (Shenzhen), supervised by Prof. Jun Yu, and I also work closely with Prof. Qiang Huang. Before that, I obtained my Bachelor's and Master's degrees in Computer Science from Shandong University in 2022 and 2025, respectively, under the supervision of Prof. Zhaochun Ren.

最新动态

News

[2026年2月] 我们关于统一多模态模型训练中模态冲突缓解的新论文 Uni-X 已被 ICLR 2026 (Poster) 接收!

[Feb 2026] Our new paper Uni-X on mitigating modality conflict for unified multimodal models has been accepted to ICLR 2026 (Poster)!

[2025年9月] 我们关于大模型高效知识蒸馏的新论文已被 NeurIPS 2025 (Spotlight) 接收!我们提出了 Low-Rank Clone (💖LRC💖),一种创新的 SLM 高效预训练方法。 LRC 仅需约 10B-20B tokens 即可达到甚至超越需要数万亿 (Trillions) tokens 训练的SOTA模型。

[Sep 2025] Our new paper on efficient knowledge distillation for LLMs has been accepted to NeurIPS 2025 (Spotlight)! We propose Low-Rank Clone (💖LRC💖), an innovative and efficient pretraining method for SLMs. LRC can achieve, or even surpass, the performance of SOTA models trained on trillions of tokens with only about 10B-20B tokens.

研究成果

Publications

DeltaKV: Residual-Based KV Cache Compression via Long-Range Similarity

Jitai Hao, Qiang Huang, Yaowei Wang, Min Zhang, Jun Yu
arXiv 2026
TL;DR: 我们发现 KV 表征中存在显著的长程相似性,据此提出 DeltaKV。通过将语义表示编码为相对历史参考的残差,在不丢弃 token 的情况下将 KV 内存降至 29%,在 SCBench、AIME 等任务上接近无损,并实现 2× 吞吐提升。
TL;DR: Motivated by the long-range similarity in KV representations, we propose DeltaKV. It encodes semantic residuals relative to historical references, reducing KV memory to 29% without discarding tokens. It achieves near-lossless performance on SCBench and AIME, with 2× throughput gain.

Uni-X: Mitigating Modality Conflict with a Two-End-Separated Architecture for Unified Multimodal Models

Jitai Hao*, Hao Liu*, Xinyan Xiao, Qiang Huang, Jun Yu
ICLR 2026 (Poster)
TL;DR: 提出 Uni-X “两端分离—中间共享”架构,通过模态特定路径缓解统一多模态模型中的梯度冲突。3B 规模的 Uni-X 性能比肩 7B 模型,并在 GenEval 上达到 82.0。
TL;DR: Uni-X uses an X-shaped "two-end-separated" architecture to mitigate modality conflicts in unified multimodal models. A 3B Uni-X matches or surpasses 7B models, achieving 82.0 on GenEval.

A Token is Worth over 1,000 Tokens: Efficient Knowledge Distillation through Low-Rank Clone

Jitai Hao, Qiang Huang, Hao Liu, Xinyan Xiao, Zhaochun Ren, Jun Yu
NeurIPS 2025 (Spotlight)
TL;DR: 我们提出 Low-Rank Clone (LRC),一种能显著提高模型训练效率的创新方法。LRC 仅需约 10B-20B tokens 数据,就能达到甚至超越那些用数万亿 tokens 训练的顶尖模型(如 Qwen3, Llama3)的性能。
TL;DR: We propose Low-Rank Clone (LRC), an innovative method to significantly improve model training efficiency. With only about 10B-20B tokens, LRC can match or even surpass the performance of SOTA models like Qwen3 and Llama3, which are trained on trillions of tokens.

OmniKV: Dynamic Context Selection for Efficient Long-Context LLMs

Jitai Hao*, Yuke Zhu*, Tian Wang, Jun Yu, Xin Xin, Bo Zheng, Zhaochun Ren, Sheng Guo
ICLR 2025
TL;DR: 通过创新地提出层间注意力相似性,OmniKV 能够动态选择最重要的上下文信息,从而在处理长文本时,显著提升 LLM 的效率和性能,同时降低计算成本。
TL;DR: By innovatively proposing inter-layer attention similarity, OmniKV dynamically selects the most crucial context information, significantly enhancing the efficiency and performance of LLMs for long-context tasks while reducing computational costs.

MEFT: Memory-Efficient Fine-Tuning through Sparse Adapter

Jitai Hao, Weiwei Sun, Xin Xin, Qi Meng, Zhumin Chen, Pengjie Ren, Zhaochun Ren
ACL 2024
TL;DR: MEFT 是一种内存效率更高的模型微调方法。它通过引入稀疏适配器(Sparse Adapter)来减少微调时占用的内存,使得在大模型上进行微调更加可行和高效。
TL;DR: MEFT is a memory-efficient fine-tuning method. It reduces memory usage during fine-tuning by introducing a sparse adapter, making it more feasible and efficient to fine-tune large models.

Multi-Defendant Legal Judgment Prediction via Hierarchical Reasoning

Yougang Lyu*, Jitai Hao*, Zihan Wang, Kai Zhao, Shen Gao, Pengjie Ren, Zhumin Chen, Fang Wang, Zhaochun Ren
EMNLP 2023 Findings
TL;DR: 本文研究如何通过分层推理来预测涉及多个被告的法律判决结果,旨在理解案件中复杂的实体关系和逻辑链条,以提高预测的准确性。
TL;DR: This paper investigates how to predict legal judgment outcomes involving multiple defendants through hierarchical reasoning, aiming to understand complex relationships and logical chains to improve prediction accuracy.

项目

Projects

Sparse-vLLM: A Sparse-First Inference Framework for Long-Context LLMs

统一稀疏推理框架,支持物理淘汰、逻辑掩码与混合压缩
A unified sparse inference engine supporting physical eviction, logical masking, and hybrid compression
TL;DR: Sparse-vLLM 是我构建的一个以稀疏性为核心设计原则的长上下文 LLM 推理框架。它不是在传统 KV Cache 之上临时叠加稀疏方法,而是从缓存布局、控制流和 kernel 层面重新设计,使得 SnapKV、PyramidKV、OmniKV、QuEST 以及 DeltaKV 等方法都能在统一框架下高效集成与对比。
TL;DR: Sparse-vLLM is a sparse-first inference framework I built for long-context LLMs. Rather than layering sparse tricks on top of a conventional KV cache, it redesigns cache layout, controller flow, and kernels from the ground up, enabling methods such as SnapKV, PyramidKV, OmniKV, QuEST, and DeltaKV to be integrated and compared efficiently within one engine.

实习经历

Internship Experience

  • 百度 (2025年3月 - 2025年10月),研究实习生
  • 蚂蚁集团 (2024年5月 - 2024年10月),研究实习生
  • Baidu (Mar. 2025 - Oct. 2025), Research Intern
  • Ant Group (May 2024 - Oct. 2024), Research Intern

奖项与荣誉

Awards & Honors

  • 国家奖学金,一等学业奖学金等
  • ACM/ICPC 亚洲区域赛银牌 (2枚)
  • 大学生软件创新大赛 (OPPO 杯) 国家一等奖
  • National Scholarship, First-Class Academic Scholarship, etc.
  • ACM/ICPC Asia Regional Contest, Silver Medal (x2)
  • National First Prize, University Student Software Innovation Competition (OPPO Cup)