Yilong Chen

I am a first-year Phd student at the University of Chinese Academy of Sciences, under the guidance of Prof. Tingwen Liu at the Institute of Information Engineering, Chinese Academy of Sciences. My research focuses on enhancing the efficiency of large language model training and inference, which involves editing model parameters and developing more efficient model structures.

Prior to this, I earned my bachelor’s degree and Peida Ye's honors degree from Beijing University of Posts and Telecommunications. Additionally, I achieved a first-class bachelor’s degree from QMUL. During my academic journey, I also interned at ByteDance Inc. (TikTok) and Baidu Inc. (BaiduNLP, Ernie), supervised by Haifeng Wang(CTO). I serve as a area chair as CCL 25, and reviewer in: NIPS 25; ACL 25/24;EMNLP 25/24;NAACL 24.

I am currently deeply interested in the parameter principles of large language models and am committed to building more efficient and interpretable AI systems based on observations and understanding of these models.

Static Model Parameter Features: Parameter Editing methods based on the Modular Parameter phenomenon and Mode Connectivity phenomenon existing in the model, in order to optimize model parameter quantity (LEMON) or inference cost (DHA).
Dynamic Model Parameter Features: Adaptively constructing a more efficient architecture (MoHD, ITT) based on the dynamic changes of parameters during model training (DHA) or optimizing model cache according to focus memory during inference (NACL).

I am actively seeking opportunities for collaboration and welcome discussions on any topics of interest with me.

Mixture of Hidden-Dimensions Transformer
Yilong Chen, Junyuan Shang, Zhengyu Zhang, Jiawei Sheng, Tingwen Liu, Shuohuan Wang, Yu Sun, Hua Wu, Haifeng Wang
ICML'25 | International Conference on Machine Learning
pdf| abstract| cite

Inner Thinking Transformer: Leveraging Dynamic Depth Scaling to Foster Adaptive Internal Thinking
Yilong Chen, Junyuan Shang, Zhenyu Zhang, Yanxi Xie, Jiawei Sheng, Tingwen Liu, Shuohuan Wang, Yu Sun, Hua Wu, Haifeng Wang
ACL'25 | Annual Meeting of the Association for Computational Linguistics
pdf| abstract| cite

DHA: Learning Decoupled-Head Attention from Transformer Checkpoints via Adaptive Heads Fusion
Yilong Chen∗, Linhao Zhang∗, Junyuan Shang, Zhenyu Zhang, Tingwen Liu, Shuohuan Wang, Yu Sun>< (* = Equal Contribution)
NeurIPS'24 | Annual Conference on Neural Information Processing Systems
pdf| abstract| video| cite

LEMON: Reviving Stronger and Smaller LMs from Larger LMs with Linear Parameter Fusion
Yilong Chen, Junyuan Shang, Zhenyu Zhang, Shiyao Cui, Tingwen Liu, Shuohuan Wang, Yu Sun, Hua Wu ACL'24 Oral | Annual Meeting of the Association for Computational Linguistics
pdf| abstract| video| cite

NACL: A General and Effective KV Cache Eviction Framework for LLMs at Inference Time
Yilong Chen∗, Guoxia Wang∗, Junyuan Shang, Shiyao Cui, Zhenyu Zhang, Tingwen Liu, Shuohuan Wang, Yu Sun, Dianhai Yu, Hua Wu (* = Equal Contribution)
ACL'24 | Annual Meeting of the Association for Computational Linguistics
pdf| abstract| video| cite

MoR: Mixture of Ranks for Low-Rank Adaptation Tuning
Chuanyu Tang∗,Yilong Chen∗, Zhenyu Zhang, Junyuan Shang, Wenyuan Zhang, Yong Huang, Tingwen Liu (* = Equal Contribution)
Arxiv | Preprint
pdf| abstract| cite

Improving Adaptive Knowledge Graph Construction via Large Language Models with Multiple Views
Yilong Chen, Shiyao Cui, Kun Huang, Shicheng Wang, Chuanyu Tang, Tingwen Liu, Binxing Fang CCKS'23 | China Conference on Knowledge Graph and Semantic Computing
pdf| abstract| cite

FFT: Towards Harmlessness Evaluation and Analysis for LLMs with Factuality, Fairness, Toxicity
Shiyao Cui, Zhenyu Zhang, Yilong Chen, Wenyuan Zhang, Tianyun Liu, Siqi Wang, Tingwen Liu
KDD'24 Workshop | ACM SIGKDD Conference on Knowledge Discovery and Data Mining
pdf| abstract| cite

LEMON: Reviving Stronger and Smaller LMs from Larger LMs with Linear Parameter Fusion
In the context of increasing attention on cost-effective and flexible deployment of smaller language models, LEMON enhances smaller models by fusing parameters from larger models, creating strong initial points for training. This approach uses layer and dimension operators along with controllable receptive fields, enabling transformation to smaller scales.
CentralWorld ,Bangkok, Thailand
video|

General Information Extraction based on Instruction Fine-tuning LLMs (In Chinese)
Introduce some of my work in the direction of Information Extraction by instruction fine-tuning LLMs. The constructed IE LLMs has good performance and generalization, providing a foundation for constructing KGs.
University of Chinese Academy of Sciences, Beijing, China
video|

Champion in Instruction-driven Adaptive Knowledge Graph Construction (1/664)
CCKS'23 | Sponsored by Chinese Information Processing Society of China (CIPS)
An competition on Tianchi platform for adaptive knowledge graph construction and inductive relation reasoning using large models like ChatGPT.

Champion in Evaluation of Large Language Models (1/481)
LIC'23 | Sponsored by China Computer Federation (CCF) and Chinese Information Processing Society of China (CIPS)
Propose a harmlessness evaluation system for the development of Factuality, Fairness, Toxicity LLMs, deeply evaluate the multi-dimensional security of LLMs and provide suggestions.

Graduate National Scholarship (Top 1%)
Annually awarded to 10,000 Phd students in China
Recognizes graduate students for outstanding academic and research achievements.

Zhu Li Yuehua Scholarship (Top 1%)
Established to support and reward outstanding graduate students in China
Recognizes academic excellence, research potential, and contributions to society.

Excellent Graduation Project of Beijing University of Posts and Telecommunications (Top 0.5%)
Data Mining for Controversial Detection. 2022. advised by Dr.Kleomenis Katevas
Introduce RoBERTa combined with CNN to build three classification models and proposed a semi-supervised controversy detection application to optimize data annotation work.

Beijing Outstanding Graduate (Top 5%)
Awarded in 2022 by the Beijing Education Commission
Ranked among the top 5% of all graduates in Beijing based on comprehensive evaluation.

BUPT

2018 - 2022

QMUL

2018 - 2022

IIE

S2019

ByteDance Inc.

S2022

Baidu Inc.

S2023

UCAS

2022 - Present