Yannqi's homepage

Ph.D. Student @ Institute of Automation, Chinese Academy of Sciencesyangqi2021@ia.ac.cn • Beijing, China

icon.png

Beijing, China

I am currently a Ph.D. student at the Institute of Automation, Chinese Academy of Sciences (CASIA), under the guidance of Prof. Shiming Xiang. I received my bachelor’s degree from University of Electronic Science and Technology of China (UESTC) under the supervision of Prof. Lu Yang.

My research focuses on building intelligent multimodal systems. I am particularly interested in:

  • Multimodal Large Language Models – reasoning, auto-thinking, and retrieval-augmented generation
  • Audio-Visual Learning – segmentation, generation, and cross-modal understanding
  • Visual Semantic Segmentation – open-vocabulary, continual, and remote-sensing segmentation
  • AIGC – image, audio, and video generation

I am happy to cooperate and share. If you are interested in my research, please feel free to contact me via email.

News

Mar 12, 2026 Congratulations! Our paper RCTS: Re-ranking Reasoning Context with Tree Search Makes Large Vision-Language Models Stronger has been marked as ICML 2025 Spotlight Paper! 😮
Mar 12, 2026 Congratulations! Our paper R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs has been accepted by CVPR!
Aug 28, 2025 Our paper R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs is now available on arXiv! R-4B achieves SOTA across 25 benchmarks.
May 21, 2025 Our large-scale model paper Hunyuan-TurboS is now available, ranking top 7 on LMSYS Chatbot Arena!
Sep 06, 2024 Our paper Draw an Audio: Leveraging Multi-Instruction for Video-to-Audio Synthesis is now available on arXiv!
Apr 06, 2024 Congratulations on our paper Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-Visual Segmentation being marked as CVPR 2024 Highlight Paper! 😮
Feb 29, 2024 Congratulations on our paper Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-Visual Segmentation being accepted by CVPR 2024! 😄

Selected Publications

  1. ScaleSeg.jpg
    Continual Semantic Segmentation via Scalable Contrastive Clustering and Background Diversity
    Qi YangXing Nie, Linsu Shi , and 3 more authors
    In 2023 IEEE International Conference on Data Mining (ICDM) , 2023
  2. COMBO.jpg
    Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-Visual Segmentation
    Qi YangXing NieTong Li, and 5 more authors
    In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2024
    CVPR 2024 Highlight
  3. Foley.jpg
    Draw an Audio: Leveraging Multi-Instruction for Video-to-Audio Synthesis
    Qi Yang, Binjie Mao , Zili Wang, and 6 more authors
    2024
  4. R4B.jpg
    R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Annealing and Reinforce Learning
    Qi Yang, Bolin Ni , Shiming Xiang, and 3 more authors
    2025
  5. RCTS.png
    Re-ranking Reasoning Context with Tree Search Makes Large Vision-Language Models Stronger
    Qi Yang, Chenghao Zhang , Lubin Fan , and 3 more authors
    2025