Hello, I'm Weiming Ren

I am a third year PhD student at the Cheriton School of Computer Science, University of Waterloo, supervised by Prof. Wenhu Chen. I am also affiliated with the Vector Institute. I have previously interned at Meta MGenAI, Samsung AI Centre Toronto and 01.ai.

My research focuses on developing foundation models for perceiving and generating multimodal content. In particular, I work on pre-training and post-training methods for multimodal large language models (MLLMs) for image and video understanding, as well as diffusion-based generative models for image/video generation and editing.

Before PhD, I obtained my MSc. in Applied Computing degree from the Department of Computer Science, University of Toronto. I received my bachelor’s degrees from Beijing Institute of Technology and the Australian National University.

My Email: w2ren [at] uwaterloo [dot] ca

News

  • Dec 1, 2025: We release TUNA, a native unified multimodal model for image/video understanding and generation.
  • Sep 18, 2025: Pixel Reasoner has been accepted to NeurIPS 2025.
  • Jun 26, 2025: Vamba has been accepted to ICCV 2025.
  • Feb 26, 2025: VISTA has been accepted to CVPR 2025.
  • Jan 22, 2025: OmniEdit has been accepted to ICLR 2025.

Selected Publications

* indicates equal contribution. For a full list of publications, please visit my Google Scholar profile.

TUNA: Taming Unified Visual Representations for Native Unified Multimodal Models

TUNA: Taming Unified Visual Representations for Native Unified Multimodal Models

Zhiheng Liu*, Weiming Ren*, Haozhe Liu, Zijian Zhou, Shoufa Chen, Haonan Qiu, Xiaoke Huang, Zhaochong An, Fanny Yang, Aditya Patel, Viktar Atliha, Tony Ng, Xiao Han, Chuyan Zhu, Chenyang Zhang, Ding Liu, Juan-Manuel Perez-Rua, Sen He, Jürgen Schmidhuber, Wenhu Chen, Ping Luo, Wei Liu, Tao Xiang, Jonas Schult, Yuren Cong
Arxiv Preprint

Unifed models for image/video understanding and generation with unified visual representations.

Hallucination Score: Towards Mitigating Hallucinations in Generative Image Super-Resolution

Hallucination Score: Towards Mitigating Hallucinations in Generative Image Super-Resolution

Weiming Ren*, Raghav Goyal*, Zhiming Hu*, Tristan Aumentado-Armstrong*, Iqbal Mohomed, Alex Levinshtein
Arxiv Preprint

Using MLLMs to detect hallucinations in generative image super-resolution outputs.

VideoEval-Pro: Robust and Realistic Long Video Understanding Evaluation

VideoEval-Pro: Robust and Realistic Long Video Understanding Evaluation

Wentao Ma*, Weiming Ren*, Yiming Jia, Zhuofeng Li, Ping Nie, Ge Zhang, Wenhu Chen
Arxiv Preprint

More robust and challenging long video understanding benchmark using open-ended QA pairs.

Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers

Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers

Weiming Ren, Wentao Ma, Huan Yang, Cong Wei, Ge Zhang, Wenhu Chen
ICCV 2025

Hybrid Mamba-Transformer models for efficient hour-long video understanding.

VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by VIdeo SpatioTemporal Augmentation

VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by VIdeo SpatioTemporal Augmentation

Weiming Ren, Huan Yang, Jie Min, Cong Wei, Wenhu Chen
CVPR 2025

Generating high-quality video instruction following data from low-quality video-caption pairs.

MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark

MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark

Yubo Wang, Xueguang Ma, Ge Zhang, Yuansheng Ni, Abhranil Chandra, Shiguang Guo, Weiming Ren, Aaran Arulraj, Xuan He, Ziyan Jiang, Tianle Li, Max Ku, Kai Zhang, Alex Zhuang, Rongqi Fan, Xiang Yue, Wenhu Chen
NeurIPS Datasets and Benchmarks Track 2024 Spotlight Presentation

A more robust and challenging version of the MMLU benchmark.

AnyV2V: A Tuning-Free Framework For Any Video-to-Video Editing Tasks

AnyV2V: A Tuning-Free Framework For Any Video-to-Video Editing Tasks

Max Ku*, Cong Wei*, Weiming Ren*, Huan Yang, Wenhu Chen
TMLR 2024 Reproducibility Certification

Tuning-free video editing based on image editing and image-to-video generation models.

ConsistI2V: Enhancing Visual Consistency for Image-to-Video Generation

ConsistI2V: Enhancing Visual Consistency for Image-to-Video Generation

Weiming Ren, Huan Yang, Ge Zhang, Cong Wei, Xinrun Du, Wenhao Huang, Wenhu Chen
TMLR 2024

Tuning-free video editing based on image editing and image-to-video generation models.

MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI

MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI

Xiang Yue, Yuansheng Ni, Kai Zhang, Tianyu Zheng, Ruoqi Liu, Ge Zhang, Samuel Stevens, Dongfu Jiang, Weiming Ren, Yuxuan Sun, Cong Wei, Botao Yu, Ruibin Yuan, Renliang Sun, Ming Yin, Boyuan Zheng, Zhenzhu Yang, Yibo Liu, Wenhao Huang, Huan Sun, Yu Su, Wenhu Chen
CVPR 2024 Oral Presentation, Best Paper Finalist

One of the most widely used benchmarks for evaluating MLLMs.


Experiences

Meta Monetization GenAI, London
Research Scientist Intern - Jun. 2025 to Nov. 2025
Mentor: Dr. Jonas Schult
Samsung AI Center Toronto, Canada
Research Intern - Sep. 2024 to March. 2025
Mentor: Dr. Iqbal Mohomed, Dr. Alex Levinshtein
01.ai, Beijing
Research Intern - Aug. 2023 to Feb. 2025
Mentor: Dr. Huan Yang
Samsung AI Center Toronto, Canada
Research Intern - May. 2022 to Apr. 2023
Mentor: Dr. Iqbal Mohomed

Education

University of Waterloo, Canada
Ph.D. in Computer Science - Sep. 2023 to present
University of Toronto, Canada
Master of Science in Applied Computing - Sep. 2021 to Dec. 2022
Australian National University, Australia
Bachelor in Advanced Computing (Honours) - Jul. 2019 to Jun. 2021
Beijing Institute of Technology, China
Bachelor of Science in Computer Science and Technology - Sep. 2017 to Jun. 2021