I am a Robotics MSc student at Swiss Federal Institute of Technology Lausanne (EPFL), currently doing my master’s thesis at ETH Zurich (with co-supervision from EPFL) under Prof. Marco Hutter and Prof. Caglar Gulcehre. I am broadly interested in problems related to robotics perception, reasoning, and actuation, and I have particularly enjoyed research projects in 3D Vision, Robot control/learning, and Multimodal learning.
I had the pleasure to work in the labs of Prof. Huijing Zhao at Peking University on 3D perception, Prof. Min Xu at Carnegie Mellon University on Cryo-ET generative models, Prof. Mathieu Salzmann at EPFL on neural networks quantization for object detection and pose estimation.
I was a research intern at ByteDance AI Lab working on multimodal representation learning, and student researcher at Google DeepMind working on extending Vision Language Model with ink modality.
I enjoy board games 🎲, soccer ⚽, tennis 🎾, and music 🎶. Feel free to reach out to me if you want to join in on a hike or play some board games.
Robotics, from 2021
Swiss Federal Institute of Technology Lausanne
Robotics Summer School, July 2022
Swiss Federal Institute of Technology Zurich
Summer Session, 2018 - 2018
University of California, Berkeley
B.Sc in Automation (ECE), 2017 - 2021
Beijing Institute of Technology
[Sept 2024] 🎉 I successfully defended my thesis with a grade of 6/6 at both ETH Zurich and EPFL. Many thanks to my supervisors and committee members for their invaluable feedback and support!
[April 2024] I began my master’s thesis at the Robotic Systems Lab and CLAIRE Lab.
[Feb 2024] 🌟 Completed my student researcher internship at Google Research. It was an incredible experience, filled with joy from learning and collaboration with a fantastic team.
[July 2023] 🔎 Actively seeking a PhD position starting late 2024 or early 2025, aiming to further my research in multimodal learning and machine learning.
[April 2023] 🎉 Thrilled to join Google Research as a student researcher, starting August 2023.
Our work aims to bridge the gap between images of handwriting and digital ink with a Vision Language Model (PaLI). To our knowledge, this is the first work that effectively does so with arbitrary photos with diverse visual characteristics and backgrounds. Furthermore, it generalizes beyond its training domain and can work on simple sketches. Human evaluation reveals that 87% of the samples produced by our model on the challenging HierText dataset are considered valid tracings of the input image, and 67% look like pen trajectories traced by a human.
We introduce Modular Quantization-Aware Training (MQAT), an adaptive and mixed-precision quantization-aware training strategy that exploits the modular structure of modern 6D pose estimation architectures. MQAT guides a systematic gradated modular quantization sequence and determines module-specific bit precisions, leading to quantized models that outperform those produced by state-of-the-art uniform and mixed-precision quantization techniques.
This project investigates the effects of Sharpness-Aware Minimization (SAM) and Adaptive Sharpness-Aware Minimization (ASAM) on model generalization. Our experiments demonstrate that sharpness-aware optimization techniques notably enhance generalization abilities. Notably, ASAM shows promise in improving performance on un-normalized data.
Some of my projects, enjoy!
autograd
).
Download full CV