I hold an MSc in Robotics from Swiss Federal Institute of Technology Lausanne (EPFL), where I completed my thesis at ETH Zurich under Prof. Marco Hutter and Prof. Caglar Gulcehre from EPFL. My interests include robotics perception, reasoning, and actuation, with a focus on 3D vision, robot control/learning, and multimodal learning.
I had the pleasure to work in the labs of Prof. Huijing Zhao at Peking University on 3D perception, Prof. Min Xu at Carnegie Mellon University on Cryo-ET generative models, Prof. Mathieu Salzmann at EPFL on neural networks quantization for object detection and pose estimation.
I was a research intern at ByteDance AI Lab working on multimodal representation learning, and student researcher at Google DeepMind working on extending Vision Language Model with ink modality.
I enjoy board games 🎲, soccer ⚽, tennis 🎾, and music 🎶. Feel free to reach out to me if you want to join in on a hike or play some board games.
Robotics, 2021 - 2024
Swiss Federal Institute of Technology Lausanne
Robotics Summer School, July 2022
Swiss Federal Institute of Technology Zurich
Summer Session, 2018 - 2018
University of California, Berkeley
[Nov 2024] 🎉 Excited to share that our paper MQAT (Modular Quantization-Aware Training for 6D Object Pose Estimation) has been accepted to Transactions on Machine Learning Research [link].
[Oct 2024] 🎉 Excited to share that our project, InkSight, has been featured across multiple platforms: Google Research Blog, LinkedIn, X post, Hugging Face (AK’s post), and Hacker News.
[Sept 2024] 🎉 Successfully defended my thesis with the grade of 6.0/6.0 at both ETH Zurich and EPFL. Grateful to my supervisors and committee members for their invaluable feedback and support!
[April 2024] I began my master’s thesis at the Robotic Systems Lab and CLAIRE Lab.
[Feb 2024] 🌟 Completed my student researcher internship at Google Research. It was an incredible experience, filled with joy from learning and collaboration with a fantastic team.
Download full CV
Thesis Title: Continuous Skill Learning For ANYmal Robot
Supervisors: Chenhao Li, Nikita Rudin, Skander Moalla*, Marco Hutter, Caglar Gulcehre* (hosting supervisors from CLAIRE, EPFL)
Edge applications, such as collaborative robotics and spacecraft rendezvous, demand efficient 6D object pose estimation on resource-constrained embedded platforms. Existing 6D object pose estimation networks are often too large for such deployments, necessitating compression while maintaining reliable performance. To address this challenge, we introduce Modular Quantization-Aware Training (MQAT), an adaptive and mixed-precision quantization-aware training strategy that exploits the modular structure of modern 6D object pose estimation architectures. MQAT guides a systematic gradated modular quantization sequence and determines module-specific bit precisions, leading to quantized models that outperform those produced by state-of-the-art uniform and mixed-precision quantization techniques. Our experiments showcase the generality of MQAT across datasets, architectures, and quantization algorithms. Additionally, we observe that MQAT quantized models can achieve an accuracy boost (>7% ADI-0.1d) over the baseline full-precision network while reducing model size by a factor of 4x or more.
Our work aims to bridge the gap between images of handwriting and digital ink with a Vision Language Model (PaLI). To our knowledge, this is the first work that effectively does so with arbitrary photos with diverse visual characteristics and backgrounds. Furthermore, it generalizes beyond its training domain and can work on simple sketches. Human evaluation reveals that 87% of the samples produced by our model on the challenging HierText dataset are considered valid tracings of the input image, and 67% look like pen trajectories traced by a human.
This project investigates the effects of Sharpness-Aware Minimization (SAM) and Adaptive Sharpness-Aware Minimization (ASAM) on model generalization. Our experiments demonstrate that sharpness-aware optimization techniques notably enhance generalization abilities. Notably, ASAM shows promise in improving performance on un-normalized data.
Some of my projects, enjoy!
autograd
).