I’m currently a first year PhD student at UCLA, advised by Prof. Kai-Wei Chang.

My research interest lies in the intersection of Computer Vision (CV) and Natural Language Processing (NLP), aiming to equip computers with the ability to understand and relate data across different modalities. Specifically, I am interested in the following topcis:

  • Compositionality skills for multimodal generation and reasoning: Open-World Image/Video Captioning, Language-Conditioned Image Manipulation, **
  • Vision and Language representation learning for Embodied AI: Vision-and-Language Navigation (VLN), Robotic Manipulation
  • NLP for robotics (digital assistant): Instruction Following, Multimodal Dialogue

I’m looking for Research Intern opportunities for 2023 Summer.