Project Archive
Yedi Luo

Welcome to my project archive. My name is Yedi Luo, I'm currently a PhD student at the CSP group of Imperial College London. As a researcher and innovator, I am deeply passionate about generative AI, neural radiance field-based 3D reconstruction, and immersive VR/AR simulations.
Prior to my study at Imperial, I obtained my BS & MS in Electrical Engineering at the University of Washington, Seattle. I also conducted computer vision research at the Augmented Cognition Lab of NU Boston. Other than academic work, I have also worked in the automotive industry and venture capital, focusing on AI&Immersion.
This website showcases my past projects and publications in both academic and industry settings. They are organized based on their significance and level of interest. Please feel free to explore, and don't hesitate to reach out if you have any questions. I would be delighted to hear from you
Selected Projects
VR Remote Driving System (US/CN Patent)
Keywords: Virtual Reality Streaming, Wireless Communication (WebRTC/RTMP/RTSP), Computer Vision, Computer Graphics, Robotics
Institute: University of Washington CoMotion Center, Jiangsu Yedi Electronics





The project's objective was to utilize a Vive VR set, a 360-degree camera, a physical steering wheel, a gas pedal, and a Wi-Fi/cellular module to enable remote control of a personal vehicle within a reconstructed virtual environment. The system was designed to maximize VR immersion, convincingly simulating the experience of driving a real car. Real-time video captured by the 360-degree camera was fed directly to the user. I implemented an innovative method for broadcasting 3D virtual environments through wireless communication. To minimize latency, I integrated advanced Web Real-Time Communication (WebRTC) technology over the traditional Real-Time Messaging Protocol (RTMP). In addition, I also proposed a novel robotic arm design to synchronize the user's body movements with the system, ensuring a more integrated and natural operating experience. This invention led to the successful acquisition of patents in both the United States and China.
Patent Link
Digital Mirror Device-Based Automotive AR Projection System
(Jiangsu Innovation Competition Winner)
Keywords: Augmented Reality, Automotive Equipment, Human-Computer Interaction,
Institute: Jiangsu Yedi Electronics





Mixed reality and electric vehicles have become increasingly prevalent in our daily lives. As electric cars evolve, traditional vehicle lighting systems are proving inadequate for the demands of future intelligent automotive technologies. At Jiangsu Yedi Electronics, we have developed an Automotive AR Projection System designed for passenger vehicles to address this emerging market need. This project focuses on three primary objectives:
1. Develop an automotive lamp that not only provides traditional illumination but also projects AR street signs directly onto the road.
2. Implement a high-precision Adaptive Driving Beam (ADB) function that significantly enhances the standard 84-pixel density matrix LED light block with a solution featuring a 1.3 million pixel density.
3. Design the hardware to ensure compatibility with existing automotive platforms. Current customers include, major automotive manufacturers such as Segway, Suzuki, SAIC Motor, and BAIC Motor
Temporal-controlled Frame Swap for Generating High-Fidelity Stereo Driving Data for Autonomy Analysis (BMVC23 Accepted)
Keywords: Computer Vision, Simulation, Stereo Visual Slam, Autonomous Navigation, Multi-modal Data
Institute: Northeastern University



This paper presents a novel approach, TeFS (Temporal-controlled Frame Swap), to generate synthetic stereo driving data for visual simultaneous localization and mapping (vSLAM) tasks. TeFS is designed to overcome the lack of native stereo vision support in commercial driving simulators, and we demonstrate its effectiveness using Grand Theft Auto V (GTA V), a high-budget open-world video game engine. We introduce GTAV-TeFS, the first large-scale GTA V stereo-driving dataset, containing over 88,000 high-resolution stereo RGB image pairs, along with temporal information, GPS coordinates, camera poses, and full-resolution dense depth maps. GTAV-TeFS offers several advantages over other synthetic stereo datasets and enables the evaluation and enhancement of state-of-the-art stereo vSLAM models under GTA V's environment. We validate the quality of the stereo data collected using TeFS by conducting a comparative analysis with the conventional dual-viewport data using an open-source simulator. We also benchmark various vSLAM models using the challenging-case comparison groups included in GTAV-TeFS, revealing the distinct advantages and limitations inherent to each model. The goal of our work is to bring more high-fidelity stereo data from commercial-grade game simulators into the research domain and push the boundary of vSLAM models
Paper Link
Code Link
Depth-guided temporal diffusion (DgTDM) modeling for driving scene generation in style
Keywords: Computer Vision, Generative AI, Text-to-Video, Autonomous Navigation, Multi-modal Training
Institute: Northeastern University



Diffusion models are capable of generating high-quality images that adhere to a learned distribution. In this paper, we expand this image synthesis approach to the temporal domain for driving applications, addressing the limitations in data quality, diversity, and cost found in existing simulation methods. We introduce a novel open-source method called Depth-Guided Temporal Diffusion Model (DgTDM), which generates realistic long-duration driving videos using depth-guided diffusion models. The integration of depth guidance aids in maintaining video consistency by preserving the underlying spatial structures. Furthermore, we present DriveSceneDDM, a unique and comprehensive dataset of driving videos, complete with textual scene descriptions and dense depth maps. We evaluate our method using common video quality metrics and demonstrate that it can produce long, high-resolution, and temporally consistent driving videos. Compared to other open-source video diffusion models that utilize textual inputs, DgTDM achieves approximately 50% higher scores on common video quality analysis metrics, particularly in terms of image similarity and temporal consistency.
An Evaluation Platform to Scope Performance of Synthetic Environments in Autonomous Ground Vehicles Simulation
(ICASSP23 Accepted)
Keywords: Computer Vision, Simulation, Stereo Visual Slam, Autonomous Navigation, Multi-modal Data
Institute: Northeastern University

Evaluating autonomous ground vehicles requires evaluating their mobility performance. Since autonomous vehicles are envisioned to make decisions in a variety of situations and environments too diverse to practically assess with only physical testing, their development, and evaluation will necessarily include the use of simulations. These simulations must represent reality sufficiently to represent the decisions that the vehicles would make in real-world. In this paper we present our Scoping Autonomous Vehicle Simulation (SAVeS) platform for benchmarking the performance of simulated environments for autonomous ground vehicle testing
Paper Link
Code Link
Multi-user Steerable VR Cycling System with 3D Virtual Environment
Keywords: Virtual Reality, Computer Graphics, 3D Modeling/Printing, Mobile Computing, Embedded System, Multiplayer Network
Institute: University of Washington




The primary goal of this project is to develop an indoor virtual reality (VR) cycling system that allows users to explore a virtual world while physically biking. The system supports up to 20 users simultaneously, offering both local and remote multiplayer experiences. It accurately detects and synchronizes the real bike's steering and acceleration with the virtual counterpart using an ultrasonic sensor mounted on a stationary base to measure the bike’s speed. Key innovations of the system include a newly designed and 3D-printed front steering acquisition device that captures the bike's turning movements, and a custom-built virtual town created in Unity3D that serves as the immersive exploration environment.
Paper Link
iOS Augmented Reality-AR Art Swap
Keywords: Augmented Reality, Mobile Computing, Image Swap
Institute: University of Washington


Tired of your current wall art and unsure if a new piece would complement your decor? Our Augmented Reality app, Art Swap AR, allows users to instantly replace any rectangular-shaped paintings with custom images of their choosing in real-time. The app supports both single and multiple painting detections. Optimized for iPhone 8 and later models, Art Swap AR can detect and swap up to five different-sized paintings simultaneously, with no prior training required.
App store Link
Optical Music Recognition with Robotic Piano Player
(UW News Spotlight)
Keywords: Robotics, 3D Modeling/Printing, Computer Vision, Music Notes Classification, BLE Remote Control, Human Robotic Interaction
Institute: University of Washington


The "Robot Piano Player" project features a sophisticated robotic hand, meticulously 3D modeled, printed, and assembled from scratch, designed to interactively perform music. Equipped with flex sensor-integrated gloves, this robotic system allows for precise remote control, enabling users to manipulate the piano keys from a distance. Additionally, this innovative robotic hand is integrated with Optical Music Recognition (OMR) capabilities. Unlike traditional Optical Character Recognition, OMR handles the complex layout of music scores by analyzing both the vertical arrangement of pitches and the horizontal sequencing of time on a two-dimensional plane. This advanced functionality allows the robotic hand to directly play music from scanned sheet music, demonstrating a seamless blend of electrical engineering and computer vision technology.
Video Demo
Poster Link
News Link
Unity Game-DashRunner
Keywords: Game Design, Unity, WebGL, Mobile Computing

DashRunner, a fast-paced arcade-style game, was developed using the Unity Game Engine and is designed for both desktop and mobile platforms. Players can control the main character using simple swipes on a touchscreen or directional keys on a keyboard. The objective of the game is to collect all the gold and exit the maze within a set time limit. Each move is irreversible and comes with a time constraint for making decisions, placing a premium on the player's speed and judgment for survival. Incorrect turns can result in the character being eliminated by deadly enemies or traps. As players progress, they encounter various new pickups, such as springs and invincible pills, enhancing the gameplay. The game features 30 uniquely designed and fully developed mazes, each presenting its own challenges.
Game Demo Link
Remote-controlled Robotic Arm
Keywords: Robotics, Embedded System, EEG Concentration, Flex Sensor, Ble Remote Control
Institute: University of Washington


This project involves the development of a robotic arm equipped with five major functionalities: playing rock-paper-scissors with the user, being remotely controlled via user gestures, training, autonomous movement, and thought-driven simulation. Users can select the desired function through specific gestures using a specially designed glove.
To enable these capabilities, the system utilizes Arduino Uno and Arduino Mega as microcontrollers. Flex sensors attached to the user’s index fingers capture gesture data, and Bluetooth Low Energy (BLE) technology facilitates communication. This integration of hardware and software allows for a versatile and interactive experience with the robotic arm.
Project Link
Gameduino2 Game--Galaxy Fighter
Keywords: Arduino, Gameduino, Sprite Game Design, Embedded Systems
Institute: University of Washington

Galaxy Fighter is an arcade-style flight shooting game developed on the Gameduino 2 platform. In this game, players control an airplane using finger touch on the screen. The objective is to destroy all incoming enemies at each level. As players progress through the levels, the difficulty increases with enemies adopting more unpredictable movement patterns. Players are equipped with special abilities such as bombing and laser shooting, which serve as defensive mechanisms to enhance gameplay and strategy.
Selected by Gameduino Creator
Video Demo Link