Hang Yin

About Me

I’m a Research Engineer at the Stanford Vision & Learning Lab, where I pursue my passion for robotics and machine learning. My research focuses on robotics simulation and data curation for robot learning.

I received my M.S. in Robotics and B.S. in Computer Science from Northwestern University. I was also fortunate enough to have the opportunity to work on the OTTAVA surgical robotic system as part of the Robotics & Controls team at Johnson & Johnson MedTech.

Professional Experience

Stanford AI Lab
01/2024 - present
Research Engineer

Johnson & Johnson
06/2023 - 09/2023
Robotics&Controls Intern

Northwestern Delta Lab
03/2021 - 06/2022
Research Assistant

Education

Northwestern University
09/2022 - 12/2023
M.S. in Robotics

Northwestern University
09/2019 - 06/2022
B.S. with honors in Computer Science, summa cum laude

Publications

BEHAVIOR Robot Suite: Streamlining Real-World Whole-Body Manipulation for Everyday Household Activities

We introduce the BEHAVIOR Robot Suite (BRS) for household mobile manipulation, featuring a bimanual wheeled robot with 4-DoF torso that achieves critical capabilities in coordination, navigation, and reachability. Our framework includes a cost-effective teleoperation interface and novel algorithm for learning whole-body visuomotor policies.

BEHAVIOR-1K: A Human-Centered, Embodied AI Benchmark with 1,000 Everyday Activities and Realistic Simulation

BEHAVIOR-1K is a comprehensive simulation benchmark for human-centered robotics with 1,000 real-world tasks. Powered by NVIDIA's Omniverse, it features diverse scenes, objects, and activities with realistic rendering and physics simulation. This benchmark aims to advance embodied AI and robot learning research.

Past Projects

Action Chunking with Transformers (ACT) for Assistive Action Prediction

This project aims to enhance human-robot collaboration with Omnid Mocobots by implementing imitation learning approaches (Action Chunking with Transformers and Diffusion Policy) to predict human intent during co-manipulation tasks.

Autonomous Robot Chef

This project created a voice-controlled robotic cooking assistant that combines LLM for recipe planning, CLIP for object detection, and MediaPipe with LSTM for hand gesture recognition, allowing users to interact with a robot arm through Alexa voice commands to collaboratively prepare meals.

Feature-based EKF-SLAM from Scratch

This project implemented a feature-based EKF-SLAM (Extended Kalman Filter for Simultaneous Localization and Mapping) system from scratch in C++, using ROS2 and Rviz for simulation, with custom landmark detection algorithms to enable a robot to simultaneously map its environment and determine its location within it.

Jengabells - Franka Plays Jenga

This project developed a Jenga-playing assistant using a Franka Emika Panda robotic arm, combining computer vision for tower and brick detection, a custom MoveIt API for motion planning, and transfer learning on MobileNets for hand detection.

Edge-conditioned Graph Conlution Networks with Dependency Parsing

A project that combines dependency parsing and pre-trained language models using edge-conditioned graph convolutional networks (GCN) for sentiment analysis, achieving 87.36% accuracy on IMDB binary classification with fewer parameters than traditional methods, while maintaining interpretability by representing sentences as graphs with both semantic and syntactic features.

KUKA youBot Mobile Manipulation

A project implementing trajectory planning and control for a KUKA youBot mobile manipulator (featuring a 5R robot arm mounted on a four-wheeled mecanum base), which performs an eight-segment pick-and-place operation using feedforward control and odometry-based kinematics simulation to accurately move blocks between specified locations.

RoI-bounded Visual Odometry

This project optimized monocular visual odometry by implementing a region-of-interest (ROI) based approach for feature detection, significantly reducing computational requirements while maintaining accuracy compared to traditional methods that process entire image frames.

Swarm Shape and Behavior Control

This project implemented two swarm control algorithms: one simulating the Brazil nut effect for spatial sorting of robots by size, and another replicating Reynolds' flocking behavior to coordinate robot movement like bird flocks, with both systems using distributed control methods.

Quadrotor Design and Control

This project built an autonomous quadrotor drone by implementing IMU integration, PID control systems, manual joystick control, and Vive lighthouse positioning, enabling both manual and autonomous flight capabilities.

Mario Motorcycle

Inspired by Mario Kart, this project involved creating a line-following motorcycle that uses a Raspberry Pi Pico W for image processing and a PIC microcontroller for steering control, complete with custom-designed PCBs, earning "Best Design" at the 2023 Northwestern Tech Cup.

IMU & EMG Controlled Robot Arm

A collaborative project where two teams created the IMUGripulator - a 2-DOF robotic arm system combining IMU-based joint control and EMG-based gripper control, built using micro:bit v2 microcontrollers and programmed in C, featuring various sensors including a 9-DOF IMU for tilt-based movement and capacitive touch controls for sensitivity adjustment.

Dice in a Cup Simulation from Scratch

A physics simulation project that models a dice bouncing inside a spinning cup, implementing Lagrangian dynamics to handle the 6 degrees of freedom system, including collision detection between the dice's corners and cup edges, while accounting for gravitational and external forces to maintain the cup's position.

Orchestration Scripting Environment

This project developed Orchestration Scripts, a framework that supports effective work practices by detecting workplace situations and providing tailored strategies using computational abstractions of organizational processes, structures, venues, and tools.

iExpressionNet

A deep learning project that improves facial expression detection accuracy for specific users by combining transfer learning with a two-stage approach: first training a CNN on the general FER-13 dataset, then fine-tuning the model using individual users' facial expression data while keeping the convolutional layers frozen, integrated with OpenCV for face detection.