Projects

Self-supervised Object Tracking and Segmentation
2019.12-2020.03      [Paper]  [Code]

Self-supervised learning for visual object tracking possesses valuable advantages compared to supervised learning, such as the non-necessity of laborious human annotations and online training. In this work, we exploit an end-to-end Siamese network in a cycle-consistent self-supervised framework for object tracking. Self-supervision can be performed by taking advantage of the cycle consistency in the forward and backward tracking. To better leverage the end-to-end learning of deep networks, we propose to integrate a Siamese region proposal and mask regression network in our tracking framework so that a fast and more accurate tracker can be learned without the annotation of each frame.

Unsupervised Learning for Stereo matching with Multiscopic Images
2019.03-2019.11

We propose a new self-supervised framework for stereo matching utilizing multiple images captured at aligned camera positions. A cross photometric loss, a self-supervision loss, and a new smoothness loss are introduced to assist the network to learn to estimate the disparity map end-to-end without ground-truth depth information. Trained with only the indoor synthetic images, our network can perform better than all previous unsupervised methods in unseen outdoor KITTI dataset.

Learning to Detect and Predict Contact Events on Vision-based Tactile Sensor
2018.12-2019.07      [Paper]

We propose to classify and predict tactile signal using deep learning method, seeking to enhance the adaptability of the robotic grasp system to external disturbances. We develop a deep learning framework and collect tactile image sequence with vision-based tactile sensor, FingerVision. The neural network is integrated into a FingerVision-based robotic grasping system to detect the current and predict the future grasping state, e.g. rolling, slipping or stable contact.

MFuseNet: Robust Depth Estimation with Learned Multiscopic Fusion
2018.10-2019.03      [Project]  [Paper]  [Code]

We design a multiscopic vision system that utilizes a monocular camera to acquire accurate depth estimation. Unlike multi-view stereo with images captured at unconstrained camera poses, the proposed system controls the motion of the camera to capture a sequence of images in horizontally or vertically aligned positions with the same parallax. We propose a new heuristic method and a robust learning-based method to fuse multiple cost volumes between the reference image and its surrounding images. Trained with the synthetic dataset we built, our method outperforms previous stereo and multiscopic approaches.

Sorting Cluttered Tabletop Using Improved Monte Carlo Tree Search
2018.06-2020.03      [Paper]

In industrial environments, there are many tasks about sorting and rearranging different kinds of objects. Inspired by AlphaGo, we propose an improved MCTS to do this. We use the results of a classical MCTS to train a policy network and then use the network to guide the tree search.

Manipulation Using Deep Reinforcement Learning in Topology Space
2018.03-2018.09      [Paper]  [Code]

We model a whole arm manipulation task about holding as a deep reinforcement learning problem in order to provide a behavior that can directly respond to external perturbation and target motion. To improve the performance of deep learning in robotics application, we propose a new state as the input of the networks, the topology representation. This state allows transferring the learned policy to various shapes, sizes and poses because they are the same in topology space. Compared to RGB image state or pose coordinates state, it can better describe the interaction state between the robot and environments. Besides, there is no reality gap between simulation and reality, which means the policy trained in simulator can be directly transferred to real world.

Rearrangement Using Deep Reinforcement Learning and Transfer Learning
2017.03-2018.05      [Paper1]  [Paper2]

In contrast to traditional methods which can only handle static scene, as explicitly modeling the physical environment is not always feasible and involves various uncertainties, we learn a nonprehensile rearrangement policy with deep reinforcement learning in simulation and then transfer the knowledge to reality. We propose potential-field-based heuristic exploration and reply buffer curation to assist the training and propose a transfer method in Q-space to fill in the gap between simulation and reality. The real robot can perform well in the dynamic scene with our approach.

Machine Vision Research and Software Development of Intelligent Forging Production Line Based on Industrial Robots (Graduation Project)
2015.12-2016.05

Set up a special hardware platform to acquire usable images. 
Designed a system for high temperature forging which can obtain its the size, position and posture. 
Developed an interaction interface based on MFC to output the detection information and give the alarm when the position or posture of the forging is wrong.

On-Road Lane Detection System Based on Machine Vision
2014.05-2015.05

Detect the structured and unstructured road edge based on the gradient of brightness and the RGB information of the road to assist the driving of a Fuel Cells Go-Kart.

Ninth Robot Competition of Zhejiang Province (Team leader)
2013.09-2014.05

  • Wrote the program completing two tasks (4 tasks in total in the robot soccer game): goalie task and finding the best shooting position.
  • Participated in designing the algorithm of path planning task and positioning task and got the second prize.