Self-supervised learning for visual object tracking possesses valuable advantages compared to supervised learning, such as the non-necessity of laborious human annotations and online training. In this work, we exploit an end-to-end Siamese network in a cycle-consistent self-supervised framework for object tracking. Self-supervision can be performed by taking advantage of the cycle consistency in the forward and backward tracking. To better leverage the end-to-end learning of deep networks, we propose to integrate a Siamese region proposal and mask regression network in our tracking framework so that a fast and more accurate tracker can be learned without the annotation of each frame.
Self-supervised learning for depth estimation possesses several advantages over supervised learning. The benefits of no need for ground-truth depth, online fine-tuning, and better generalization with unlimited data attract researchers to seek self-supervised solutions. In this work, we propose a new self-supervised framework for stereo matching utilizing multiple images captured at aligned camera positions. A cross photometric loss, an uncertainty-aware mutual-supervision loss, and a new smoothness loss are introduced to optimize the network in learning disparity maps end-to-end without ground-truth depth information. To train this framework, we build a new multiscopic dataset consisting of synthetic images rendered by 3D engines and real images captured by real cameras. After being trained with only the synthetic images, our network can perform well in unseen outdoor scenes. Our experiment shows that our model obtains better disparity maps than previous unsupervised methods on the KITTI dataset and is comparable to supervised methods when generalized to unseen data.
Learning to Detect and Predict Contact Events on Vision-based Tactile Sensor
We propose to classify and predict tactile signal using deep learning method, seeking to enhance the adaptability of the robotic grasp system to external disturbances. We develop a deep learning framework and collect tactile image sequence with vision-based tactile sensor, FingerVision. The neural network is integrated into a FingerVision-based robotic grasping system to detect the current and predict the future grasping state, e.g. rolling, slipping or stable contact.
We design a multiscopic vision system that utilizes a monocular camera to acquire accurate depth estimation. Unlike multi-view stereo with images captured at unconstrained camera poses, the proposed system controls the motion of the camera to capture a sequence of images in horizontally or vertically aligned positions with the same parallax. We propose a new heuristic method and a robust learning-based method to fuse multiple cost volumes between the reference image and its surrounding images. Trained with the synthetic dataset we built, our method outperforms previous stereo and multiscopic approaches.
Sorting Cluttered Tabletop Using Improved Monte Carlo Tree Search
In industrial environments, there are many tasks about sorting and rearranging different kinds of objects. Inspired by AlphaGo, we propose an improved MCTS to do this. We use the results of a classical MCTS to train a policy network and then use the network to guide the tree search.
We model a whole arm manipulation task about holding as a deep reinforcement learning problem in order to provide a behavior that can directly respond to external perturbation and target motion. To improve the performance of deep learning in robotics application, we propose a new state as the input of the networks, the topology representation. This state allows transferring the learned policy to various shapes, sizes and poses because they are the same in topology space. Compared to RGB image state or pose coordinates state, it can better describe the interaction state between the robot and environments. Besides, there is no reality gap between simulation and reality, which means the policy trained in simulator can be directly transferred to real world.
In contrast to traditional methods which can only handle static scene, as explicitly modeling the physical environment is not always feasible and involves various uncertainties, we learn a nonprehensile rearrangement policy with deep reinforcement learning in simulation and then transfer the knowledge to reality. We propose potential-field-based heuristic exploration and reply buffer curation to assist the training and propose a transfer method in Q-space to fill in the gap between simulation and reality. The real robot can perform well in the dynamic scene with our approach.
Machine Vision Research and Software Development of Intelligent Forging Production Line Based on Industrial Robots (Graduation Project)
Set up a special hardware platform to acquire usable images.
Designed a system for high temperature forging which can obtain its the size, position and posture.
Developed an interaction interface based on MFC to output the detection information and give the alarm when the position or posture of the forging is wrong.
On-Road Lane Detection System Based on Machine Vision
Detect the structured and unstructured road edge based on the gradient of brightness and the RGB information of the road to assist the driving of a Fuel Cells Go-Kart.
Ninth Robot Competition of Zhejiang Province (Team leader)
- Wrote the program completing two tasks (4 tasks in total in the robot soccer game): goalie task and finding the best shooting position.
- Participated in designing the algorithm of path planning task and positioning task and got the second prize.