Touchless Screen Gesture Recognition
I developed algorithms for robust detection, key-point recognition and 3D human recognition. I am in charge of deployment of the fourth screen using natural gesture in real time and high accuracy. The whole system is running on an edge device of 5 TOPS. Imagine the day we control the screen in the "mission impossible" touchless fashion (or just take McDonald's order in the time of Covid :p).

ARCore for 3D Car rendering
On mobile platform, I deploy AR technique (ARCore) for 6 DoF Car real time rendering and interaction. Based on the developed the computer vision system for automatic evaluation for car damage claim.

Kaggle Baidu/Peking University autonomous driving challenge
As a principal research, I lead the team to develop an algorithm to estimate the absolute pose of vehicles (6 degrees of freedom) from a single image in a real-world traffic environment

Ranking: 2st/864, cash prize winner.


ApolloScape ECCV 2018 3D Car Instance Understanding Challenge (Baidu)
I designed a system to detect, reconstruct and estimate the 3D shape of the cars in a given video in a single image. A novel RCNN based network is proposed to train the 6DoF estimator in an end-to-end fashion.

Ranking: 1st, cash prize winner.


Kaggle CVPR 2018 WAD Video segmentation Challenge (Baidu)
I evaluated various Mask-RCNN based models to segment movable objects, such as cars and pedestrians, at instance level within image frames.

Participants include tech companies such as Megvii, DiDi, Nvidia Research, etc.

Ranking: 3rd/145 (top 2%), cash prize winner..


ICVSS--International Computer Vision Summer School
I join the 2017 11th International Computer Vision Summer School in the beautiful Sicily. The courses were delivered by world renowned experts in the field, from both academia and industry, and covered both theoretical and practical aspects of real Computer Vision problems as well as examples of their successful commercialisation. The school aims to provide a stimulating opportunity for young researchers and Ph.D. students. The participants will benefit from direct interaction and discussions with world leaders in Computer Vision. Participants will also have the possibility to present the results of their research, and to interact with their scientific peers, in a friendly and constructive environment.

I have also met my favourate researcher Raquel Urtasun again!


EUMSSI Project
I join the EUMSSI (Event Understanding through and Multimodal Social Stream Interpretation) project. The EUMSSI project belongs to European Commission 7th framework (Programme: FP7-ICT). I focus working on the person identification and characterisation of people in video steams coming from broadcast TV. By identification, it is meant (1) detecting the speech segments and clustering them into different speakers (a task known as speaker diarization); (2) detecting faces and clustering them so that each cluster corresponds to a single identity (face diarization); (3) associating the faces with voices in order to build an audio-visual (AV) representation of all persons in the video, and as a side effect know when a person appears or speak in the video document (AV fusion); and finally (4) actually name the different person clusters.

Try out EUMSSI Demo now!


Alibaba Tianchi, Clothes Matching Challenge on Taobao.com
Taobao is one of the most famous Chinese website for online shopping, which is similar to eBay and Amazon. It facilities C2C retail by providing a platform for small businesses and individual entrepreneurs to open online stores. In Taobao, apparel and accessories industries occupy the market by the vast majority of the share. Clothing matching (e.g. find appropriate pants and shoes for a shirt) is a very important topic in shopping guide. The extension of this technology can be widely applied to varieties of scenarios of big data marketing, such as search, recommendation, and advertising etc,. In this competition, data sets of clothing collocation from fashion experts, image data of Taobao items, and user behavior data are provided. Participants are required to train their model, which provides personalized, quality, professional clothing collocation suggestion. Our team enemylessRain ranks the 7th out of 2100 participating teams in the Season 1.

MediaEval Benchmarking Initiative for Multimedia Evaluation 2015, Wurzen, Germany
The workshop brought together task participants to present and discuss their findings, and prepare for future work. I participated for the "The 2015 Multimodal Person Discovery in Broadcast TV" track as team EUMSSI and our proposed system ranked the first amongst other 9 international research teams. This task represents an extension of the (now completed) French REPERE challenge, which focused on multimodal person recognition in TV broadcast. The main objective of this challenge was to answer the two questions "Who speaks when?" and "Who appears when?" using any sources of information (including pre-existing biometric models and person names extracted from text overlay and speech transcripts). In this new task, only unsupervised algorithms (i.e., algorithms not relying on pre-existing labels or biometric models) are admitted. To ensure high quality indexes, those algorithms should also help human annotators double-check these indexes by providing an evidence of the claimed identity (especially for people who are not yet famous).

MLSS Machine Learning Summer School 2014, April 25 - May 4, 2014, Reykjavik, Iceland
The Machine Learning Summer School (MLSS) is a great venue for graduate students, researchers, and professionals to learn about fundamental and advanced methods of machine learning, data analysis, and inference, from theory to practice.

Gaussian Process Summer School 2014
The Gaussian Process Summer Schools are a series of meetings targeted at facilitating the understanding of Gaussian process models both in theory and practice. Gaussian Process defines a prior over functions, which can be converted into a posterior over functions once we have seen some data. The Gaussian Process Summer Schools are a series of meetings targeted at facilitating the understanding of Gaussian process models both in theory and practice. The main summer schools are held in Sheffield, UK at the Sheffield Institute for Translational Neuroscience.

Summer Internship at MSRC
In the summer of 2013, I was fortunate to have an internship at Microsoft Research Cambridge under the supervision of Ben Glocker. The project is based on Kinect sensor for asessing movement disorders patients. It has been a truly amazing experience!

CHALEARN Gesture Challenge
The challenge is to predict the identity of 1 to 5 gestures from a vocabulary of 8 to 15 gesture tokens recorded by Kinect camera. I have achieved the 8th in the leaderboard and have been invited to give an oral presentation at the CVPR 2012 workshop on Gesture Recognition.


Udacity
I have been a beneficiary of this programming based learning programmes. "bridge the gap between real world skills, relevant education, and employ". I have taken some interesting courses:

Kaggle
From time to time, I like to participate some interesting competitions on the "make science a sport" Kaggle platform where companies and researchers post their data and statisticians and data miners from all over the world compete to produce the best models. It's fascinating to see how machine learning technqiues are applied to tackle real-world problems. My Kaggle account: stevenwudi

Github
My Github account: stevenwudi.

This document was last modified on: