Fully convolutional denoising autoencoder for 3D scene reconstruction from a single depth image


In this work, we propose a 3D scene reconstruction algorithm based on a fully convolutional 3D denoising autoencoder neural network. The network is capable of reconstructing a full scene from a single depth image by creating a 3D representation of it and automatically filling holes and inserting hidden elements. We exploit the fact that our neural network is capable of generalizing object shapes by inferring similarities in geometry. Our fully convolutional architecture enables the network to be unconstrained by a fixed 3D shape, and so it is capable of successfully reconstructing arbitrary scene sizes. Our algorithm was evaluated on a real word dataset of tabletop scenes acquired using a Kinect and processed using KinectFusion software in order to obtain ground truth for network training and evaluation. Extensive measurements show that our deep neural network architecture outperforms the previous state of the art both in terms of precision and recall for the scene reconstruction task. The network has been broadly profiled in terms of memory footprint, number of floating point operations, inference time and power consumption in CPU, GPU and embedded devices. Its small memory footprint and its low computation requirements enable low power, memory constrained, real time always-on embedded applications such as autonomous vehicles, warehouse robots, interactive gaming controllers and drones.

2017 4th International Conference on Systems and Informatics (ICSAI)