If you have any problems related to the accessibility of any content (or if you want to request that a specific publication be accessible), please contact us at firstname.lastname@example.org.
A Single-Shot Next Best View Approach Accompanied by a Dual-View Active Vision for Object Recognition Tasks
AltmetricsView Usage Statistics
Active vision is the ability of intelligent agents to dynamically gather more information about their surroundings by physical motion of the camera. Adding to the number of sources of sensory information can be efficacious in enhancing the object recognition capability of robots. In the realm of vision-based object recognition, in addition to improving the general recognition performance, observing objects of interest from different points of view can be central to handling occlusions. In this work, a robotic vision system is proposed that constantly uses a 3D camera, while actively switching to make use of a second RGB camera in cases where it is necessary. The proposed system detects objects in the view seen by the 3D camera, which is mounted on a humanoid robot’s head, and computes a confidence measure for its recognitions. In the event of low confidence regarding the correctness of the recognition, the secondary camera, which is installed on the robot’s arm, is moved toward the object to obtain another perspective of the object. With the objects detected in the scene viewed by the hand camera, they are matched to the detections of the head camera, and subsequently, their recognition decisions are fused together. One of the decision fusion methods is a novel approach based on the Dempster–Shafer evidence theory. Significant improvements in object recognition performance are observed after employing the proposed active vision system.In the case of object recognition, active vision enables improved performance by incorporating classification decisions from new viewpoints when there is some degree of uncertainty in the current recognition result. A natural question in an autonomous active vision system is, nonetheless, how to determine the new viewpoint, i.e. in what pose should the camera be moved? This is the traditional question of next best view in active perception systems. Current approaches to the next best view problem either need construction of occupancy grids or require training datasets of 3D objects or multiple captures of the same object in specified poses. Occupancy grid methods are usually dependent on multiple camera movements to perform well, which make them more useful for 3D reconstruction applications than object recognition. In this work, a next best view method for active object recognition based on object appearance and surface direction is proposed that decides on the next cameras pose without requiring any specifically structured training datasets of 3D objects. It is also designed for single-shot deductions of next viewpoint and is able to determine next best views without the need for substantial knowledge of 3D voxels in the environment around the camera. The experimental results illustrate the efficiency of the proposed method, while showing large improvements in accuracy and F1 score.