A New Way to Interact With Robots
Computer Science and Engineering
AltmetricsView Usage Statistics
This thesis presents a new and portable robot control architecture for a multimodal user interface for interactive conversations with a robot. Multimodal user interfaces allow the robot to interact with more than one person within a group. Previous research produced architectures that allowed the robot to respond to different people in the conversation and determine which person is talking. However, there has been no architecture that can keep track of a multi-person conversation in the same way that a human can, by establishing and maintaining an identity for each person involved in the conversation. This thesis presents an architecture that attempts to mimic the way humans interact and view a conversation, allowing the robot to carry different conversations with multiple people. The proposed architecture brings the following main contributions. First, it allows for on-line learning of people’s faces, which can later be used for appropriately tracking the conversations. Second, it tracks the faces that are recognized in the field of vision to determine who is in the conversation and who has left the conversation. Third, this architecture allows for the robot to respond differently to various people who are in a different stage in the conversation. The robot can give an appropriate response to different people by keeping track of each person and where they are in the conversation. Fourth, in addition to using vision and facial recognition to track and identify users, the architecture will also incorporate sound localization to determine who has spoken. The architecture will also incorporate speech recognition to determine what was spoken. To achieve these goals, the architecture uses finite state machines (FSM) to track interactions between the robot and the person. For each person who joins the conversation, an instance of the FSM is created for that person. The FSM records each person’s interaction as well the robot’s response to the user’s interaction. FSMs also facilitate correct responses from the robot. While existing implementations have been done on specialized robots in the past, this thesis presents a portable architecture that utilizes robots that are already on the market and libraries that are readily available. The advantage of such an implementation is the potential for further development, to an ‘off the shelf’ implementation that can be adapted for use on different robots. While the implementation in this thesis has not been tested on different robots, it has been successfully emulated on a laptop as well as the NAO robotic platform.