Facebook AI researchers have created a pair of AI systems that are able to navigate the streets of New York City using only 360-degree images, natural language, and a map with local landmarks like banks and restaurants for guidance. The research task and dataset named Talk the Walk is being open-sourced today alongside initial results of the real-world training being published on Arxiv today. The two AI systems are trained to complete two specific tasks: The tourist bot must describe its surroundings to the guide bot, which then interprets the tourists location based on the description and use of a map. Agents were only given the ability to move forward, left, or right at intersections within two city blocks. Tourist agents could only describe their location for the guide using a map with no street names. Natural language used in the exercise was created from transcripts of text from humans who completed the same task. What sets this apart from those other datasets is we have actual natural language annotations, so its not some kind of artificially templated language, which other people have tried. This is the first instance where its real language with real visual perception, Facebook AI research scientist Douwe Kiela told VentureBeat in a phone interview. Talk the Walk involves two AI systems in a two-block radius in Hells Kitchen, East Village, Financial District, and Upper East Side in Manhattan, and the Williamsburg neighborhood in Brooklyn. Complicating matters a bit, each of the neighborhoods follows a grid system so the maps have no distinctive qualities. A two-block radius with 16 different street corners may seem small; however, the original study started covering more ground but had to be reduced because it proved too hard for humans to complete. Its an important task because it brings together a lot of different challenges that we need to solve if we want to make progress with AI research, so things like realistic 360 visual perception, map-based navigation, visual reasoning, natural language communication by dialogue — all of these things are important to solve problems in AI. And what this work is about is trying to bring all these problems together into an overarching, all-encompassing kind of solution, Kiela said. While 360 video and a map were part of input that trained the systems, the task and benchmark dataset is primarily geared toward the advancement of conversational AI, said Kiela, whose work has centered on grounding, the practice of using multimodal methods to develop natural language understanding. To reach one another requires successful communication, both from the tourist telling the guide where it is with natural language and the guide that must interpret words generated by the tourist agent. The long term vision of this kind of research is improving natural language understanding, and so that of course is interesting to humankind. Basically, if we can achieve artificial intelligence where agents actually understand natural language, then that would be kind of a pivotal moment for AI, and I think were not even close to that yet, he said. I really care about this long term vision, first and foremost, of how can we get to this kind of language understanding and how can we get AI that really has this kind of common sense that has been missing up until now. An attention mechanism called Masked Attention for Spatial Convolution (MASC) was used to narrow the focus of the agents, and produced results that at times made the agents twice as likely to complete the task. The resulting task and dataset were made to act as a benchmark. The work is being open-sourced so others in the AI community can advance the current state of machine understanding of human communication skills. This is a difficult challenge, and thats also one of the reasons were open-sourcing it and inviting people to think about this kind of problem. In general we should have more hard challenges in AI research and difficult problems for the community to tackle and realize also what the limitations are of what we can currently do. And so the open-sourcing thing is important to us, and thats why were happy to share with the scientific community, he said. In my opinion this really is the way forward with AI. If we dont have this, then its going to look like were making a lot of progress, but were not really making the kind of progress that we should be making. To view or download the dataset, visit this code.fb.com website.
Facebook is developing a Talk the Walk AI capable of giving walking directions without knowing a users location. What it is: A team comprised of a researcher from the University of Montreal in Canada and Facebook Artificial Intelligence Research ( FAIR) scientists recently published a white paper describing a neural network capable of giving a person plain language directions without the use of GPS or other location tracking aids. According to the researchers: We introduce the Talk the Walk dataset, where the aim is for two agents, a guide and a tourist, to interact with each other via natural language in order to achieve a common goal: having the tourist navigate towards the correct location. The guide has access to a map and knows the target location, but does not know where the tourist is; the tourist has a 360-degree view of the world, but knows neither the target location on the map nor the way to it. The agents need to work together through communication in order to successfully solve the task. How it works: Hypothetically, if this neural network were to be fully fleshed-out, it could provide end-to-end directions to a person, even if location services and internet connectivity was unavailable. In such a case, it would function by allowing users to have a conversation with an AI in much the same way they would with a human. The tourist describes the landmarks they see, such as Im standing next to a theater, and the AI tries to determine where they are. It can ask questions in return, for example it may ask if the user sees a shop on the corner to help narrow down which theater theyre looking at. Once it determines where the user is, it gives a plain language response guiding them to the next waypoint. When its coming: Maybe never, as this isnt a new feature the company is rolling out. Its early research which appears to lay the ground work for future development. The Talk the Walk white paper establishes a data-set and some basic algorithms to prove that the concept works, but its far from being ready for prime time. The major significance of this work is in its focus on creating AI capable of working together with humans to achieve a goal. To learn more about neural networks read our primer here. And dont forget to visit our artificial intelligence section for all the latest news and updates in machine learning.