As the world grows ever more interconnected and complex, making good decisions becomes increasingly difficult – and necessary. Transportation systems in megacities must be optimised, global supply chains must be efficient, and AI trading must be able to handle the evolving behaviour of markets. While humans are very good at many aspects of decision-making (e.g. intuition, heuristics and abstraction), computers excel at logic, analysis and mathematical operations, tasks people often struggle with. Recent advances in machine learning and computing power enable us to train computers to solve complex problems that human minds alone are simply not equipped to handle. This is the essence of VUKU, our decision-making platform here at PROWLER.io. Our aim is to facilitate timely, intelligent decisions and take action in an increasingly interconnected world of complex systems. To build useful decision-making AI today, we need to work out how to solve real-world problems with currently available mathematical tools. But how?
Why not just build a model?
A “model” is a set of assumptions about the world that can be used for inference in new situations. In machine learning (ML), we train a model using a dataset representative of the problem and then use that model to understand characteristics of new, similar data. ML has become adept at representing abstract data like graphs, text and images in concise, numerical form. We can even represent very complex relationships between inputs and outputs such as non-linearities or large dimensionality changes.
Neural networks can be trained to classify cats and dogs in pictures. Regression can help predict the next day's temperature from previous days' weather data. Both are models. They allow us to make predictions in diverse domains and to employ the resulting analytics to make useful decisions.
But deciding what to do based on a model's predictions is not always a simple problem. If we build, for example, a model of demand for taxis in a city, we can then predict when a given area is likely to have new passengers. But just dispatching taxis to the most likely locations will crowd those areas, and passengers from less popular areas will end up stranded — and unhappy.
Just rewards?
Reinforcement learning (RL) aims to teach agents to make better decisions by rewarding them for their successes. By repeatedly interacting with its environment, an agent attempts to learn how to maximise its reward. RL has achieved high performance in many Atari games by using Deep Q-Networks (DQN); add Monte-Carlo Tree Search (MCTS), and RL even beats the world's best Go players. We've seen similar advances in simulated robotic control.
But before it can make a decision, an agent needs some way to map observations of its environment to actions; it needs a strategy or "policy" that defines how to make decisions by interacting directly with the environment. Such a policy can be trained, for instance, to perform robotic tasks by controlling a robot and getting feedback when it is either doing well or making mistakes.
Unfortunately, interactions in real-world environments can be costly, especially when initial performance is poor. Solving important real-world problems can require a well defined, acceptable baseline and highly sample-efficient learning. In robotics, interactions may need to be done hundreds or thousands of times, which would induce wear or damage to the robot. In finance, logistics or transportation, mistakes can be costly, disruptive and damaging, so solutions need to be robust and high-performing. Even if plenty of historical data is available, many environments (e.g. markets or city traffic) change all the time, and constant adjustments are necessary. If the task changes, a naively trained policy can fail. Multi-task and transfer learning can help generalise policies, but we still need to learn the new tasks quickly, which is challenging.
So... we can learn models using data, but these might not align with our objectives or solve the real decision-making problem. On the other hand, we can learn to make decisions directly using reinforcement learning, but this often requires millions of interactions and the resulting policies may not generalise.
What if we combine the two?
One approach — called Model-Based Reinforcement Learning – has had recent success using models for learning better policies. This is an active area of focus at PROWLER.io and our researchers have published work at recent top conferences. One well-known research example is PILCO, where the algorithm achieved orders of magnitude faster learning than even current state-of-the-art deep reinforcement learning algorithms (in terms of the number of interactions with the environment.) The main benefit of model-based RL is that once we learn a good model of the world, we can then use it to improve the performance of RL policies. This is analogous to how humans decide: our previously learned notions of how the world works help us make decisions in new situations. Models can thus help bring fast, human-like learning to computers, enabling us to tackle problems that were previously intractable.
What if we do it probabilistically?
There are essentially two types of uncertainty in any decision-making task.
The first is inherent in the system; it includes uncertainty both in the environment and within the agents themselves. In a city environment, traffic flow as a whole is inherently uncertain, as are the movements — and even the internal mechanics — of individual vehicles. Though computers are capable of being much more precise than humans, in real-world applications like autonomous vehicles and robotics, actions are always somewhat stochastic.
The second type of uncertainty stems from the fact that models can never be 100% accurate. By definition, they are only approximations of the real world based on data that is incomplete – since we can only interact with the environment a finite number of times. Here, uncertainty can be reduced with more data, but getting more data can, again, be costly.
These uncertainties are a key reason we focus on probabilistic models in our work. With probabilistic models, we can quantify – and thus use – uncertainty. This allows us to:
Plan with uncertainty in mind and adjust our decisions to ensure that, for example, delivery vehicles are dispatched in such a way that they are on time 9 times out of 10 even in uncertain traffic conditions. This is an example of accounting for the first type of uncertainty.
Explore areas that are potentially valuable, but highly uncertain. Then, if we discover that some promising part of the environment has too much uncertainty, we can deploy resources to solve the problem. In a delivery routing application, for example, we could install better sensors to gather information about traffic flow on promising routes.
Models also allow us to generalise and do long-term planning. Though tasks are often many and varied in the real world, models can be reused for new tasks by retraining a new policy on the model alone. For example, if the underlying dynamics of two robotics tasks are similar, then the model can be adapted for use in both tasks. By reusing models, we can design decision-making systems that learn to perform increasingly complex tasks.
Comentários