What is the difference between RL and a random trial-and-error search?
Classic RL (i.e., without using deep networks) makes the processing of alternatives more consistent and efficient than a random trial-and-error search. A fundamental principle of RL is environment exploration followed by knowledge exploitation. In other words, nothing prevents us from mixing model application and testing, provided that a reasonable balance is maintained.
However, for some tasks, it is unfeasible to process all possible scenarios. In this case, advanced RL algorithms allow us to summarize collected knowledge and apply it to new ones. Again, the same explore and exploit concept is in action.What is the optimal algorithm of interaction with an environment?
RL mantra is: An instant win does not always guarantee sustainable success. For instance, capturing an opponent’s piece in chess might lead to more considerable losses.
However, when we choose an action, we make assumptions about what might happen on the next move. Again, on the next step, we can assume what would happen next and so on. All the accumulated knowledge is accounted for when we opt for the next action. Thus, the behavior strategy emerges. This principle is witnessed in games, where there is progress in teaching robots. Below are examples and references:
GO tutorial: AlphaGo algorithms take over the best Go professional players
Chatbot: this bot is negotiating with another agent to strike a deal.
For the sake of establishing a terminological glossary, first, we will discuss three examples that demonstrate certain conceptual features of RL: