Recommender Systems


The goal of a decision support system is to provide the human user with an optimized decision recommendation when operating under uncertainty in complex environments​.


The particular focus of our discussion is the investment domain - the goal of investment decision-making is to select an optimal portfolio that satisfies the investor’s objective, or, in other words, to maximize the investment returns under the constraints given by investors.


Reinforcement learning based recommender systems: A survey - [link](https://arxiv.org/abs/2101.06286) - Abstract:
  • Recommender systems (RSs) are becoming an inseparable part of our everyday lives. They help us find our favorite items to purchase, our friends on social networks, and our favorite movies to watch. Traditionally, the recommendation problem was considered as a simple classification or prediction problem; however, the sequential nature of the recommendation problem has been shown. Accordingly, it can be formulated as a Markov decision process (MDP) and reinforcement learning (RL) methods can be employed to solve it. In fact, recent advances in combining deep learning with traditional RL methods, i.e. deep reinforcement learning (DRL), has made it possible to apply RL to the recommendation problem with massive state and action spaces. In this paper, a survey on reinforcement learning based recommender systems (RLRSs) is presented. We first recognize the fact that algorithms developed for RLRSs can be generally classified into RL- and DRL-based methods. Then, we present these RL- and DRL-based methods in a classified manner based on the specific RL algorithm, e.g., Q-learning, SARSA, and REINFORCE, that is used to optimize the recommendation policy. Furthermore, some tables are presented that contain detailed information about the MDP formulation of these methods, as well as about their evaluation schemes. Finally, we discuss important aspects and challenges that can be addressed in the future.

You can visit the literature we went over while working on this Usecase here