This review presents on research of application of reinforcement learning and new approaches on a course search in mazes
with some kinds of multi-point passing as machines. It is based on a selective learning from multi-directive behavior patterns using
PS (Profit Sharing) by an agent. The behavior is selected stochastically from 4 kinds of ones using PS with Boltzmann Distribution
with a plan to inhibit invalid rules by a reinforcement function of a geometric sequence. Moreover, a variable temperature scheme is
adopted in this distribution, where the environmental identification is valued in the first stage of the search and the convergence of
learning is shifted to be valuing as time passing. A SUB learning system and a multistage layer system were proposed in this review,
and these functions were inspected by some simulations and experiments using a mobile robot.
Keywords: Autonomous; Mobile Robot; Learning; Agent; System Simulation