OpenAI set up competitions between 3D robots. Each robot has a goal (push the other robot, go to other side, kick the ball). The robots learn behaviors like tackling, ducking, faking, kicking and catching, and diving for the ball. Neural network of each agent was trained with PPO (Proximal Policy Optimization).
WOW
Downvoting a post can decrease pending rewards and make it less visible. Common reasons:
Submit