Proximal Policy Optimization PPO

Tags: #machine learning


$$L^{CLIP}(\theta)=E_{t}[\min(r_{t}(\theta))A_{t}, \text{clip}(r_{t}(\theta), 1-\epsilon,1+\epsilon)A_{t}]$$

Latex Code

                                 L^{CLIP}(\theta)=E_{t}[\min(r_{t}(\theta))A_{t}, \text{clip}(r_{t}(\theta), 1-\epsilon,1+\epsilon)A_{t}]

Have Fun

Let's Vote for the Most Difficult Equation!


With supervised learning, we can easily implement the cost function, run gradient descent on it, and be very confident that we’ll get excellent results with relatively little hyperparameter tuning. The route to success in reinforcement learning isn’t as obvious—the algorithms have many moving parts that are hard to debug, and they require substantial effort in tuning in order to get good results. PPO strikes a balance between ease of implementation, sample complexity, and ease of tuning, trying to compute an update at each step that minimizes the cost function while ensuring the deviation from the previous policy is relatively small.

Latex Code
            L^{CLIP}(\theta)=E_{t}[\min(r_{t}(\theta))A_{t}, \text{clip}(r_{t}(\theta), 1-\epsilon,1+\epsilon)A_{t}]

  • : is the policy parameter
  • : denotes the empirical expectation over timesteps
  • : is the ratio of the probability under the new and old policies, respectively
  • : is the estimated advantage at time t
  • : is a hyperparameter, usually 0.1 or 0.2


Comment to Make Wishes Come True

Leave your wishes (e.g. Passing Exams) in the comments and earn as many upvotes as possible to make your wishes come true

  • Jack Reed
    May luck be on my side to pass this exam.
    2023-04-04 00:00


    Eric Stewart reply to Jack Reed
    You can make it...
    2023-04-21 00:00:00.0


  • Curtis Price
    I'm ready to tackle this test head-on.
    2023-08-22 00:00


    Amanda Harris reply to Curtis Price
    You can make it...
    2023-09-07 00:00:00.0


  • Ernest White
    I have high hopes for passing this exam.
    2023-12-24 00:00


    Cheryl Mitchell reply to Ernest White
    Best Wishes.
    2024-01-21 00:00:00.0