[ML.1] AML - Reinforcement Learning assignment typossteemCreated with Sketch.

in technology •  6 years ago  (edited)

I would like to start a new blog series - Machine Learning [ML], as I am slowly re-qualifying from pure science to data science.

Recently I finished the fourth course on Coursera in a row from Advanced Machine Learning specialization - Practical Reinforcement Learning.

I will copy my review of the course (mark 5/5):
"This is my fourth AML course, and for now I would say it is the best one. It connects lectures and practice in the best way. On the other hand, there are mistakes all around, as it is beta-version. In my opinion, it is not fair to put the beta-version course into paid specialization."

Yes, you read it right - it is the beta-version. Hence, a few practical tasks (or assignments), had some mistakes where they should not be, i.e. in the part of the code that should not be changed by the student. I will list all of them here and try to make somebodies life easier :)

Week 1 / Assignment 1 - "OpenAI Gym":

There is a wrong hint:
Hint: your action at each step should depend either on t or on s.
One should only use t!!!

Week 6 / Assignment 8 - "Bandirs & exploration":

Change 1 - class BernoulliBandit:

class BernoulliBandit:

def pull(self, action):

line

if np.random.random() > self._probs[action]:

changed to:

if np.any(np.random.random() > self._probs[action]):

i.e. the condition is put under np.any().

Change 2 - def plot_regret:

def plot_regret(scores):

line

plt.legend([agent.name for agent in scores])

changed to

plt.legend([agent for agent in scores])

i.e. agent.name is changed to solely agent.

Change 3 - submission:

Instead of submit_bandits, make new function submit_bandits2 in submit.py:

def submit_bandits2(agents, scores, email, token):
    epsilon_greedy_agent = None
    ucb_agent = None
    thompson_sampling_agent = None
    for agent in agents:
        if "EpsilonGreedyAgent" in agent.name:
            epsilon_greedy_agent = agent.name
        if "UCBAgent" in agent.name:
            ucb_agent = agent.name
        if "ThompsonSamplingAgent" in agent.name:
            thompson_sampling_agent = agent.name
    assert epsilon_greedy_agent is not None
    assert ucb_agent is not None
    assert thompson_sampling_agent is not None
    grader = grading.Grader("VL9tBt7zEeewFg5wtLgZkA")
    grader.set_answer("YQLYE", (int(scores[epsilon_greedy_agent][int(1e4) - 1]) - int(scores[epsilon_greedy_agent[int(5e3) - 1])))
    grader.set_answer("FCHOZ", (int(scores[epsilon_greedy_agent][int(1e4) - 1]) - int(scores[ucb_agent][int(1e4) - 1])))
    grader.set_answer("0JWHl", (int(scores[epsilon_greedy_agent][int(5e3) - 1]) - int(scores[ucb_agent][int(5e3) - 1])))
    grader.set_answer("4rH5M", (int(scores[epsilon_greedy_agent][int(1e4) - 1]) - int(scores[thompson_sampling_agent][int(1e4) - 1])))
    grader.set_answer("TvOqm", (int(scores[epsilon_greedy_agent][int(5e3) - 1]) - int(scores[thompson_sampling_agent][int(5e3) - 1])))
    grader.submit(email, token)

Week 6 / Assignment 9 - "MCTS":

class WithSnapshots(Wrapper):

def load_snapshot(self,snapshot):

line

self.render(close=True) # close popup windows since we can't load into them

changed to

self.close()


I hope somebody will find this helpful and please let me know if you think that something else should be here.

If you are new to SteemIT and decide to join based on this article - let me know using any media you prefer.

Authors get paid when people like you upvote their post.
If you enjoyed what you read here, create your account today and start earning FREE STEEM!