{
  "cells": [
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "%matplotlib inline"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "\n# Example: Solving an OpenAI Gym environment with CGP.\n\nThis examples demonstrates how to solve an OpenAI Gym environment\n(https://gym.openai.com/envs/) with Cartesian genetic programming. We\nchoose the \"MountainCarContinuous\" environment due to its continuous\nobservation and action spaces.\n\nPreparatory steps:\nInstall the OpenAI Gym package: `pip install gym`\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "# The docopt str is added explicitly to ensure compatibility with\n# sphinx-gallery.\ndocopt_str = \"\"\"\n  Usage:\n    example_parametrized_nodes.py [--max-generations=<N>] [--visualize-final-champion]\n\n  Options:\n    -h --help\n    --max-generations=<N>  Maximum number of generations [default: 500]\n    --visualize-final-champion  Create animation of final champion in the mountain car env.\n\"\"\"\n\nimport functools\nimport warnings\n\nimport matplotlib.pyplot as plt\nimport numpy as np\nimport sympy\nfrom docopt import docopt\n\nimport cgp\n\ntry:\n    import gym\nexcept ImportError:\n    raise ImportError(\n        \"Failed to import the OpenAI Gym package. Please install it via `pip install gym`.\"\n    )\n\n\nargs = docopt(docopt_str)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "For more flexibility in the evolved expressions, we define two\nconstants that can be used in the expressions, with values 0.1 and\n10.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "class ConstantFloatZeroPointOne(cgp.ConstantFloat):\n    _def_output = \"0.1\"\n\n\nclass ConstantFloatTen(cgp.ConstantFloat):\n    _def_output = \"10.0\""
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Then we define the objective function for the evolution.  The inner\nobjective accepts a Python callable as input. This callable\ndetermines the action taken by the agent upon receiving observations\nfrom the environment. The fitness of the given callable on the task\nis then computed as the cumulative reward over a fixed number of\nepisodes.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "def inner_objective(f, seed, n_runs_per_individual, n_total_steps, *, render):\n\n    env = gym.make(\"MountainCarContinuous-v0\")\n\n    env.seed(seed)\n\n    cum_reward_all_episodes = []\n    for _ in range(n_runs_per_individual):\n        observation = env.reset()\n\n        cum_reward_this_episode = 0\n        for _ in range(n_total_steps):\n\n            if render:\n                env.render()\n\n            continuous_action = f(*observation)\n            observation, reward, done, _ = env.step([continuous_action])\n            cum_reward_this_episode += reward\n\n            if done:\n                cum_reward_all_episodes.append(cum_reward_this_episode)\n                cum_reward_this_episode = 0\n                observation = env.reset()\n\n    env.close()\n\n    return cum_reward_all_episodes"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "The objective then takes an individual, evaluates the inner\nobjective, and updates the fitness of the individual. If the\nexpression of the individual leads to a division by zero, this error\nis caught and the individual gets a fitness of -infinity assigned.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "def objective(ind, seed, n_runs_per_individual, n_total_steps):\n\n    if not ind.fitness_is_None():\n        return ind\n\n    f = ind.to_func()\n    try:\n        with warnings.catch_warnings():  # ignore warnings due to zero division\n            warnings.filterwarnings(\n                \"ignore\", message=\"divide by zero encountered in double_scalars\"\n            )\n            warnings.filterwarnings(\n                \"ignore\", message=\"invalid value encountered in double_scalars\"\n            )\n            cum_reward_all_episodes = inner_objective(\n                f, seed, n_runs_per_individual, n_total_steps, render=False\n            )\n\n        # more episodes are better, more reward is better\n        n_episodes = float(len(cum_reward_all_episodes))\n        mean_cum_reward = np.mean(cum_reward_all_episodes)\n        ind.fitness = n_episodes / n_runs_per_individual + mean_cum_reward\n\n    except ZeroDivisionError:\n        ind.fitness = -np.inf\n\n    return ind"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "We then define the main loop for the evolution, which consists of:\n\n- parameters for the population, the genome of individuals, and the evolutionary algorithm.\n- creating a Population instance and instantiating the evolutionary algorithm.\n- defining a recording callback closure for bookkeeping of the progression of the evolution.\n\nFinally, we call the `evolve` method to perform the evolutionary search.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "def evolve(seed):\n\n    objective_params = {\"n_runs_per_individual\": 3, \"n_total_steps\": 2000}\n\n    genome_params = {\n        \"n_inputs\": 2,\n        \"primitives\": (\n            cgp.Add,\n            cgp.Sub,\n            cgp.Mul,\n            cgp.Div,\n            cgp.ConstantFloat,\n            ConstantFloatZeroPointOne,\n            ConstantFloatTen,\n        ),\n    }\n\n    ea_params = {\"n_processes\": 4}\n\n    evolve_params = {\n        \"max_generations\": int(args[\"--max-generations\"]),\n        \"termination_fitness\": 100.0,\n    }\n\n    pop = cgp.Population(genome_params=genome_params)\n\n    ea = cgp.ea.MuPlusLambda(**ea_params)\n\n    history = {}\n    history[\"expr_champion\"] = []\n    history[\"fitness_champion\"] = []\n\n    def recording_callback(pop):\n        history[\"expr_champion\"].append(pop.champion.to_sympy())\n        history[\"fitness_champion\"].append(pop.champion.fitness)\n\n    obj = functools.partial(\n        objective,\n        seed=seed,\n        n_runs_per_individual=objective_params[\"n_runs_per_individual\"],\n        n_total_steps=objective_params[\"n_total_steps\"],\n    )\n\n    pop = cgp.evolve(\n        obj, pop, ea, **evolve_params, print_progress=True, callback=recording_callback\n    )\n\n    return history, pop.champion"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "For visualization, we define a function to plot the fitness over generations.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "def plot_fitness_over_generation_index(history):\n    width = 6.0\n    fig = plt.figure(figsize=(width, width / 1.618))\n    ax = fig.add_axes([0.15, 0.15, 0.8, 0.8])\n    ax.set_xlabel(\"Generation index\")\n    ax.set_ylabel(\"Fitness champion\")\n    ax.plot(history[\"fitness_champion\"])\n    fig.savefig(\"example_mountain_car.pdf\", dpi=300)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "We define a function that checks whether the best expression\nfulfills the \"solving criteria\", i.e., average reward of at least\n90.0 over 100 consecutive\ntrials. (https://github.com/openai/gym/wiki/Leaderboard#mountaincarcontinuous-v0)\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "def evaluate_champion(ind):\n\n    env = gym.make(\"MountainCarContinuous-v0\")\n\n    env.seed(seed)\n    observation = env.reset()\n\n    f = ind.to_func()\n\n    cum_reward_all_episodes = []\n    cum_reward_this_episode = 0\n    while len(cum_reward_all_episodes) < 100:\n\n        continuous_action = f(*observation)\n        observation, reward, done, _ = env.step([continuous_action])\n        cum_reward_this_episode += reward\n\n        if done:\n            cum_reward_all_episodes.append(cum_reward_this_episode)\n            cum_reward_this_episode = 0\n            observation = env.reset()\n\n    env.close()\n\n    cum_reward_average = np.mean(cum_reward_all_episodes)\n    print(f\"average reward over 100 consecutive trials: {cum_reward_average:.05f}\", end=\"\")\n    if cum_reward_average >= 90.0:\n        print(\"-> environment solved!\")\n    else:\n        print()\n\n    return cum_reward_all_episodes"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Furthermore, we define a function for visualizing the agent's behaviour for\neach expression that increase over the currently best performing individual.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "def visualize_behaviour_for_evolutionary_jumps(seed, history, only_final_solution=True):\n    n_runs_per_individual = 1\n    n_total_steps = 999\n\n    max_fitness = -np.inf\n    for i, fitness in enumerate(history[\"fitness_champion\"]):\n\n        if only_final_solution and i != (len(history[\"fitness_champion\"]) - 1):\n            continue\n\n        if fitness > max_fitness:\n            expr = history[\"expr_champion\"][i]\n            expr_str = str(expr).replace(\"x_0\", \"x\").replace(\"x_1\", \"dx/dt\")\n\n            print(f'visualizing behaviour for expression \"{expr_str}\" (fitness: {fitness:.05f})')\n\n            x_0, x_1 = sympy.symbols(\"x_0, x_1\")\n            f_lambdify = sympy.lambdify([x_0, x_1], expr)\n\n            def f(x, v):\n                return f_lambdify(x, v)\n\n            inner_objective(f, seed, n_runs_per_individual, n_total_steps, render=True)\n\n            max_fitness = fitness"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Finally, we execute the evolution and visualize the results.\nTo animate the behavior of the car for the found expression, uncomment\nthe last line of the example.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "if __name__ == \"__main__\":\n\n    seed = 1234\n\n    print(\"starting evolution\")\n    history, champion = evolve(seed)\n    print(\"evolution ended\")\n\n    max_fitness = history[\"fitness_champion\"][-1]\n    best_expr = history[\"expr_champion\"][-1]\n    best_expr_str = str(best_expr).replace(\"x_0\", \"x\").replace(\"x_1\", \"dx/dt\")\n    print(f'solution with highest fitness: \"{best_expr_str}\" (fitness: {max_fitness:.05f})')\n\n    plot_fitness_over_generation_index(history)\n    evaluate_champion(champion)\n    if args[\"--visualize-final-champion\"]:\n        visualize_behaviour_for_evolutionary_jumps(seed, history)"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.8.6"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}