Saturday, November 20, 2021

Reinforcement learning summary

 

You need 2 things

A. Environment that simulates the step, rewad, states, etc

B. Model that is capable of learning over time


A. Environment:

Key libraries:

1. OpenAI gym: Contains many prebuilt environments

2. Gym_anytrading: Contains great environments for trading. https://github.com/AminHP/gym-anytrading

3. Book of yves hilpisch has a custom environment: https://colab.research.google.com/github/yhilpisch/aiif/blob/main/code/09_reinforcement_learning_b.ipynb

4. Book of Stefan jansen: https://github.com/stefan-jansen/machine-learning-for-trading/tree/main/22_deep_reinforcement_learning


B. Model

Key libraries

1. Stable baselines

To create a model, you need to define the algorithm and the policy


Algorithms


This table displays the rl algorithms that are implemented in the stable baselines project, along with some useful characteristics: support for recurrent policies, discrete/continuous actions, multiprocessing.

NameRefactored [1]RecurrentBoxDiscreteMulti Processing
A2C✔️✔️✔️✔️✔️
ACER✔️✔️❌ [4]✔️✔️
ACKTR✔️✔️✔️✔️✔️
DDPG✔️✔️✔️ [3]
DQN✔️✔️
HER✔️✔️✔️
GAIL [2]✔️✔️✔️✔️✔️ [3]
PPO1✔️✔️✔️✔️ [3]
PPO2✔️✔️✔️✔️✔️
SAC✔️✔️
TD3✔️✔️
TRPO✔️✔️✔️ [3]
[1]Whether or not the algorithm has be refactored to fit the BaseRLModel class.
[2]Only implemented for TRPO.
[3](1234) Multi Processing with MPI.
[4]TODO, in project scope.


Policies


Available Policies

MlpPolicyPolicy object that implements actor critic, using a MLP (2 layers of 64)
MlpLstmPolicyPolicy object that implements actor critic, using LSTMs with a MLP feature extraction
MlpLnLstmPolicyPolicy object that implements actor critic, using a layer normalized LSTMs with a MLP feature extraction
CnnPolicyPolicy object that implements actor critic, using a CNN (the nature CNN)
CnnLstmPolicyPolicy object that implements actor critic, using LSTMs with a CNN feature extraction
CnnLnLstmPolicyPolicy object that implements actor critic, using a layer normalized LSTMs with a CNN fe

Once the environment, algorithm and policies are defined, running a RL training is easy
1. Create environment:

env = gym.make('stocks-v0', df=df, frame_bound=(5,250), window_size=5)

2. Encode the environment so that RL can test multiple environments in parallel

env_maker = lambda: env2
env = DummyVecEnv([env_maker])

3. Define the model

model = A2C('MlpLstmPolicy', env, verbose=1) 

4. Starting training

model.learn(total_timesteps=1000000)

Hyper parameter tuning:
1. Use different RL algorithms (A2C, POP, etc)
2. Use different policies (MLP, MLCNN, MLLSTM, etc)
3. Use different policy parameters

The most common hyperparameters to change are for A2C Feedforward policy

import gym
import tensorflow as tf

from stable_baselines import PPO2

# Custom MLP policy of two layers of size 32 each with tanh activation function
policy_kwargs = dict(act_fun=tf.nn.tanh, net_arch=[32, 32])
# Create the agent
model = PPO2("MlpPolicy", "CartPole-v1", policy_kwargs=policy_kwargs, verbose=1)
# Retrieve the environment
env = model.get_env()
# Train the agent
model.learn(total_timesteps=100000)
# Save the agent
model.save("ppo2-cartpole")

del model
# the policy_kwargs are automatically loaded
model = PPO2.load("ppo2-cartpole")

import gym

from stable_baselines.common.policies import FeedForwardPolicy, register_policy
from stable_baselines.common.vec_env import DummyVecEnv
from stable_baselines import A2C

# Custom MLP policy of three layers of size 128 each
class CustomPolicy(FeedForwardPolicy):
    def __init__(self, *args, **kwargs):
        super(CustomPolicy, self).__init__(*args, **kwargs,
                                           net_arch=[dict(pi=[128, 128, 128],
                                                          vf=[128, 128, 128])],
                                           feature_extraction="mlp")

# Create and wrap the environment
env = gym.make('LunarLander-v2')
env = DummyVecEnv([lambda: env])

model = A2C(CustomPolicy, env, verbose=1)
# Train the agent
model.learn(total_timesteps=100000)
# Save the agent
model.save("a2c-lunar")

del model
# When loading a model with a custom policy
# you MUST pass explicitly the policy when loading the saved model
model = A2C.load("a2c-lunar", policy=CustomPolicy)

The net_arch parameter of FeedForwardPolicy allows to specify the amount and size of the hidden layers and how many of them are shared between the policy network and the value network. It is assumed to be a list with the following structure:

  1. An arbitrary length (zero allowed) number of integers each specifying the number of units in a shared layer. If the number of ints is zero, there will be no shared layers.
  2. An optional dict, to specify the following non-shared layers for the value network and the policy network. It is formatted like dict(vf=[<value layer sizes>], pi=[<policy layer sizes>]). If it is missing any of the keys (pi or vf), no non-shared layers (empty list) is assumed.

In short: [<shared layers>, dict(vf=[<non-shared value network layers>], pi=[<non-shared policy network layers>])].

Examples

Two shared layers of size 128: net_arch=[128, 128]

          obs
           |
         <128>
           |
         <128>
   /               \
action            value

Value network deeper than policy network, first layer shared: net_arch=[128, dict(vf=[256, 256])]

          obs
           |
         <128>
   /               \
action             <256>
                     |
                   <256>
                     |
                   value

Initially shared then diverging: [128, dict(vf=[256], pi=[16])]

          obs
           |
         <128>
   /               \
 <16>             <256>
   |                |
action            value





Saturday, August 21, 2021

Deploy Python container on Kubernetes cluster on GCP

 Source

https://www.youtube.com/watch?v=GKuk-TBmNcI


  • Purpose

The goal of this blog is to show how a docker container can be deployed on a Kubernetes cluster. The motivation of deploying containers is as follows

Once you build a container, you may want to scale it up and down according to the changing demand. TO do that, Kubernetes has a Loadbalancer which distributes http requests over a set of Nodes.


Kubernetes architecture

A Kubernetes cluster is formed of nodes. A node can be either worker or master node



Within each Node there are multiple pods. A container is deployed on a pod and the load balancer distributes the requests across the nodes and pods



Build a container

  • On cloud shell create a folder and place the following files




  • Thats how the Dockerfile looks like




  • Build the container using gcloud command




  • Create a Kubernetes cluster using this command- Change the number of nodes as you wish




  • You can change the number of nodes by modifying the deployment.yaml file as follows


  • Then apply this as follows


  • To modify the ports of the loadbalancer edit the services.yaml file as follows



  • To get the IP to call type the following command




  • Just place this IP in the browser and the container will run













Loud fan of desktop

 Upon restart the fan of the desktop got loud again. I cleaned the desktop from the dust but it was still loud (Lower than the first sound) ...