ابو الفضل

Thursday, May 14, 2020

Install Ubuntu and Windows along each other

Set up:

2 hard drives:
1. Toshiba: Windows installed ~128 Gb
2. ATA: Empty ~ 1 Tb

Burn an Ubuntu iso to a USB

In installation, you will have to define 3 partitions

1. Root partition
Size; As much as you wish
Use as: ext4 journaling file
Mount to /

2. Linux Swap
Size: Same as RAM
Use as: swap

3. EFI
Use as EFI
Size: ~280 Mb

I chose the empty drive to install Ubuntu (ATA)

After clicking next, you might have a problem 'Cannot install Grub2 this is a fatal error'

Source: https://www.howtogeek.com/114884/how-to-repair-grub2-when-ubuntu-wont-boot/

Instructions to solve:
1. Reboot without USB plugged in
2. Ubuntu will open, in Terminal type

sudo apt-add-repository ppa:yannubuntu/boot-repair

sudo apt-get update

sudo apt-get install -y boot-repair

boot-repair

Follow instructions in the source link above.

Now Ubuntu should work through grub on the drive it is installed on (ATA).

To make Windows boot:
Source: https://h30434.www3.hp.com/t5/Notebook-Operating-System-and-Recovery/Changing-UEFI-mode-to-Legacy-mode/td-p/5650742

On startup:

hit F10 to open menu
Go to boot order
Disable UEFI and enable Legacy boot

Monday, May 11, 2020

Create identical conda environment on another machine

Source:
https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html

On the existing conda environment dump a requirements txt file

conda list --explicit > spec-file.txt

On the other machine, create an environment using the requirements file

conda create --name myenv --file spec-file.txt

To install on an existing environment (for e.g. after setting up Keras)

conda install --name myenv --file spec-file.txt

Folium for map displays

Sources:

https://www.kaggle.com/daveianhickey/how-to-folium-for-maps-heatmaps-time-data
https://nbviewer.jupyter.org/github/python-visualization/folium/blob/master/examples/Plugins.ipynb
https://alysivji.github.io/getting-started-with-folium.html

Create a function

def map_points(df, lat_col='latitude', lon_col='longitude', zoom_start=11, \
plot_points=False, pt_radius=15, \
draw_heatmap=False, heat_map_weights_col=None, \
heat_map_weights_normalize=True, heat_map_radius=15, label_col_name = None,
use_existing_map = False, existing_map = None):
"""Creates a map given a dataframe of points. Can also produce a heatmap overlay

Arg:
df: dataframe containing points to maps
lat_col: Column containing latitude (string)
lon_col: Column containing longitude (string)
zoom_start: Integer representing the initial zoom of the map
plot_points: Add points to map (boolean)
pt_radius: Size of each point
draw_heatmap: Add heatmap to map (boolean)
heat_map_weights_col: Column containing heatmap weights
heat_map_weights_normalize: Normalize heatmap weights (boolean)
heat_map_radius: Size of heatmap point

Returns:
folium map object
"""

## center map in the middle of points center in
middle_lat = df[lat_col].median()
middle_lon = df[lon_col].median()

if use_existing_map:
curr_map = existing_map
else:
curr_map = folium.Map(location=[middle_lat, middle_lon],
zoom_start=zoom_start)

# add points to map
if plot_points:
for _, row in df.iterrows():
folium.CircleMarker([row[lat_col], row[lon_col]],
radius=pt_radius,
popup=row[label_col_name]
#fill_color="#3db7e4", # divvy color
).add_to(curr_map)

# add heatmap
if draw_heatmap:
# convert to (n, 2) or (n, 3) matrix format
if heat_map_weights_col is None:
cols_to_pull = [lat_col, lon_col]
else:
# if we have to normalize
if heat_map_weights_normalize:
df[heat_map_weights_col] = \
df[heat_map_weights_col] / df[heat_map_weights_col].sum()

cols_to_pull = [lat_col, lon_col, heat_map_weights_col]

stations = df[cols_to_pull].to_numpy()
curr_map.add_child(plugins.HeatMap(stations, radius=heat_map_radius))

return curr_map

2. Create a dataframe with lon, lat, popup display, weight for heatmap

mp = map_points(disp[['mstr_latitude','mstr_longitude','ticket_rate_in_peak','site_id','diff']], lat_col='mstr_latitude', lon_col='mstr_longitude', zoom_start=6, \
plot_points=True, pt_radius=1, \
draw_heatmap=False, heat_map_weights_col='diff', \
heat_map_weights_normalize=False, heat_map_radius=15, label_col_name = 'ticket_rate_in_peak')

mp_1 = map_points(disp_heat.loc[:,['mstr_latitude','mstr_longitude','ticket_rate_in_peak','site_id','diff']], lat_col='mstr_latitude', lon_col='mstr_longitude', zoom_start=6, \
plot_points=False, pt_radius=2, \
draw_heatmap=True, heat_map_weights_col='ticket_rate_in_peak', \
heat_map_weights_normalize=True, heat_map_radius=20, label_col_name = 'ticket_rate_in_peak', use_existing_map = True, existing_map = mp)

plugins.Fullscreen(
position='topright',
title='Expand me',
title_cancel='Exit me',
force_separate_button=True
).add_to(mp_1)

mp_1.save('C:/Users/mohamed.ibrahim/Google Drive/box/TAWAL/05_data/predmain/01_code/arc/SCECO_outage_raw_may_june.html')

mp_1

Tuesday, May 5, 2020

Remote Desktop XRDP [Connect to Ubuntu from Windows]

Sources:

https://medium.com/@vivekteega/how-to-setup-an-xrdp-server-on-ubuntu-18-04-89f7e205bd4e

You need to install 2 things
xrdp
display manager

Step 1- Install xRDP

sudo apt-get update
sudo apt-get install xrdp

Step 2- Install your preferred desktop environment

# XFCEsudo apt-get install xfce4
#Optional stuff
sudo apt-get install xfce4-terminal
sudo apt-get install gnome-icon-theme-full tango-icon-theme

# MATEsudo apt-get install mate-core mate-desktop-environment mate-notification-daemon

Step 3- Tell xRDP to use your environment

# XFCEsudo sed -i.bak '/fi/a #xrdp multiple users configuration \n xfce-session \n' /etc/xrdp/startwm.sh

# MATEsudo sed -i.bak '/fi/a #xrdp multiple users configuration \n mate-session \n' /etc/xrdp/startwm.sh

Step 4- Firewall permission

# allow just RDP through the local firewall
sudo ufw allow 3389/tcp# restart xrdp 
sudo /etc/init.d/xrdp restart

Step 5 - Add configuration

echo xfce4-session >~/.xsession

Monday, May 4, 2020

Git and DVC ultimate guide

Partially Based on https://link.medium.com/uQMhEpPL95

0. Install Git and DVC
https://git-scm.com/downloads
https://dvc.org/

1. Initialize Repo
In the root of the folder enter

git init
dvc init
git commit -m "Initial commit"

2. Add remote repositories

DVC

DVC remote repository

dvc remote add name_of_repo /path/to/repo

Setting this remote as the default for this project to avoid mentioning it in every dvc push

dvc remote default name_of_repo

Set the cache of data to be on the remote directory to avoid huge storage overhead in local folder

dvc cache dir /path/to/data_store

Git

Create bare repository in remote folder. Bare repository is the remote repository which does not accept commits and changes. typically that would be on gitub/gitlab. Here we are referring to the case where we want to have a remote repository somewhere on a server.

cd /path/to/repo/
git init --bare repo_origin.git

Go back to folder directory and add remote

cd /project/dir/
git remote add origin /path/to/repo/repo_origin.git
git commit -m "Configured remote"

3. Add data to version control

dvc add 02_data 03_imgs
git add .
git commit -m "Data versioning files added to git"
dvc push

4. Start working on a branch to develop a feature

git checkout -b v1

5. Commit changes

dvc add 02_data 03_imgs
git add .
git commit -m "Did some changes to code and data"

6. Push to repo

dvc push
git push origin name_of_repo:master # Pushes current branch to remote master

7. Merge branch to master after completing a feature
git checkout master
git merge v1

Bonus:
To clear some of the data in cache
dvc gc - -workspace

Full article

When working in a productive machine learning project you probably deal with a tone of data and several models. To keep track of which models were trained with which data, you should use a system to version the data, similar to versioning and tracking your code. One way to solve this problem is dvc (Data Version Control, https://dvc.org/), which approaches data versioning in a similar way to Git.

To illustrate the use of dvc in a machine learning context, we assume that our data is divided into train, test and validation folders by default, with the amount of data increasing over time either through an active learning cycle or by manually adding new data. An example could be the following structure, whereby the labels were omitted here for simplification purposes:

├── train
│    ├── image1.jpg
│    ├── image2.jpg
│    └── image3.jpg
├── val
│    └── image4.jpg
└──test
     └── image5.jpg

Normally, a minimal versioning system should have the following two capabilities:

Tag a new set of data with a new version e.g. vx.y.z
Return to old data versions or switch between different data versions very easy

Among other features, dvc is capable of doing these tasks. For this purpose it works closely together with Git. First you need to install dvc which can be done using pip

pip install dvc

To start the versioning process you have to create a git repository in the base folder of your data and initialize dvc afterwards through

git init
dvc init

Through the init command dvc has now created a .dvc folder containing its cache in order to save differences between different data versions and the config file which stores meta information.

In the case you are wondering how git fits into this concept: The task of git in this case is not to version the data itself but to version the dvc files which save the meta informations of the version like the location of files corresponding to a special version or the information which file of your data belongs to the current data version.

In order for git to ignore the data itself dvc also automatically writes to the .gitignore file. To commit the config file of dvc and the .gitignore file we need to do a initial commit

git commit -m “Initial commit”

Each data version is associated with their own .dvc files which again are associated with one commit or one head of Git. The dvc files define and track the data for a given version whereby the dvc files themself are tracked by Git. For me a good way to associate a new data version with a head of Git is to make a new branch for a new data version. To do this, before we define our first version we create a new branch with the name of the version and checkout to this branch:

git checkout -b v0.0.1

Now we can define our first version by telling dvc which data should be tracked, which are in our case the train, val and test folders. This can be done by the dvc add command:

dvc add train test val

After that we now see new .dvc files for each folder like train.dvc inside our base folder. The folders themselves have been added to the .gitignore so that git doesn`t track the data itself which in our case is the task of dvc. In order to track the new .dvc files with Git we make the standard Git procedure for a commit with

git add .
git commit -m "Data versioning files added to Git"

Now we have created our first version of our data by having stored which data belongs to the version in our .dvc files and referenced the .dvc themself by the current commit. Please note that you can also connect the git to a remote git to save and version the .dvc files remotely. The data in this case stays in the current folder and is not stored remotely (this can be also changed using dvc push and pull).

We now have associated one state of our data with a version, but of course you don’t need data versioning for one fix data set. Therefore we now assume that two new images (image6.jpg and image7.jpg) are added to the train and test folders, so that the structure now looks like this:

├── train
│    ├── image1.jpg
│    ├── image2.jpg
│    ├── image3.jpg
│    └── image6.jpg
├── val
│    └── image5.jpg
└── test
     ├── image4.jpg  
     └── image7.jpg

In order to create a new data version we repeat the previous steps. We therefore create a new branch corresponding to the new data version

git checkout -b v0.0.2

As we already know, a new data version is always associated with their own .dvc files which store the meta information of the version. In order to update the .dvc files we need to tell dvc that it should track again the train and test folder as there is new data in these folders:

dvc add train test

The train.dvc and test.dvc files changed and dvc now tracks which files belongs to the current version. In order to track the new .dvc files inside the git branch we have to do a commit:

git add .
git commit -m "Data versioning files added to Git"

Now the cool part is coming. When checking your git branches you see two different branches (master excluded) where each branch corresponds to one data version:

master
v0.0.1
* v0.0.2

You are now able to get back to an older data version and update your data directory directly in order to recreate the old data version. In order to get back to the previous version we need to do two things. First we need to checkout to the corresponding head of the data version which is in this case the branch v0.0.1:

git checkout v0.0.1

In this head the .dvc files are different compared to v0.0.2 but out current data directory still looks the same and the data inside the directory still corresponds to v0.0.2. This is because dvc has not yet aligned the data directory with its .dvc files. To align your data directory to the correct data version, which again is persistent in the .dvc files, one need to perform the dvc checkout command:

dvc checkout

This command restores the old data version (in this case v0.0.1) using its cache. When you now look into your data repository you see again the following structure:

├── train
│    ├── image1.jpg
│    ├── image2.jpg
│    └── image3.jpg
├── val
│    └── image4.jpg
└──test
     └── image5.jpg

The files image6.jpg and image7.jpg were removed from the data directories and stored into the cache of dvc. You can now work with the old data version just as usual with the three folders.

This procedure also works for data versions containing a lot more data than currently persistent in the data folder as dvc stores differences of arbitrary size between different versions in its cache and can therefore recreate older or newer states of the data directories by its checkout command. The checkout is of course also possible in the other direction. You could checkout the git to branch v0.0.2 and perform a dvc checkout in order to set the data directory to the state of version v0.0.2.

Besides the init, add and checkout command dvc has a lot more features in order to make the machine learning/big data workflow more easy. For example can data versions be shared between multiple machines using a remote bucket like Amazon’s S3 Bucket and interacting with the bucket using dvc push and pull (for Details see. https://dvc.org/).

I hope this article can help to better organize the data in a Machine Learning project and to keep a better overview.

For more blog posts about Machine Learning, Data Science and Statistics checkout www.matthias-bitzer.de

Get weather data

Source:
https://www.freecodecamp.org/news/obtain-historical-weather-forecast-data-in-csv-format-using-python/

1. Sign up here
https://www.worldweatheronline.com/developer/signup.aspx

2. Get API_KEY

3. !pip install wwo_hist

4. Run for some dates and cities

from wwo_hist import retrieve_hist_data

frequency=24
start_date = '01-APR-2019'
end_date = '15-APR-2020'
api_key = '914b5f406fb34c4da0b131207200405'
#location_list = ['mecca','tabuk','al madinah','riyadh','asir','dammam']
location_list = ['al-madinah']

hist_weather_data = retrieve_hist_data(api_key,
location_list,
start_date,
end_date,
frequency,
location_label = False,
export_csv = True,
store_df = True)

Saturday, May 2, 2020

Sublime text with flake8 linter and black autoformatter

First install both libraries using pip from command line (Not inside anaconda!!)

Pip install flake8
Pip install black

Then in Sublime text, Press Ctrl + Shift + P, then open install packages

1. Install Sublack
2. Intall SublimeLinter-Flake8

Sometimes flake8 gives warning about what black does, you can install this to fix the problem

https://github.com/kaste/SublimeLinter-addon-black-for-flake

Usage of sublack

Run Black on the current file:

Press Ctrl-Alt-B to format the entire file. You can also Ctrl-Shift-P (Mac: Cmd-Shift-P) and select Sublack: Format file.
Run Black with --diff:

Press Ctrl-Alt-Shift-B will show diff in a new tab. You can also Ctrl-Shift-P (Mac: Cmd-Shift-P) and select Sublack: Diff file.
Toggle Black on save for current view :

Press Ctrl-Shift-P (Mac: Cmd-Shift-P) and select Sublack: Toggle black on save for current view.
run Black Format All :

Press Ctrl-Shift-P (Mac: Cmd-Shift-P) and select Sublack: Format All. Run black against each root folder in a standard way (without taking care of sublack options and configuration). Same thing as running black . being in the folder.
Start Blackd Server :

Press Ctrl-Shift-P (Mac: Cmd-Shift-P) and select Sublack: Start BlackdServer.
Stop Blackd Server :

Press Ctrl-Shift-P (Mac: Cmd-Shift-P) and select Sublack: Stop BlackdServer.

Blackd Mode

Sublack supports blackd. If option black_use_blackd is to true, Sublack will use blackd (and not black) according to the 'host' and 'port' configuration.

You can run blackd from SublimeText manually via Start Blackd Server command or automatically at SublimeText start via setting black_blackd_autostart to true.

Blackd server started via SublimeText can be stopped manually via the Stop Blackd Server command or automatically at sublime's exit.

Unlike "standalone" blackd, using sublack with blackd will continue to take care of the pyproject file.

Using standard mode ou blackd mode in sublack should always have the same result...or it's a bug :-)

Blackd is faster than Black.

Diff is always run with black.

Pre-commit integration

You can choose to run Black via pre-commit by setting black_use_precommit to true. Sublack settings will be ignored.

Settings

Sublack will always look for settings in the following order:

First in a pyproject.toml file
Second in the project file: first with sublack prefix then in a subsetting (see Project settings).
Then in Users global settings
Finally in Sublack's default settings

Global settings

Preferences -> Package Settings -> sublack -> settings :

Black specifics options

black_line_length:

Set custom line length option used by Black. Default = null which lets black default.
black_fast:

Black fast mode. default is false.
black_skip_string_normalization:

Don't normalize string quotes or prefixes. Default = false.
black_py36[Deprecated]:

Force use of python 3.6 only syntax. The default is Black-s default.
black_target_version:

Python versions that should be supported by Black's output. You should enter it as a list ex : ["py37"]

Sublack specifics options

black_command:

Set custom location. Default = "black".
black_on_save:

Black is always run before saving the file. Default = false.
black_log:

Show non error messages in console. Default = info.
black_default_encoding:

Should not be changed. Only needed on some OSX platforms.
black_use_blackd:

Use blackd instead of black. Default = false.
black_blackd_server_host:

default = "localhost",
black_blackd_port:

default = "45484"
black_blackd_autostart:

Automatically run blackd in the background wen sublime starts. default is false.
black_use_precommit:

run black via pre-commit hook.
black_confirm_formatall:

Popup confirmation dialog before format_all command. default = true.

Project settings

Just add sublack as prefix (recommended):

{
    "settings": {
        "sublack.black_on_save": true
    }
}

A sublack subsettings is still possible:

{
    "settings": {
        "sublack": {
            "black_on_save": true
        }
    }
}