Thursday, May 14, 2020

Install Ubuntu and Windows along each other



Set up:

2 hard drives:
1. Toshiba: Windows installed ~128 Gb
2. ATA: Empty ~ 1 Tb


Burn an Ubuntu iso to a USB

In installation, you will have to define 3 partitions

1. Root partition
Size; As much as you wish
Use as: ext4 journaling file
Mount to /

2. Linux Swap
Size: Same as RAM
Use as: swap

3. EFI
Use as EFI
Size: ~280 Mb

I chose the empty drive to install Ubuntu (ATA)


After clicking next, you might have a problem 'Cannot install Grub2 this is a fatal error'

Source:  https://www.howtogeek.com/114884/how-to-repair-grub2-when-ubuntu-wont-boot/


Instructions to solve:
1. Reboot without USB plugged in
2. Ubuntu will open, in Terminal type

sudo apt-add-repository ppa:yannubuntu/boot-repair

sudo apt-get update

sudo apt-get install -y boot-repair

boot-repair


Follow instructions in the source link above.


Now Ubuntu should work through grub on the drive it is installed on (ATA).

To make Windows boot:
Source: https://h30434.www3.hp.com/t5/Notebook-Operating-System-and-Recovery/Changing-UEFI-mode-to-Legacy-mode/td-p/5650742

On startup:

hit F10 to open menu
Go to boot order
Disable UEFI and enable Legacy boot






Monday, May 11, 2020

Create identical conda environment on another machine

Source:
https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html



On the existing conda environment dump a requirements txt file


conda list --explicit > spec-file.txt


On the other machine, create an environment using the requirements file


conda create --name myenv --file spec-file.txt


To install on an existing environment (for e.g. after setting up Keras)


conda install --name myenv --file spec-file.txt



Folium for map displays

Sources:

https://www.kaggle.com/daveianhickey/how-to-folium-for-maps-heatmaps-time-data
https://nbviewer.jupyter.org/github/python-visualization/folium/blob/master/examples/Plugins.ipynb
https://alysivji.github.io/getting-started-with-folium.html



Create a function

def map_points(df, lat_col='latitude', lon_col='longitude', zoom_start=11, \
                plot_points=False, pt_radius=15, \
                draw_heatmap=False, heat_map_weights_col=None, \
                heat_map_weights_normalize=True, heat_map_radius=15, label_col_name = None,
              use_existing_map = False, existing_map = None):
    """Creates a map given a dataframe of points. Can also produce a heatmap overlay

    Arg:
        df: dataframe containing points to maps
        lat_col: Column containing latitude (string)
        lon_col: Column containing longitude (string)
        zoom_start: Integer representing the initial zoom of the map
        plot_points: Add points to map (boolean)
        pt_radius: Size of each point
        draw_heatmap: Add heatmap to map (boolean)
        heat_map_weights_col: Column containing heatmap weights
        heat_map_weights_normalize: Normalize heatmap weights (boolean)
        heat_map_radius: Size of heatmap point

    Returns:
        folium map object
    """

    ## center map in the middle of points center in
    middle_lat = df[lat_col].median()
    middle_lon = df[lon_col].median()

    if use_existing_map:
        curr_map = existing_map
    else:
        curr_map = folium.Map(location=[middle_lat, middle_lon],
                          zoom_start=zoom_start)

       

       
    # add points to map
    if plot_points:
        for _, row in df.iterrows():
            folium.CircleMarker([row[lat_col], row[lon_col]],
                                radius=pt_radius,
                                popup=row[label_col_name]
                                #fill_color="#3db7e4", # divvy color
                               ).add_to(curr_map)

    # add heatmap
    if draw_heatmap:
        # convert to (n, 2) or (n, 3) matrix format
        if heat_map_weights_col is None:
            cols_to_pull = [lat_col, lon_col]
        else:
            # if we have to normalize
            if heat_map_weights_normalize:
                df[heat_map_weights_col] = \
                    df[heat_map_weights_col] / df[heat_map_weights_col].sum()

            cols_to_pull = [lat_col, lon_col, heat_map_weights_col]

        stations = df[cols_to_pull].to_numpy()
        curr_map.add_child(plugins.HeatMap(stations, radius=heat_map_radius))

    return curr_map



2. Create a dataframe with lon, lat, popup display, weight for heatmap




mp = map_points(disp[['mstr_latitude','mstr_longitude','ticket_rate_in_peak','site_id','diff']], lat_col='mstr_latitude', lon_col='mstr_longitude', zoom_start=6, \
                plot_points=True, pt_radius=1, \
                draw_heatmap=False, heat_map_weights_col='diff', \
                heat_map_weights_normalize=False, heat_map_radius=15, label_col_name = 'ticket_rate_in_peak')

mp_1 = map_points(disp_heat.loc[:,['mstr_latitude','mstr_longitude','ticket_rate_in_peak','site_id','diff']], lat_col='mstr_latitude', lon_col='mstr_longitude', zoom_start=6, \
                plot_points=False, pt_radius=2, \
                draw_heatmap=True, heat_map_weights_col='ticket_rate_in_peak', \
                heat_map_weights_normalize=True, heat_map_radius=20, label_col_name = 'ticket_rate_in_peak', use_existing_map = True, existing_map = mp)

plugins.Fullscreen(
    position='topright',
    title='Expand me',
    title_cancel='Exit me',
    force_separate_button=True
).add_to(mp_1)

mp_1.save('C:/Users/mohamed.ibrahim/Google Drive/box/TAWAL/05_data/predmain/01_code/arc/SCECO_outage_raw_may_june.html')

mp_1


Tuesday, May 5, 2020

Remote Desktop XRDP [Connect to Ubuntu from Windows]

Sources:

https://medium.com/@vivekteega/how-to-setup-an-xrdp-server-on-ubuntu-18-04-89f7e205bd4e


You need to install 2 things
xrdp
display manager

Step 1- Install xRDP

sudo apt-get update
sudo apt-get install xrdp 

Step 2- Install your preferred desktop environment

# XFCEsudo apt-get install xfce4
#Optional stuff
sudo apt-get install xfce4-terminal
sudo apt-get install gnome-icon-theme-full tango-icon-theme
or
# MATEsudo apt-get install mate-core mate-desktop-environment mate-notification-daemon

Step 3- Tell xRDP to use your environment

# XFCEsudo sed -i.bak '/fi/a #xrdp multiple users configuration \n xfce-session \n' /etc/xrdp/startwm.sh
or
# MATEsudo sed -i.bak '/fi/a #xrdp multiple users configuration \n mate-session \n' /etc/xrdp/startwm.sh

Step 4- Firewall permission

# allow just RDP through the local firewall
sudo ufw allow 3389/tcp# restart xrdp 
sudo /etc/init.d/xrdp restart



Step 5 - Add configuration

echo xfce4-session >~/.xsession





Monday, May 4, 2020

Git and DVC ultimate guide




0. Install Git and DVC
https://git-scm.com/downloads
https://dvc.org/


1. Initialize Repo
In the root of the folder enter

git init
dvc init
git commit -m "Initial commit"


2. Add remote repositories

DVC

DVC remote repository

dvc remote add name_of_repo /path/to/repo

Setting this remote as the default for this project to avoid mentioning it in every dvc push

dvc remote default name_of_repo


Set the cache of data to be on the remote directory to avoid huge storage overhead in local folder

dvc cache dir /path/to/data_store


Git

Create bare repository in remote folder. Bare repository is the remote repository which does not accept commits and changes. typically that would be on gitub/gitlab. Here we are referring to the case where we want to have a remote repository somewhere on a server.

cd /path/to/repo/
git init --bare repo_origin.git

Go back to folder directory and add remote

cd /project/dir/
git remote add origin /path/to/repo/repo_origin.git
git commit -m "Configured remote"


3. Add data to version control

dvc add 02_data 03_imgs
git add .
git commit -m "Data versioning files added to git"
dvc push


4. Start working on  a branch to develop a feature

git checkout -b v1


5. Commit changes

dvc add 02_data 03_imgs
git add .
git commit -m "Did some changes to code and data"

6. Push to repo

dvc push
git push origin name_of_repo:master  # Pushes current branch to remote master


7. Merge branch to master after completing a feature
git checkout master
git merge v1



Bonus:
To clear some of the data in cache
dvc gc - -workspace






Full article


When working in a productive machine learning project you probably deal with a tone of data and several models. To keep track of which models were trained with which data, you should use a system to version the data, similar to versioning and tracking your code. One way to solve this problem is dvc (Data Version Control, https://dvc.org/), which approaches data versioning in a similar way to Git.
To illustrate the use of dvc in a machine learning context, we assume that our data is divided into train, test and validation folders by default, with the amount of data increasing over time either through an active learning cycle or by manually adding new data. An example could be the following structure, whereby the labels were omitted here for simplification purposes:
├── train
│    ├── image1.jpg
│    ├── image2.jpg
│    └── image3.jpg
├── val
│    └── image4.jpg
└──test
     └── image5.jpg
Normally, a minimal versioning system should have the following two capabilities:
  • Tag a new set of data with a new version e.g. vx.y.z
  • Return to old data versions or switch between different data versions very easy
Among other features, dvc is capable of doing these tasks. For this purpose it works closely together with Git. First you need to install dvc which can be done using pip
pip install dvc 
To start the versioning process you have to create a git repository in the base folder of your data and initialize dvc afterwards through
git init
dvc init
Through the init command dvc has now created a .dvc folder containing its cache in order to save differences between different data versions and the config file which stores meta information.
In the case you are wondering how git fits into this concept: The task of git in this case is not to version the data itself but to version the dvc files which save the meta informations of the version like the location of files corresponding to a special version or the information which file of your data belongs to the current data version.
In order for git to ignore the data itself dvc also automatically writes to the .gitignore file. To commit the config file of dvc and the .gitignore file we need to do a initial commit
git commit -m “Initial commit”
Each data version is associated with their own .dvc files which again are associated with one commit or one head of Git. The dvc files define and track the data for a given version whereby the dvc files themself are tracked by Git. For me a good way to associate a new data version with a head of Git is to make a new branch for a new data version. To do this, before we define our first version we create a new branch with the name of the version and checkout to this branch:
git checkout -b v0.0.1
Now we can define our first version by telling dvc which data should be tracked, which are in our case the train, val and test folders. This can be done by the dvc add command:
dvc add train test val
After that we now see new .dvc files for each folder like train.dvc inside our base folder. The folders themselves have been added to the .gitignore so that git doesn`t track the data itself which in our case is the task of dvc. In order to track the new .dvc files with Git we make the standard Git procedure for a commit with
git add .
git commit -m "Data versioning files added to Git"
Now we have created our first version of our data by having stored which data belongs to the version in our .dvc files and referenced the .dvc themself by the current commit. Please note that you can also connect the git to a remote git to save and version the .dvc files remotely. The data in this case stays in the current folder and is not stored remotely (this can be also changed using dvc push and pull).
We now have associated one state of our data with a version, but of course you don’t need data versioning for one fix data set. Therefore we now assume that two new images (image6.jpg and image7.jpg) are added to the train and test folders, so that the structure now looks like this:
├── train
│    ├── image1.jpg
│    ├── image2.jpg
│    ├── image3.jpg
│    └── image6.jpg
├── val
│    └── image5.jpg
└── test
     ├── image4.jpg  
     └── image7.jpg
In order to create a new data version we repeat the previous steps. We therefore create a new branch corresponding to the new data version
git checkout -b v0.0.2
As we already know, a new data version is always associated with their own .dvc files which store the meta information of the version. In order to update the .dvc files we need to tell dvc that it should track again the train and test folder as there is new data in these folders:
dvc add train test
The train.dvc and test.dvc files changed and dvc now tracks which files belongs to the current version. In order to track the new .dvc files inside the git branch we have to do a commit:
git add .
git commit -m "Data versioning files added to Git"
Now the cool part is coming. When checking your git branches you see two different branches (master excluded) where each branch corresponds to one data version:
master
v0.0.1
* v0.0.2
You are now able to get back to an older data version and update your data directory directly in order to recreate the old data version. In order to get back to the previous version we need to do two things. First we need to checkout to the corresponding head of the data version which is in this case the branch v0.0.1:
git checkout v0.0.1
In this head the .dvc files are different compared to v0.0.2 but out current data directory still looks the same and the data inside the directory still corresponds to v0.0.2. This is because dvc has not yet aligned the data directory with its .dvc files. To align your data directory to the correct data version, which again is persistent in the .dvc files, one need to perform the dvc checkout command:
dvc checkout
This command restores the old data version (in this case v0.0.1) using its cache. When you now look into your data repository you see again the following structure:
├── train
│    ├── image1.jpg
│    ├── image2.jpg
│    └── image3.jpg
├── val
│    └── image4.jpg
└──test
     └── image5.jpg
The files image6.jpg and image7.jpg were removed from the data directories and stored into the cache of dvc. You can now work with the old data version just as usual with the three folders.
This procedure also works for data versions containing a lot more data than currently persistent in the data folder as dvc stores differences of arbitrary size between different versions in its cache and can therefore recreate older or newer states of the data directories by its checkout command. The checkout is of course also possible in the other direction. You could checkout the git to branch v0.0.2 and perform a dvc checkout in order to set the data directory to the state of version v0.0.2.
Besides the init, add and checkout command dvc has a lot more features in order to make the machine learning/big data workflow more easy. For example can data versions be shared between multiple machines using a remote bucket like Amazon’s S3 Bucket and interacting with the bucket using dvc push and pull (for Details see. https://dvc.org/).
I hope this article can help to better organize the data in a Machine Learning project and to keep a better overview.
For more blog posts about Machine Learning, Data Science and Statistics checkout www.matthias-bitzer.de




Get weather data

Source:
https://www.freecodecamp.org/news/obtain-historical-weather-forecast-data-in-csv-format-using-python/


1. Sign up here
https://www.worldweatheronline.com/developer/signup.aspx


2. Get API_KEY



3. !pip install wwo_hist


4. Run for some dates and cities


from wwo_hist import retrieve_hist_data


frequency=24
start_date = '01-APR-2019'
end_date = '15-APR-2020'
api_key = '914b5f406fb34c4da0b131207200405'
#location_list = ['mecca','tabuk','al madinah','riyadh','asir','dammam']
location_list = ['al-madinah']

hist_weather_data = retrieve_hist_data(api_key,
                                location_list,
                                start_date,
                                end_date,
                                frequency,
                                location_label = False,
                                export_csv = True,
                                store_df = True)

Saturday, May 2, 2020

Sublime text with flake8 linter and black autoformatter



First install both libraries using pip from command line (Not inside anaconda!!)

Pip install flake8
Pip install black


Then in Sublime text, Press Ctrl + Shift + P, then open install packages

1. Install Sublack
2. Intall SublimeLinter-Flake8

Sometimes flake8 gives warning about what black does, you can install this to fix the problem

https://github.com/kaste/SublimeLinter-addon-black-for-flake

Usage of sublack

  • Run Black on the current file:
    Press Ctrl-Alt-B to format the entire file. You can also Ctrl-Shift-P (Mac: Cmd-Shift-P) and select Sublack: Format file.
  • Run Black with --diff:
    Press Ctrl-Alt-Shift-B will show diff in a new tab. You can also Ctrl-Shift-P (Mac: Cmd-Shift-P) and select Sublack: Diff file.
  • Toggle Black on save for current view :
    Press Ctrl-Shift-P (Mac: Cmd-Shift-P) and select Sublack: Toggle black on save for current view.
  • run Black Format All :
    Press Ctrl-Shift-P (Mac: Cmd-Shift-P) and select Sublack: Format All. Run black against each root folder in a standard way (without taking care of sublack options and configuration). Same thing as running black . being in the folder.
  • Start Blackd Server :
    Press Ctrl-Shift-P (Mac: Cmd-Shift-P) and select Sublack: Start BlackdServer.
  • Stop Blackd Server :
    Press Ctrl-Shift-P (Mac: Cmd-Shift-P) and select Sublack: Stop BlackdServer.

Blackd Mode

Sublack supports blackd. If option black_use_blackd is to true, Sublack will use blackd (and not black) according to the 'host' and 'port' configuration.
You can run blackd from SublimeText manually via Start Blackd Server command or automatically at SublimeText start via setting black_blackd_autostart to true.
Blackd server started via SublimeText can be stopped manually via the Stop Blackd Server command or automatically at sublime's exit.
Unlike "standalone" blackd, using sublack with blackd will continue to take care of the pyproject file.
Using standard mode ou blackd mode in sublack should always have the same result...or it's a bug :-)
Blackd is faster than Black.
Diff is always run with black.

Pre-commit integration

You can choose to run Black via pre-commit by setting black_use_precommit to true. Sublack settings will be ignored.

Settings

Sublack will always look for settings in the following order:
  • First in a pyproject.toml file
  • Second in the project file: first with sublack prefix then in a subsetting (see Project settings).
  • Then in Users global settings
  • Finally in Sublack's default settings

Global settings

Preferences -> Package Settings -> sublack -> settings :
Black specifics options
  • black_line_length:
    Set custom line length option used by Black. Default = null which lets black default.
  • black_fast:
    Black fast mode. default is false.
  • black_skip_string_normalization:
    Don't normalize string quotes or prefixes. Default = false.
  • black_py36[Deprecated]:
    Force use of python 3.6 only syntax. The default is Black-s default.
  • black_target_version:
    Python versions that should be supported by Black's output. You should enter it as a list ex : ["py37"]
Sublack specifics options
  • black_command:
    Set custom location. Default = "black".
  • black_on_save:
    Black is always run before saving the file. Default = false.
  • black_log:
    Show non error messages in console. Default = info.
  • black_default_encoding:
    Should not be changed. Only needed on some OSX platforms.
  • black_use_blackd:
    Use blackd instead of black. Default = false.
  • black_blackd_server_host:
    default = "localhost",
  • black_blackd_port:
    default = "45484"
  • black_blackd_autostart:
    Automatically run blackd in the background wen sublime starts. default is false.
  • black_use_precommit:
    run black via pre-commit hook.
  • black_confirm_formatall:
    Popup confirmation dialog before format_all command. default = true.

Project settings

Just add sublack as prefix (recommended):
{
    "settings": {
        "sublack.black_on_save": true
    }
}
A sublack subsettings is still possible:

{
    "settings": {
        "sublack": {
            "black_on_save": true
        }
    }
}

Loud fan of desktop

 Upon restart the fan of the desktop got loud again. I cleaned the desktop from the dust but it was still loud (Lower than the first sound) ...