Sunday, April 28, 2019

Training on large data

There are 3 ways to deal with large data files:


1.AWS: Use a large machine to process data which is stored on s3. Only h2o models can be saved to binary files and uploaded to the s3 server


2. Read chunks of data one at a time and train the algorithm using  the checkpoint feature of h2o. With h2o, the gbm should have one extra tree every iteration, and for deeplearning extra epochs.
Reading chunk by chunk: https://www.youtube.com/watch?v=Z5rMrI1e4kM


3. Vowpal Wabbit. Offers online learning.
Installation as on github on an ubuntu machine: https://github.com/JohnLangford/vowpal_wabbit/wiki/Tutorial
Commands:
git clone git://github.com/JohnLangford/vowpal_wabbit.git
OR
git clone https://github.com/JohnLangford/vowpal_wabbit.git
cd vowpal_wabbit
sudo apt-get install libboost-program-options-dev libboost-python-dev
sudo apt-get install zlib1g-dev
sudo apt-get install libboost1.48-all-dev


make vw
make library_example
make test
VW requires a special format file. To convert csv to special format: https://www.youtube.com/watch?v=ee6T9ytzjyU&t=1s
https://www.auduno.com/2014/08/29/some-nice-ml-libraries/


Useful links

http://www.zinkov.com/posts/2013-08-13-vowpal-tutorial/

No comments:

Post a Comment

Loud fan of desktop

 Upon restart the fan of the desktop got loud again. I cleaned the desktop from the dust but it was still loud (Lower than the first sound) ...