There are 3 ways to deal with large data files:
2. Read chunks of data one at a time and train the algorithm using the checkpoint feature of h2o. With h2o, the gbm should have one extra tree every iteration, and for deeplearning extra epochs.
Reading chunk by chunk: https://www.youtube.com/watch?v=Z5rMrI1e4kM
Reading chunk by chunk: https://www.youtube.com/watch?v=Z5rMrI1e4kM
H2o check point: http://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/algo-params/checkpoint.html
3. Vowpal Wabbit. Offers online learning.
Installation as on github on an ubuntu machine: https://github.com/JohnLangford/vowpal_wabbit/wiki/Tutorial
Installation as on github on an ubuntu machine: https://github.com/JohnLangford/vowpal_wabbit/wiki/Tutorial
Commands:
git clone git://github.com/JohnLangford/vowpal_wabbit.git
OR
git clone https://github.com/JohnLangford/vowpal_wabbit.git
cd vowpal_wabbit
sudo apt-get install libboost-program-options-dev libboost-python-dev
sudo apt-get install zlib1g-dev
sudo apt-get install libboost1.48-all-dev
make vw
make library_example
make test
VW requires a special format file. To convert csv to special format: https://www.youtube.com/watch?v=ee6T9ytzjyU&t=1s
https://www.auduno.com/2014/08/29/some-nice-ml-libraries/
Useful links
http://www.zinkov.com/posts/2013-08-13-vowpal-tutorial/
No comments:
Post a Comment