Sunday, April 28, 2019

Spark: single machine vs cluster




Spark can run in one of two modes:

Standalone : Spark itself manages the cluster of computers. This is the choice if we start from scracth
Cluster: Spark is installed on an existing cluster manager (YARN or mesos).


Multiple processing is achieved when Spark runs multiple processing threads.
On a single machine each core is regared as a thread



Benchmarking pandas and pyspark on a single machine

https://databricks.com/blog/2018/05/03/benchmarking-apache-spark-on-a-single-node-machine.html

No comments:

Post a Comment

Loud fan of desktop

 Upon restart the fan of the desktop got loud again. I cleaned the desktop from the dust but it was still loud (Lower than the first sound) ...