Spark can run in one of two modes:
Standalone : Spark itself manages the cluster of computers. This is the choice if we start from scracth
Cluster: Spark is installed on an existing cluster manager (YARN or mesos).
Multiple processing is achieved when Spark runs multiple processing threads.
On a single machine each core is regared as a thread
Benchmarking pandas and pyspark on a single machine
https://databricks.com/blog/
No comments:
Post a Comment