Scalability¶
Scaling a Presto Cluster¶
The purpose of the scaling a Presto cluster is to handle an increased workload for certain scenarios. Usually you need to scale up the Presto cluster because:
- Number of users increases
 - Number of queries increase
 - The volume of data increases
 
Some situations also indicate a need of scaling:
- Single point of failure
 - Bad worker state: out-of-memory errors occurs and corresponding queries fail, thus wasting resources.
 - Bad queiries: A SQL query takes too long to execute and starves the other queries for resources.
 
Vertical Scaling¶
One of the scaling approach is to increase the resources in a single server, i.e. CPU, memory, or storage size.
| Query type | Resources to increase | 
|---|---|
| Store large amount of data in memory | Memory | 
| Complex calculation/computation | CPU | 
- Memory:
- Need to allocate more memory to a Presto node. The 
-Xmxproperty in thejvm.configconfiguration. query.max-memory-per-nodein theconfig.properties
 - Need to allocate more memory to a Presto node. The 
 - CPU:
task.concurrencyin theconfig.properties: define the number of worker thread per node.
 
Availability¶
- Use Presto Router to manage the access to multiple Presto clusters
 - Set Up multiple coordinators - Disaggregated Coordinator
 
Horizontal Scaling¶
Deploy more workers.
- For instance base deployment, create or allocate a new instance, deploy Presto, add the worker node to the Presto cluster
 - For Kubernetes base deployment, increase the replicate number of worker deployment.
 - For cloud base deployment, check the service document and scale up.
 
Presto on Spark¶
- Great for large batch ETL workload because of the paralelism.
 - A robust and scalable execution platform
 - Get benefits from both Apache Spark and Presto