Deploying Presto with Docker Compose¶
Prerequisites¶
- Git
- Docker
- Docker Compose
1. Set Up¶
1.1 Pull docker images¶
Pull the Presto images first to allow time for downloading.
For Presto Java coordinator and worker image:
docker pull prestodb/presto:latest
Download the right Presto C++ worker image based on your system architecture.
1. For aarch64:
docker pull public.ecr.aws/oss-presto/presto-native:0.289-ubuntu-arm64
- For
x64, pick the latest image tag from ECR, or use the following tag:docker pull public.ecr.aws/oss-presto/presto-native:0.291-20250127081606-85259c3
1.2 Clone prestorials repo¶
Clone the prestodb/prestorials repository which contains the Docker Compose files used for deploying Presto.
git clone https://github.com/prestodb/prestorials.git
Then change directory into the cloned repository.
cd prestorials
2. Using Presto CLI¶
To run queries against a Presto cluster, confirm that the coordinator has started and run Presto CLI from the coordinator container:
docker exec -it coordinator sh -c "/opt/presto-cli <ARGS>"
Arguments to presto-cli can be appended in place of <ARGS> in the above command. Verify that a given catalog and
schema exist before trying to access its tables, to use hive catalog and tpcds schema for instance:
SHOW schemas in hive;
USE hive.tpcds;
SHOW tables;
3. Set Up TPC-H, TPC-DS Data¶
3.1 Generate data with Presto Java connectors¶
We will be using Presto Java's TPC-H and TPC-DS connectors to generate TPC-H and TPC-DS tables with scale factor of 1.
Navigate to docker-compose/local-fs, you can spinup a single node Presto Java cluster where the coordinator also acts
as a worker using docker compose file docker-compose-single-node.yaml. If your system architecture is x64, please
modify the platform to linux/amd64 in the docker compose file docker-compose-single-node.yaml. Certain TPC-H and
TPC-DS tables contain a large number of rows, so it is recommended to increase the container memory limit in
docker-compose-single-node.yaml to 10GB from the default value of 2GB:
deploy:
resources:
limits:
memory: 10G
Spinup a single node Presto Java cluster using:
docker compose -v -f docker-compose-single-node.yaml up
Download the sql files createHiveTpchTables.sql,
createHiveTpcdsTables.sql,
createIcebergTpchTables.sql, and
createIcebergTpcdsTables.sql, and copy them into the Presto server
container:
docker cp ./createHiveTpchTables.sql <container_id>:/.
docker cp ./createHiveTpcdsTables.sql <container_id>:/.
docker cp ./createIcebergTpchTables.sql <container_id>:/.
docker cp ./createIcebergTpcdsTables.sql <container_id>:/.
Using the CTAS queries from files createIcebergTpchTables.sql and
createIcebergTpcdsTables.sql, we will create the tpch and tpcds schemas
in iceberg catalog and add tables in these schemas using the data generated by TPC-H and TPC-DS connectors. The data
will be generated in /home/iceberg_data in parquet file format and iceberg table format:
docker exec -it coordinator sh -c "/opt/presto-cli -f ./createIcebergTpchTables.sql"
docker exec -it coordinator sh -c "/opt/presto-cli -f ./createIcebergTpcdsTables.sql"
Similarly, to generate TPC-H and TPC-DS data in parquet file format and hive table format, use the CTAS queries from
files createHiveTpchTables.sql and
createHiveTpcdsTables.sql. The data will be generated in /home/hive_data:
docker exec -it coordinator sh -c "/opt/presto-cli -f ./createHiveTpchTables.sql"
docker exec -it coordinator sh -c "/opt/presto-cli -f ./createHiveTpcdsTables.sql"
The TPC-H and TPC-DS tables can now be queried with either hive or iceberg catalog using tpch or tpcds schema.
3.2 Prepare data¶
Copy data into directory prestorials/ from the presto_server docker container:
cd prestorials/
mkdir data/
docker cp <container_id>:/home/hive_data ./data/.
docker cp <container_id>:/home/iceberg_data ./data/.
Alternatively, the TPC-H and TPC-DS data with parquet file format and hive table format can be downloaded from here.
If you downloaded data.tar from this link, un-tar it and ensure the data directory is present in prestorials/. The
data.tar file can then be deleted as it is no longer needed.
Copy the data directory from prestorials/ into docker-compose/local-fs/ and into docker-compose-native/local-fs/:
cp -r data docker-compose/local-fs/.
cp -r data docker-compose-native/local-fs/.
3. Deploying Presto¶
Now that setup is complete, we can run the docker compose file to deploy a Presto cluster and use Presto CLI to run queries.
3.1 Deploying Presto C++¶
Change into the docker-compose-native/local-fs directory in prestorials and run the docker compose
command to start the Presto C++ cluster. We specify the Docker Compose file with -f docker-compose.yaml; based on
whether your system architecture is aarch64 or x64, use either the docker compose file docker-compose-arm64.yaml
or docker-compose-amd64.yaml respectively:
-
For
aarch64:docker compose -v -f docker-compose-arm64.yaml up -
For
x64:docker compose -v -f docker-compose-amd64.yaml up
You should now see the logs of the Presto coordinator and worker starting up. The cluster is ready once the Presto coordinator's discovery service acknowledges requests from both the Presto C++ workers, wait for these logs:
coordinator | 2024-07-25T23:48:15.077Z INFO main com.facebook.presto.server.PrestoServer ======== SERVER STARTED ========
worker_2 | I0725 23:48:39.584002 8 PeriodicServiceInventoryManager.cpp:118] Announcement succeeded: HTTP 202. State: active.
worker_1 | I0725 23:48:41.484305 8 PeriodicServiceInventoryManager.cpp:118] Announcement succeeded: HTTP 202. State: active.
3.2 Deploying Presto Java¶
If you just finished Step 3.1 and have a Presto C++ cluster running, skip to the next section on running benchmarks with
pbench. Return to this step after running pbench and obtaining TPC-DS benchmark times for Presto C++.
Deploying Presto Java is very similar to Presto C++. We just use the Docker Compose file in docker-compose/local-fs
directory of prestorials. Once again, based on whether your system architecture is aarch64 or x64, use either the
docker compose file docker-compose-arm64.yaml or docker-compose-amd64.yaml respectively:
-
For
aarch64:docker compose -v -f docker-compose-arm64.yaml up -
For
x64:docker compose -v -f docker-compose-amd64.yaml up
3.3 Iceberg schema evolution and time travel¶
We can query the TPC-H and TPC-DS iceberg tables using Presto's iceberg connector. First set the catalog to iceberg
and the schema to either tpcds or tpch. The iceberg connector lets you modify the schema in-place, try it out with
these queries. The iceberg connector also
supports time travel using table snapshots, try it out with these queries.