Deploying Presto with Docker Compose¶
Prerequisites¶
- Git
- Docker
- Docker Compose
1. Set Up¶
1.1 Pull docker images¶
Pull the Presto images first to allow time for downloading.
For Presto Java coordinator and worker image:
docker pull prestodb/presto:latest
Download the right Presto C++ worker image based on your system architecture.
1. For aarch64
:
docker pull public.ecr.aws/oss-presto/presto-native:0.289-ubuntu-arm64
- For
x64
, pick the latest image tag from ECR, or use the following tag:docker pull public.ecr.aws/oss-presto/presto-native:0.291-20250127081606-85259c3
1.2 Clone prestorials repo¶
Clone the prestodb/prestorials
repository which contains the Docker Compose files used for deploying Presto.
git clone https://github.com/prestodb/prestorials.git
Then change directory into the cloned repository.
cd prestorials
2. Using Presto CLI¶
To run queries against a Presto cluster, confirm that the coordinator has started and run Presto CLI from the coordinator container:
docker exec -it coordinator sh -c "/opt/presto-cli <ARGS>"
Arguments to presto-cli
can be appended in place of <ARGS>
in the above command. Verify that a given catalog and
schema exist before trying to access its tables, to use hive
catalog and tpcds
schema for instance:
SHOW schemas in hive;
USE hive.tpcds;
SHOW tables;
3. Set Up TPC-H, TPC-DS Data¶
3.1 Generate data with Presto Java connectors¶
We will be using Presto Java's TPC-H and TPC-DS connectors to generate TPC-H and TPC-DS tables with scale factor of 1
.
Navigate to docker-compose/local-fs
, you can spinup a single node Presto Java cluster where the coordinator also acts
as a worker using docker compose file docker-compose-single-node.yaml
. If your system architecture is x64
, please
modify the platform
to linux/amd64
in the docker compose file docker-compose-single-node.yaml
. Certain TPC-H and
TPC-DS tables contain a large number of rows, so it is recommended to increase the container memory limit in
docker-compose-single-node.yaml
to 10GB
from the default value of 2GB
:
deploy:
resources:
limits:
memory: 10G
Spinup a single node Presto Java cluster using:
docker compose -v -f docker-compose-single-node.yaml up
Download the sql
files createHiveTpchTables.sql
,
createHiveTpcdsTables.sql
,
createIcebergTpchTables.sql
, and
createIcebergTpcdsTables.sql
, and copy them into the Presto server
container:
docker cp ./createHiveTpchTables.sql <container_id>:/.
docker cp ./createHiveTpcdsTables.sql <container_id>:/.
docker cp ./createIcebergTpchTables.sql <container_id>:/.
docker cp ./createIcebergTpcdsTables.sql <container_id>:/.
Using the CTAS queries from files createIcebergTpchTables.sql
and
createIcebergTpcdsTables.sql
, we will create the tpch
and tpcds
schemas
in iceberg
catalog and add tables in these schemas using the data generated by TPC-H and TPC-DS connectors. The data
will be generated in /home/iceberg_data
in parquet
file format and iceberg
table format:
docker exec -it coordinator sh -c "/opt/presto-cli -f ./createIcebergTpchTables.sql"
docker exec -it coordinator sh -c "/opt/presto-cli -f ./createIcebergTpcdsTables.sql"
Similarly, to generate TPC-H and TPC-DS data in parquet
file format and hive
table format, use the CTAS queries from
files createHiveTpchTables.sql
and
createHiveTpcdsTables.sql
. The data will be generated in /home/hive_data
:
docker exec -it coordinator sh -c "/opt/presto-cli -f ./createHiveTpchTables.sql"
docker exec -it coordinator sh -c "/opt/presto-cli -f ./createHiveTpcdsTables.sql"
The TPC-H and TPC-DS tables can now be queried with either hive
or iceberg
catalog using tpch
or tpcds
schema.
3.2 Prepare data¶
Copy data into directory prestorials/
from the presto_server
docker container:
cd prestorials/
mkdir data/
docker cp <container_id>:/home/hive_data ./data/.
docker cp <container_id>:/home/iceberg_data ./data/.
Alternatively, the TPC-H and TPC-DS data with parquet
file format and hive
table format can be downloaded from here.
If you downloaded data.tar
from this link, un-tar it and ensure the data
directory is present in prestorials/
. The
data.tar
file can then be deleted as it is no longer needed.
Copy the data
directory from prestorials/
into docker-compose/local-fs/
and into docker-compose-native/local-fs/
:
cp -r data docker-compose/local-fs/.
cp -r data docker-compose-native/local-fs/.
3. Deploying Presto¶
Now that setup is complete, we can run the docker compose file to deploy a Presto cluster and use Presto CLI to run queries.
3.1 Deploying Presto C++¶
Change into the docker-compose-native/local-fs
directory in prestorials
and run the docker compose
command to start the Presto C++ cluster. We specify the Docker Compose file with -f docker-compose.yaml
; based on
whether your system architecture is aarch64
or x64
, use either the docker compose file docker-compose-arm64.yaml
or docker-compose-amd64.yaml
respectively:
-
For
aarch64
:docker compose -v -f docker-compose-arm64.yaml up
-
For
x64
:docker compose -v -f docker-compose-amd64.yaml up
You should now see the logs of the Presto coordinator and worker starting up. The cluster is ready once the Presto coordinator's discovery service acknowledges requests from both the Presto C++ workers, wait for these logs:
coordinator | 2024-07-25T23:48:15.077Z INFO main com.facebook.presto.server.PrestoServer ======== SERVER STARTED ========
worker_2 | I0725 23:48:39.584002 8 PeriodicServiceInventoryManager.cpp:118] Announcement succeeded: HTTP 202. State: active.
worker_1 | I0725 23:48:41.484305 8 PeriodicServiceInventoryManager.cpp:118] Announcement succeeded: HTTP 202. State: active.
3.2 Deploying Presto Java¶
If you just finished Step 3.1 and have a Presto C++ cluster running, skip to the next section on running benchmarks with
pbench
. Return to this step after running pbench
and obtaining TPC-DS benchmark times for Presto C++.
Deploying Presto Java is very similar to Presto C++. We just use the Docker Compose file in docker-compose/local-fs
directory of prestorials
. Once again, based on whether your system architecture is aarch64
or x64
, use either the
docker compose file docker-compose-arm64.yaml
or docker-compose-amd64.yaml
respectively:
-
For
aarch64
:docker compose -v -f docker-compose-arm64.yaml up
-
For
x64
:docker compose -v -f docker-compose-amd64.yaml up
3.3 Iceberg schema evolution and time travel¶
We can query the TPC-H and TPC-DS iceberg
tables using Presto's iceberg
connector. First set the catalog to iceberg
and the schema to either tpcds
or tpch
. The iceberg
connector lets you modify the schema in-place, try it out with
these queries. The iceberg
connector also
supports time travel using table snapshots, try it out with these queries.