Skip to content

Deploying Presto with Docker Compose

Prerequisites

  • Git
  • Docker
  • Docker Compose

1. Set Up

1.1 Pull docker images

Pull the Presto images first to allow time for downloading.

For Presto Java coordinator and worker image:

docker pull prestodb/presto:latest

Download the right Presto C++ worker image based on your system architecture. 1. For aarch64:

docker pull public.ecr.aws/oss-presto/presto-native:0.289-ubuntu-arm64

  1. For x64, pick the latest image tag from ECR, or use the following tag:
    docker pull public.ecr.aws/oss-presto/presto-native:0.291-20250127081606-85259c3
    

1.2 Clone prestorials repo

Clone the prestodb/prestorials repository which contains the Docker Compose files used for deploying Presto.

git clone https://github.com/prestodb/prestorials.git

Then change directory into the cloned repository.

cd prestorials

2. Using Presto CLI

To run queries against a Presto cluster, confirm that the coordinator has started and run Presto CLI from the coordinator container:

docker exec -it coordinator sh -c "/opt/presto-cli <ARGS>" 

Arguments to presto-cli can be appended in place of <ARGS> in the above command. Verify that a given catalog and schema exist before trying to access its tables, to use hive catalog and tpcds schema for instance:

SHOW schemas in hive;
USE hive.tpcds;
SHOW tables;

3. Set Up TPC-H, TPC-DS Data

3.1 Generate data with Presto Java connectors

We will be using Presto Java's TPC-H and TPC-DS connectors to generate TPC-H and TPC-DS tables with scale factor of 1. Navigate to docker-compose/local-fs, you can spinup a single node Presto Java cluster where the coordinator also acts as a worker using docker compose file docker-compose-single-node.yaml. If your system architecture is x64, please modify the platform to linux/amd64 in the docker compose file docker-compose-single-node.yaml. Certain TPC-H and TPC-DS tables contain a large number of rows, so it is recommended to increase the container memory limit in docker-compose-single-node.yaml to 10GB from the default value of 2GB:

    deploy:
      resources:
        limits:
          memory: 10G

Spinup a single node Presto Java cluster using:

docker compose -v -f docker-compose-single-node.yaml up

Download the sql files createHiveTpchTables.sql, createHiveTpcdsTables.sql, createIcebergTpchTables.sql, and createIcebergTpcdsTables.sql, and copy them into the Presto server container:

docker cp ./createHiveTpchTables.sql <container_id>:/.
docker cp ./createHiveTpcdsTables.sql <container_id>:/.
docker cp ./createIcebergTpchTables.sql <container_id>:/.
docker cp ./createIcebergTpcdsTables.sql <container_id>:/.

Using the CTAS queries from files createIcebergTpchTables.sql and createIcebergTpcdsTables.sql, we will create the tpch and tpcds schemas in iceberg catalog and add tables in these schemas using the data generated by TPC-H and TPC-DS connectors. The data will be generated in /home/iceberg_data in parquet file format and iceberg table format:

docker exec -it coordinator sh -c "/opt/presto-cli -f ./createIcebergTpchTables.sql"
docker exec -it coordinator sh -c "/opt/presto-cli -f ./createIcebergTpcdsTables.sql" 

Similarly, to generate TPC-H and TPC-DS data in parquet file format and hive table format, use the CTAS queries from files createHiveTpchTables.sql and createHiveTpcdsTables.sql. The data will be generated in /home/hive_data:

docker exec -it coordinator sh -c "/opt/presto-cli -f ./createHiveTpchTables.sql"
docker exec -it coordinator sh -c "/opt/presto-cli -f ./createHiveTpcdsTables.sql" 

The TPC-H and TPC-DS tables can now be queried with either hive or iceberg catalog using tpch or tpcds schema.

3.2 Prepare data

Copy data into directory prestorials/ from the presto_server docker container:

cd prestorials/
mkdir data/
docker cp <container_id>:/home/hive_data ./data/.
docker cp <container_id>:/home/iceberg_data ./data/. 

Alternatively, the TPC-H and TPC-DS data with parquet file format and hive table format can be downloaded from here. If you downloaded data.tar from this link, un-tar it and ensure the data directory is present in prestorials/. The data.tar file can then be deleted as it is no longer needed.

Copy the data directory from prestorials/ into docker-compose/local-fs/ and into docker-compose-native/local-fs/:

cp -r data docker-compose/local-fs/.
cp -r data docker-compose-native/local-fs/.

3. Deploying Presto

Now that setup is complete, we can run the docker compose file to deploy a Presto cluster and use Presto CLI to run queries.

3.1 Deploying Presto C++

Change into the docker-compose-native/local-fs directory in prestorials and run the docker compose command to start the Presto C++ cluster. We specify the Docker Compose file with -f docker-compose.yaml; based on whether your system architecture is aarch64 or x64, use either the docker compose file docker-compose-arm64.yaml or docker-compose-amd64.yaml respectively:

  1. For aarch64:

    docker compose -v -f docker-compose-arm64.yaml up
    

  2. For x64:

    docker compose -v -f docker-compose-amd64.yaml up
    

You should now see the logs of the Presto coordinator and worker starting up. The cluster is ready once the Presto coordinator's discovery service acknowledges requests from both the Presto C++ workers, wait for these logs:

coordinator  | 2024-07-25T23:48:15.077Z INFO    main    com.facebook.presto.server.PrestoServer ======== SERVER STARTED ========
worker_2     | I0725 23:48:39.584002     8 PeriodicServiceInventoryManager.cpp:118] Announcement succeeded: HTTP 202. State: active.
worker_1     | I0725 23:48:41.484305     8 PeriodicServiceInventoryManager.cpp:118] Announcement succeeded: HTTP 202. State: active.

3.2 Deploying Presto Java

If you just finished Step 3.1 and have a Presto C++ cluster running, skip to the next section on running benchmarks with pbench. Return to this step after running pbench and obtaining TPC-DS benchmark times for Presto C++.

Deploying Presto Java is very similar to Presto C++. We just use the Docker Compose file in docker-compose/local-fs directory of prestorials. Once again, based on whether your system architecture is aarch64 or x64, use either the docker compose file docker-compose-arm64.yaml or docker-compose-amd64.yaml respectively:

  1. For aarch64:

    docker compose -v -f docker-compose-arm64.yaml up
    

  2. For x64:

    docker compose -v -f docker-compose-amd64.yaml up
    

3.3 Iceberg schema evolution and time travel

We can query the TPC-H and TPC-DS iceberg tables using Presto's iceberg connector. First set the catalog to iceberg and the schema to either tpcds or tpch. The iceberg connector lets you modify the schema in-place, try it out with these queries. The iceberg connector also supports time travel using table snapshots, try it out with these queries.