Skip to content

Glossary

Apache Superset: Apache Superset is an open-source software application for data exploration and data visualization able to handle data at petabyte scale. Apache Superset is a modern, enterprise-ready business intelligence web application. It is fast, lightweight, intuitive, and loaded with options that make it easy for users of all skill sets to explore and visualize their data, from simple pie charts to highly detailed geospatial charts.

Application Programming Interface (API)**: Application Programming Interface (API) is a programmatic interface for executing functions of an application in an automated or manual fashion without using a CLI or User Interface.

Buckets: Buckets are the basic containers that hold your data. Everything that you store in Cloud Storage must be contained in a bucket. You can use buckets to organize your data and control access to your data, but unlike directories and folders, you cannot nest buckets.

Catalog: This term may have many meanings depending on context. Review below:

  • Service Catalog - A service catalog is a comprehensive list of cloud computing services that an organization offers its customers. The catalog is the only portion of the company's service portfolio that is published and provided to customers as a support to the sale or delivery of offered services.

  • Data Catalog - A collection of business information describing the available datasets within an organization.

  • Metastore Catalog - A collection of technical and operational metadata allowing a query engine to overlay a virtual table on a collection of discrete data files.

  • Connector Catalog - The named representation of a connector within the virtual warehouse of a presto instance.

Command Line Interface (CLI): A command-line interface (CLI) is a text-based user interface (UI) used to run programs, manage computer files and interact with the computer.

dBeaver: DBeaver is a SQL client software application and a database administration tool. For relational databases it uses the JDBC application programming interface to interact with databases via a JDBC driver. For other databases it uses proprietary database drivers.

Federation: A federated database is a system in which several databases appear to function as a single entity. Each component database in the system is completely self-sustained and functional. When an application queries the federated database, the system figures out which of its component databases contains the data being requested and passes the request to it. Federated databases can be thought of as database virtualization in much the same way that storage virtualization makes several drives appear as one.

MinIO: MinIO is a high-performance, S3 compatible object store. It is built for large scale AI/ML, data lake and database workloads. It runs on-prem and on any cloud (public or private) and from the data center to the edge. MinIO is software-defined and open source under GNU AGPL v3.

Object Storage: Object storage is a data storage architecture for storing unstructured data, which sections data into units—objects—and stores them in a structurally flat data environment. Each object includes the data, metadata, and a unique identifier that applications can use for easy access and retrieval.

Presto: Presto is a distributed database query engine (written in Java) that uses the SQL query language. Its architecture allows users to query data sources such as Hadoop, Cassandra, Kafka, AWS S3, Alluxio, MySQL, MongoDB and Teradata, and allows use of multiple data sources within a query. Presto is community-driven open-source software released under the Apache License. Presto's architecture is very similar to other database management systems using cluster computing, sometimes called massively parallel processing (MPP).

SPARK: Apache Spark is an open-source unified analytics engine for large-scale data processing. Spark provides an interface for programming clusters with implicit data parallelism and fault tolerance. Spark can be used with watsonx.data but is not included in the watsonx.data environment image provided.

TechZone (IBM Technology Zone): IBM Technology Zone is the platform where the developer edition of watsonx.data with the sample data sets has been provisioned. Generally, it allows Go To Market teams and Business Partners to easily build technical 'Show Me' live environments, POTs, prototypes, and MVPs, which can then be customized and shared with peers and customers to experience IBM Technology.

VNC (Virtual Network Computing): VNC is a cross-platform screen-sharing system that uses the Remote Frame Buffer (RFB) protocol. VNC was created to control another computer remotely. You may know it best for its role in tech support services. Use of VNC is optional. VNC can be used after the WireGuard VPN has been activated to access the watsonx.data server.

WireGuard: WireGuard is a communication protocol and free and open-source software that implements encrypted virtual private networks, and was designed with the goals of ease of use, high speed performance, and low attack surface. You will need to install the Wireguard software and download the server VPN certificate in order to access the watsonx.data server.