Serverless SQL on Lakehouse

Apache Kyuubi (Incubating), a distributed and multi-tenant gateway to provide serverless SQL on lakehouses.

Key Features

Multi Tenancy

Kyuubi provides end-to-end multi-tenancy for resource acquiring and data/metadata access with a unified authentication/authorization layer.

High Availability

Kyuubi supports load balancing via ZooKeeper, which provides an enterprise-grade high availability, as well as an unlimited high client concurrency.

Multiple Workloads

Kyuubi can easily support multiple disparate workloads with one single platform, one copy of data, and one SQL interface.

Ecosystem
The figure below shows our vision for the Kyuubi Ecosystem. Some of them have been realized, some in development, and others would not be possible without your help.
Access from anywhere
Deploy at any scale
Connect to any data
Use Cases

Interactive Analytics

Kyuubi is an advanced, enterprise-grade, rapid analytics platform for interactive visual analytics on big data, with modern computing frameworks under the hood, i.e., Apache Spark, Apache Flink, Trino, e.t.c. With JDBC/ODBC, users can access kyuubi and run queries efficiently through SQL directly or generated by BI tools. Kyuubi caches background engine instances at the user level for better computing resource sharing and quick response. They parallelize queries on large amounts of data and return the results quickly.

Batch Processing

Kyuubi provides a SQL interface that you are already comfortable with for batch processing, typically large Extract, Transform, Load (ETL) processes. Both kyuubi and its engines are storage independent and work with numerous data sources, and Kyuubi isolates background engine instances at the connection level for better computing resource isolation and stability.

Data Lake & Lakehouses

Kyuubi supports query all traditional data warehouses, like Apache Hive/HDFS, or modern lakehouses, like Apache Iceberg, Apache Hudi, and Delta Lake, together. Kyuubi also provides multi catalog meta APIs that present a sizeable centralized picture of all your data in front of you and help you innovate faster. The ability to query disparate data sources in a single entrance with the ANSI standard SQL syntaxes greatly simplifies data insight while providing authentication and authorization to keep all data secure.