Overview - StarTree Docs

Apache Pinot is built for scale, effortlessly handling massive datasets and high query throughput. At the heart of its exceptional performance and flexibility are Apache Pinot’s advanced indexing capabilities, enabling users to execute ultra-fast analytics even at petabyte scale. With a comprehensive set of indexing techniques, Pinot empowers users to confidently select the indexes best suited for their unique data characteristics and evolving query patterns.

Why Use Indexes?

Accelerated Query Performance: Indexes drastically enhance query speed, efficiently pinpointing relevant data segments even at massive scale.
Optimized Resource Usage: Strategic indexing reduces unnecessary data scans, effectively lowering resource consumption and operational costs.
Flexible Analytics: A variety of index types allows Pinot to accommodate diverse analytical workloads, ranging from straightforward lookups to complex analytics and sophisticated similarity searches.

Supported Index Types

Apache Pinot supports a wide range of indexes tailored to optimize various query scenarios:

Inverted Index

Maps each value directly to its rows for fast lookups.

Star-tree Index

Delivers superior aggregation performance on large, high-cardinality datasets.

Range Index

Handles numeric range queries efficiently without requiring data sorting.

Forward Index

Enhances range queries by maintaining data in a sorted sequence. Types: Dictionary-Encoded, Sorted, and Raw Value.

JSON Index

Enables fast queries on JSON-structured data.

Geospatial Index

Powers geographic queries, enabling proximity searches and spatial analytics.

Text Index (Lucene)

Provides rapid search capabilities for unstructured text fields through full-text indexing.

Text Index (Native)

Provides rapid search capabilities for unstructured text fields through full-text indexing.

Timestamp Index

Enables fast filtering on timestamp columns by indexing at a defined time granularity.

Vector Index

Supports fast similarity searches on vector embeddings, ideal for Gen AI and recommendation workloads.

Sparse Index

Optimizes high-cardinality equality filters using chunked partitioning.

Bloom Filter

Fast segment pruning for equality queries with minimal memory.

FST Index

Compact regex search on dictionary-encoded text columns.

Dictionary Index

Replaces repeated values with integer IDs for storage efficiency.

Composite JSON Index

An enhanced version of the JSON Index to reduce index size and improve performance.

When selecting the right index for your use case, consider the following:

Query Patterns: Assess the types of queries you run—point lookups, range queries, aggregations, or similarity searches.
Data Type and Cardinality: Evaluate column uniqueness, data distribution, and characteristics.
Performance vs. Storage Trade-offs: Understand that some indexes enhance performance substantially but may require additional storage.

By strategically selecting indexes based on your data and query requirements, Apache Pinot empowers you to deliver blazing-fast analytics at any scale, making complex data exploration powerful and seamless.

Updating Indexes

Updating indexes involves the following abstracted steps:

Assess the Right Indexes: Determine the appropriate indexes based on your query needs and data characteristics.
Apply Index Configurations: Configure indexes in your table configuration, referring to each index’s dedicated documentation page for specific configuration options.
Apply Changes and Reload: Invoke the table reload using the reload API. This process occurs seamlessly, without downtime, and remains completely transparent to active queries.

Documentation Index

​Why Use Indexes?

​Supported Index Types