Latency Models
Latency models learn to predict query runtime from database plans and related features. In this repository, the measured JSONL files from the benchmark workflow are the raw input; dataset scripts turn those measurements into model features and training labels.
The target value is query latency in milliseconds. The model input is not the query text alone, but the database’s view of how the query will run: operators, estimated rows or costs, structural plan features, and, for some drivers, database statistics.
Query latency is hard to predict because plans are only approximations of what will happen at runtime.
The most important uncertainty is cardinality estimation. If the database planner underestimates how many rows or documents a filter will return, a plan can look cheap even though the query scans much more data than expected. That error flows directly into the model features because estimated rows, costs, and operator choices are part of the input.
The repository has two latency-model paths.
| Path | Dataset scripts | Training scripts | Main use |
|---|---|---|---|
| Plan-structured neural models | scripts.neural.create_dataset |
scripts.neural.train |
QPP-Net-style experiments over plan trees. |
| Flat tree models | scripts.flat.<driver>.create_dataset |
scripts.flat.<driver>.train |
EXPLAIN-only prediction and MCTS integration. |
Both paths start from measured query files. They differ in how much of the plan structure they preserve.
This model is an implementation of the Plan-Structured Deep Neural Network paper - feel free to refer to it for more details.
The neural path converts each measured plan into a tree of PlanNode objects. Each node stores:
- an operator type,
- a numeric feature vector,
- optional extracted operator latency,
- child plan nodes.
The feature extractor first scans training plans to build vocabularies and normalization statistics. Then it converts each database-native plan into a driver-specific tree representation.
Create a neural train/validation split:
python -m scripts.neural.create_dataset \
postgres/edbt-demo-train \
edbt-0/measured-100-6.jsonl \
--val-dataset postgres/edbt-demo-val \
--val-ratio 0.2 \
--split-seed 42
This writes:
data/cache/postgres/edbt-demo-train/dataset.pkl
data/cache/postgres/edbt-demo-train/feature_extractor.pkl
data/cache/postgres/edbt-demo-train/operators.json
operators.json records the operator signatures seen during dataset creation. For neural models, an operator signature is the combination of operator type, number of children, and feature dimension. The model creates one neural unit for each signature found in the training set. Note that this is a latency-model term and is not the same as a schema-category
signature
, which identifies morphisms in the conceptual model.
Train a neural model:
python -m scripts.neural.train \
postgres/edbt-demo-model \
edbt-demo-train \
edbt-demo-val
Training checkpoints are saved under:
data/checkpoints/{driver}/{model-name}/
Important checkpoint names are:
best.pt
best_metrics.json
epoch/0005.pt
epoch/0005_metrics.json
Evaluate a neural checkpoint:
python -m scripts.neural.test \
postgres/edbt-demo-model/best \
edbt-demo-val
Evaluation writes:
data/plots/evaluation_results.json
data/plots/evaluation_plots.png
Validation or test items with unseen operator signatures may be pruned because the trained neural model has no unit for those operators.
Flat models convert each whole plan into one fixed-length numeric vector. They are trained with tree regressors such as random forests and XGBoost.
Create a PostgreSQL flat dataset:
python -m scripts.flat.postgres.create_dataset \
postgres/edbt-demo-flat-train \
edbt-0/measured-100-6.jsonl \
--val-dataset postgres/edbt-demo-flat-val \
--val-ratio 0.2 \
--split-seed 42
Flat artifacts are:
data/cache/{driver}/{dataset-name}/flat_dataset.pkl
data/cache/{driver}/{dataset-name}/flat_feature_extractor.pkl
Train and evaluate a PostgreSQL flat model:
python -m scripts.flat.postgres.train \
postgres/edbt-demo-flat-rf \
edbt-demo-flat-train \
edbt-demo-flat-val \
--model-type random_forest
python -m scripts.flat.postgres.test \
postgres/edbt-demo-flat-rf \
edbt-demo-flat-val
Flat model artifacts are:
data/checkpoints/{driver}/{model-name}/flat_model.pkl
data/checkpoints/{driver}/{model-name}/flat_metrics.json
The equivalent MongoDB and Neo4j commands are:
scripts.flat.mongo.create_dataset
scripts.flat.mongo.train
scripts.flat.mongo.test
scripts.flat.neo4j.create_dataset
scripts.flat.neo4j.train
scripts.flat.neo4j.test
MongoDB and Neo4j flat trainers support additional tree families such as extra_trees and decision_tree. PostgreSQL currently supports random_forest and xgboost.
Feature extractors are saved with datasets and models because they define the feature contract. A test dataset should reuse the training feature extractor when it needs the same vocabulary or feature ordering:
python -m scripts.flat.postgres.create_dataset \
postgres/edbt-demo-flat-test \
edbt-1/measured-1000-10.jsonl \
--feature-extractor-dataset edbt-demo-flat-train
At a high level, the feature families are:
| Feature family | Examples |
|---|---|
| Plan structure | node count, tree depth, leaf count, parent-child operator patterns |
| Operator mix | scan, join, sort, aggregate, update, delete, traversal operators |
| Optimizer estimates | estimated rows, estimated cost, plan width, root and aggregate statistics |
| Query shape | filters, limits, sorts, grouping, expression counts, relationship patterns |
| Database statistics | MongoDB collection and field-distribution stats; schema-specific optional identifiers |
PostgreSQL flat features use plain EXPLAIN information such as estimated costs, estimated rows, plan width, structural counts, and categorical operator counts.
MongoDB flat features use queryPlanner-safe plans plus global collection and field-distribution statistics. They also preserve original query filters on the plan so selectivity-style features can be estimated without executing the query.
Neo4j flat features use plain EXPLAIN plan fields such as estimated rows, operator structure, identifiers, and parsed Details patterns. They avoid runtime counters from PROFILE.
The training label is derived from the measured times array for each query. Dataset creation skips early timing runs and averages the remaining values.
The defaults differ by path:
| Dataset path | Default --skip-first |
|---|---|
| Generic neural dataset | 5 |
| PostgreSQL flat dataset | 5 |
| MongoDB flat dataset | 1 |
| Neo4j flat dataset | 1 |
If a measurement file has too few runs for the requested skip count, dataset creation may fail or fall back depending on the driver-specific script. For training runs, collect enough timings that the label is still based on multiple post-warmup values.
Evaluation compares predicted latency with measured latency.
Common outputs include:
- predicted latency,
- measured latency,
- absolute error,
- mean and median absolute error,
- mean relative error,
- median and mean R-value,
- fraction of predictions within R-value thresholds such as
1.5,2.0, or5.0.
The R-value is symmetric: it is the larger of predicted / measured and measured / predicted. A value of 1.0 is perfect. A value of 2.0 means the prediction is off by a factor of two in either direction.
Flat evaluation scripts write JSON results under data/plots/ by default:
data/plots/flat_evaluation_results.json
data/plots/mongo_flat_evaluation_results.json
data/plots/neo4j_flat_evaluation_results.json
Each result row includes the query id, label, predicted latency, measured latency, and absolute error. MongoDB evaluation can also print verbose per-template summaries with --verbose.
Flat models can predict a single query without executing it.
PostgreSQL:
python -m scripts.flat.postgres.predict \
postgres/edbt-demo-flat-rf \
postgres/edbt-0 \
"SELECT * FROM person LIMIT 10"
MongoDB:
python -m scripts.flat.mongo.predict \
mongo/tpch-2-flat-rf \
mongo/tpch-2 \
'{"find":"orders","filter":{"o_orderkey":1}}'
Neo4j:
python -m scripts.flat.neo4j.predict \
neo4j/tpch-2-flat-rf \
neo4j/tpch-2 \
"MATCH (n) RETURN n LIMIT 10"
PostgreSQL uses EXPLAIN without ANALYZE, MongoDB uses explain verbosity queryPlanner, and Neo4j uses EXPLAIN. MongoDB prediction also needs cached global stats with field distributions; those are normally produced while creating MongoDB flat datasets.
Use the neural model when predicting for PostgreSQL, as the shapes of its query plan are a good fit for it, unlike the shapes of MongoDB and Neo4j query plans.
Use the flat (tree-based) model when predicting any of the PostgreSQL, MongoDB, Neo4j queries.