Skip to main content

GraphQL

GraphQL (GQL) is an open-source query language for APIs. It offers a more efficient and flexible approach compared to traditional RESTful APIs.

With GraphQL, users can request specific data using a single query, reducing the need for many server round trips. This improves performance and minimizes network overhead.

GraphQL has several advantages, such as self-documenting, having a strong typing system, supporting versioning and evolution, enabling rapid development, and having a robust ecosystem. These features make GraphQL a powerful choice for APIs prioritizing flexibility, performance, and developer productivity.

dbt Semantic Layer GraphQL API

The dbt Semantic Layer GraphQL API allows you to explore and query metrics and dimensions. Due to its self-documenting nature, you can explore the calls conveniently through a schema explorer.

The schema explorer URLs vary depending on your deployment region. Use the following table to find the right link for your region:

Deployment typeSchema explorer URL
North America multi-tenanthttps://semantic-layer.cloud.getdbt.com/api/graphql
EMEA multi-tenanthttps://semantic-layer.emea.dbt.com/api/graphql
APAC multi-tenanthttps://semantic-layer.au.dbt.com/api/graphql
Single tenanthttps://semantic-layer.YOUR_ACCESS_URL/api/graphql

Replace YOUR_ACCESS_URL with your specific account prefix followed by the appropriate Access URL for your region and plan.
Multi-cellhttps://YOUR_ACCOUNT_PREFIX.semantic-layer.REGION.dbt.com/api/graphql

Replace YOUR_ACCOUNT_PREFIX with your specific account identifier and REGION with your location, which could be us1.dbt.com.

Example

  • If your Single tenant access URL is ABC123.getdbt.com, your schema explorer URL will be https://semantic-layer.ABC123.getdbt.com/api/graphql.

dbt Partners can use the Semantic Layer GraphQL API to build an integration with the dbt Semantic Layer.

Note that the dbt Semantic Layer API doesn't support ref to call dbt objects. Instead, use the complete qualified table name. If you're using dbt macros at query time to calculate your metrics, you should move those calculations into your Semantic Layer metric definitions as code.

Requirements to use the GraphQL API

  • A dbt Cloud project on dbt v1.6 or higher
  • Metrics are defined and configured
  • A dbt Cloud service token with "Semantic Layer Only” and "Metadata Only" permissions
  • Your dbt project is configured and connected to a data platform

Using the GraphQL API

If you're a dbt user or partner with access to dbt Cloud and the dbt Semantic Layer, you can setup and test this API with data from your own instance by configuring the Semantic Layer and obtaining the right GQL connection parameters described in this document.

Refer to Get started with the dbt Semantic Layer for more info.

Authentication

Authentication uses a dbt Cloud service account tokens passed through a header as follows. To explore the schema, you can enter this information in the "header" section.

{"Authorization": "Bearer <SERVICE TOKEN>"}

Each GQL request also requires a dbt Cloud environmentId. The API uses both the service token in the header and environmentId for authentication.

Metadata calls

Fetch data platform dialect

In some cases in your application, it may be useful to know the dialect or data platform that's internally used for the dbt Semantic Layer connection (such as if you are building where filters from a user interface rather than user-inputted SQL).

The GraphQL API has an easy way to fetch this with the following query:

{
environmentInfo(environmentId: BigInt!) {
dialect
}
}

Fetch available metrics

metrics(environmentId: BigInt!): [Metric!]!

Fetch available dimensions for metrics

dimensions(
environmentId: BigInt!
metrics: [MetricInput!]!
): [Dimension!]!

Fetch available granularities given metrics

Note: This call for queryableGranularities returns only queryable granularities for metric time - the primary time dimension across all metrics selected.

queryableGranularities(
environmentId: BigInt!
metrics: [MetricInput!]!
): [TimeGranularity!]!

You can also get queryable granularities for all other dimensions using the dimensions call:

{
dimensions(environmentId: BigInt!, metrics:[{name:"order_total"}]) {
name
queryableGranularities # --> ["DAY", "WEEK", "MONTH", "QUARTER", "YEAR"]
}
}

You can also optionally access it from the metrics endpoint:

{
metrics(environmentId: BigInt!) {
name
dimensions {
name
queryableGranularities
}
}
}

Fetch measures

{
measures(environmentId: BigInt!, metrics: [{name:"order_total"}]) {
name
aggTimeDimension
}
}

aggTimeDimension tells you the name of the dimension that maps to metric_time for a given measure. You can also query measures from the metrics endpoint, which allows you to see what dimensions map to metric_time for a given metric:

{
metrics(environmentId: BigInt!) {
measures {
name
aggTimeDimension
}
}
}

Fetch available metrics given a set of dimensions

metricsForDimensions(
environmentId: BigInt!
dimensions: [GroupByInput!]!
): [Metric!]!

Metric types

Metric {
name: String!
description: String
type: MetricType!
typeParams: MetricTypeParams!
filter: WhereFilter
dimensions: [Dimension!]!
queryableGranularities: [TimeGranularity!]!
}
MetricType = [SIMPLE, RATIO, CUMULATIVE, DERIVED]

Metric type parameters

MetricTypeParams {
measure: MetricInputMeasure
inputMeasures: [MetricInputMeasure!]!
numerator: MetricInput
denominator: MetricInput
expr: String
window: MetricTimeWindow
grainToDate: TimeGranularity
metrics: [MetricInput!]
}

Dimension types

Dimension {
name: String!
description: String
type: DimensionType!
typeParams: DimensionTypeParams
isPartition: Boolean!
expr: String
queryableGranularities: [TimeGranularity!]!
}
DimensionType = [CATEGORICAL, TIME]

List saved queries

{
savedQueries(environmentId: 200532) {
name
description
label
queryParams {
metrics {
name
}
groupBy {
name
grain
datePart
}
where {
whereSqlTemplate
}
}
}
}

Querying

When querying for data, either a groupBy or a metrics selection is required. The following section provides examples of how to query metrics:

Create dimension values query


mutation createDimensionValuesQuery(
environmentId: BigInt!
metrics: [MetricInput!]
groupBy: [GroupByInput!]!
): CreateDimensionValuesQueryResult!

Create metric query

createQuery(
environmentId: BigInt!
metrics: [MetricInput!]!
groupBy: [GroupByInput!] = null
limit: Int = null
where: [WhereInput!] = null
order: [OrderByInput!] = null
): CreateQueryResult
MetricInput {
name: String!
}

GroupByInput {
name: String!
grain: TimeGranularity = null
}

WhereInput {
sql: String!
}

OrderByinput { # -- pass one and only one of metric or groupBy
metric: MetricInput = null
groupBy: GroupByInput = null
descending: Boolean! = false
}

Fetch query result

query(
environmentId: BigInt!
queryId: String!
): QueryResult!

The GraphQL API uses a polling process for querying since queries can be long-running in some cases. It works by first creating a query with a mutation, `createQuery, which returns a query ID. This ID is then used to continuously check (poll) for the results and status of your query. The typical flow would look as follows:

  1. Kick off a query
mutation {
createQuery(
environmentId: 123456
metrics: [{name: "order_total"}]
groupBy: [{name: "metric_time"}]
) {
queryId # => Returns 'QueryID_12345678'
}
}
  1. Poll for results
{
query(environmentId: 123456, queryId: "QueryID_12345678") {
sql
status
error
totalPages
jsonResult
arrowResult
}
}
  1. Keep querying 2. at an appropriate interval until status is FAILED or SUCCESSFUL

Output format and pagination

Output format

By default, the output is in Arrow format. You can switch to JSON format using the following parameter. However, due to performance limitations, we recommend using the JSON parameter for testing and validation. The JSON received is a base64 encoded string. To access it, you can decode it using a base64 decoder. The JSON is created from pandas, which means you can change it back to a dataframe using pandas.read_json(json, orient="table"). Or you can work with the data directly using json["data"], and find the table schema using json["schema"]["fields"]. Alternatively, you can pass encoded:false to the jsonResult field to get a raw JSON string directly.

{
query(environmentId: BigInt!, queryId: Int!, pageNum: Int! = 1) {
sql
status
error
totalPages
arrowResult
jsonResult(orient: PandasJsonOrient! = TABLE, encoded: Boolean! = true)
}
}

The results default to the table but you can change it to any pandas supported value.

Pagination

By default, we return 1024 rows per page. If your result set exceeds this, you need to increase the page number using the pageNum option.

Run a Python query

The arrowResult in the GraphQL query response is a byte dump, which isn't visually useful. You can convert this byte data into an Arrow table using any Arrow-supported language. Refer to the following Python example explaining how to query and decode the arrow result:

import base64
import pyarrow as pa
import time

headers = {"Authorization":"Bearer <token>"}
query_result_request = """
{
query(environmentId: 70, queryId: "12345678") {
sql
status
error
arrowResult
}
}
"""

while True:
gql_response = requests.post(
"https://semantic-layer.cloud.getdbt.com/api/graphql",
json={"query": query_result_request},
headers=headers,
)
if gql_response.json()["data"]["status"] in ["FAILED", "SUCCESSFUL"]:
break
# Set an appropriate interval between polling requests
time.sleep(1)

"""
gql_response.json() =>
{
"data": {
"query": {
"sql": "SELECT\n ordered_at AS metric_time__day\n , SUM(order_total) AS order_total\nFROM semantic_layer.orders orders_src_1\nGROUP BY\n ordered_at",
"status": "SUCCESSFUL",
"error": null,
"arrowResult": "arrow-byte-data"
}
}
}
"""

def to_arrow_table(byte_string: str) -> pa.Table:
"""Get a raw base64 string and convert to an Arrow Table."""
with pa.ipc.open_stream(base64.b64decode(byte_string)) as reader:
return pa.Table.from_batches(reader, reader.schema)


arrow_table = to_arrow_table(gql_response.json()["data"]["query"]["arrowResult"])

# Perform whatever functionality is available, like convert to a pandas table.
print(arrow_table.to_pandas())
"""
order_total ordered_at
3 2023-08-07
112 2023-08-08
12 2023-08-09
5123 2023-08-10
"""

Additional create query examples

The following section provides query examples for the GraphQL API, such as how to query metrics, dimensions, where filters, and more:

Query metric alias

mutation {
createQuery(
environmentId: "..."
metrics: [{name: "metric_name", alias: "metric_alias"}]
) {
...
}
}

Query two metrics grouped by time

mutation {
createQuery(
environmentId: BigInt!
metrics: [{name: "food_order_amount"}]
groupBy: [{name: "metric_time"}, {name: "customer__customer_type"}]
) {
queryId
}
}

Query with a time grain

mutation {
createQuery(
environmentId: BigInt!
metrics: [{name: "order_total"}]
groupBy: [{name: "metric_time", grain: MONTH}]
) {
queryId
}
}

Note that when using granularity in the query, the output of a time dimension with a time grain applied to it always takes the form of a dimension name appended with a double underscore and the granularity level - {time_dimension_name}__{DAY|WEEK|MONTH|QUARTER|YEAR}. Even if no granularity is specified, it will also always have a granularity appended to it and will default to the lowest available (usually daily for most data sources). It is encouraged to specify a granularity when using time dimensions so that there won't be any unexpected results with the output data.

Query two metrics with a categorical dimension

mutation {
createQuery(
environmentId: BigInt!
metrics: [{name: "food_order_amount"}, {name: "order_gross_profit"}]
groupBy: [{name: "metric_time", grain: MONTH}, {name: "customer__customer_type"}]
) {
queryId
}
}

Query a categorical dimension on its own

mutation {
createQuery(
environmentId: 123456
groupBy: [{name: "customer__customer_type"}]
) {
queryId
}
}

Query with a where filter

The where filter takes a list argument (or a string for a single input). Depending on the object you are filtering, there are a couple of parameters:

  • Dimension() — Used for any categorical or time dimensions. For example, Dimension('metric_time').grain('week') or Dimension('customer__country').

  • Entity() — Used for entities like primary and foreign keys, such as Entity('order_id').

Note: If you prefer a where clause with a more explicit path, you can optionally use TimeDimension() to separate categorical dimensions from time ones. The TimeDimension input takes the time dimension and optionally the granularity level. TimeDimension('metric_time', 'month').

mutation {
createQuery(
environmentId: BigInt!
metrics:[{name: "order_total"}]
groupBy:[{name: "customer__customer_type"}, {name: "metric_time", grain: month}]
where:[{sql: "{{ Dimension('customer__customer_type') }} = 'new'"}, {sql:"{{ Dimension('metric_time').grain('month') }} > '2022-10-01'"}]
) {
queryId
}
}

For both TimeDimension(), the grain is only required in the WHERE filter if the aggregation time dimensions for the measures and metrics associated with the where filter have different grains.

For example, consider this Semantic model and Metric configuration, which contains two metrics that are aggregated across different time grains. This example shows a single semantic model, but the same goes for metrics across more than one semantic model.

semantic_model:
name: my_model_source

defaults:
agg_time_dimension: created_month
measures:
- name: measure_0
agg: sum
- name: measure_1
agg: sum
agg_time_dimension: order_year
dimensions:
- name: created_month
type: time
type_params:
time_granularity: month
- name: order_year
type: time
type_params:
time_granularity: year

metrics:
- name: metric_0
description: A metric with a month grain.
type: simple
type_params:
measure: measure_0
- name: metric_1
description: A metric with a year grain.
type: simple
type_params:
measure: measure_1

Assuming the user is querying metric_0 and metric_1 together, a valid filter would be:

  • "{{ TimeDimension('metric_time', 'year') }} > '2020-01-01'"

Invalid filters would be:

  • "{{ TimeDimension('metric_time') }} > '2020-01-01'" — metrics in the query are defined based on measures with different grains.

  • "{{ TimeDimension('metric_time', 'month') }} > '2020-01-01'"metric_1 is not available at a month grain.

Query with order

mutation {
createQuery(
environmentId: BigInt!
metrics: [{name: "order_total"}]
groupBy: [{name: "metric_time", grain: MONTH}]
orderBy: [{metric: {name: "order_total"}}, {groupBy: {name: "metric_time", grain: MONTH}, descending:true}]
) {
queryId
}
}

Query with limit

mutation {
createQuery(
environmentId: BigInt!
metrics: [{name:"food_order_amount"}, {name: "order_gross_profit"}]
groupBy: [{name:"metric_time", grain: MONTH}, {name: "customer__customer_type"}]
limit: 10
) {
queryId
}
}

Query with just compiling SQL

This takes the same inputs as the createQuery mutation.

mutation {
compileSql(
environmentId: BigInt!
metrics: [{name:"food_order_amount"} {name:"order_gross_profit"}]
groupBy: [{name:"metric_time", grain: MONTH}, {name:"customer__customer_type"}]
) {
sql
}
}

Querying compile SQL with saved queries

This query includes the field savedQuery and generates the SQL based on a predefined saved query,rather than dynamically building it from a list of metrics and groupings. You can use this for frequently used queries.

mutation {
compileSql(
environmentId: 200532
savedQuery: "new_customer_orders" # new field
) {
queryId
sql
}
}
A note on querying saved queries

When querying saved queries,you can use parameters such as where, limit, order, compile, and so on. However, keep in mind that you can't access metric or group_by parameters in this context. This is because they are predetermined and fixed parameters for saved queries, and you can't change them at query time. If you would like to query more metrics or dimensions, you can build the query using the standard format.

Create query with saved queries

This takes the same inputs as the createQuery mutation, but includes the field savedQuery. You can use this for frequently used queries.

mutation {
createQuery(
environmentId: 200532
savedQuery: "new_customer_orders" # new field
) {
queryId
}
}

Multi-hop joins

In cases where you need to query across multiple related tables (multi-hop joins), use the entity_path argument to specify the path between related entities. The following are examples of how you can define these joins:

  • In this example, you're querying the location_name dimension but specifying that it should be joined using the order_id field.
    {{Dimension('location__location_name', entity_path=['order_id'])}}
  • In this example, the salesforce_account_owner dimension is joined to the region field, with the path going through salesforce_account.
    {{ Dimension('salesforce_account_owner__region',['salesforce_account']) }}
0