Derive data with Data Distiller
Learn how data engineers can use query service to transform data and output new datasets. Run these queries on a schedule to to power automated dashboards and segmentation. For more information, please visit Generate output datasets from query results.
Let鈥檚 look at how data engineers can use Query Service with the Data Distiller add-on to enrich and automate datasets. Deriving data with Data Distiller allows you to create complex, programmatic rules to enrich datasets with new calculated fields and other derived outputs. These enriched datasets can then be used to enrich customer profiles, segment customers, analyze data, and execute other advanced use cases. Keep in mind that transformation requires the Data Distiller add-on, as it involves writing data back to a dataset. This is a key distinction from ad-hoc queries, which only read data.
Let鈥檚 walk through how you, as a data engineer, can use Query Service to identify the top 10% best-selling products over the last 30 days. Creating a derived dataset allows you to reuse the results across tools and workflows. It can also be surfaced in external reporting dashboards such as Microsoft Power BI, enabling users to track different metrics without re-running complex queries. This is also especially useful when the query is scheduled to refresh regularly, ensuring that external tools always reflect the latest insights. First, query the purchase events demo dataset and calculate total revenue for each product over the past 30 days. Then, they divide the ranked results into deciles, 10 equal groups, and filter for the top decile, or top 10% of products. When you run the query, the results populate the top 10% products table but do not appear in the results tab. You can validate the results using a simple select query in the query editor. Since deciles adjust dynamically as new data arrives, they provide adaptive insights, making them useful for tracking high-performing products, customers, or other business metrics. Once the ad hoc query has been validated, the next step is to transform it into a scheduled query. Save your query as a template. Only saved templates can be scheduled. Name your query, for example, top 10% products. Then navigate to the templates tab, select it, and click add schedule. In the scheduling interface, define the frequency, for example, daily or weekly. Set the start and end dates and choose the output dataset. You can either create a new dataset or append it to an existing one using the UI. If creating a new dataset, an ad hoc schema will automatically be created with the necessary fields. This will be the target for your scheduled query results. This process turns a one-time query into a reliable, automated workflow. Ensuring your dataset is continuously refreshed with the latest insights. You can view the schedule from the query editor, the template row, or from the scheduled queries tab. Remember, you can also use this approach to enrich profile data, segment profiles into audiences, and for other use cases requiring transformed data. Thanks for watching.