Deneb & Vega - 06: Shape the Data

02 Jul 2025

Introduction

In the previous post, we looked at how to get data into Vega. Once the data is in, the next step is to shape it according to your visualization needs. That’s where data transformation comes in.

In Vega, we can perform transformations right inside the data block of each dataset definition. Vega offers a wide range of built-in transformations, but in this post, we’ll focus on the most useful and commonly used ones. If you’re curious, you can explore the full list of available transformations here.

Transformations

Transformations come in many forms. You might want to add new columns, remove existing ones, filter rows, calculate statistics, perform lookups, or aggregate data—the list goes on.

All of these actions happen inside the transform property of a dataset. This property takes an array, so you can apply multiple transformations in sequence. Each transformation builds on the result of the previous one.

Let’s walk through some of the most useful transformations with examples and explanations.

1. Calculated Columns

If you’re coming from a Power BI background, the term calculated column should feel familiar.

In Vega, we can create new columns (or overwrite existing ones) using formulas. To do this, we use the formula transform with these properties:

type: For calculated columns, this should be formula.
as: The name of the new column. If the name already exists, it will be overwritten.
expr: The formula to be applied to each row.

Here, we need to understand the concept of datum in Vega. It represents the current row of the dataset (row context, Power BI folks!). You can access a field in the row using datum.ColumnName or datum['Column Name'].

Vega supports a wide range of expressions. You can check them out here.

{
  "data": [
    {
      "name": "exampleFormula",
      "values": [
        {"a": 9, "b": 7, "c": 5},
        {"a": 19, "b": 27, "c": 35},
        {"a": 99, "b": null, "c": 59}
      ],
      "transform": [
        {
          "type": "formula",
          "as": "Simple Addition",
          "expr": "datum.a + datum.c"
        },
        {
          "type": "formula",
          "as": "Logical Calc",
          "expr": "if( isNumber( datum.b ), datum['Simple Addition'] - datum.b, NaN )"
        }
      ]
    }
  ]
}

2. Aggregating

The aggregate transform is used to group and summarize data. You can perform operations like count, sum, average, min, max, variance, distinct, and more (full list here).

Key properties:

type: Must be aggregate.
ops: An array of aggregation operations to apply.
fields: An array of columns to aggregate—must match the size of ops.
as: Names of the new aggregated fields (also matches the size of ops and fields). If omitted, Vega auto-generates names like sum_sales, avg_rating, etc.
groupby: Optional. If specified, the aggregation will be grouped by the given fields.

{
  "data": [
    {
      "name": "exampleFormula",
      "values": [
        {"category": "SSC", "a": 9, "b": 7, "c": 5},
        {"category": "SSC", "a": 19, "b": 27, "c": 35},
        {"category": "HSC", "a": 99, "b": 79, "c": 59}
      ],
      "transform": [
        {
          "type": "aggregate",
          "ops": ["average", "min", "variance", "count"],
          "fields": ["a", "b", "c", "c"],
          "groupby": ["category"]
        }
      ]
    }
  ]
}

There are a few more optional properties for advanced behavior, but we can skip those for now.

3. Filtering

Filtering lets you include or exclude rows based on a condition. In Vega, you use the filter transform, which is super straightforward:

type: filter
expr: A condition that evaluates to true or false for each row and this transform only returns the rows where the expression evaluates to true.

Once again, datum is used to refer to each row while writing the expression.

{
  "data": [
    {
      "name": "exampleFormula",
      "values": [
        {"category": "SSC", "a": 9, "b": 7, "c": 5},
        {"category": "SSC", "a": 19, "b": 27, "c": 35},
        {"category": "HSC", "a": 99, "b": 79, "c": 59}
      ],
      "transform": [
        {
          "type": "filter",
          "expr": "datum.category == 'SSC' && datum.c > 30"
        }
      ]
    }
  ]
}

4. Lookups

If you’re familiar with Excel or Power BI, you know how powerful lookups can be.

In Vega, you can enrich a dataset by pulling values from another dataset using the lookup transform. This works just like LOOKUP functions Excel’s lookup functions (like VLOOKUP, HLOOKUP, INDEX-MATCH, and XLOOKUP) are used to find specific data within tables and retrieve related information. in Excel or the RELATED In Power BI’s DAX, RELATED is a function used in calculated columns to retrieve a value from a related table, automatically following an existing one-to-many relationship. function in Power BI.

Important properties:

type: lookup
from: The name of the dataset you’re pulling data from.
fields: Columns from the current dataset to match against.
key: Column in the second dataset to match with.
values: Columns you want to bring in from the second dataset. If omitted, the full matching row is returned as an object.
as: Names of the new columns being added. Required if fields has multiple values or values is omitted.

{
  "data": [
    {
      "name": "secondary",
      "values": [
        {"fullName": "Alice", "age": 19},
        {"fullName": "Bob", "age": 23},
        {"fullName": "Evans", "age": 39}
      ]
    },
    {
      "name": "primary",
      "values": [
        {"firstName": "Alice", "lastName": "Smith"},
        {"firstName": "Bob", "lastName": "Jones"},
        {"firstName": "Carol", "lastName": "Evans"}
      ],
      "transform": [
        {
          "type": "lookup",
          "from": "secondary",
          "key": "fullName",
          "fields": ["firstName", "lastName"],
          "values": ["age"],
          "as": ["age_1", "age_2"]
        }
      ]
    }
  ]
}

This code looks up the firstName from the primary table in the fullName column of the secondary table and returns the age column as age_1. Then it looks up the lastNames from the primary table in the fullName column of the secondary table and returns the age column as age_2.

⚠️ The dataset you’re looking from must be defined before the one doing the lookup, otherwise it won’t work.

5. Sorting

Sorting a dataset is another essential transformation. In Vega, we use the collect transform for this. It supports sorting by one or more columns in ascending or descending order.

type: collect
sort: An object with two keys:
- field: An array of column names to sort by.
- order: An array specifying the sort order for each field—either ascending or descending.

{
  "data": [
    {
      "name": "exampleTable",
      "values": [
        {"fullName": "Alice", "age": 19},
        {"fullName": "Bob", "age": 23},
        {"fullName": "Evans", "age": 39}
      ],
      "transform": [
        {
          "type": "collect",
          "sort": {
            "field": ["age"],
            "order": ["descending"]
          }
        }
      ]
    }
  ]
}

These are some of the most useful transformations you’ll find yourself using regularly. Vega supports many others like flatten, window, stack, fold, and so on. You can explore the full list here. And if there’s a specific one you’d like explained with examples, just let me know—I’d be happy to help!

Conclusion

Once your data is shaped and ready, it’s time to move on to the next step—scaling. In the next post, we’ll explore how to tell Vega where your data should go and how it should be mapped visually. Stay tuned!