Skip to content
PBIXray
Go back

What Is a PBIX File?

A .pbix file is the native file format used by Microsoft Power BI Desktop. At first glance it looks like a binary blob, but it is really a package that combines report definition, queries, semantic metadata, and imported table data into a single deliverable.

Understanding that package boundary is useful. Understanding what sits behind it is much more useful. The hard part of PBIX is not that it is zipped. The hard part is that one of the ZIP members, DataModel, contains an embedded Analysis Services Tabular database with VertiPaq storage structures underneath it.

This article is the hub for the launch batch on pbixray.com. It sets up the terminology and the map. The rest of the articles drill into the parts of the format that matter when you are building tooling rather than authoring reports.

The open source Python library pbixray was built specifically to make PBIX parsing accessible without requiring Power BI Desktop or a live Analysis Services connection.

Where This Fits

If you want the short version, stay here. If you want the package boundary next, continue with Inside the PBIX ZIP Archive. If you care most about the storage engine, jump to The DataModel: Power BI’s Embedded Analysis Services Engine.

A PBIX File Is a Package, Not a Monolith

At the outer layer, a PBIX file behaves like a ZIP archive with a recognizable set of top-level members:

DataModel
Mashup
Report/Layout
Report/StaticResources
SecurityBindings
Connections
[Content_Types].xml
Version

Those entries already tell you a lot about the product model:

For developers, DataModel is where most of the real complexity lives.

Why DataModel Is the Center of Gravity

In the PBIX files I work with, the storage path looks like this:

  1. the outer PBIX file is a ZIP archive
  2. the DataModel member is stored as a raw ZIP entry
  3. that entry contains an XPress9-compressed Analysis Services backup stream
  4. inside the backup is a VertiPaq filesystem layout
  5. individual columns are reconstructed from metadata, dictionaries, hash indexes, and compressed segment payloads

That is why “just unzip the PBIX” is only the beginning of the story. It gets you to the doorstep, not into the data.

The articles in this batch build directly on the research trail I described in Lessons Learned from Unpacking VertiPaq: A Developer’s Journey, but with the emphasis shifted from the story of discovery to the structures that are now documented in code and notes.

The Seven Articles in This Launch

This batch is intentionally storage-focused:

Why Reverse Engineer PBIX at All?

There are at least four good reasons:

That last reason has been a recurring theme in my work for a while. Years ago I wrote about Power BI limits from the outside in, asking how much data could fit in Power BI Desktop. These articles approach the same world from the inside out.

From Research Notes to Working Code

The articles here are grounded in three complementary sources:

That combination matters because no single source is sufficient on its own. Specs rarely tell the whole implementation story. Reverse-engineering notes need validation. Library code needs a conceptual model to remain maintainable. When all three line up, the format becomes much easier to reason about.


Share this post on:

Previous Post
Inside the PBIX ZIP Archive