What Is a PBIX File?

A .pbix file is the native file format used by Microsoft Power BI Desktop. At first glance it looks like a binary blob, but it is really a package that combines report definition, queries, semantic metadata, and imported table data into a single deliverable.

Understanding that package boundary is useful. Understanding what sits behind it is much more useful. The hard part of PBIX is not that it is zipped. The hard part is that one of the ZIP members, DataModel, contains an embedded Analysis Services Tabular database with VertiPaq storage structures underneath it.

This article is the hub for the launch batch on pbixray.com. It sets up the terminology and the map. The rest of the articles drill into the parts of the format that matter when you are building tooling rather than authoring reports.

The open source Python library pbixray was built specifically to make PBIX parsing accessible without requiring Power BI Desktop or a live Analysis Services connection.

Where This Fits

If you want the short version, stay here. If you want the package boundary next, continue with Inside the PBIX ZIP Archive. If you care most about the storage engine, jump to The DataModel: Power BI’s Embedded Analysis Services Engine.

A PBIX File Is a Package, Not a Monolith

At the outer layer, a PBIX file behaves like a ZIP archive with a recognizable set of top-level members:

DataModel
Mashup
Report/Layout
Report/StaticResources
SecurityBindings
Connections
[Content_Types].xml
Version

Those entries already tell you a lot about the product model:

Report/Layout belongs to the report canvas and visuals.
Mashup belongs to Power Query.
Connections belongs to external data sources and model bindings.
DataModel belongs to the embedded Analysis Services engine.

For developers, DataModel is where most of the real complexity lives.

Why `DataModel` Is the Center of Gravity

In the PBIX files I work with, the storage path looks like this:

the outer PBIX file is a ZIP archive
the DataModel member is stored as a raw ZIP entry
that entry contains an XPress9-compressed Analysis Services backup stream
inside the backup is a VertiPaq filesystem layout
individual columns are reconstructed from metadata, dictionaries, hash indexes, and compressed segment payloads

That is why “just unzip the PBIX” is only the beginning of the story. It gets you to the doorstep, not into the data.

The articles in this batch build directly on the research trail I described in Lessons Learned from Unpacking VertiPaq: A Developer’s Journey, but with the emphasis shifted from the story of discovery to the structures that are now documented in code and notes.

The Seven Articles in This Launch

This batch is intentionally storage-focused:

What Is a PBIX File? explains the package and the terminology.
Inside the PBIX ZIP Archive maps the container layer and the route to DataModel.
The DataModel: Power BI’s Embedded Analysis Services Engine follows the XPress9 and ABF layers into the VertiPaq workspace.
Inside metadata.sqlitedb: Tables, Columns, Measures & Relationships shows how the semantic model is exposed through the embedded SQLite database.
VertiPaq Dictionaries and Hash Indexes explains how encoded IDs become readable values.
Reconstructing Column Data from .idf and .idfmeta covers the compressed column segments themselves.
Parsing PBIX Files with Python (pbixray) ties the pieces together into a practical extraction workflow.

Why Reverse Engineer PBIX at All?

There are at least four good reasons:

you want metadata extraction without launching Desktop
you want table reconstruction in automated workflows
you want to inspect model size, cardinality, and storage shape directly from files
you want a precise mental model of how VertiPaq persists imported data

That last reason has been a recurring theme in my work for a while. Years ago I wrote about Power BI limits from the outside in, asking how much data could fit in Power BI Desktop. These articles approach the same world from the inside out.

From Research Notes to Working Code

The articles here are grounded in three complementary sources:

published Microsoft specifications where they exist
reverse-engineered Kaitai Struct schemas and storage notes
the implementation work in pbixray

That combination matters because no single source is sufficient on its own. Specs rarely tell the whole implementation story. Reverse-engineering notes need validation. Library code needs a conceptual model to remain maintainable. When all three line up, the format becomes much easier to reason about.

Read Inside the PBIX ZIP Archive for the package layer.
Read The DataModel: Power BI’s Embedded Analysis Services Engine for the inner storage container.
Read Parsing PBIX Files with Python (pbixray) if your goal is extraction rather than format study.