Skip to content
PBIXray
Go back

Parsing PBIX Files with Python (pbixray)

pbixray is an open source Python library that parses PBIX files directly, without requiring Power BI Desktop or a live Analysis Services connection. You give it a path to a .pbix file and it gives you DataFrames and decoded table data back out. It also accepts .xlsx files that contain a Power Pivot model (stored internally as xl/model/item.data), since that uses the same XPress9-compressed Analysis Services backup format under a different wrapper.

This article is the practical counterpart to the storage deep dives on this site. If you want the internals first, read What Is a PBIX File?, The DataModel article, and Inside metadata.sqlitedb. If your goal is “show me the API,” start here.

Install the Library

The library is published on PyPI:

pip install pbixray

The public surface is intentionally small.

Open a PBIX File

Create a PBIXRay object with the path to a PBIX file:

from pbixray import PBIXRay

model = PBIXRay("Sales & Returns Sample v201912.pbix")

Under the hood, that object:

The point of the library is that you do not have to think about those steps unless you want to.

Start with Metadata

The fastest way to understand a PBIX file is to inspect metadata first:

print(model.tables)
print(model.schema)
print(model.dax_measures)
print(model.relationships)

That already covers many useful automation scenarios:

If you want DMV-like detail rather than convenience views, pbixray also exposes TMSCHEMA equivalents directly from the embedded SQLite metadata:

print(model.tmschema_tables[["Name", "Description"]])
print(model.tmschema_columns[["TableName", "Name", "DataType", "IsHidden"]])
print(model.tmschema_partitions)

That mapping is described in more detail in Inside metadata.sqlitedb: Tables, Columns, Measures & Relationships.

Extract Imported Table Data

To reconstruct the contents of an imported table, call get_table:

sales = model.get_table("Sales")
print(sales.head())

That method is where the lower-level format work pays off. Internally, pbixray:

  1. locates the storage metadata for each column
  2. reads the companion .idfmeta file
  3. decodes the .idf payload using Hybrid RLE and bit packing rules
  4. maps internal IDs through a dictionary or value-encoding path
  5. casts the results to a sensible pandas dtype

Those lower layers are covered in VertiPaq Dictionaries and Hash Indexes and Reconstructing Column Data from .idf and .idfmeta.

Useful High-Level Properties

Some of the most practical entry points in the current API are:

Here is a compact example:

from pbixray import PBIXRay

model = PBIXRay("report.pbix")

print("Tables:", list(model.tables))
print(model.statistics.sort_values("DataSize", ascending=False).head(10))
print(model.dax_measures[["TableName", "Name", "Expression"]].head())
print(model.rls)

The statistics frame is especially helpful when you want to understand model shape quickly because it includes cardinality and per-column file-size information derived from the storage metadata.

Why pbixray Is Useful

The biggest advantage is that it reads the file directly. That makes it useful in environments where Power BI Desktop is inconvenient or unavailable:

That direct-file approach is also why the format details documented in this article set matter so much. A library like pbixray has to answer concrete questions about the file layout rather than delegating them to a running engine.

Where to Go Next

If you want more implementation detail, the next stops are:

If you just want the code, head to the pbixray repository.


Share this post on:

Previous Post
VertiPaq Dictionaries and Hash Indexes
Next Post
Inside metadata.sqlitedb: Tables, Columns, Measures & Relationships