Parsing PBIX Files with Python (pbixray)

pbixray is an open source Python library that parses PBIX files directly, without requiring Power BI Desktop or a live Analysis Services connection. You give it a path to a .pbix file and it gives you DataFrames and decoded table data back out. It also accepts .xlsx files that contain a Power Pivot model (stored internally as xl/model/item.data), since that uses the same XPress9-compressed Analysis Services backup format under a different wrapper.

This article is the practical counterpart to the storage deep dives on this site. If you want the internals first, read What Is a PBIX File?, The DataModel article, and Inside metadata.sqlitedb. If your goal is “show me the API,” start here.

Install the Library

The library is published on PyPI:

pip install pbixray

The public surface is intentionally small.

Open a PBIX File

Create a PBIXRay object with the path to a PBIX file:

from pbixray import PBIXRay

model = PBIXRay("Sales & Returns Sample v201912.pbix")

Under the hood, that object:

slices and unpacks the embedded DataModel
reads metadata.sqlitedb
uses the metadata to decode .dictionary, .hidx, .idf, and .idfmeta files

The point of the library is that you do not have to think about those steps unless you want to.

Start with Metadata

The fastest way to understand a PBIX file is to inspect metadata first:

print(model.tables)
print(model.schema)
print(model.dax_measures)
print(model.relationships)

That already covers many useful automation scenarios:

inventorying tables and columns
reviewing DAX measures
checking relationship topology
estimating model size and storage-heavy columns

If you want DMV-like detail rather than convenience views, pbixray also exposes TMSCHEMA equivalents directly from the embedded SQLite metadata:

print(model.tmschema_tables[["Name", "Description"]])
print(model.tmschema_columns[["TableName", "Name", "DataType", "IsHidden"]])
print(model.tmschema_partitions)

That mapping is described in more detail in Inside metadata.sqlitedb: Tables, Columns, Measures & Relationships.

Extract Imported Table Data

To reconstruct the contents of an imported table, call get_table:

sales = model.get_table("Sales")
print(sales.head())

That method is where the lower-level format work pays off. Internally, pbixray:

locates the storage metadata for each column
reads the companion .idfmeta file
decodes the .idf payload using Hybrid RLE and bit packing rules
maps internal IDs through a dictionary or value-encoding path
casts the results to a sensible pandas dtype

Those lower layers are covered in VertiPaq Dictionaries and Hash Indexes and Reconstructing Column Data from .idf and .idfmeta.

Useful High-Level Properties

Some of the most practical entry points in the current API are:

model.tables
model.schema
model.metadata
model.power_query
model.m_parameters
model.dax_tables
model.dax_measures
model.dax_columns
model.relationships
model.rls
model.statistics

Here is a compact example:

from pbixray import PBIXRay

model = PBIXRay("report.pbix")

print("Tables:", list(model.tables))
print(model.statistics.sort_values("DataSize", ascending=False).head(10))
print(model.dax_measures[["TableName", "Name", "Expression"]].head())
print(model.rls)

The statistics frame is especially helpful when you want to understand model shape quickly because it includes cardinality and per-column file-size information derived from the storage metadata.

Why `pbixray` Is Useful

The biggest advantage is that it reads the file directly. That makes it useful in environments where Power BI Desktop is inconvenient or unavailable:

Linux or macOS analysis environments
CI pipelines
bulk audits of PBIX files
metadata cataloging jobs
reverse-engineering and validation work

That direct-file approach is also why the format details documented in this article set matter so much. A library like pbixray has to answer concrete questions about the file layout rather than delegating them to a running engine.

Where to Go Next

If you want more implementation detail, the next stops are:

If you just want the code, head to the pbixray repository.