pbixray is an open source Python library that parses PBIX files directly, without requiring Power BI Desktop or a live Analysis Services connection. You give it a path to a .pbix file and it gives you DataFrames and decoded table data back out. It also accepts .xlsx files that contain a Power Pivot model (stored internally as xl/model/item.data), since that uses the same XPress9-compressed Analysis Services backup format under a different wrapper.
This article is the practical counterpart to the storage deep dives on this site. If you want the internals first, read What Is a PBIX File?, The DataModel article, and Inside metadata.sqlitedb. If your goal is “show me the API,” start here.
Install the Library
The library is published on PyPI:
pip install pbixray
The public surface is intentionally small.
Open a PBIX File
Create a PBIXRay object with the path to a PBIX file:
from pbixray import PBIXRay
model = PBIXRay("Sales & Returns Sample v201912.pbix")
Under the hood, that object:
- slices and unpacks the embedded
DataModel - reads
metadata.sqlitedb - uses the metadata to decode
.dictionary,.hidx,.idf, and.idfmetafiles
The point of the library is that you do not have to think about those steps unless you want to.
Start with Metadata
The fastest way to understand a PBIX file is to inspect metadata first:
print(model.tables)
print(model.schema)
print(model.dax_measures)
print(model.relationships)
That already covers many useful automation scenarios:
- inventorying tables and columns
- reviewing DAX measures
- checking relationship topology
- estimating model size and storage-heavy columns
If you want DMV-like detail rather than convenience views, pbixray also exposes TMSCHEMA equivalents directly from the embedded SQLite metadata:
print(model.tmschema_tables[["Name", "Description"]])
print(model.tmschema_columns[["TableName", "Name", "DataType", "IsHidden"]])
print(model.tmschema_partitions)
That mapping is described in more detail in Inside metadata.sqlitedb: Tables, Columns, Measures & Relationships.
Extract Imported Table Data
To reconstruct the contents of an imported table, call get_table:
sales = model.get_table("Sales")
print(sales.head())
That method is where the lower-level format work pays off. Internally, pbixray:
- locates the storage metadata for each column
- reads the companion
.idfmetafile - decodes the
.idfpayload using Hybrid RLE and bit packing rules - maps internal IDs through a dictionary or value-encoding path
- casts the results to a sensible pandas dtype
Those lower layers are covered in VertiPaq Dictionaries and Hash Indexes and Reconstructing Column Data from .idf and .idfmeta.
Useful High-Level Properties
Some of the most practical entry points in the current API are:
model.tablesmodel.schemamodel.metadatamodel.power_querymodel.m_parametersmodel.dax_tablesmodel.dax_measuresmodel.dax_columnsmodel.relationshipsmodel.rlsmodel.statistics
Here is a compact example:
from pbixray import PBIXRay
model = PBIXRay("report.pbix")
print("Tables:", list(model.tables))
print(model.statistics.sort_values("DataSize", ascending=False).head(10))
print(model.dax_measures[["TableName", "Name", "Expression"]].head())
print(model.rls)
The statistics frame is especially helpful when you want to understand model shape quickly because it includes cardinality and per-column file-size information derived from the storage metadata.
Why pbixray Is Useful
The biggest advantage is that it reads the file directly. That makes it useful in environments where Power BI Desktop is inconvenient or unavailable:
- Linux or macOS analysis environments
- CI pipelines
- bulk audits of PBIX files
- metadata cataloging jobs
- reverse-engineering and validation work
That direct-file approach is also why the format details documented in this article set matter so much. A library like pbixray has to answer concrete questions about the file layout rather than delegating them to a running engine.
Where to Go Next
If you want more implementation detail, the next stops are:
- The DataModel: Power BI’s Embedded Analysis Services Engine
- Inside
metadata.sqlitedb: Tables, Columns, Measures & Relationships - Reconstructing Column Data from
.idfand.idfmeta
If you just want the code, head to the pbixray repository.