pbixray Python Library
pbixray is an open source Python library for parsing Power BI .pbix files directly from Python, without Power BI Desktop or a live Analysis Services connection.
It is designed for developers who need to inspect semantic models, extract metadata, read Power Query logic, and work with the internals of PBIX files programmatically. The same API also supports Excel .xlsx files that contain embedded PowerPivot models.
pbixray is read-only: it returns pandas DataFrames and requires no network access and no Power BI or Excel install. File type (PBIX vs XLSX) is auto-detected from the file contents — the same API works either way.
Install
pip install pbixray
Quick Start
from pbixray import PBIXRay
model = PBIXRay("path/to/your_report.pbix")
print(model.tables)
print(model.metadata)
print(model.power_query)
print(model.dax_measures)
print(model.relationships)
Supported Inputs
- Power BI
.pbixfiles - Excel
.xlsxfiles with embedded PowerPivot models
Large Models (on-disk loading)
By default the entire decompressed data model is held in memory. For models whose uncompressed size approaches or exceeds available RAM, pass on_disk=True: the decompressed data is streamed to a temporary file and memory-mapped, so only the pages a requested table actually touches are faulted in. Use temp_dir to control where the spill file is created (it defaults to the system temp directory).
# Spill to disk + mmap instead of holding everything in RAM.
with PBIXRay("path/to/large.pbix", on_disk=True, temp_dir="/fast/scratch") as model:
df = model.get_table("Sales")
# leaving the `with` block releases the mapping and removes the temp file
PBIXRay is also a context manager. Calling model.close() (or exiting the with block) deterministically releases the memory map and the metadata connection. When on_disk=False (the default) behavior is unchanged. Metadata (DAX, tmschema_*, etc.) is loaded lazily on first access, so simply opening a file is cheap.
Core Properties
These properties expose the most useful parts of a model as Python values or pandas DataFrames.
| Property | Returns |
|---|---|
model.tables | List of table names in the model |
model.metadata | Metadata about the Power BI configuration used to create the model |
model.power_query | DataFrame of Power Query / M expressions with TableName and Expression |
model.m_parameters | DataFrame of M parameters with ParameterName, Description, Expression, and ModifiedTime |
model.size | Model size in bytes (int) |
model.dax_tables | DataFrame of calculated tables with TableName and Expression |
model.dax_measures | DataFrame of measures with TableName, Name, Expression, DisplayFolder, and Description |
model.dax_columns | DataFrame of calculated columns with TableName, ColumnName, and Expression |
model.schema | DataFrame of schema info with TableName, ColumnName, and PandasDataType |
model.relationships | DataFrame of relationships with FromTableName, FromColumnName, ToTableName, ToColumnName, IsActive, Cardinality, CrossFilteringBehavior, FromKeyCount, ToKeyCount, and RelyOnReferentialIntegrity |
model.rls | DataFrame of row-level security with TableName, RoleName, RoleDescription, FilterExpression, State, and MetadataPermission |
model.statistics | DataFrame of column statistics with TableName, ColumnName, Cardinality, Dictionary, HashIndex, and DataSize |
Common Examples
List tables
print(model.tables)
Read Power Query / M code
power_query = model.power_query
print(power_query[["TableName", "Expression"]])
Inspect measures
measures = model.dax_measures
print(measures[["TableName", "Name", "Expression"]])
Inspect calculated columns
columns = model.dax_columns
print(columns[["TableName", "ColumnName", "Expression"]])
Inspect relationships
relationships = model.relationships
print(
relationships[
[
"FromTableName",
"FromColumnName",
"ToTableName",
"ToColumnName",
"Cardinality",
"IsActive",
]
]
)
Inspect row-level security
rls = model.rls
print(rls[["RoleName", "TableName", "FilterExpression"]])
Read a table’s contents
sales = model.get_table("Sales")
print(sales.head())
To decode only a subset of columns from a wide table (decoding the others is skipped), pass columns:
sales = model.get_table("Sales", columns=["ProductKey", "Sales"])
Dictionary decode runs on a native Huffman kernel (xmhuffman) and fans out across cores automatically for large dictionaries.
Data Model Details
pbixray can be used both as a quick inspection tool and as a lower-level metadata extraction library.
Use it to:
- enumerate tables and columns in a semantic model
- inspect DAX calculated tables, measures, and calculated columns
- extract Power Query and M parameter definitions
- audit relationships, referential settings, and cross-filtering behavior
- review row-level security rules
- measure model size and storage statistics
Tabular Model Schema Endpoints
pbixray also exposes direct equivalents of all 38 Analysis Services $System.TMSCHEMA_* DMVs by reading the embedded SQLite metadata database inside the PBIX. These endpoints are PBIX-only — on XLSX files they return empty DataFrames.
| Property | DMV equivalent |
|---|---|
model.tmschema_model | TMSCHEMA_MODEL |
model.tmschema_tables | TMSCHEMA_TABLES |
model.tmschema_columns | TMSCHEMA_COLUMNS |
model.tmschema_partitions | TMSCHEMA_PARTITIONS |
model.tmschema_hierarchies | TMSCHEMA_HIERARCHIES |
model.tmschema_levels | TMSCHEMA_LEVELS |
model.tmschema_datasources | TMSCHEMA_DATASOURCES |
model.tmschema_perspectives | TMSCHEMA_PERSPECTIVES |
model.tmschema_perspective_tables | TMSCHEMA_PERSPECTIVE_TABLES |
model.tmschema_perspective_columns | TMSCHEMA_PERSPECTIVE_COLUMNS |
model.tmschema_perspective_hierarchies | TMSCHEMA_PERSPECTIVE_HIERARCHIES |
model.tmschema_perspective_measures | TMSCHEMA_PERSPECTIVE_MEASURES |
model.tmschema_kpis | TMSCHEMA_KPIS |
model.tmschema_annotations | TMSCHEMA_ANNOTATIONS |
model.tmschema_extended_properties | TMSCHEMA_EXTENDED_PROPERTIES |
model.tmschema_cultures | TMSCHEMA_CULTURES |
model.tmschema_translations | TMSCHEMA_OBJECT_TRANSLATIONS |
model.tmschema_linguistic_metadata | TMSCHEMA_LINGUISTIC_METADATA |
model.tmschema_query_groups | TMSCHEMA_QUERY_GROUPS |
model.tmschema_calculation_groups | TMSCHEMA_CALCULATION_GROUPS |
model.tmschema_calculation_items | TMSCHEMA_CALCULATION_ITEMS |
model.tmschema_calculation_expressions | TMSCHEMA_CALCULATION_EXPRESSIONS |
model.tmschema_variations | TMSCHEMA_VARIATIONS |
model.tmschema_attribute_hierarchies | TMSCHEMA_ATTRIBUTE_HIERARCHIES |
model.tmschema_sets | TMSCHEMA_SETS |
model.tmschema_refresh_policies | TMSCHEMA_REFRESH_POLICIES |
model.tmschema_detail_rows_definitions | TMSCHEMA_DETAIL_ROWS_DEFINITIONS |
model.tmschema_format_string_definitions | TMSCHEMA_FORMAT_STRING_DEFINITIONS |
model.tmschema_functions | TMSCHEMA_FUNCTIONS |
model.tmschema_calendars | TMSCHEMA_CALENDARS |
model.tmschema_calendar_column_groups | TMSCHEMA_CALENDAR_COLUMN_GROUPS |
model.tmschema_calendar_column_refs | TMSCHEMA_CALENDAR_COLUMN_REFERENCES |
model.tmschema_alternate_of | TMSCHEMA_ALTERNATE_OF |
model.tmschema_related_column_details | TMSCHEMA_RELATED_COLUMN_DETAILS |
model.tmschema_group_by_columns | TMSCHEMA_GROUP_BY_COLUMNS |
model.tmschema_binding_info | TMSCHEMA_BINDING_INFO |
model.tmschema_analytics_ai_metadata | TMSCHEMA_ANALYTICS_AI_METADATA |
model.tmschema_data_coverage_definitions | TMSCHEMA_DATA_COVERAGE_DEFINITIONS |
model.tmschema_role_memberships | TMSCHEMA_ROLE_MEMBERSHIPS |
TMSCHEMA examples
# List all columns with table names and hidden flags
print(model.tmschema_columns[["TableName", "Name", "DataType", "IsHidden"]])
# Inspect incremental refresh policies
print(model.tmschema_refresh_policies)
# Inspect security role memberships
print(model.tmschema_role_memberships)
PBIX vs XLSX Capability Matrix
Both PBIX and XLSX (PowerPivot) files use the same API, but coverage differs. “Empty” below means a zero-row DataFrame — never None and never an exception.
| Endpoint | PBIX | XLSX |
|---|---|---|
tables, schema, statistics, size | Populated | Populated |
get_table(name) | Real data | Real data (no RowNumber) |
relationships | Populated | Populated |
dax_tables | Populated | Populated (from partitions) |
dax_measures | Populated | Populated (measure groups) |
dax_columns | Populated | Empty |
power_query, m_parameters | Populated | Empty |
metadata, rls | Populated | Empty |
tmschema_* (all 38) | Populated | Empty |
Notes and Gotchas
RowNumberis dropped fromget_table()output — it is a VertiPaq internal storage position, not user data.- Row order is storage order, not insertion order. VertiPaq sorts rows by lowest-cardinality columns first for RLE compression. Two calls are stable, but order will differ from a CSV exported from Excel. For row equivalence, compare as multisets:
df.sort_values(list(df.columns)).reset_index(drop=True). - Unknown table name → empty DataFrame, not an exception. Validate against
model.tablesif you need to detect bad names. model.tablesreturns a numpy array, not a Python list. It iterates fine, butmodel.tables == [...]won’t work as a plain equality check.- XLSX calculated columns can have a display name different from the internal storage name (e.g.
Category↔CalculatedColumn1).pbixrayresolves these soschema.ColumnNameandget_table()use the display name. - No streaming.
get_table()materializes the whole table into a DataFrame, so large fact tables can be memory-heavy. Useon_disk=Trueand thecolumnsargument to keep memory in check.
Out of Scope
pbixray is a read-only data-model extractor. It does not:
- write, modify, or repack PBIX/XLSX files
- evaluate DAX — measure and calculated-column expressions are returned as source text only
- run a query engine (no DAX/MDX/M evaluation against the model)
- connect to Power BI Service, Analysis Services, datasets, gateways, or workspaces
- refresh the model
- parse the report layer (visuals, pages, bookmarks, themes) — data model only
- support
.pbit,.pbids, or.pbipformats — only.pbixand.xlsxwith an embedded PowerPivot model
Requirements
- Python 3.8+ (tested through 3.13)
- Runtime dependencies:
xpress8,xpress9,xmhuffman,kaitaistruct,numpy,pandas,apsw - No Power BI Desktop required
- Works on macOS, Linux, and Windows