pbixray Python Library

pbixray is an open source Python library for parsing Power BI .pbix files directly from Python, without Power BI Desktop or a live Analysis Services connection.

It is designed for developers who need to inspect semantic models, extract metadata, read Power Query logic, and work with the internals of PBIX files programmatically. The same API also supports Excel .xlsx files that contain embedded PowerPivot models.

pbixray is read-only: it returns pandas DataFrames and requires no network access and no Power BI or Excel install. File type (PBIX vs XLSX) is auto-detected from the file contents — the same API works either way.

Install

pip install pbixray

Quick Start

from pbixray import PBIXRay

model = PBIXRay("path/to/your_report.pbix")

print(model.tables)
print(model.metadata)
print(model.power_query)
print(model.dax_measures)
print(model.relationships)

Supported Inputs

Power BI .pbix files
Excel .xlsx files with embedded PowerPivot models

Large Models (on-disk loading)

By default the entire decompressed data model is held in memory. For models whose uncompressed size approaches or exceeds available RAM, pass on_disk=True: the decompressed data is streamed to a temporary file and memory-mapped, so only the pages a requested table actually touches are faulted in. Use temp_dir to control where the spill file is created (it defaults to the system temp directory).

# Spill to disk + mmap instead of holding everything in RAM.
with PBIXRay("path/to/large.pbix", on_disk=True, temp_dir="/fast/scratch") as model:
    df = model.get_table("Sales")
# leaving the `with` block releases the mapping and removes the temp file

PBIXRay is also a context manager. Calling model.close() (or exiting the with block) deterministically releases the memory map and the metadata connection. When on_disk=False (the default) behavior is unchanged. Metadata (DAX, tmschema_*, etc.) is loaded lazily on first access, so simply opening a file is cheap.

Core Properties

These properties expose the most useful parts of a model as Python values or pandas DataFrames.

Property	Returns
`model.tables`	List of table names in the model
`model.metadata`	Metadata about the Power BI configuration used to create the model
`model.power_query`	DataFrame of Power Query / M expressions with `TableName` and `Expression`
`model.m_parameters`	DataFrame of M parameters with `ParameterName`, `Description`, `Expression`, and `ModifiedTime`
`model.size`	Model size in bytes (`int`)
`model.dax_tables`	DataFrame of calculated tables with `TableName` and `Expression`
`model.dax_measures`	DataFrame of measures with `TableName`, `Name`, `Expression`, `DisplayFolder`, and `Description`
`model.dax_columns`	DataFrame of calculated columns with `TableName`, `ColumnName`, and `Expression`
`model.schema`	DataFrame of schema info with `TableName`, `ColumnName`, and `PandasDataType`
`model.relationships`	DataFrame of relationships with `FromTableName`, `FromColumnName`, `ToTableName`, `ToColumnName`, `IsActive`, `Cardinality`, `CrossFilteringBehavior`, `FromKeyCount`, `ToKeyCount`, and `RelyOnReferentialIntegrity`
`model.rls`	DataFrame of row-level security with `TableName`, `RoleName`, `RoleDescription`, `FilterExpression`, `State`, and `MetadataPermission`
`model.statistics`	DataFrame of column statistics with `TableName`, `ColumnName`, `Cardinality`, `Dictionary`, `HashIndex`, and `DataSize`

Common Examples

List tables

print(model.tables)

Read Power Query / M code

power_query = model.power_query
print(power_query[["TableName", "Expression"]])

Inspect measures

measures = model.dax_measures
print(measures[["TableName", "Name", "Expression"]])

Inspect calculated columns

columns = model.dax_columns
print(columns[["TableName", "ColumnName", "Expression"]])

Inspect relationships

relationships = model.relationships
print(
    relationships[
        [
            "FromTableName",
            "FromColumnName",
            "ToTableName",
            "ToColumnName",
            "Cardinality",
            "IsActive",
        ]
    ]
)

Inspect row-level security

rls = model.rls
print(rls[["RoleName", "TableName", "FilterExpression"]])

Read a table’s contents

sales = model.get_table("Sales")
print(sales.head())

To decode only a subset of columns from a wide table (decoding the others is skipped), pass columns:

sales = model.get_table("Sales", columns=["ProductKey", "Sales"])

Dictionary decode runs on a native Huffman kernel (xmhuffman) and fans out across cores automatically for large dictionaries.

Data Model Details

pbixray can be used both as a quick inspection tool and as a lower-level metadata extraction library.

Use it to:

enumerate tables and columns in a semantic model
inspect DAX calculated tables, measures, and calculated columns
extract Power Query and M parameter definitions
audit relationships, referential settings, and cross-filtering behavior
review row-level security rules
measure model size and storage statistics

Tabular Model Schema Endpoints

pbixray also exposes direct equivalents of all 38 Analysis Services $System.TMSCHEMA_* DMVs by reading the embedded SQLite metadata database inside the PBIX. These endpoints are PBIX-only — on XLSX files they return empty DataFrames.

Property	DMV equivalent
`model.tmschema_model`	`TMSCHEMA_MODEL`
`model.tmschema_tables`	`TMSCHEMA_TABLES`
`model.tmschema_columns`	`TMSCHEMA_COLUMNS`
`model.tmschema_partitions`	`TMSCHEMA_PARTITIONS`
`model.tmschema_hierarchies`	`TMSCHEMA_HIERARCHIES`
`model.tmschema_levels`	`TMSCHEMA_LEVELS`
`model.tmschema_datasources`	`TMSCHEMA_DATASOURCES`
`model.tmschema_perspectives`	`TMSCHEMA_PERSPECTIVES`
`model.tmschema_perspective_tables`	`TMSCHEMA_PERSPECTIVE_TABLES`
`model.tmschema_perspective_columns`	`TMSCHEMA_PERSPECTIVE_COLUMNS`
`model.tmschema_perspective_hierarchies`	`TMSCHEMA_PERSPECTIVE_HIERARCHIES`
`model.tmschema_perspective_measures`	`TMSCHEMA_PERSPECTIVE_MEASURES`
`model.tmschema_kpis`	`TMSCHEMA_KPIS`
`model.tmschema_annotations`	`TMSCHEMA_ANNOTATIONS`
`model.tmschema_extended_properties`	`TMSCHEMA_EXTENDED_PROPERTIES`
`model.tmschema_cultures`	`TMSCHEMA_CULTURES`
`model.tmschema_translations`	`TMSCHEMA_OBJECT_TRANSLATIONS`
`model.tmschema_linguistic_metadata`	`TMSCHEMA_LINGUISTIC_METADATA`
`model.tmschema_query_groups`	`TMSCHEMA_QUERY_GROUPS`
`model.tmschema_calculation_groups`	`TMSCHEMA_CALCULATION_GROUPS`
`model.tmschema_calculation_items`	`TMSCHEMA_CALCULATION_ITEMS`
`model.tmschema_calculation_expressions`	`TMSCHEMA_CALCULATION_EXPRESSIONS`
`model.tmschema_variations`	`TMSCHEMA_VARIATIONS`
`model.tmschema_attribute_hierarchies`	`TMSCHEMA_ATTRIBUTE_HIERARCHIES`
`model.tmschema_sets`	`TMSCHEMA_SETS`
`model.tmschema_refresh_policies`	`TMSCHEMA_REFRESH_POLICIES`
`model.tmschema_detail_rows_definitions`	`TMSCHEMA_DETAIL_ROWS_DEFINITIONS`
`model.tmschema_format_string_definitions`	`TMSCHEMA_FORMAT_STRING_DEFINITIONS`
`model.tmschema_functions`	`TMSCHEMA_FUNCTIONS`
`model.tmschema_calendars`	`TMSCHEMA_CALENDARS`
`model.tmschema_calendar_column_groups`	`TMSCHEMA_CALENDAR_COLUMN_GROUPS`
`model.tmschema_calendar_column_refs`	`TMSCHEMA_CALENDAR_COLUMN_REFERENCES`
`model.tmschema_alternate_of`	`TMSCHEMA_ALTERNATE_OF`
`model.tmschema_related_column_details`	`TMSCHEMA_RELATED_COLUMN_DETAILS`
`model.tmschema_group_by_columns`	`TMSCHEMA_GROUP_BY_COLUMNS`
`model.tmschema_binding_info`	`TMSCHEMA_BINDING_INFO`
`model.tmschema_analytics_ai_metadata`	`TMSCHEMA_ANALYTICS_AI_METADATA`
`model.tmschema_data_coverage_definitions`	`TMSCHEMA_DATA_COVERAGE_DEFINITIONS`
`model.tmschema_role_memberships`	`TMSCHEMA_ROLE_MEMBERSHIPS`

TMSCHEMA examples

# List all columns with table names and hidden flags
print(model.tmschema_columns[["TableName", "Name", "DataType", "IsHidden"]])

# Inspect incremental refresh policies
print(model.tmschema_refresh_policies)

# Inspect security role memberships
print(model.tmschema_role_memberships)

PBIX vs XLSX Capability Matrix

Both PBIX and XLSX (PowerPivot) files use the same API, but coverage differs. “Empty” below means a zero-row DataFrame — never None and never an exception.

Endpoint	PBIX	XLSX
`tables`, `schema`, `statistics`, `size`	Populated	Populated
`get_table(name)`	Real data	Real data (no `RowNumber`)
`relationships`	Populated	Populated
`dax_tables`	Populated	Populated (from partitions)
`dax_measures`	Populated	Populated (measure groups)
`dax_columns`	Populated	Empty
`power_query`, `m_parameters`	Populated	Empty
`metadata`, `rls`	Populated	Empty
`tmschema_*` (all 38)	Populated	Empty

Notes and Gotchas

RowNumber is dropped from get_table() output — it is a VertiPaq internal storage position, not user data.
Row order is storage order, not insertion order. VertiPaq sorts rows by lowest-cardinality columns first for RLE compression. Two calls are stable, but order will differ from a CSV exported from Excel. For row equivalence, compare as multisets: df.sort_values(list(df.columns)).reset_index(drop=True).
Unknown table name → empty DataFrame, not an exception. Validate against model.tables if you need to detect bad names.
model.tables returns a numpy array, not a Python list. It iterates fine, but model.tables == [...] won’t work as a plain equality check.
XLSX calculated columns can have a display name different from the internal storage name (e.g. Category ↔ CalculatedColumn1). pbixray resolves these so schema.ColumnName and get_table() use the display name.
No streaming. get_table() materializes the whole table into a DataFrame, so large fact tables can be memory-heavy. Use on_disk=True and the columns argument to keep memory in check.

Out of Scope

pbixray is a read-only data-model extractor. It does not:

write, modify, or repack PBIX/XLSX files
evaluate DAX — measure and calculated-column expressions are returned as source text only
run a query engine (no DAX/MDX/M evaluation against the model)
connect to Power BI Service, Analysis Services, datasets, gateways, or workspaces
refresh the model
parse the report layer (visuals, pages, bookmarks, themes) — data model only
support .pbit, .pbids, or .pbip formats — only .pbix and .xlsx with an embedded PowerPivot model

Requirements

Python 3.8+ (tested through 3.13)
Runtime dependencies: xpress8, xpress9, xmhuffman, kaitaistruct, numpy, pandas, apsw
No Power BI Desktop required
Works on macOS, Linux, and Windows