Skip to content
PBIXray

pbixray Python Library

pbixray is an open source Python library for parsing Power BI .pbix files directly from Python, without Power BI Desktop or a live Analysis Services connection.

It is designed for developers who need to inspect semantic models, extract metadata, read Power Query logic, and work with the internals of PBIX files programmatically. The same API also supports Excel .xlsx files that contain embedded PowerPivot models.

pbixray is read-only: it returns pandas DataFrames and requires no network access and no Power BI or Excel install. File type (PBIX vs XLSX) is auto-detected from the file contents — the same API works either way.

Install

pip install pbixray

Quick Start

from pbixray import PBIXRay

model = PBIXRay("path/to/your_report.pbix")

print(model.tables)
print(model.metadata)
print(model.power_query)
print(model.dax_measures)
print(model.relationships)

Supported Inputs

Large Models (on-disk loading)

By default the entire decompressed data model is held in memory. For models whose uncompressed size approaches or exceeds available RAM, pass on_disk=True: the decompressed data is streamed to a temporary file and memory-mapped, so only the pages a requested table actually touches are faulted in. Use temp_dir to control where the spill file is created (it defaults to the system temp directory).

# Spill to disk + mmap instead of holding everything in RAM.
with PBIXRay("path/to/large.pbix", on_disk=True, temp_dir="/fast/scratch") as model:
    df = model.get_table("Sales")
# leaving the `with` block releases the mapping and removes the temp file

PBIXRay is also a context manager. Calling model.close() (or exiting the with block) deterministically releases the memory map and the metadata connection. When on_disk=False (the default) behavior is unchanged. Metadata (DAX, tmschema_*, etc.) is loaded lazily on first access, so simply opening a file is cheap.

Core Properties

These properties expose the most useful parts of a model as Python values or pandas DataFrames.

PropertyReturns
model.tablesList of table names in the model
model.metadataMetadata about the Power BI configuration used to create the model
model.power_queryDataFrame of Power Query / M expressions with TableName and Expression
model.m_parametersDataFrame of M parameters with ParameterName, Description, Expression, and ModifiedTime
model.sizeModel size in bytes (int)
model.dax_tablesDataFrame of calculated tables with TableName and Expression
model.dax_measuresDataFrame of measures with TableName, Name, Expression, DisplayFolder, and Description
model.dax_columnsDataFrame of calculated columns with TableName, ColumnName, and Expression
model.schemaDataFrame of schema info with TableName, ColumnName, and PandasDataType
model.relationshipsDataFrame of relationships with FromTableName, FromColumnName, ToTableName, ToColumnName, IsActive, Cardinality, CrossFilteringBehavior, FromKeyCount, ToKeyCount, and RelyOnReferentialIntegrity
model.rlsDataFrame of row-level security with TableName, RoleName, RoleDescription, FilterExpression, State, and MetadataPermission
model.statisticsDataFrame of column statistics with TableName, ColumnName, Cardinality, Dictionary, HashIndex, and DataSize

Common Examples

List tables

print(model.tables)

Read Power Query / M code

power_query = model.power_query
print(power_query[["TableName", "Expression"]])

Inspect measures

measures = model.dax_measures
print(measures[["TableName", "Name", "Expression"]])

Inspect calculated columns

columns = model.dax_columns
print(columns[["TableName", "ColumnName", "Expression"]])

Inspect relationships

relationships = model.relationships
print(
    relationships[
        [
            "FromTableName",
            "FromColumnName",
            "ToTableName",
            "ToColumnName",
            "Cardinality",
            "IsActive",
        ]
    ]
)

Inspect row-level security

rls = model.rls
print(rls[["RoleName", "TableName", "FilterExpression"]])

Read a table’s contents

sales = model.get_table("Sales")
print(sales.head())

To decode only a subset of columns from a wide table (decoding the others is skipped), pass columns:

sales = model.get_table("Sales", columns=["ProductKey", "Sales"])

Dictionary decode runs on a native Huffman kernel (xmhuffman) and fans out across cores automatically for large dictionaries.

Data Model Details

pbixray can be used both as a quick inspection tool and as a lower-level metadata extraction library.

Use it to:

Tabular Model Schema Endpoints

pbixray also exposes direct equivalents of all 38 Analysis Services $System.TMSCHEMA_* DMVs by reading the embedded SQLite metadata database inside the PBIX. These endpoints are PBIX-only — on XLSX files they return empty DataFrames.

PropertyDMV equivalent
model.tmschema_modelTMSCHEMA_MODEL
model.tmschema_tablesTMSCHEMA_TABLES
model.tmschema_columnsTMSCHEMA_COLUMNS
model.tmschema_partitionsTMSCHEMA_PARTITIONS
model.tmschema_hierarchiesTMSCHEMA_HIERARCHIES
model.tmschema_levelsTMSCHEMA_LEVELS
model.tmschema_datasourcesTMSCHEMA_DATASOURCES
model.tmschema_perspectivesTMSCHEMA_PERSPECTIVES
model.tmschema_perspective_tablesTMSCHEMA_PERSPECTIVE_TABLES
model.tmschema_perspective_columnsTMSCHEMA_PERSPECTIVE_COLUMNS
model.tmschema_perspective_hierarchiesTMSCHEMA_PERSPECTIVE_HIERARCHIES
model.tmschema_perspective_measuresTMSCHEMA_PERSPECTIVE_MEASURES
model.tmschema_kpisTMSCHEMA_KPIS
model.tmschema_annotationsTMSCHEMA_ANNOTATIONS
model.tmschema_extended_propertiesTMSCHEMA_EXTENDED_PROPERTIES
model.tmschema_culturesTMSCHEMA_CULTURES
model.tmschema_translationsTMSCHEMA_OBJECT_TRANSLATIONS
model.tmschema_linguistic_metadataTMSCHEMA_LINGUISTIC_METADATA
model.tmschema_query_groupsTMSCHEMA_QUERY_GROUPS
model.tmschema_calculation_groupsTMSCHEMA_CALCULATION_GROUPS
model.tmschema_calculation_itemsTMSCHEMA_CALCULATION_ITEMS
model.tmschema_calculation_expressionsTMSCHEMA_CALCULATION_EXPRESSIONS
model.tmschema_variationsTMSCHEMA_VARIATIONS
model.tmschema_attribute_hierarchiesTMSCHEMA_ATTRIBUTE_HIERARCHIES
model.tmschema_setsTMSCHEMA_SETS
model.tmschema_refresh_policiesTMSCHEMA_REFRESH_POLICIES
model.tmschema_detail_rows_definitionsTMSCHEMA_DETAIL_ROWS_DEFINITIONS
model.tmschema_format_string_definitionsTMSCHEMA_FORMAT_STRING_DEFINITIONS
model.tmschema_functionsTMSCHEMA_FUNCTIONS
model.tmschema_calendarsTMSCHEMA_CALENDARS
model.tmschema_calendar_column_groupsTMSCHEMA_CALENDAR_COLUMN_GROUPS
model.tmschema_calendar_column_refsTMSCHEMA_CALENDAR_COLUMN_REFERENCES
model.tmschema_alternate_ofTMSCHEMA_ALTERNATE_OF
model.tmschema_related_column_detailsTMSCHEMA_RELATED_COLUMN_DETAILS
model.tmschema_group_by_columnsTMSCHEMA_GROUP_BY_COLUMNS
model.tmschema_binding_infoTMSCHEMA_BINDING_INFO
model.tmschema_analytics_ai_metadataTMSCHEMA_ANALYTICS_AI_METADATA
model.tmschema_data_coverage_definitionsTMSCHEMA_DATA_COVERAGE_DEFINITIONS
model.tmschema_role_membershipsTMSCHEMA_ROLE_MEMBERSHIPS

TMSCHEMA examples

# List all columns with table names and hidden flags
print(model.tmschema_columns[["TableName", "Name", "DataType", "IsHidden"]])

# Inspect incremental refresh policies
print(model.tmschema_refresh_policies)

# Inspect security role memberships
print(model.tmschema_role_memberships)

PBIX vs XLSX Capability Matrix

Both PBIX and XLSX (PowerPivot) files use the same API, but coverage differs. “Empty” below means a zero-row DataFrame — never None and never an exception.

EndpointPBIXXLSX
tables, schema, statistics, sizePopulatedPopulated
get_table(name)Real dataReal data (no RowNumber)
relationshipsPopulatedPopulated
dax_tablesPopulatedPopulated (from partitions)
dax_measuresPopulatedPopulated (measure groups)
dax_columnsPopulatedEmpty
power_query, m_parametersPopulatedEmpty
metadata, rlsPopulatedEmpty
tmschema_* (all 38)PopulatedEmpty

Notes and Gotchas

Out of Scope

pbixray is a read-only data-model extractor. It does not:

Requirements