Skip to content
PBIXray
Go back

Parsing Power Pivot Data Models from Excel XLSX Files

When I first started unpacking VertiPaq, I was mostly thinking about Power BI. The file on my desk was a .pbix, the obvious target was the DataModel stream, and the reward was being able to reconstruct imported Power BI tables without opening Power BI Desktop.

The XLSX work started with a GitHub issue opened by LucianCr on September 29, 2025. The question was whether pbixray could also read tables and Power Query steps from .xlsx files.

What I liked about the issue was the practical context. The request was not framed as a file-format curiosity. It came from someone trying to understand model data locally, after experimenting with copy-paste workflows and local AI. The workbook was already on disk, but the model inside it was still difficult to inspect programmatically.

Power Pivot has been part of the Microsoft BI story for a long time. It came out of the old Project Gemini work, appeared around the Excel 2010 / SQL Server 2008 R2 era, and brought a local analytical model into the workbook. Microsoft describes Power Pivot as an Excel add-in for building data models, relationships, and DAX calculations. More importantly for this discussion, Microsoft also describes the data as being stored in an analytical database inside the Excel workbook, with a local engine loading and querying that database.

That matches what many of us remember from the early PowerPivot days. Excel was not simply making bigger PivotTables. It was hosting something that behaved like a local Analysis Services model. Even today, if you watch Excel with Process Explorer while working with Power Pivot, the msmdsrv.exe subprocess is a useful reminder of where the machinery comes from. Power BI Desktop follows a related local-engine pattern: Analysis Services process, model metadata, VertiPaq storage, then an application shell around it.

The work I had done for PBIX files did not stop at PBIX, but it also did not transfer perfectly. The engine family is shared; the file details still matter.

The Same Engine in a Different Coat

A .pbix file and an .xlsx file look like different products. One belongs to Power BI Desktop. The other belongs to Excel. Their outer packaging also differs in the details.

But once a workbook contains a Power Pivot data model, the important inner layer becomes familiar.

In a PBIX file, pbixray looks for:

DataModel

In an Excel workbook with an embedded Power Pivot model, it looks for:

xl/model/item.data

That path is the bridge. item.data is the workbook’s embedded model payload. After that point, the parser is no longer doing ordinary Excel worksheet extraction. It is back in the world of Analysis Services backup structures and VertiPaq storage files.

But this is also where the first important difference appears. PBIX files, at least in the modern files pbixray targets, expose the semantic layer through metadata.sqlitedb. The XLSX Power Pivot samples I tested take a different route: model metadata is reconstructed from XML files such as Model...cub.xml, dimension XML, partition XML, measure-group XML, script XML, data-source-view XML, and table XML.

This is where the Microsoft Open Specifications become useful. [MS-XLDM], the Spreadsheet Data Model File Format specification, describes the binary format used to store part of a tabular data model inside a spreadsheet file: tables, data, and relationships within the containing workbook. [MS-SSAS-T], the SQL Server Analysis Services Tabular Protocol, describes the Tabular side of the Analysis Services world: how clients communicate with and operate on Tabular databases at compatibility level 1200 or higher.

Those two documents served different purposes during the reverse-engineering process. MS-XLDM helped with the spreadsheet-contained model structures. MS-SSAS-T filled in object definitions that were not fully described in MS-XLDM, which made it easier to interpret the Tabular metadata objects inside the workbook model. Some of the XLSX implementation details in pbixray were sourced from those specifications; others were matched against real workbook files and against structures I had already learned while parsing PBIX.

The outer ZIP package changed. The metadata layer changed. The data model did not become a simple spreadsheet table.

Why This Matters

Excel workbooks often sit in the awkward middle of enterprise analytics. They are easy to share, hard to govern, and full of models that gradually become important without anyone quite deciding that they are now production assets.

Power Pivot makes that more interesting. A workbook can contain:

If the only way to inspect that model is to open Excel manually, automation stops at the front door. You cannot easily inventory models, search DAX measures, review relationships, or compare workbook models in a CI-style workflow.

That is where Lucian’s issue landed for me. The question was not “can Python read worksheets?” Python can already do that very well. The useful question was whether the hidden model layer inside some workbooks could be inspected with the same direct-file approach as PBIX.

That is where pbixray becomes useful, but with a narrower promise than the PBIX path. It can treat a Power Pivot workbook as a model file, not just as a collection of visible worksheets. Today, the XLSX path is primarily a metadata reader.

Opening an XLSX Model with pbixray

The entry point is intentionally the same as the PBIX path:

from pbixray import PBIXRay

model = PBIXRay("Supplier Quality Analysis Sample-no-PV.xlsx")

print(model.tables)
print(model.schema)
print(model.relationships)
print(model.dax_measures)

There is no separate ExcelRay class. If the .xlsx contains xl/model/item.data, PBIXRay detects it, marks the model as xlsx, and routes the embedded payload through the Excel metadata path.

In addition to tables and schema shown above, the other populated XLSX-facing properties are:

print(model.dax_tables)
print(model.dax_measures)
print(model.relationships)
print(model.statistics)
print(model.size)

On the sample workbook I used while checking the implementation, those return real DataFrames for tables, schema, relationships, measures, partition/source expressions, model size, and storage statistics. The dax_tables name comes from the shared API; for XLSX files it should be read as source-expression metadata, not as a guarantee that Excel calculated tables are supported.

There are also properties that exist on the Python object but are not populated for XLSX today:

print(model.power_query)        # empty for XLSX today
print(model.m_parameters)       # empty for XLSX today
print(model.dax_columns)        # empty for XLSX today
print(model.rls)                # empty for XLSX today
print(model.tmschema_tables)    # empty stub for XLSX today
print(model.tmschema_columns)   # empty stub for XLSX today

This distinction matters. The PBIX implementation can lean on metadata.sqlitedb and expose many DMV-like TMSCHEMA_* endpoints. The XLSX implementation parses XML metadata instead, and most DMV-style properties are currently stubs.

That is usually where I start when looking at an unfamiliar workbook anyway. Tables and relationships give you the shape. Measures tell you where some of the business logic lives. Statistics tell you which columns are doing the heavy lifting in storage.

What Happens Under the Hood

The XLSX path in pbixray is deliberately small.

First, the unpacker opens the file as a ZIP archive. Then it checks for one of two supported model locations:

Once it has the model payload, the compression and backup layers are familiar:

  1. detect whether the backup stream is uncompressed, single-threaded XPress9, or multithreaded XPress9
  2. decompress the Analysis Services backup stream
  3. parse the ABF structures to recover the internal file log
  4. choose a metadata reader based on file type

For PBIX, pbixray extracts metadata.sqlitedb and uses SQLite queries to populate the rich metadata and TMSCHEMA-style frames. For XLSX, it uses XmlMetadataQuery, which reads the workbook model’s XML objects and builds a smaller set of frames: schema, DAX measures, partition/source expressions (exposed under the shared dax_tables name), relationships, and statistics.

The implementation also leans on MS-XLDM and MS-SSAS-T as complementary maps. The parser still has to reconcile the specifications with the files produced by Excel, but together they give names and shapes to structures that would otherwise be much harder to interpret.

Excel support is a second metadata route hanging off the same decompression and ABF recovery work, informed by open protocol documents that overlap with structures I had already been using for PBIX parsing.

Metadata Before Rows

With workbooks, I find metadata inspection especially valuable.

People often use Excel as a canvas for exploration: import a few tables, add relationships, create a measure, add another measure, hide a helper column, forget which query created which table, then pass the workbook to someone else. Six months later the workbook has become part of a process.

For XLSX files, I want to answer metadata questions first:

Those questions do not require opening Excel, refreshing the workbook, or materializing every imported table into memory.

from pbixray import PBIXRay

model = PBIXRay("finance-model.xlsx")

print(model.tables)
print(model.relationships[[
    "FromTableName",
    "FromColumnName",
    "ToTableName",
    "ToColumnName",
    "IsActive",
]])
print(model.dax_measures[["TableName", "Name", "Expression"]])

For model governance and documentation, that is often enough to be immediately useful.

Current get_table Limitation

The tempting next step is to do this:

customers = model.get_table("Customer")
print(customers.head())

That works for PBIX files. In the current pbixray implementation, it does not work for XLSX Power Pivot files.

The reason is not that there is no VertiPaq storage underneath. The decoder iterates the schema and, for each column, expects to find an .idfmeta companion alongside the column’s .idf segment in the ABF file log. PBIX files expose those companions in the layout the decoder was written against. The XLSX item.data file log does not surface them in the same shape, so the column reconstruction step has nothing to attach to.

I should also be honest: I have not yet worked out how the XLSX model stitches column storage to its XML metadata. The structures are clearly there — the workbook obviously stores its imported tables somewhere — but mapping them onto the existing decoder, or writing a second reconstruction path that does not assume the PBIX-style metadata, is something I am planning to look at next rather than something I have already solved. The XLSX test suite reflects that: it covers metadata only and does not exercise get_table.

So the practical boundary today is:

That is still useful, but it is not the same as saying “the whole PBIX API works for XLSX.”

XLSX table extraction would need a reconstruction path that does not assume the PBIX-style .idfmeta layout — once I have spent more time with how the workbook actually maps its XML model objects onto its stored column segments.

Plain XLSX Files Are Different

One important distinction: not every .xlsx file has a Power Pivot model.

A normal workbook with visible worksheets, formulas, and cell ranges may not contain xl/model/item.data at all. That is not a failure of the parser; it is a different kind of file. pbixray is interested in the embedded Tabular model, not general worksheet extraction.

So the rule is:

This boundary is useful because it keeps pbixray focused. It does not try to become an Excel reader. It reads the Analysis Services model hiding inside some Excel workbooks.

From One Issue to a Wider Parser

The original VertiPaq work started as a very specific itch: can I open a PBIX file, get past the nested compression, and reconstruct the data without asking Power BI Desktop for help?

The answer eventually became yes for PBIX. The XLSX issue showed that some of the same container and model layers were present, but the mapping was not one-to-one:

container file
  -> embedded model payload
  -> Analysis Services backup
  -> model metadata and storage files
  -> parser-specific metadata views

PBIX uses one container. Excel uses another. PBIX exposes modern metadata through SQLite. Excel Power Pivot workbooks can expose model metadata through XML. pbixray can inspect both, as long as the implementation respects that fork.

The older product history is why this overlap exists at all. Power Pivot brought a local, workbook-contained analytical model into Excel. Power BI later made the same family of ideas feel like a standalone reporting application. SSAS Tabular sits on the server side of the same family tree. Names changed over time: PowerPivot, Power Pivot, Excel Data Model, xVelocity, VertiPaq, Tabular. The file formats are not identical, but the lineage is very real.

That makes Power Pivot workbooks less opaque. You can list their model tables, inspect relationships, review measures, and estimate model storage from Python. The current XLSX path should not be treated as a full row-data extractor, and the PBIX DMV endpoints should not be expected to be populated for workbook models.

That is the useful part of small open source issues. A concrete question from someone’s workflow can reveal both the generality and the limits of the parser. pbixray started with one file extension, but the real subject was never just .pbix. It was the model inside.


Share this post on:

Previous Post
How VertiPaq Sorts a Table — A Visual Walkthrough
Next Post
PBIX Parsing in Your Browser: Introducing PBIX.info