The DataModel: Power BI's Embedded Analysis Services Engine

When Power BI Desktop saves a report, it serializes an entire in-memory columnar database into a single stream called DataModel. That stream is not a simplified export format. It is an Analysis Services Tabular database packaged inside the PBIX file.

Once you cross into DataModel, you are no longer dealing with report JSON or package metadata. You are dealing with the same family of storage concepts that power VertiPaq in Power BI, Power Pivot, and SSAS Tabular.

Where This Fits

This article begins where Inside the PBIX ZIP Archive ends. It follows the DataModel member through its compression layer and into the VertiPaq filesystem layout. For the semantic layer inside that structure, continue with Inside metadata.sqlitedb: Tables, Columns, Measures & Relationships.

The Layer Cake Inside `DataModel`

The easiest way to think about DataModel is as a stack of nested representations:

the PBIX file is a ZIP archive
the DataModel member is a raw ZIP entry
that entry contains an XPress9-compressed Analysis Services backup stream
the backup expands into a VertiPaq-oriented filesystem layout
tables are reconstructed from metadata, dictionaries, hash indexes, and compressed column segments

That is why generic archive tooling is only the first step. The real implementation work starts after the ZIP boundary.

The same layering was central to my earlier post Lessons Learned from Unpacking VertiPaq: A Developer’s Journey. The difference here is that the goal is not to tell the discovery story. It is to document the structures in a way that is useful for parser authors.

XPress9 and the Analysis Services Backup Stream

In practice there are three forms a DataModel stream can take, identified by the first 102 bytes:

an uncompressed ABF backup, marked by a STREAM_STORAGE_SIGNATURE_)!@#$%^&*( header — usually older files
a single-threaded XPress9 form whose signature reads "This backup was created using XPress9 compression."
a multithreaded XPress9 form whose signature is "This backup was created using multithreaded XPrs9." (note the abbreviated XPrs9)

All three land at the same destination — an Analysis Services backup that unfolds into a VertiPaq filesystem — they just take different routes to get there. pbixray’s unpacker dispatches on those signatures directly.

Both XPress9 variants share the same chunk layout after their respective signatures: each chunk begins with an uncompressed_size and a compressed_size as 32-bit little-endian integers, followed by a compressed node. The multithreaded form adds one extra header up front (block counts and chunk sizes) so a decoder can hand work out to a thread pool. Each node itself starts with a 32-byte XPress9 header that includes a 0x4e86d72a magic, the original and encoded sizes, a Huffman-table flags bitfield, a session signature, a block index, and a CRC32 — the actual compressed payload begins after those 32 bytes.

seq:
  - id: uncompressed
    type: u4
  - id: compressed
    type: u4
  - id: node
    type: node

That chunking step is the bridge between “ZIP member” and “recoverable backup.”

The ABF Layer Underneath

Once XPress9 is out of the way, what’s left is an Analysis Services ABF (Analysis Backup File). Inside every ABF there are three anchored structures a parser needs:

a BackupLogHeader at offset 72, always 4 KB long, which gives you the offset and size of the virtual directory
a VirtualDirectory — an XML-ish list of file entries with paths, sizes, and offsets into the backup body
a BackupLog — an XML manifest of file groups and backup files, which is matched back against the virtual directory by StoragePath

When those three agree, you get a file log of (Path, FileName, StoragePath, Size, OffsetHeader) tuples. That file log is what turns the backup stream into the VertiPaq workspace described below — each tuple becomes one on-disk file.

What the VertiPaq Filesystem Looks Like

After decompression, the structure follows a predictable on-disk layout that Power BI Desktop also uses when it materializes models to its local workspace folder. A minimal example looks like this:

0.CryptKey.bin
metadata.sqlitedb

Fruit RLE (427).tbl
  0.Fruit RLE (427).Type (430).dictionary
  1.H$Fruit RLE (427)$Qty (431).hidx
  432.prt/
    0.Fruit RLE (427).Qty (431).0.idf
    0.Fruit RLE (427).Qty (431).0.idfmeta
    0.Fruit RLE (427).Type (430).0.idf
    0.Fruit RLE (427).Type (430).0.idfmeta

That single directory tree already reveals most of the moving parts behind imported table reconstruction:

metadata.sqlitedb holds the semantic metadata
table folders hold data-oriented structures
partition folders hold the physical segments for each column
.dictionary, .hidx, .idf, and .idfmeta files cooperate to reconstruct values

Root-Level Folder Families

The VertiPaq workspace contains more than just table folders. Four folder families show up next to each other, each with a distinct naming convention:

data table folders: {Table} ({TableID}).tbl
column hierarchy folders: H${Table} ({TableID})${Column} ({ColumnID})$(…).tbl
relationship folders: R${Table} ({TableID})${Relationship} ({RelationshipID})$(…).tbl
user-defined hierarchy folders: U${Table} ({TableID})${Hierarchy} ({HierarchyID})$(…).tbl

If your immediate goal is “read table data,” the first family matters most. The others are still important because they show that the on-disk engine is maintaining more than simple column payloads. It is also materializing structures needed for hierarchy navigation, relationship operations, and query-time behavior.

Why `metadata.sqlitedb` Sits at the Center

Although DataModel contains many binary structures, the metadata database is the organizing spine. It tells the parser:

which tables and columns exist
which storage files correspond to each logical column
whether a column uses a dictionary or value-encoding path
which type conversions should be applied on the way out

That is why a direct parser usually reads metadata.sqlitedb very early in the process. The rest of the binary files become much easier to interpret once they are anchored to real model objects.

What `pbixray` Actually Does with `DataModel`

At a high level, the Python implementation follows a three-stage pattern:

unpack the embedded DataModel
read metadata.sqlitedb into DataFrames
use that metadata to locate and decode per-column storage files

That split is deliberate. The hardest part of the format is not one magical binary parser. It is the fact that meaning is distributed across multiple layers that only become useful when combined.

Read Inside metadata.sqlitedb: Tables, Columns, Measures & Relationships for the semantic layer.
Read VertiPaq Dictionaries and Hash Indexes for value reconstruction.
Read Reconstructing Column Data from .idf and .idfmeta for the compressed segment payloads.

The DataModel: Power BI's Embedded Analysis Services Engine

Where This Fits

The Layer Cake Inside DataModel