When Power BI Desktop saves a report, it serializes an entire in-memory columnar database into a single stream called DataModel. That stream is not a simplified export format. It is an Analysis Services Tabular database packaged inside the PBIX file.
Once you cross into DataModel, you are no longer dealing with report JSON or package metadata. You are dealing with the same family of storage concepts that power VertiPaq in Power BI, Power Pivot, and SSAS Tabular.
Where This Fits
This article begins where Inside the PBIX ZIP Archive ends. It follows the DataModel member through its compression layer and into the VertiPaq filesystem layout. For the semantic layer inside that structure, continue with Inside metadata.sqlitedb: Tables, Columns, Measures & Relationships.
The Layer Cake Inside DataModel
The easiest way to think about DataModel is as a stack of nested representations:
- the PBIX file is a ZIP archive
- the
DataModelmember is a raw ZIP entry - that entry contains an XPress9-compressed Analysis Services backup stream
- the backup expands into a VertiPaq-oriented filesystem layout
- tables are reconstructed from metadata, dictionaries, hash indexes, and compressed column segments
That is why generic archive tooling is only the first step. The real implementation work starts after the ZIP boundary.
The same layering was central to my earlier post Lessons Learned from Unpacking VertiPaq: A Developer’s Journey. The difference here is that the goal is not to tell the discovery story. It is to document the structures in a way that is useful for parser authors.
XPress9 and the Analysis Services Backup Stream
In practice there are three forms a DataModel stream can take, identified by the first 102 bytes:
- an uncompressed ABF backup, marked by a
STREAM_STORAGE_SIGNATURE_)!@#$%^&*(header — usually older files - a single-threaded XPress9 form whose signature reads
"This backup was created using XPress9 compression." - a multithreaded XPress9 form whose signature is
"This backup was created using multithreaded XPrs9."(note the abbreviatedXPrs9)
All three land at the same destination — an Analysis Services backup that unfolds into a VertiPaq filesystem — they just take different routes to get there. pbixray’s unpacker dispatches on those signatures directly.
Both XPress9 variants share the same chunk layout after their respective signatures: each chunk begins with an uncompressed_size and a compressed_size as 32-bit little-endian integers, followed by a compressed node. The multithreaded form adds one extra header up front (block counts and chunk sizes) so a decoder can hand work out to a thread pool. Each node itself starts with a 32-byte XPress9 header that includes a 0x4e86d72a magic, the original and encoded sizes, a Huffman-table flags bitfield, a session signature, a block index, and a CRC32 — the actual compressed payload begins after those 32 bytes.
seq:
- id: uncompressed
type: u4
- id: compressed
type: u4
- id: node
type: node
That chunking step is the bridge between “ZIP member” and “recoverable backup.”
The ABF Layer Underneath
Once XPress9 is out of the way, what’s left is an Analysis Services ABF (Analysis Backup File). Inside every ABF there are three anchored structures a parser needs:
- a BackupLogHeader at offset 72, always 4 KB long, which gives you the offset and size of the virtual directory
- a VirtualDirectory — an XML-ish list of file entries with paths, sizes, and offsets into the backup body
- a BackupLog — an XML manifest of file groups and backup files, which is matched back against the virtual directory by
StoragePath
When those three agree, you get a file log of (Path, FileName, StoragePath, Size, OffsetHeader) tuples. That file log is what turns the backup stream into the VertiPaq workspace described below — each tuple becomes one on-disk file.
What the VertiPaq Filesystem Looks Like
After decompression, the structure follows a predictable on-disk layout that Power BI Desktop also uses when it materializes models to its local workspace folder. A minimal example looks like this:
0.CryptKey.bin
metadata.sqlitedb
Fruit RLE (427).tbl
0.Fruit RLE (427).Type (430).dictionary
1.H$Fruit RLE (427)$Qty (431).hidx
432.prt/
0.Fruit RLE (427).Qty (431).0.idf
0.Fruit RLE (427).Qty (431).0.idfmeta
0.Fruit RLE (427).Type (430).0.idf
0.Fruit RLE (427).Type (430).0.idfmeta
That single directory tree already reveals most of the moving parts behind imported table reconstruction:
metadata.sqlitedbholds the semantic metadata- table folders hold data-oriented structures
- partition folders hold the physical segments for each column
.dictionary,.hidx,.idf, and.idfmetafiles cooperate to reconstruct values
Root-Level Folder Families
The VertiPaq workspace contains more than just table folders. Four folder families show up next to each other, each with a distinct naming convention:
- data table folders:
{Table} ({TableID}).tbl - column hierarchy folders:
H${Table} ({TableID})${Column} ({ColumnID})$(…).tbl - relationship folders:
R${Table} ({TableID})${Relationship} ({RelationshipID})$(…).tbl - user-defined hierarchy folders:
U${Table} ({TableID})${Hierarchy} ({HierarchyID})$(…).tbl
If your immediate goal is “read table data,” the first family matters most. The others are still important because they show that the on-disk engine is maintaining more than simple column payloads. It is also materializing structures needed for hierarchy navigation, relationship operations, and query-time behavior.
Why metadata.sqlitedb Sits at the Center
Although DataModel contains many binary structures, the metadata database is the organizing spine. It tells the parser:
- which tables and columns exist
- which storage files correspond to each logical column
- whether a column uses a dictionary or value-encoding path
- which type conversions should be applied on the way out
That is why a direct parser usually reads metadata.sqlitedb very early in the process. The rest of the binary files become much easier to interpret once they are anchored to real model objects.
What pbixray Actually Does with DataModel
At a high level, the Python implementation follows a three-stage pattern:
- unpack the embedded
DataModel - read
metadata.sqlitedbinto DataFrames - use that metadata to locate and decode per-column storage files
That split is deliberate. The hardest part of the format is not one magical binary parser. It is the fact that meaning is distributed across multiple layers that only become useful when combined.
Related Articles
- Read Inside
metadata.sqlitedb: Tables, Columns, Measures & Relationships for the semantic layer. - Read VertiPaq Dictionaries and Hash Indexes for value reconstruction.
- Read Reconstructing Column Data from
.idfand.idfmetafor the compressed segment payloads.