Skip to content
PBIXray
Go back

Inside the PBIX ZIP Archive

Rename any .pbix file to .zip and you can open it with an archive tool immediately. That outer container is not the most exotic part of the format, but it is still important because it defines the first stable boundary a parser can rely on.

At this layer, PBIX is a package format. The interesting work starts when one of those package members, DataModel, turns out to be a compressed Analysis Services backup that unfolds into a VertiPaq workspace.

Where This Fits

This article is the container-level companion to What Is a PBIX File?. It covers the outer package and the route to DataModel. For the next step down, continue with The DataModel: Power BI’s Embedded Analysis Services Engine.

Typical Top-Level Entries

In practice, a PBIX file normally contains a recognizable set of members:

DataModel
Mashup
Report/Layout
Report/StaticResources
SecurityBindings
Connections
[Content_Types].xml
Version

Not all of those matter equally for this launch batch. The report-side files are useful if your goal is layout extraction. The storage-focused articles on this site treat DataModel as the main event because it contains the semantic model and the imported column data.

The ZIP Layer Is Still Worth Modeling Properly

A purpose-built PBIX parser only needs the parts of ZIP that help it reach DataModel:

The end-of-central-directory record sits at a fixed position — 22 bytes from the end of the file when there’s no archive comment — and points to the central directory. Walking the central directory gives you every entry’s name, size, and local-header offset. A minimal Kaitai-style description is enough:

instances:
  end_of_central_dir:
    pos: _root._io.size - 22
  central_dir:
    pos: end_of_central_dir.ofs_central_dir
    type: central_dir_entry

That is enough to locate DataModel precisely without unpacking the entire file to disk first.

Why DataModel Is Usually Stored Raw

One detail worth checking in a real PBIX file is the ZIP compression_method of the DataModel entry. In the files I work with, it is typically stored with ZIP compression_method = 0, meaning the outer ZIP layer is not compressing that member.

That matches the reverse-engineering path from my earlier VertiPaq write-up: the meaningful compression lives inside DataModel, not at the outer package layer.

What the ZIP Archive Tells You

The archive layer can answer several practical questions quickly:

Those facts are enough to make a parser efficient. You can seek directly to the entry you care about instead of treating the file as a monolithic blob.

The fields that matter most during this pass are usually:

compression_method
crc32
len_body_compressed
len_body_uncompressed
ofs_local_header
file_name

What the ZIP Archive Cannot Tell You

The outer package is only the envelope. It cannot tell you:

Those questions are answered deeper in the stack:

A Practical Traversal Strategy

A straightforward PBIX reader usually follows this pattern:

  1. read the end-of-central-directory record
  2. walk the central directory until DataModel is found
  3. resolve the corresponding local header
  4. slice the raw bytes of the member
  5. hand the bytes to a DataModel decoder

That separation of concerns turns out to be very helpful. The ZIP parser only needs to be good at locating and slicing archive members. The storage parser can then focus on Analysis Services and VertiPaq details.


Share this post on:

Previous Post
The DataModel: Power BI's Embedded Analysis Services Engine
Next Post
What Is a PBIX File?