Rename any .pbix file to .zip and you can open it with an archive tool immediately. That outer container is not the most exotic part of the format, but it is still important because it defines the first stable boundary a parser can rely on.
At this layer, PBIX is a package format. The interesting work starts when one of those package members, DataModel, turns out to be a compressed Analysis Services backup that unfolds into a VertiPaq workspace.
Where This Fits
This article is the container-level companion to What Is a PBIX File?. It covers the outer package and the route to DataModel. For the next step down, continue with The DataModel: Power BI’s Embedded Analysis Services Engine.
Typical Top-Level Entries
In practice, a PBIX file normally contains a recognizable set of members:
DataModel
Mashup
Report/Layout
Report/StaticResources
SecurityBindings
Connections
[Content_Types].xml
Version
Not all of those matter equally for this launch batch. The report-side files are useful if your goal is layout extraction. The storage-focused articles on this site treat DataModel as the main event because it contains the semantic model and the imported column data.
The ZIP Layer Is Still Worth Modeling Properly
A purpose-built PBIX parser only needs the parts of ZIP that help it reach DataModel:
- the end-of-central-directory record
- the central directory entries
- the local header for the
DataModelentry
The end-of-central-directory record sits at a fixed position — 22 bytes from the end of the file when there’s no archive comment — and points to the central directory. Walking the central directory gives you every entry’s name, size, and local-header offset. A minimal Kaitai-style description is enough:
instances:
end_of_central_dir:
pos: _root._io.size - 22
central_dir:
pos: end_of_central_dir.ofs_central_dir
type: central_dir_entry
That is enough to locate DataModel precisely without unpacking the entire file to disk first.
Why DataModel Is Usually Stored Raw
One detail worth checking in a real PBIX file is the ZIP compression_method of the DataModel entry. In the files I work with, it is typically stored with ZIP compression_method = 0, meaning the outer ZIP layer is not compressing that member.
That matches the reverse-engineering path from my earlier VertiPaq write-up: the meaningful compression lives inside DataModel, not at the outer package layer.
What the ZIP Archive Tells You
The archive layer can answer several practical questions quickly:
- which major PBIX components are present
- where each component sits in the file
- how large each component is
- whether a component is stored or compressed at the ZIP layer
Those facts are enough to make a parser efficient. You can seek directly to the entry you care about instead of treating the file as a monolithic blob.
The fields that matter most during this pass are usually:
compression_method
crc32
len_body_compressed
len_body_uncompressed
ofs_local_header
file_name
What the ZIP Archive Cannot Tell You
The outer package is only the envelope. It cannot tell you:
- which tables exist in the semantic model
- how a given column is encoded
- which files belong to which logical columns
- how the imported table data is compressed
Those questions are answered deeper in the stack:
- The DataModel article covers the XPress9 and ABF layers.
- Inside
metadata.sqlitedb: Tables, Columns, Measures & Relationships covers the metadata database. - Reconstructing Column Data from
.idfand.idfmetacovers the actual segment decoding.
A Practical Traversal Strategy
A straightforward PBIX reader usually follows this pattern:
- read the end-of-central-directory record
- walk the central directory until
DataModelis found - resolve the corresponding local header
- slice the raw bytes of the member
- hand the bytes to a
DataModeldecoder
That separation of concerns turns out to be very helpful. The ZIP parser only needs to be good at locating and slicing archive members. The storage parser can then focus on Analysis Services and VertiPaq details.
Related Articles
- Read What Is a PBIX File? for the high-level overview.
- Continue with The DataModel: Power BI’s Embedded Analysis Services Engine for the inner container.
- Jump to Parsing PBIX Files with Python (pbixray) if you want the implementation surface rather than the package details.