If you only inspect one file inside DataModel, make it metadata.sqlitedb. This embedded SQLite database is the semantic spine of a PBIX file: it describes the model, the tables, the columns, the measures, the relationships, the partitions, and a large tail of other Tabular objects.
For a parser, this file is the difference between raw storage fragments and a coherent model. Without it, .idf and .dictionary files are just blobs with IDs in their filenames. With it, those blobs become typed columns in named tables.
Where This Fits
This article covers the metadata layer inside DataModel. It sits between The DataModel: Power BI’s Embedded Analysis Services Engine and the lower-level storage articles on dictionaries and hash indexes and column data reconstruction.
metadata.sqlitedb Mirrors the TMSCHEMA World
Power BI and Analysis Services expose semantic metadata through $System.TMSCHEMA_* DMVs. One of the nicest features of PBIX is that the same information is already embedded in the file as SQLite tables, so every TMSCHEMA rowset has a direct SQL equivalent you can run against metadata.sqlitedb without a live Analysis Services connection.
For example, TMSCHEMA_TABLES maps directly to the [Table] table:
SELECT
t.ID,
t.Name,
t.Description,
t.DataCategory,
t.IsHidden
FROM [Table] t
WHERE t.SystemFlags = 0;
Likewise, TMSCHEMA_COLUMNS, TMSCHEMA_MEASURES, TMSCHEMA_RELATIONSHIPS, and TMSCHEMA_PARTITIONS can all be expressed as ordinary SQL over the embedded database.
The Core Tables Worth Knowing First
The embedded schema is much larger than most people expect — several dozen tables covering every Tabular object type. The ones worth knowing first are the obvious logical tables:
ModelTableColumnMeasurePartitionRelationshipHierarchyLevelRoleTablePermission
It also includes the bridge tables that connect semantic objects to physical storage:
StorageFolderStorageFileTableStorageColumnStorage
That second group is especially important when you move from metadata browsing to table reconstruction.
Tables, Columns, and Measures
At the user-facing level, this is the part of the model people recognize immediately.
[Table] gives you logical tables. [Column] gives you visible and hidden columns, inferred and explicit names, data types, display folders, sort metadata, and expression information. [Measure] stores DAX expressions, format strings, folders, and descriptions.
The corresponding mapping query for columns looks like this:
SELECT
c.ID,
t.Name AS TableName,
COALESCE(c.ExplicitName, c.InferredName) AS Name,
COALESCE(c.ExplicitDataType, c.InferredDataType) AS DataType,
c.Expression,
c.FormatString
FROM [Column] c
JOIN [Table] t ON c.TableID = t.ID
WHERE c.Type IN (1, 2);
That is already enough to drive useful tooling such as schema inventories, DAX audits, and model documentation generators.
Relationships and Partitions
Relationships are stored as plain metadata too. The important fields are all there:
- from-table and to-table IDs
- from-column and to-column IDs
- cardinality
- active and inactive flags
- cross-filtering behavior
- referential-integrity hints
Partitions matter for a different reason. They tell you how data is sourced and organized, and they also form part of the bridge to the physical files that store imported column data.
The key lesson is that metadata.sqlitedb does not only describe business logic. It also provides the lookup keys a storage parser needs.
Why SQLite Is Such a Good Fit
Embedding semantic metadata in SQLite makes the file compact, queryable, and easy to inspect with ordinary tools. It also makes reverse engineering much more practical because you can test assumptions quickly:
- do table IDs match the folder names in the storage layer?
- which columns have dictionaries?
- which partitions back which tables?
- which roles and permissions exist?
That queryability is a big reason pbixray can expose so much of the model as DataFrames.
How pbixray Uses the Metadata Layer
In the current Python implementation, PBIXRay slices metadata.sqlitedb out of the DataModel stream, opens it through a SQLite handler, and materializes both convenience views and direct TMSCHEMA-like frames:
from pbixray import PBIXRay
model = PBIXRay("report.pbix")
print(model.tmschema_tables[["Name", "Description"]])
print(model.tmschema_columns[["TableName", "Name", "DataType"]])
print(model.relationships)
That metadata then drives the table-extraction path. The decoder does not guess which files belong to a column. It reads the mapping from the metadata layer first.
Related Articles
- Read The DataModel: Power BI’s Embedded Analysis Services Engine for the filesystem context around this database.
- Read VertiPaq Dictionaries and Hash Indexes for the value-mapping layer that sits below this metadata.
- Read Parsing PBIX Files with Python (pbixray) for the practical extraction workflow built on top of it.