The concept of data products, or ‘data as a product’, has gained significant traction over the last few years. Data products are central to data marketplaces, and more recently recognised as a core part of data mesh. But a broadly accepted definition of a data product remains elusive.
There have been many attempts to define it. Some definitions are helpful, while others are narrow or self-serving, tied to a particular technology or methodology. Most tend to fall on a spectrum between two points. At one end there’s the ‘product mindset’ approach, thinking of data products in the context of their purpose to optimize value and how they’re managed to serve that purpose. At the other end there are technical specifications about what constitutes a data product and how it’s technically managed through its lifecycle. So, what is a data product? Something data-related that adds value to a target audience, or a defined set of technical components?
For data product managers, it’s almost always the former. This is a topic that’s discussed regularly on the Data Product Mindset Podcast. Not only is there a strong consensus on optimizing for business value, but I’ve found that experienced data product managers will often refuse to be drawn on defining what a data product is. From their perspective, it’s an academic question that’s of little practical use when trying to deliver value. In this case, data products tend to be defined as:
One or more data-related assets that deliver a value proposition to a defined target audience, that are packaged and permissioned as a product to deliver an ongoing service.
The benefit of this type of definition is that it’s highly flexible with regards to types of data products and emphasizes elements of data product management like ‘value proposition’ and ‘target audience’, suggesting purpose and planning. However, it can also be a bit nebulous, particularly for the uninitiated or those that have to practically implement and manage a data product.
For architects and data engineers, the preferred definition tends to be more technical. Their roles require a blueprint for what needs to be built and managed, as they will typically be held responsible if the produced goods fail to meet the specified requirements. In this case data products may be defined more in relation to architecture and desired characteristics such as observability, quality, portability, auditability and security. These tend to be aligned to a common ‘standard’ that represents a ‘data product’. In this more technical realm, data products tend to be defined as:
A self-describing object containing one or more data assets, subject to full-lifecycle management — create, read, update, and delete (CRUD) — and with SLAs relating to content, structure, quality, update frequency, and levels of support.
This type of definition is helpful when aiming to build trust and acceptance in data products. It also provides a clear and practical measure of what a data product is, and is not. However, it’s also rigid and can fail to acknowledge that data products can be poor quality, unreliable, and a messy combination of formats, yet still deliver tangible benefit.
This divergence in definitions between roles can create a healthy friction balancing the needs of different roles. A good question is whether there’s a need for a single definition and whether there’s a need for the same definition across every organization.
One approach is for different roles and departments to have the definition they need in order to fulfill their roles. For data product managers, that’s about data product management. For data engineers, that’s about a technical specification. Assuming the two definitions don’t contradict each other, this approach can work, although it’s crucial to acknowledge there’s more than one definition and to ensure there isn’t a sudden proliferation of definitions that create inconsistency.
Another approach is to create a compound definition that balances the needs of the two groups. In that case, data products can be defined as:
One or more data-related assets that deliver a value proposition to a defined target audience, that are packaged and permissioned as a product to deliver an ongoing service. A data product should ideally be self-describing and subject to full-lifecycle management (CRUD). They may also require SLAs relating to content, structure, quality, update frequency, and levels of support.
By making the technical specification clear, but optional, it provides flexibility while also setting a standard for what good looks like.
Regardless of the approach taken, it’s important to quickly move beyond the debate on what is a data product and let people get on with what’s really important: building great data products, delivering them to users, and realizing the pent up value in your data assets.