Data catalog vs. data marketplace: What’s the difference?
What’s the difference between a data catalog and a data marketplace?
Repeat after me: A data catalog is not a data marketplace. A data marketplace is not a data catalog.
The distinction may be obvious when comparing a data catalog and a public data marketplace (a place that enables buying and selling of data). But the difference is less clear when comparing a data catalog and an enterprise data marketplace.
Before I explain the difference in features and users that they serve, let’s first look at how they differ in purpose.
Purpose of a data catalog vs. data marketplace
The purpose of a data catalog is to understand and organize all the data within an organization. A catalog is primarily used by data governance teams to establish an inventory and provide governance over the data estate. This is particularly important for complying with standards, regulations, and laws. It also helps manage the risk of sensitive or valuable data elements.
The purpose of a data marketplace is to facilitate the access, use, and distribution of ‘ready-to-use’ data (including data products) within and between organizations. A data marketplace is primarily used to reduce time to access and time to value for a wide range of data users. This is particularly important when building applications, delivering insights, and creating data-driven business processes.
Features of a data catalog
The typical features found in a data catalog include:
- Metadata management: This should include a ‘data asset inventory’ listing all data assets and a ‘metadata repository’ that stores information about each data asset.
- Data discovery: The ability for users to search and browse data assets using keywords, filters, tags, data type, source, and owner.
- Data governance, which can include:
- Data lineage tools to represent data flows from source to target and how data is transformed
- Data owner/stewardship information with the responsibilities of individuals in relation to specific data assets
- Tools to ensure data usage is managed in line with standards, regulations and laws
- Data quality: The automatic generation of detailed metadata to provide insight and assurance on data quality, including the accuracy, completeness, consistency, and timeliness of data.
- Collaboration and documentation: Users can add comments, notes, and documentation to data assets in order to foster collaboration and knowledge sharing. Social features and user-generated content (UGC) — like ratings and reviews — capture user feedback and can improve the overall experience.
- Usage analytics: Usage metrics provide insights into how often data assets are accessed, by whom, and for what purpose. Data access controls allow organizations to manage who can view or edit data assets.
- AI: Some catalogs have developed more advanced features using AI and machine learning to improve data asset recommendations, metadata generation and analysis, and the populating of business glossaries.
Features of a data marketplace
The typical features found in a data marketplace include:
- Data product management: Users should be able to create and manage data as a product, by connecting to data sources, defining assets, and combining any number and type of asset into a product by adding packaging and permissioning.
- Data product catalog/storefront: Similar capabilities to a data catalog, but for data products rather than just data assets. Data products will have dedicated summaries with rich packaging explaining and demonstrating use cases to enable assessment and comparison.
- Permissions: Users can gain subscription-based access to data products through pre-provisioned, self-service, or access request workflows. Subscription plans are based on policies that control data access including duration, permitted use, data contracts, pricing, single/multi-user, etc.
- Transformation and customization: Ability to transform and customize data with both technical and low/no-code options — such as executing code/notebooks, or filtering rows and columns. Access and use of customized data products subject to permission lineage so the data owner maintains visibility and control.
- Analytics: Ability to analyze data with technical and low/no-code options to support a wide range of users and use cases. This may include sandboxes, clean rooms, data science workbenches, query engines, natural language queries, data apps, and interactive visualizations. Tooling, infrastructure, and data is orchestrated by the marketplace for a self-service experience.
- Distribution: Ability to distribute data via a range of mechanisms including APIs, cloud pipelines, SFTP, and desktop download. Distribution is one-off or automated on an ongoing basis — either scheduled or event-driven, such as when the data changes.
Why do I need a data marketplace when I already have a data catalog?
If your organization has already invested in an enterprise data catalog, isn’t that supposed to support one-stop shopping for data?
According to industry expert Wayne Eckerson, the answer is “yes and no”. As he explains in this detailed guide, “a data catalog gets users halfway to data, while a data marketplace closes the loop.”