Internal data marketplaces: Everything you need to know

Everything you need to know

Download as PDF

Read time:

Author:

Anthony Cosgrove

Co-Founder

Date published:

9.18.2024

Table of Contents:

Expand Table of Contents

Collapse Table of Contents

Practical guide to launching an internal data marketplace

Data marketplaces are all about solving the ‘last mile’ problem for data. This means closing the gap between data producers and consumers.

This is true whether you’re trying to access data directly in a source system, through an interface, or in a target location. This guide sets out practical considerations for launching your own internal data marketplace, sometimes referred to as an enterprise data marketplace — that is, a marketplace that empowers data access across your entire organization.

What are data marketplaces for?

Data marketplaces solve data access at scale. Solving the data access problem is critical for Chief Data Officers and their data teams to:

Accelerate data-driven outcomes
Provide architectural flexibility and avoid vendor lock-in
Foster an improved data culture
Deliver and demonstrate return on investment (ROI)
Improve the experience for business stakeholders

How do you know you need an internal data marketplace?

Ideally, you’ve already identified the need for an internal data marketplace, but if not, there are certain factors that indicate a pressing need.

Business users frustrated with time-to-access

The unfortunate reality is that business users share a seemingly universal complaint: it simply takes too long to access the data they need. Even when data requests are highly specific, it’s not uncommon for them can take weeks or even months to fulfill. Friction and delays lead to frustration, and in turn, a sense of futility. An internal data marketplace can short-circuit this issue by providing rapid access to data and tooling.

Low trust between data users and owners

It’s all too common for data users to mistrust or have low confidence in the data. Similarly, data owners don’t trust users with the data, and to mitigate risk, they put barriers in place. A data marketplace with proper governance capabilities enables data owners to grant access to users while tightly managing risk. Data users also gain clarity about the data, what it can be used for, and the source. This gives them greater confidence that they’re using the right data for their needs.

Existence — and ubiquity — of data silos

Data silos are a ubiquitous phenomenon in the modern enterprise. While there are practical reasons for them to exist, they can result in inhibited value creation, duplication of effort, wasted resources, and frustration. The primary reason for a data marketplace is to enable data access, which means overcoming data silos.

Multitude and diversity of data consumers

Users will have different technical skills, use cases, and preferred tools. Additionally, a significant amount of collaboration will be required between users to deliver successful outcomes from data. Internal data marketplaces are purpose-built to serve the full range of data consumers and the best ones will include collaborative and asynchronous workflows.

Marketplace or catalog?

A data catalog is a technology that creates an inventory of (typically) internal data. It's a way to start understanding what data is available, but catalogs tend not to provide a view of the permitted use cases of the data, multiple ways of accessing it, or the ability to manage data access across technical and organizational boundaries. Most firms already have at least one data catalog in place, which is a helpful foundation for a data marketplace.

Learn more in our explainer comparing data catalogs and marketplaces.

Learn more

Consequences of not having an internal data marketplace

Now that we can recognize some of the indicators of needing an enterprise data marketplace, let’s briefly look at what happens when you don’t have one.

Technology, data, and AI projects take months and years to deliver.
Data initiatives show relatively poor ROI — if they are even measured.
Inability to scale data access, whether through self-service or automation.
Low levels of data reusability; ideally, you want to be able to build something once and use it often.

The reality is that until you enable flexible, governed data access, you will struggle to deliver value at any scale.

What should data marketplaces enable?

When done well, internal or enterprise data marketplaces are transformative for organizations. But what’s required for success? Your data marketplace should enable:

Flexible data access

At source — Data owners may want their data to be accessed directly at source. The data may be too large to move, they might not want copies to be made, they may need to minimize latency, or they need to optimize query performance.
Via an interface — Data owners may want an interface to control the amount of data that’s accessible, provide specific functionality to user personas, or create a streamlined experience. For maximum flexibility, the marketplace should enable interfaces to query data wherever it’s stored.
In a target location — Data owners may be comfortable with copies of their data being made. Users may also need to use the data in a range of systems including on-premises and proprietary tooling.

Different user personas

To be successful as a single interface for data access, your data marketplace needs to work for a wide range of user personas — from data engineers and data scientists through to data analysts, business users, and executives. Serving these different users will have implications for everything from the look and feel of each user journey to the level of collaboration and asynchronous working that is required.

Any type of digital asset

Data sometimes means rows and columns, but often it means files, images, reports, and visualizations. To be successful, a data marketplace will need to support every digital asset that data owners provide, as well as every format.

A range of operating models

Your data marketplace will need to adapt to your business environment, with scalable controls around how data, organizations, and users are managed. It must enable one or more operating models, which may include:

Internal sharing and distribution: Distribute your data products to internal business users at scale. This model can support a data mesh architecture.
Internal collaboration: Share and collaborate on data across teams, divisions, and business units.
Data acquisition/integrating external data: Evaluate and adapt data from suppliers and redistribute internally.

If sharing data outside of your organization, additional operating models include:

Data commerce: Monetize data products, models, and data-related services.
External data sharing: Securely share data and models with external parties and service providers.
Partner collaboration: Collaborate with partners without losing custody of data or models.

What capabilities does your internal data marketplace need?

To effectively serve your business and the range of users and use cases, your internal data marketplace will need certain features and capabilities.

Connectors

Manage the creation and maintenance of connectors to read, write, and copy data to and from a range of sources such as cloud databases, data lakes/warehouses, on-premises, and desktops.

Data assets

Full lifecycle management of data assets. This includes ownership, entitlements, and lineage. Data assets should include tables, notebooks, images, visualizations, text, etc.

Data products

Full lifecycle management of data products, including asynchronous processes to create, publish, manage, and delete.

Subscriptions

Record and enforce the entitlements set out in the data product subscriptions covering duration of access, type of access, number of users, etc.

Storefront

A capable and intuitive storefront for data products allowing a range of personas to easily discover and understand what’s available to them.

Interfaces

Host a range of interfaces to balance user experience and risk. This may include cleanrooms, sandboxes, workbenches, query engines BI tools, etc.

Data transformation

Manage the creation and maintenance of code used to create custom data assets and the ability to automate code execution on a scheduled and event-driven basis.

Export

Manage the creation and maintenance of data pipelines, file transfers, and downloads on a one-off, scheduled, or event-driven basis.

Identity and access management (IAM)

Control all user and system access to the platform, services, and tools.

Miscellaneous

A range of services covering entity reference, deletion, monitoring, lineage, authentication, security, etc.

User personas for data marketplaces

Understanding and serving users is key to a successful internal data marketplace. There are three broad user personas: platform operators, data producers, and data consumers. These are often groups of people, not single individuals, working together to achieve specific goals. Individual users may take on multiple personas at different times. For example, it’s very common for the platform operator to also act as a data producer and a data consumer. Meeting user needs and building momentum is critical at all stages, especially at the start.

Platform operators

Platform operators are responsible for running the data marketplace. They will set the standards in several key areas:

Operating model: Is your marketplace strictly for internal users? Which teams will have access to the marketplace? How are those teams defined? Will data be centralized or left at source?
Governance: How do you plan to set and enforce rules around data governance, movement, access, and usage?
Access and control: Who will have access to the marketplace? How do you plan on delineating user roles and entitlements? Does the platform operator or the data producer dictate how data is accessed?
Reporting: How will you monitor the performance and usage of the platform? What metrics will you collect? Who will you report these metrics to? How will your reporting drive your business objectives?
Cloud provider: What cloud provider(s) will you use? Where will you deploy your platform? How will you manage and apportion costs?

Data producers

Data producers are those who configure and manage data assets and products. When establishing your internal data marketplace, data producers should think about:

Data assets: What type of data assets are available? Which data assets need to be prioritized in the early stages of your data marketplace? Will they be left at source or copied to the platform? Who will they be shared with?
Data product management: How will the lifecycle of data products be managed? Is there a coherent data product management philosophy or culture in your organization? Are you comfortable with the balance between risk and reward?
Technology: Are source systems the optimal storage layer for the target use cases? What tools do the various users need to work with data? What data pipelines will need to be created and maintained?
Experience: What user experiences do you want to enable? Who are those experiences for?

Data consumers

Data consumers are the end users of your data assets and products. Consider how the following will affect and empower your organization’s data consumers:

Roles: Who are the data consumers? What roles need to be created? Who will be invited to use the data marketplace? How will you delete and remove users? How will you support users?
User experience: Based on your user roles, how will this affect what users can do? What limits will you put on user behavior? What are the rules around data access, usage, and distribution? Which tools will be available to users in the cloud workspaces? What usage patterns do you want to enable?
Collaboration: Are users able and encouraged to collaborate within the marketplace? Which tools will you need to provide to enable effective collaboration?

Integrating a data marketplace with your tech stack

An internal data marketplace will, by its nature, need to interact with many parts of your existing technology stack, which may be quite diverse.

Data manufacturing

Data manufacturing tools, such as lakes and warehouses, focus on centralizing, cleansing, standardizing, and maintaining data. The ultimate goal is to reduce or remove complexity and minimize the cost of maintaining usable data. The data manufacturing process will likely generate a variety of data assets, including tables, notebooks, text files, images, and visualizations.

Data catalogs

Data catalogs provide an inventory of the data in your organization. The metadata from your catalog should be able to be imported into your data marketplace. Additionally, any assets connected to, or created within, your data marketplace should be able to be referenced within the catalog(s).

Identity and access management (IAM)

A data marketplace should integrate with your identity and access management systems. In the event that these systems are not appropriate for managing access to data, the data marketplace should be able to manage this independently. If your marketplace is also used for external parties, it’s important that it can integrate with multiple identity and access management systems.

Tools

Because your data marketplace will need to cater to a range of users, they will expect the marketplace to work well with their preferred tools. For example, in analytics use cases, this may include Excel, Tableau, Zeppelin, and many others.

‍

Getting your data marketplace

To acquire the set of capabilities you need (i.e. connectors, data assets, subscriptions, etc.), it’s important to understand the spectrum of options available. Factors such as cost (both to build and maintain), time to deploy, customizability, security, and architecture must be considered.

Cost to build: Software projects are notoriously difficult to accurately estimate in terms of time and money. The difference between different options in terms of cost can be literally millions of dollars.
Cost to maintain: What costs will be associated with supporting, maintaining, and upgrading your data marketplace over time?
Time to deploy: To prove value, will you go down the proof-of-concept (POC) or proof-of-value (POV) route? You’ll need patience from leadership and investors to get to this point.
Customizability: To what extent is customizability important? This is particularly relevant if you’re considering buying or outsourcing the build. But it also applies to those building their own platforms, as design and engineering talent will play a major part. What can your custom solution do that other options can’t provide?
User adoption risk: What can you do to encourage and incentivize platform adoption? Think about push and pull factors, as well as how the quality of the user experience will affect usage and adoption.
Methodology and know-how: Do you (or your hired help) have experience building and running data marketplaces? What experiences can you/they draw upon to improve the chances of a successful initiative?

Any decision here will involve opportunity cost. Given that you’ll be using time and resources for this, what other work will you de-prioritize in order to launch a data marketplace? To what extent do you want your people building software vs. creating data products?

First, let’s look at the relative appeal of the three broad options: buy, build, and outsource. In this table, green represents the most appealing option and red the least. Where you see more than one color in a box, this means that it very much depends on the specific capabilities of your team and/or partner.

Purpose-built, market-proven solution

The most straightforward option is to buy a purpose-built data marketplace platform and deploy it in your own environment. Look for a vendor that has the core capabilities listed earlier and a proven track record of deploying within large organizations. Customizability and methodology are where vendors will really differentiate themselves.

Self-build

Within this track, there is a spectrum here from assembling components — with custom code to connect them — to fully self-built. You’ll need to answer a key question: What can your custom solution do that other options can’t provide?

A self-built solution will be unique and the design and implementation will be under your full control. To do this, you’ll rely on internal expertise, ideally from people who have built similar systems before. Beware that this route can cost millions of dollars and take a long time to deploy.

Outsourced build

The final option is to outsource the build to another firm, typically an IT consulting firm. In order to maintain their margins, consultancies are best served by adapting frameworks and solutions developed for other clients to your project. Whether this solution will be fit for your purposes depends on whether they’ve developed a similarly-specified data marketplace for someone else already. This method can turn out to be the most expensive, slowest, and least likely to meet needs. In addition, managing a third party and keeping everything in alignment can be challenging and frustrating. Clear communication and stakeholder management is absolutely essential.

‍

Build a strong foundation

To maximize your chances of success, it’s crucial to have a strong foundation. You should be able to answer all the following questions before implementing a data marketplace.

Do you know who your users are and do you understand their use cases? To start out, you’ll need a small number of target personas and use cases with high conviction that they will adopt your data marketplace and get value.

Do you know what data is available and where the demand is? This is crucial to understand the full scope of what your marketplace will need to support and enable. It will also help you to focus and prioritize in the early stages.

Do you have a legal framework in place that clarifies what can be shared, with whom, and for what purposes? This is essential for adding precision to what data, users, and use cases are within scope — and the level of risk management that will be required.

Do you have a plan to encourage and incentivize user adoption? This is key to overcoming helping you overcome the ‘cold start problem’.

Any kind of marketplace is subject to the cold start problem, a term popularized by Andrew Chen, a General Partner at Andreessen Horowitz. The cold start problem refers to the challenges of fostering network effects when you have a two-sided marketplace. With a data marketplace, you need to balance the needs and numbers of data producers on one side and data consumers on the other, even when they’re in the same organization. Depending on your operating model, you will need to apply the right strategy to solve this.

Any decision here will involve opportunity cost. Given that you’ll be using time and resources for this, what other work will you de-prioritize in order to launch a data marketplace? To what extent do you want your people building software vs. creating data products?

First, let’s look at the relative appeal of the three broad options: buy, build, and outsource. In this table, green represents the most appealing option and red the least. Where you see more than one color in a box, this means that it very much depends on the specific capabilities of your team and/or partner.

Purpose-built, market-proven solution

The most straightforward option is to buy a purpose-built data marketplace platform and deploy it in your own environment. Look for a vendor that has the core capabilities listed earlier and a proven track record of deploying within large organizations. Customizability and methodology are where vendors will really differentiate themselves.

Self-build

Within this track, there is a spectrum here from assembling components — with custom code to connect them — to fully self-built. You’ll need to answer a key question: What can your custom solution do that other options can’t provide?

A self-built solution will be unique and the design and implementation will be under your full control. To do this, you’ll rely on internal expertise, ideally from people who have built similar systems before. Beware that this route can cost millions of dollars and take a long time to deploy.

Outsourced build

The final option is to outsource the build to another firm, typically an IT consulting firm. In order to maintain their margins, consultancies are best served by adapting frameworks and solutions developed for other clients to your project. Whether this solution will be fit for your purposes depends on whether they’ve developed a similarly-specified data marketplace for someone else already. This method can turn out to be the most expensive, slowest, and least likely to meet needs. In addition, managing a third party and keeping everything in alignment can be challenging and frustrating. Clear communication and stakeholder management is absolutely essential.

‍

Build a strong foundation

To maximize your chances of success, it’s crucial to have a strong foundation. You should be able to answer all the following questions before implementing a data marketplace.

Do you know who your users are and do you understand their use cases? To start out, you’ll need a small number of target personas and use cases with high conviction that they will adopt your data marketplace and get value.

Do you know what data is available and where the demand is? This is crucial to understand the full scope of what your marketplace will need to support and enable. It will also help you to focus and prioritize in the early stages.

Do you have a legal framework in place that clarifies what can be shared, with whom, and for what purposes? This is essential for adding precision to what data, users, and use cases are within scope — and the level of risk management that will be required.

Do you have a plan to encourage and incentivize user adoption? This is key to overcoming helping you overcome the ‘cold start problem’.

‍

If you’re ready to realize your ambitions for an internal data marketplace, chat with Harbr today.

Chat with us