Data marketplaces are all about solving the ‘last mile’ problem for data. This means closing the gap between data producers and consumers.
This is true whether you’re trying to access data directly in a source system, through an interface, or in a target location. This guide sets out practical considerations for launching your own internal data marketplace — that is, a marketplace that empowers data access across your entire organization.
Data marketplaces solve data access at scale. Solving the data access problem is critical for Chief Data Officers and their data teams to:
Ideally, you’ve already identified the need for an internal data marketplace, but if not, there are certain factors that indicate a pressing need.
The unfortunate reality is that business users share a seemingly universal complaint: it simply takes too long to access the data they need. Even when data requests are highly specific, it’s not uncommon for them can take weeks or even months to fulfill. Friction and delays lead to frustration, and in turn, a sense of futility. An internal data marketplace can short-circuit this issue by providing rapid access to data and tooling.
It’s all too common for data users to mistrust or have low confidence in the data. Similarly, data owners don’t trust users with the data, and to mitigate risk, they put barriers in place. A data marketplace with proper governance capabilities enables data owners to grant access to users while tightly managing risk. Data users also gain clarity about the data, what it can be used for, and the source. This gives them greater confidence that they’re using the right data for their needs.
Data silos are a ubiquitous phenomenon in the modern enterprise. While there are practical reasons for them to exist, they can result in inhibited value creation, duplication of effort, wasted resources, and frustration. The primary reason for a data marketplace is to enable data access, which means overcoming data silos.
Users will have different technical skills, use cases, and preferred tools. Additionally, a significant amount of collaboration will be required between users to deliver successful outcomes from data. Internal data marketplaces are purpose-built to serve the full range of data consumers and the best ones will include collaborative and asynchronous workflows.
A data catalog is a technology that creates an inventory of (typically) internal data. It's a way to start understanding what data is available, but catalogs tend not to provide a view of the permitted use cases of the data, multiple ways of accessing it, or the ability to manage data access across technical and organizational boundaries. Most firms already have at least one data catalog in place, which is a helpful foundation for a data marketplace.
Learn more in our explainer comparing data catalogs and marketplaces.
Now that we can recognize some of the indicators of needing a data marketplace, let’s briefly look at what happens when you don’t have one.
The reality is that until you enable flexible, governed data access, you will struggle to deliver value at any scale.
When done well, private data marketplaces are transformative for organizations. But what’s required for success? Your data marketplace should enable:
To be successful as a single interface for data access, your data marketplace needs to work for a wide range of user personas — from data engineers and data scientists through to data analysts, business users, and executives. Serving these different users will have implications for everything from the look and feel of each user journey to the level of collaboration and asynchronous working that is required.
Data sometimes means rows and columns, but often it means files, images, reports, and visualizations. To be successful, a data marketplace will need to support every digital asset that data owners provide, as well as every format.
Your data marketplace will need to adapt to your business environment, with scalable controls around how data, organizations, and users are managed. It must enable one or more operating models, which may include:
If sharing data outside of your organization, additional operating models include:
To effectively serve your business and the range of users and use cases, your internal data marketplace will need certain features and capabilities.
Connectors
Manage the creation and maintenance of connectors to read, write, and copy data to and from a range of sources such as cloud databases, data lakes/warehouses, on-premises, and desktops.
Data assets
Full lifecycle management of data assets. This includes ownership, entitlements, and lineage. Data assets should include tables, notebooks, images, visualizations, text, etc.
Data products
Full lifecycle management of data products, including asynchronous processes to create, publish, manage, and delete.
Subscriptions
Record and enforce the entitlements set out in the data product subscriptions covering duration of access, type of access, number of users, etc.
Storefront
A capable and intuitive storefront for data products allowing a range of personas to easily discover and understand what’s available to them.
Interfaces
Host a range of interfaces to balance user experience and risk. This may include cleanrooms, sandboxes, workbenches, query engines BI tools, etc.
Data transformation
Manage the creation and maintenance of code used to create custom data assets and the ability to automate code execution on a scheduled and event-driven basis.
Export
Manage the creation and maintenance of data pipelines, file transfers, and downloads on a one-off, scheduled, or event-driven basis.
Identity and access management (IAM)
Control all user and system access to the platform, services, and tools.
Miscellaneous
A range of services covering entity reference, deletion, monitoring, lineage, authentication, security, etc.
Understanding and serving users is key to a successful internal data marketplace. There are three broad user personas: platform operators, data producers, and data consumers. These are often groups of people, not single individuals, working together to achieve specific goals. Individual users may take on multiple personas at different times. For example, it’s very common for the platform operator to also act as a data producer and a data consumer. Meeting user needs and building momentum is critical at all stages, especially at the start.
Platform operators are responsible for running the data marketplace. They will set the standards in several key areas:
Data producers are those who configure and manage data assets and products. When establishing your internal data marketplace, data producers should think about:
Data consumers are the end users of your data assets and products. Consider how the following will affect and empower your organization’s data consumers:
An internal data marketplace will, by its nature, need to interact with many parts of your existing technology stack, which may be quite diverse.
Data manufacturing tools, such as lakes and warehouses, focus on centralizing, cleansing, standardizing, and maintaining data. The ultimate goal is to reduce or remove complexity and minimize the cost of maintaining usable data. The data manufacturing process will likely generate a variety of data assets, including tables, notebooks, text files, images, and visualizations.
Data catalogs provide an inventory of the data in your organization. The metadata from your catalog should be able to be imported into your data marketplace. Additionally, any assets connected to, or created within, your data marketplace should be able to be referenced within the catalog(s).
A data marketplace should integrate with your identity and access management systems. In the event that these systems are not appropriate for managing access to data, the data marketplace should be able to manage this independently. If your marketplace is also used for external parties, it’s important that it can integrate with multiple identity and access management systems.
Because your data marketplace will need to cater to a range of users, they will expect the marketplace to work well with their preferred tools. For example, in analytics use cases, this may include Excel, Tableau, Zeppelin, and many others.
To acquire the set of capabilities you need (i.e. connectors, data assets, subscriptions, etc.), it’s important to understand the spectrum of options available. Factors such as cost (both to build and maintain), time to deploy, customizability, security, and architecture must be considered.
Any decision here will involve opportunity cost. Given that you’ll be using time and resources for this, what other work will you de-prioritize in order to launch a data marketplace? To what extent do you want your people building software vs. creating data products?
First, let’s look at the relative appeal of the three broad options: buy, build, and outsource. In this table, green represents the most appealing option and red the least. Where you see more than one color in a box, this means that it very much depends on the specific capabilities of your team and/or partner.
The most straightforward option is to buy a purpose-built data marketplace platform and deploy it in your own environment. Look for a vendor that has the core capabilities listed earlier and a proven track record of deploying within large organizations. Customizability and methodology are where vendors will really differentiate themselves.
Within this track, there is a spectrum here from assembling components — with custom code to connect them — to fully self-built. You’ll need to answer a key question: What can your custom solution do that other options can’t provide?
A self-built solution will be unique and the design and implementation will be under your full control. To do this, you’ll rely on internal expertise, ideally from people who have built similar systems before. Beware that this route can cost millions of dollars and take a long time to deploy.
The final option is to outsource the build to another firm, typically an IT consulting firm. In order to maintain their margins, consultancies are best served by adapting frameworks and solutions developed for other clients to your project. Whether this solution will be fit for your purposes depends on whether they’ve developed a similarly-specified data marketplace for someone else already. This method can turn out to be the most expensive, slowest, and least likely to meet needs. In addition, managing a third party and keeping everything in alignment can be challenging and frustrating. Clear communication and stakeholder management is absolutely essential.
To maximize your chances of success, it’s crucial to have a strong foundation. You should be able to answer all the following questions before implementing a data marketplace.
Do you know who your users are and do you understand their use cases? To start out, you’ll need a small number of target personas and use cases with high conviction that they will adopt your data marketplace and get value.
Do you know what data is available and where the demand is? This is crucial to understand the full scope of what your marketplace will need to support and enable. It will also help you to focus and prioritize in the early stages.
Do you have a legal framework in place that clarifies what can be shared, with whom, and for what purposes? This is essential for adding precision to what data, users, and use cases are within scope — and the level of risk management that will be required.
Do you have a plan to encourage and incentivize user adoption? This is key to overcoming helping you overcome the ‘cold start problem’.
Any kind of marketplace is subject to the cold start problem, a term popularized by Andrew Chen, a General Partner at Andreessen Horowitz. The cold start problem refers to the challenges of fostering network effects when you have a two-sided marketplace. With a data marketplace, you need to balance the needs and numbers of data producers on one side and data consumers on the other, even when they’re in the same organization. Depending on your operating model, you will need to apply the right strategy to solve this.
Any decision here will involve opportunity cost. Given that you’ll be using time and resources for this, what other work will you de-prioritize in order to launch a data marketplace? To what extent do you want your people building software vs. creating data products?
First, let’s look at the relative appeal of the three broad options: buy, build, and outsource. In this table, green represents the most appealing option and red the least. Where you see more than one color in a box, this means that it very much depends on the specific capabilities of your team and/or partner.
The most straightforward option is to buy a purpose-built data marketplace platform and deploy it in your own environment. Look for a vendor that has the core capabilities listed earlier and a proven track record of deploying within large organizations. Customizability and methodology are where vendors will really differentiate themselves.
Within this track, there is a spectrum here from assembling components — with custom code to connect them — to fully self-built. You’ll need to answer a key question: What can your custom solution do that other options can’t provide?
A self-built solution will be unique and the design and implementation will be under your full control. To do this, you’ll rely on internal expertise, ideally from people who have built similar systems before. Beware that this route can cost millions of dollars and take a long time to deploy.
The final option is to outsource the build to another firm, typically an IT consulting firm. In order to maintain their margins, consultancies are best served by adapting frameworks and solutions developed for other clients to your project. Whether this solution will be fit for your purposes depends on whether they’ve developed a similarly-specified data marketplace for someone else already. This method can turn out to be the most expensive, slowest, and least likely to meet needs. In addition, managing a third party and keeping everything in alignment can be challenging and frustrating. Clear communication and stakeholder management is absolutely essential.
To maximize your chances of success, it’s crucial to have a strong foundation. You should be able to answer all the following questions before implementing a data marketplace.
Do you know who your users are and do you understand their use cases? To start out, you’ll need a small number of target personas and use cases with high conviction that they will adopt your data marketplace and get value.
Do you know what data is available and where the demand is? This is crucial to understand the full scope of what your marketplace will need to support and enable. It will also help you to focus and prioritize in the early stages.
Do you have a legal framework in place that clarifies what can be shared, with whom, and for what purposes? This is essential for adding precision to what data, users, and use cases are within scope — and the level of risk management that will be required.
Do you have a plan to encourage and incentivize user adoption? This is key to overcoming helping you overcome the ‘cold start problem’.
If you’re ready to realize your ambitions for an internal data marketplace, chat with Harbr today.