Read time:
Author:
Date published:
18.12.2024
Expand Table of contents
Collapse Table of contents

To derive value from data within your organization, you need to solve for data consumption — how data is accessed, used, and understood. The ultimate aim is to get to a state where individuals across the entire organization are able to work effectively, autonomously, and compliantly with data, regardless of their role and skill.

Databricks and Harbr are powerful data platforms that, when used together, enable organizations to realize the full potential of data. While both are data platforms, they serve different and complementary purposes. Using them together means you can go beyond what either platform will enable you to achieve on its own.

Databricks and Harbr: Better together

Databricks is a Unified Analytics Platform designed to process and analyze massive amounts of data in a scalable cloud environment. It’s primarily used by technical teams who need an end-to-end data solution to process, store, analyze, and model data.

Harbr is a data marketplace platform that enables the governed access and use of data, models, and insights. It’s used by technical teams to configure and manage data products, and business teams to access, adapt, and use those data products. When used together, Databricks and Harbr address two critical challenges for businesses:

  1. The ability to process and analyze large volumes of data in a robust, scalable platform to create models and insights (Databricks)
  2. The ability to streamline and govern the access and usage of data, models, and insights across the organization (Harbr)

This combination allows businesses to solve the problem of data consumption at scale, enabling both technical and non-technical users to derive value from data without compromising on governance or efficiency.

The 3 core benefits of using Databricks and Harbr

1. Zero-copy data sharing

A significant challenge of enabling access and usage is the movement and replication of data across systems. This can lead to inefficiencies, higher costs, and compliance risks. Databricks’ Delta Share feature allows businesses to share data securely without needing to create multiple copies. Harbr integrates with this and removes the technical complexity to create a highly intuitive experience for business users. This ensures data can be shared without moving — reducing costs, increasing speed,  and enabling governed data access.

2. Federated data architecture

Data is often spread across different systems, databases, and environments. Databricks allows for a federated approach to data, meaning it can pull data from multiple sources without needing to centralize it. Harbr’s architecture complements this by curating and organizing data assets from those sources directly, or via Databricks, making it easier for users to find and access what they need without technical support or complex approval processes.

3. Curated data access

While Databricks focuses on analytics and processing, Harbr focuses on data curation and ease of use. Harbr has a highly flexible “data products” and “data assets” model. Any digital object — whether it's a table, an API, a notebook, a visualization, or even a PDF — can be registered as a data asset. Any combination of data assets can be turned into, and managed as, a data product. This enables highly-specific data curation at scale, enhancing the discoverability and usability of data and making it more accessible to non-technical users.

Different benefits for different users

The integration of Databricks and Harbr offers distinct advantages for various users within an organization:

1. Non-technical users

For non-technical users, Harbr has an AI-powered text-to-SQL query capability. This means any user can work autonomously with tabular data assets, without requiring deep technical expertise. Using this capability, they can perform key actions like evaluating, adapting, and analyzing data without intervention from IT or data engineering teams. This feature is also completely governed by the platform owner, so data owners can control which LLMs are used and even prevent data assets from being used with this feature.  

2. Technical users

Technical users, such as data engineers and data scientists, benefit from the integration because Harbr enhances the user experience at the database level. Users can take advantage of Databricks’ advanced compute capabilities through Harbr’s front-end, which provides a seamless way for users to manage their own infrastructure and allocate compute resources in a cost-effective and scalable way. This allows technical users to focus on data science and model development without worrying about setting up the underlying infrastructure.

3. Platform operators

For platform operators who manage and orchestrate data platforms, Harbr provides visibility and control over how data is shared and accessed within and between organizations and users. This includes managing the isolation of different Databricks accounts and ensuring compliance with internal governance policies. Harbr provides full transparency into how data is used and offers flexibility in managing complex data environments, allowing operators to easily track and manage data access across multiple users and teams.

What the process looks like with Databricks and Harbr in place

When both Databricks and Harbr are deployed together, the workflow becomes significantly more streamlined:

  • Data ingestion and processing: Technical teams can use tools within Databricks to ingest and process raw data, and transform it into a reliable, ‘ready-to-use’ state. Users can store this data in Databricks' Delta Lake, ensuring efficient storage and high performance for downstream usage.
  • Curation and sharing: These same teams then use Harbr to easily curate the ‘ready-to-use’ data and make it available to the right users. Whether through the creation of products that combine various data assets or by managing access control to specific assets, Harbr ensures that data is easily discoverable and accessible without technical barriers.
  • Seamless access: With Harbr, any user — technical or non-technical — can securely access the ‘ready-to-use’ data without moving it or creating additional copies, thanks to Databricks Delta Share. Users can work with data across multiple environments in a federated model, accessing it quickly and at a low cost.
  • Scaling and self-service: Databricks provides the compute power to run complex analytics and AI models at scale. Meanwhile, Harbr allows users to self-serve data access and usage, selecting the data and resources they need without requiring technical assistance, improving operational efficiency and time to value.

Harbr + Databricks: A comprehensive solution to the data consumption problem

Using Databricks and Harbr together provides a comprehensive solution to the data consumption problem. By combining Databricks’ powerful data processing with Harbr’s user-friendly data curation and governance, data access and usage become seamless for technical and non-technical users. This ensures data can be efficiently shared, accessed, and used at scale, both within and between organizations, by removing the challenges around compliance and operational complexity.

See it in action