When we discuss data, we often approach it from a bottom-up technology perspective, focused on things like data plumbing. Centralization of pipelines and data lakes. A more productive approach is to understand why companies invest in technology. The reason is to democratize data and make it available to as many people as possible within the company so they can make better, data-driven decisions.

An enterprise data strategy should enable users to understand and visualize data in the context of specific use cases. Beyond that, data should be represented in context regardless of its location and decoupled from the underlying data infrastructure. At the same time, it must also adhere to open standards. For example, analysts on the factory floor typically do not have data plumbing skills, but they need to be able to use relevant data to quickly make decisions about the resilience of their supply chain or the impact of a particular product recall. I have.

Self-service is essential to enable citizen data users to explore, visualize, and find information without relying on time-constrained and highly-skilled data engineers and data scientists who are in high demand. That said, self-service for citizen data users must also be balanced against IT’s need to maintain control over enterprise data platforms to ensure high security, good data governance, and operational reliability. I have.

A data strategy is more important than a cloud strategy

Most companies will end up with a hybrid cloud strategy that deploys workloads where it makes the most sense: on-premises, multiple clouds, or edge environments. The challenge of data management is to store, process, protect and manage all the data distributed in these environments.

A modern data architecture must provide a layer of data services that enable the movement of data, metadata, and workloads across the hybrid cloud with full access control, data lineage, and audit logging .

Overall, your data strategy must understand your cloud strategy so that they work together and reinforce each other.

A modern data architecture that makes the most of your hybrid cloud

Data originates in the cloud or at the edge on-premises. The goal is to retrieve, store, and process this data in its original form without losing contextual information about the data and its authoritative sources.

A data management architecture must provide a unified view of all data assets with consistent security and governance regardless of location. Here are the most important features of modern data architecture for hybrid environments:

The data fabric is an integral part

A data fabric manages the entire lifecycle of storing, processing, protecting, and analyzing data, regardless of where it resides. A data fabric connects disparate data repositories and provides consistency in security, governance, and data management capabilities across on-premises infrastructure and multiple clouds.

For example, businesses can leverage the data fabric to collect customer data from various touchpoints such as CRM systems, social media, and websites to create a 360-degree customer profile. Marketing can then use customer sentiment analysis to segment customers or launch targeted campaigns based on consumer preferences.

Another example of a data fabric architecture in a multi-cloud environment might include AWS for customer data, Microsoft Azure for advertising data, and Cloudera providing analytics services on the Cloudera Data Platform. A data fabric architecture ties these environments together to create a unified view of the data.

A data fabric uses data services and APIs to provide a holistic view and collect data from legacy systems, data lakes, data warehouses, and various enterprise applications. The problem of data gravity (data becomes harder to move as its size grows) is mitigated by a data fabric that abstracts the technical complexities associated with moving, transforming, and integrating data.

A data fabric has six basic components common to most vendors.

  1. data management Provides data governance and security.
  2. Data ingestion Combine cloud data and establish connections between structured and unstructured data.
  3. information processing Filter your data so that only relevant data is used in data extraction.
  4. data orchestration Transform, integrate, and cleanse your data to make it fully usable.
  5. data discovery New opportunities emerge to integrate disparate data sources.
  6. data access Ensure proper permissions and view relevant data through dashboards and other data visualization tools.

These elements enable portability of applications, data, and metadata across on-premises and cloud boundaries while tracking data movement and, most importantly, without requiring application code changes. .

Hybrid cloud makes managing metadata more complex

A hybrid cloud contains data and associated metadata distributed across different clouds. Metadata management involves collecting and storing metadata associated with various asset types, and making it available to downstream applications as needed.

An effective metadata management system enables flexible movement of data and associated metadata across the hybrid cloud without losing context or context of data assets. In fact, a consistent data context simplifies data distribution and analysis by defining a multi-tenant data access model once and applying it seamlessly everywhere.

Metadata should include information about database schemas, security policies, audit logs, and data lineage and provenance, and the management system should allow all of this metadata to be viewed and managed centrally. is needed. This means that users who need to explore data on demand through various analytics engines have a layer of shared services that provide all the metadata, context, and state information they need for a consistent view of their data assets. must be

Separate storage and compute for efficiency

By decoupling storage and compute, businesses can save money overall by temporarily shutting down compute clusters to avoid unnecessary spending. You can also independently scale storage and compute resources to meet your business needs.

Data access layers, application APIs, and metadata repositories should provide the abstractions needed to separate application code from the underlying infrastructure. Data can move freely across cloud boundaries by leveraging cloud object stores such as Apache Ozone O3, Amazon S3, Azure Data Lake Storage (ADLS), and Google Cloud Storage (GCS) for on-premises.

Ensure data is always protected, trusted and compliant

This is a hybrid cloud architecture and data access policies and lineage should be consistent between private and public clouds. Otherwise, you’ll have gaps in your audit logs, leading to a compliance nightmare. The challenge is that each cloud in use (US public cloud, EU public cloud, or private cloud) may have different governance rules for access and control.

A hybrid data platform should also provide cross-platform security and governance. Consistent metadata-driven security and governance across all clouds is essential to the success of hybrid clouds and a requirement for the continuous movement of data and services.


Choosing the right vendor in the crowded cloud database management system (CDMS) market can be difficult. To simplify the process, we present three solution attributes required for successful vendor partnerships.

First, all the tools your team uses to access data must be supported. Vendors must provide self-service analytics across a variety of tools such as Tableau, Power BI, and Jupyter Notebooks, whether on-premises or in Google Cloud, AWS, or Azure, to avoid significant training effort .

Second, anything end-user related in terms of APIs, file formats, or engines should run on community-supported open-source software. Consider an open-source data platform with cloud-agnostic capabilities and easy migration of data assets, metadata, and workloads.

A third attribute is interoperability. Regardless of where the data resides, you need to be able to securely move data, applications, and users between on-premises infrastructure and multiple clouds in both directions without changing a single line of code. Related to this, the platform should also support deployment of containerized workloads such as his Docker and Kubernetes.

Data democratization is now possible in a hybrid world. The power of data, previously in the hands of a few data scientists, is now available to non-data experts via his hybrid data platform. Data democratization makes data accessible to all employees who need it, and when done right, can drive your company’s performance to new heights.

Follow me please twitter or LinkedIn. check out My website or some of my other works can be found here.

Source link

Next Post

Leave a Reply

Your email address will not be published. Required fields are marked *