5 Reasons Your Big Data Project Won’t Work (And What To Do About It)

[ad_1]

(WHYFRAME/Shutterstock)

Having trouble with your big data project? You’re not alone. Low success rates for big data projects have been a consistent theme over the past decade, and similar types of struggles have emerged for AI projects. A 100% success rate is not a feasible goal, but there are some adjustments you can make to get more out of your data investment.

As the world generates more data, so does its reliance on it, and businesses that do not adopt data-driven decision-making risk falling further behind. Fortunately, the sophistication of data collection, storage, management, and analysis has improved significantly over the past decade, and studies show that companies with cutting-edge data capabilities earn higher revenues than their peers. I’m here.

Similarly, there are certain patterns of data failure that repeat over and over again. Here are five common pitfalls that affect big data projects, along with some potential solutions to keep your big data projects on track.

Put it all in your data lake

More than two-thirds of businesses are getting “lasting value” from their data investments, according to a study cited by Gerrit Kazmaier, vice president of Google Cloud and general manager of databases, data analytics, and Looker. says no. At the recent launch of BigLake.

“It’s very interesting,” Kazmeier said at a press conference last month. “We all recognize that we are competing with data, but we also recognize that very few companies are actually succeeding with data. What is there?”

One big reason is the lack of data centralization. This prevents companies from extracting value from their data. Most companies of all sizes have data spread across numerous silos, including databases, file systems, applications, and other locations. Enterprises have responded to this data dilemma by placing as much data as possible in data lakes such as Hadoop and (more recently) object systems running in the cloud. Not only does it provide a central place where your data lives, it also reduces the costs associated with storing petabytes of data.

Check before diving into a data lake (Rawpixel.com/Shutterstock)

But while data lakes have addressed one problem, Kazmeier said, they have created a whole new set of problems, especially when it comes to ensuring data consistency, purity and control. “All of these organizations that tried to innovate on top of data lakes ended up finding that they were just data swamps,” he said.

Google Cloud’s latest solution to this dilemma is the lakehouse architecture unveiled by the recently announced BigLake, which blends the manageability, governance, and quality of a data warehouse with the openness of a data lake approach.

Enterprises can store data in Google Cloud Storage, an S3-compatible object storage system that supports open data formats such as Parquet and Iceberg and query engines such as Presto, Trino, and BigQuery, without sacrificing data warehouse governance. can hold .

Lakehouse architecture is one way companies are trying to overcome the natural divisions that arise between heterogeneous data sets. But the world of data is very diverse, and that’s not all.

No centralized view into data

After decades of struggling to centralize data in their data lakes, many companies have given up on the fact that data silos will exist in the near future. The goal is therefore to remove as many barriers to user access to data as possible.

The goal of big data at Capital One was to democratize user access as part of modernizing the entire data ecosystem. “In practice, it is important to make data available to all users, including analysts, engineers, and machine learning data scientists, to unlock the potential of what can be done with data,” said Biba Helou. increase. , SVP Enterprise Data Platform and Credit Card Company Risk Management Technology.

A key element of Capital One’s data democratization efforts is a centralized data catalog that provides a view into various data assets while tracking access rights and governance.

What’s really in a data silo? (jijomathaidesigners/Shutterstock)

“Obviously we’re making sure we’re doing it in a well-managed way, but we want people to see what’s out there, innovate, and have access to what they need to create great things. It’s a customer-facing product,” Helou said in a recent interview with Datanami.

The company decided to create its own data catalog. One reason is that Catalogs also allow users to create data pipelines. “So it’s a catalog, plus. It’s very interconnected with all the other systems,” she said. “We found it much easier to build an integrated solution ourselves than to get a bunch of third-party products and piece them together ourselves.”

too big and too fast

In the heyday of the Hadoop era, many companies spent a lot of money building large clusters to power their data lakes. Many of these on-premises systems used standard X86 processors and hard disks, making them more cost-effective than the data warehouses they replaced, at least on a terabyte basis. increased cost, pushing up costs.

Now that we’re firmly in the cloud era, we can look back at these investments and see where we fell short. The availability of cloud-based data warehouse and data lake services allows customers to start with a small investment and step up from there, said former Forrester analyst Jennifer Belissent, who joined Snowflake last year as principal data strategist. I’m here.

“I think that’s one of the challenges we faced, and that’s that people approached it because it requires a lot of investment up front,” Belissent said. “You get disillusioned. You don’t have to, especially if you have a cloud infrastructure. , you can incrementally add use cases, add data, and add more results.”

Have someone else build your cluster (GreenBelka/Shutterstock)

Vericent says it’s better for customers to start with small projects that have a higher chance of success and build them over time, rather than letting a risky big bang project go bankrupt quickly.

“Historically, when we talk about big data and expect people to embrace it, the industry has, by definition, [means] It’s a massive infrastructure and that’s pushing people forward,” she said. “On the other hand, if you’re looking to start small, build incrementally, and leverage cloud infrastructure, cloud infrastructure is easier to use and you don’t need to make the upfront investment to deploy it. No results and you are probably eliminating some of the disillusionment we saw in previous generations.

Belissent pointed out that Gartner has recently started emphasizing the benefits of “smaller, broader data.” A point Andrew Ng has made on the lecture circuit when it comes to AI projects.

“It’s not just about big data, it’s about right-sizing the data,” Belissent told Datanami in an interview last week. “It doesn’t have to be huge. You can start small and scale up, or diversify your data sources and grow broader. We can have a better sense of what they want and be contextualized. How we serve them.”

Just because big data projects don’t have to be big to begin with, you should also think about future expansion possibilities.

Not planning big growth in advance

One of the recurring themes of big data is the unpredictability of how users will accept new solutions. How many times have you read an article about a big data project failing in a big way? At the same time, many side projects with little chance of success can go on to great success.

In general, it’s wise to start with big data and build on your successes. However, when choosing a big data architecture, be careful not to get bogged down by choosing technologies that get in the way of shrinking the line.

Lenley Hensarling, chief strategy officer at NoSQL database company Aerospike, said: “It’s going to be big. We’re going to have big data sets. We’re going to have very high throughput in terms of the number of operations in progress.”

New databases may need to scale horizontally to support the business (ZinetroN/Shutterstock)

The folks at Aerospike call this “ambitious scale.” This is generally a more common phenomenon among internet companies. With no hardware investment required, the cloud enables businesses to increase computing power to n degrees.

However, unless your database or file system can also scale to handle the throughput, you cannot take advantage of performance in the public cloud. His modern NoSQL database can easily adapt to changing businesses, but it is limited in what it can offer. Also, database migration is never easy.

Big data has many known failure modes and definitely some unknown failure modes as well. It is important to familiarize yourself with common things. But perhaps most importantly, it’s good to know that failure should not only be expected, but welcomed as part of the process.

can’t stand failure

When using big data insights to change business strategy, there are unknown factors that can appear out of nowhere, leading to failed experiments or even unexpected successes. Keeping your cool in this difficult process is the key difference between long-term success and short-term big data failure.

According to Satyen Sangani, CEO and co-founder of data cataloging firm Alation, science is inherently speculative, and we have to embrace it. “We make hypotheses, and sometimes the hypotheses are right, sometimes they are wrong,” he said. “Sometimes we experiment, sometimes we can predict, sometimes we can’t.”

If A doesn’t work, hopefully B does (Imagentle/Shutterstock)

Sangani encourages companies to have an “exploratory mindset” and think like a venture capitalist. On the one hand, you can get a low but reliable return by making conservative investments, for example, hiring new sales representatives or expanding your headquarters. Or you can take a more speculative approach.

“Such exploratory thinking makes it difficult for people to understand themselves,” says Sangani. “When you invest in a portfolio of data assets and AI investments, you don’t get a 100% return on investment with each individual investment, but one of the investments is a 10x investment.”

After all, companies are betting on whether they can achieve any of the 10x returns from their data investments. Of course, there are many small things that need to be done right to hit Data Gold. There are many things that can go wrong, but trial and error can help you learn what works and what doesn’t. And hopefully when he hits that 10x profit he can share those learnings with us.

Related products:

Why Few People Master the Data Economy

Data Engineering Modernization at Capital One

Google Cloud opens the door to lakehouses with BigLake

tag:
AI, Big Data Failure, Data Lake, Data Management, Data Warehouse, Experimentation, Gerrit Kazmaier, Jennifer Belissent, lakehouse, Lenley Hensarling, Satyen Sangani

[ad_2]

Source link

5 Reasons Your Big Data Project Won’t Work (And What To Do About It)

Sarah Dow

15 Dr. Martens knockoffs under $100 from Soda, TUK and Asos

Leave a Reply Cancel reply

Recent Posts