The 10 Most Common Data Modeling Mistakes

[ad_1]

Big data visualization. — Image: Garrykillian/Adobe Stock

Data modeling is the process of representing information system objects or entities and the connections between them. Such entities could be people, products, or anything related to your business. Regardless of the entity type, modeling them correctly sets up a powerful database for fast information retrieval, efficient storage, and more.

Reference: Job Description: Big Data Modeler (TechRepublic Premium)

Given the benefits that data modeling can bring to database insights, it’s important to learn how to effectively model data in your organization. This guide points out some important mistakes to avoid when modeling your data.

Jump to:

Don’t view a quality data model as an asset
Not considering the use of data by applications
Schemaless does not mean data modelless
Failing to manage semi-structured data
Not planning to evolve your data model
Tightly map UI to data fields and values
Wrong or different levels of granularity
Inconsistent or nonexistent naming patterns
Do not separate the concept of keys from indexes
Data modeling started too late

Don’t view a quality data model as an asset

Melissa Coates, a Microsoft Power BI consultant, It pointed outwe might optimize a data model for a specific use case, such as analyzing sales data, but using a model quickly becomes complicated when analysts need to analyze multiple things.

For example, if a model is optimized for sales data only, it may be difficult for analysts to jump right into analyzing the intersection of sales and support calls. This is not to mention the additional time, resources, and possible costs that can be spent creating additional models if a single model were sufficient.

To prevent this kind of model inefficiency, take the time upfront to ensure that your data model offers broader applicability and long-term financial implications.

Not considering the use of data by applications

One of the most difficult things in data modeling is striking the right balance between competing interests:

Application data needs
performance goals
Data acquisition method

Consider the structure of your data to the point where you can’t spend enough time analyzing how your application uses the data and striking the right balance between querying, updating, and processing the data. It’s easy to get carried away.

See: Recruitment Kit: Data Scientist (TechRepublic Premium)

Another way to describe this mistake is lack of empathy for others using the data model. A good data model considers all the users and use cases of your application and builds accordingly.

Schemaless does not mean data modelless

NoSQL databases (document, key-value, wide-column, etc.) have become a key component of enterprise data architectures as they provide flexibility for unstructured data. Although sometimes misunderstood as “schemaless” databases, it’s more accurate to think of NoSQL databases as allowing for flexible schemas. Some people also confuse the data schema with the data model, but the two serve different functions.

A data schema tells the database engine how to organize data in a database, while a data model is more conceptual, describing the data and the relationships between it. Despite this confusion about how flexible schemas affect data modeling, a developer still needs to model data in his NoSQL database, just like relational databases. Depending on the type of NoSQL database, its data model can be simple (key-value) or more sophisticated (document).

Failing to manage semi-structured data

Most data today is unstructured or semi-structured, but like mistake #3, this doesn’t mean your data model should follow those same formats. It’s helpful to put off thinking about how to structure your data when ingesting, but this almost inevitably hurts you. , applying rigor to the data model rather than taking a no-touch approach during data acquisition.

Not planning to evolve your data model

Given how much work is required to map a data model, it’s easy to think that once you’ve built your data model, you’re done. Prefect’s girlfriend Anna Geller said: “Building data assets is an ongoing process.”

One way to facilitate data model evolution is to “split and separate data transformations,” she continued. [to] In the long run it makes the whole process easier to build, debug and maintain. ”

Tightly map UI to data fields and values

Tailwind Labs partner Steve Schoger said: highlight, “Don’t be afraid to ‘think outside the database'”. He goes on to explain that the UI doesn’t necessarily have to map directly to each data field and value. This mistake tends to come from sticking to the data model rather than the underlying information architecture. This problem also means that he is likely presenting the data in a way that is more intuitive to the intended audience of the application than his one-to-one mapping of the underlying data model.

Wrong or different levels of granularity

In analytics, granularity refers to the level of detail we can see. For example, a SaaS business might want to see the consumption level of a service per day, hour, or minute. Getting the right granularity in your data model is important. Because if the granularity is too fine, you’ll end up with all sorts of unnecessary data, which can complicate deciphering and sorting everything.

However, if the granularity is too small, you may not get enough detail to pull out important details and trends. Now add the possibility that the granularity is focused on daily numbers, but your business needs to determine the difference between peak and off-peak consumption. At that point, you’ll be dealing with mixed grain sizes, which will confuse your users. Determining the exact data use cases for internal and external users is an important first step in determining the required level of detail for your model.

Inconsistent or nonexistent naming patterns

Rather than inventing your own naming convention, it’s better to follow a standard approach with a data model. For example, the lack of consistent logic in how tables are named makes it very difficult to follow the data model. While it may seem clever to come up with obscure naming conventions that relatively few people will immediately understand, this will inevitably lead to confusion later on. Especially when new people are hired to use these models.

Do not separate the concept of keys from indexes

Keys and indexes serve different functions in a database. As Bert Scalzo explains, “Keys enforce business rules. It’s a logical concept. Indexes speed up database access. It’s a purely physical concept.”

Many people confuse the two and end up not implementing candidate keys, thereby reducing the index. In the process, it also degrades performance. Scalzo went on to offer the following advice: [that] All keys can be effectively supported. ”

Data modeling started too late

If the data model is the blueprint that describes the application’s data and how that data interacts, it makes little sense for the big data modeler to start building the application before the data model is fully scoped out. there is not. But this is exactly what many developers do.

Understanding the shape and structure of your data is essential to application performance and ultimately user experience. This is the first thing to consider and we are back to Mistake #1. In other words, don’t view the quality data model as an asset. Failing to plan your data model is basically planning to fail (and planning to do a lot of refactoring later to fix your mistakes).

Disclosure: I work with MongoDB, but the views expressed here are mine.

Reference: Top Data Modeling Tools (TechRepublic)

[ad_2]

Source link

The 10 Most Common Data Modeling Mistakes

Sarah Dow

8 best places to make money right now

Leave a Reply Cancel reply

Recent Posts