[ad_1]
Data modeling is the process of representing information system objects or entities and the connections between them. Such entities could be people, products, or anything related to your business. Regardless of the entity type, modeling them correctly sets up a powerful database for fast information retrieval, efficient storage, and more.
Reference: Job Description: Big Data Modeler (TechRepublic Premium)
Given the benefits that data modeling can bring to database insights, it’s important to learn how to effectively model data in your organization. This guide points out some important mistakes to avoid when modeling your data.
Jump to:
- Don’t view a quality data model as an asset
- Not considering the use of data by applications
- Schemaless does not mean data modelless
- Failing to manage semi-structured data
- Not planning to evolve your data model
- Tightly map UI to data fields and values
- Wrong or different levels of granularity
- Inconsistent or nonexistent naming patterns
- Do not separate the concept of keys from indexes
- Data modeling started too late
Don’t view a quality data model as an asset
Melissa Coates, a Microsoft Power BI consultant, It pointed outwe might optimize a data model for a specific use case, such as analyzing sales data, but using a model quickly becomes complicated when analysts need to analyze multiple things.
For example, if a model is optimized for sales data only, it may be difficult for analysts to jump right into analyzing the intersection of sales and support calls. This is not to mention the additional time, resources, and possible costs that can be spent creating additional models if a single model were sufficient.
To prevent this kind of model inefficiency, take the time upfront to ensure that your data model offers broader applicability and long-term financial implications.
Not considering the use of data by applications
One of the most difficult things in data modeling is striking the right balance between competing interests:
- Application data needs
- performance goals
- Data acquisition method
Consider the structure of your data to the point where you can’t spend enough time analyzing how your application uses the data and striking the right balance between querying, updating, and processing the data. It’s easy to get carried away.
See: Recruitment Kit: Data Scientist (TechRepublic Premium)
Another way to describe this mistake is lack of empathy for others using the data model. A good data model considers all the users and use cases of your application and builds accordingly.
Schemaless does not mean data modelless
NoSQL databases (document, key-value, wide-column, etc.) have become a key component of enterprise data architectures as they provide flexibility for unstructured data. Although sometimes misunderstood as “schemaless” databases, it’s more accurate to think of NoSQL databases as allowing for flexible schemas. Some people also confuse the data schema with the data model, but the two serve different functions.
A data schema tells the database engine how to organize data in a database, while a data model is more conceptual, describing the data and the relationships between it. Despite this confusion about how flexible schemas affect data modeling, a developer still needs to model data in his NoSQL database, just like relational databases. Depending on the type of NoSQL database, its data model can be simple (key-value) or more sophisticated (document).
Failing to manage semi-structured data
Most data today is unstructured or semi-structured, but like mistake #3, this doesn’t mean your data model should follow those same formats. It’s helpful to put off thinking about how to structure your data when ingesting, but this almost inevitably hurts you. , applying rigor to the data model rather than taking a no-touch approach during data acquisition.
Not planning to evolve your data model
Given how much work is required to map a data model, it’s easy to think that once you’ve built your data model, you’re done. Prefect’s girlfriend Anna Geller said: “Building data assets is an ongoing process.”
One way to facilitate data model evolution is to “split and separate data transformations,” she continued. [to] In the long run it makes the whole process easier to build, debug and maintain. ”
Tightly map UI to data fields and values
Tailwind Labs partner Steve Schoger said: highlight, “Don’t be afraid to ‘think outside the database'”. He goes on to explain that the UI doesn’t necessarily have to map directly to each data field and value. This mistake tends to come from sticking to the data model rather than the underlying information architecture. This problem also means that he is likely presenting the data in a way that is more intuitive to the intended audience of the application than his one-to-one mapping of the underlying data model.
Wrong or different levels of granularity
In analytics, granularity refers to the level of detail we can see. For example, a SaaS business might want to see the consumption level of a service per day, hour, or minute. Getting the right granularity in your data model is important. Because if the granularity is too fine, you’ll end up with all sorts of unnecessary data, which can complicate deciphering and sorting everything.
However, if the granularity is too small, you may not get enough detail to pull out important details and trends. Now add the possibility that the granularity is focused on daily numbers, but your business needs to determine the difference between peak and off-peak consumption. At that point, you’ll be dealing with mixed grain sizes, which will confuse your users. Determining the exact data use cases for internal and external users is an important first step in determining the required level of detail for your model.
Inconsistent or nonexistent naming patterns
Rather than inventing your own naming convention, it’s better to follow a standard approach with a data model. For example, the lack of consistent logic in how tables are named makes it very difficult to follow the data model. While it may seem clever to come up with obscure naming conventions that relatively few people will immediately understand, this will inevitably lead to confusion later on. Especially when new people are hired to use these models.
Do not separate the concept of keys from indexes
Keys and indexes serve different functions in a database. As Bert Scalzo explains, “Keys enforce business rules. It’s a logical concept. Indexes speed up database access. It’s a purely physical concept.”
Many people confuse the two and end up not implementing candidate keys, thereby reducing the index. In the process, it also degrades performance. Scalzo went on to offer the following advice: [that] All keys can be effectively supported. ”
Data modeling started too late
If the data model is the blueprint that describes the application’s data and how that data interacts, it makes little sense for the big data modeler to start building the application before the data model is fully scoped out. there is not. But this is exactly what many developers do.
Understanding the shape and structure of your data is essential to application performance and ultimately user experience. This is the first thing to consider and we are back to Mistake #1. In other words, don’t view the quality data model as an asset. Failing to plan your data model is basically planning to fail (and planning to do a lot of refactoring later to fix your mistakes).
Disclosure: I work with MongoDB, but the views expressed here are mine.
Reference: Top Data Modeling Tools (TechRepublic)
[ad_2]
Source link