Entity types in the database. Database fundamentals. ER-model (entity-relationship). Defining Entity Types
Creating a database begins with design.
Database design stages:
· Study of the subject area;
· Data analysis (entities and their attributes);
· Definition of relations between entities and definition of primary and secondary (foreign) keys.
During the design process, the structure of the relational database is determined (composition of tables, their structure and logical connections). The structure of a table is determined by the composition of the columns, the data type and sizes of the columns, and the keys of the table.
Back to basic concepts database models "entity-relationship" include: entities, relationships between them and their attributes (properties).
Essence - any concrete or abstract object in the subject area under consideration. Entities are the basic types of information that are stored in the database (in a relational database, each entity is assigned a table). Entities can include: students, clients, departments, etc. Entity instance and entity type are different concepts. The concept of an entity type refers to a set of homogeneous persons, objects or events acting as a whole (for example, a student, a client, etc.). An entity instance refers, for example, to a particular person in a set. An entity type can be a student, and an instance can be Petrov, Sidorov, etc.
Attribute is a property of an entity in the subject area. Its name must be unique for a particular entity type. For example, for the entity student, the following attributes can be used: last name, first name, patronymic, date and place of birth, passport data, etc. In a relational database, attributes are stored in table fields.
Connection – the relationship between entities in the subject area. Relationships are connections between parts of the database (in a relational database, this is a connection between table records).
Entities are data that are classified by type, and relationships show how these data types relate to each other. If we describe a certain subject area in terms of an entity - a connection, then we get model entity - relationship for this database.
Consider the subject area: Dean's Office (Student Achievement)
The database "Dean's office" should store data on students, groups of students, student grades in various disciplines, teachers, scholarships, etc. We confine ourselves to data on students, groups of students, and student grades in various disciplines. Let's define entities, entity attributes and basic requirements for database functions with limited data.
The main subject-significant entities of the database "Dean's Office" are: Students, Groups of students, Disciplines, Progress.
The main subject-relevant attributes of entities:
-students - last name, first name, patronymic, gender, date and place of birth, a group of students;
-groups of students - name, course, semester;
-disciplines - name, number of hours
- progress - assessment, type of control.
Basic requirements for database functions:
- select the student's progress in disciplines indicating the total number of hours and type of control;
-select students' performance by groups and disciplines;
- choose the disciplines studied by a group of students in a particular course or
certain semester.
From the analysis of the subject area data, it follows that each entity must be assigned the simplest two-dimensional table (relationships). Next, you need to establish logical relationships between tables. It is necessary to establish such a relationship between the Students and Progress tables so that each record from the Students table corresponds to several records in the Progress table, i.e. one-to-many, as each student can have multiple grades.
The logical relationship between Group entities - Students is defined as one-to-many based on the fact that there are many students in the group, and each student is part of one group. The logical relationship between the entities Disciplines - Progress is defined as one-to-many, because for each discipline, several grades can be given to different students.
Based on the above, we compose an entity-relationship model for the Dean's office database
Arrow is symbol relationships: one-to-many.
To create a database, you need to use one of the well-known DBMS, for example, the Access DBMS.
The concept of the ER model. The concept of an entity. attributes. Attribute types
1. What problems can a developer have when designing a database?
When designing a database and developing a software product, the most important problem is the problem of interaction between the developer and the customer. The task of the developer is to most accurately recreate the wishes of the customer when developing a database management software product. The main problem that the developer needs to solve is the correct construction of the database, or rather the schema (structure) of the database.
In addition, the developer additionally encounters other difficulties, which include:
- search for efficient algorithms;
- selection of appropriate data structures;
- debugging and testing complex code;
- design and usability of the application interface.
On development stage software manager of the database, the developer must learn in detail the requirements of the customer. The database should be designed in such a way that it is understandable, most accurately reflects the problem being solved, and does not contain redundancy in the data.
To facilitate the process of developing (designing) a database, the so-called semantic models data. For different types The most famous database is the ER data model (Entity-Relationship model).
2. What is an ER-model (Entity-relationship model)? Why do you need to develop an ER model?
ER-model (Entity-relationship model or Entity-relationship diagram) is a semantic data model that is designed to simplify the database design process. All types of databases can be generated from the ER model: relational, hierarchical, network, object. The ER model is based on the concepts of "entity", "relationship" and "attribute".
For large databases, building an ER model avoids design errors that are extremely difficult to fix, especially if the database is already in operation or at the testing stage. Mistakes in the design of the database structure can lead to alteration of the code of the software that manages this database. As a result, time, money and human resources will be used inefficiently.
An ER model is a representation of a database in the form of visual graphic diagrams. An ER model visualizes a process that defines a certain subject area. An entity-relationship diagram is a diagram that graphically represents entities, attributes, and relationships.
The ER model is only a conceptual level of modeling. The ER model does not contain implementation details. For the same ER model, the details of its implementation may differ.
3. What is an entity in a database? Examples
An entity in a database is any object in a database that can be distinguished based on the essence of the subject area for which this database is being developed. The database designer must be able to properly define entities.
Example 1 The following entities can be distinguished in the bookstore database:
- book;
- provider;
- store placement.
Example 2 In the database of accounting for the educational process of some educational institution, the following entities can be distinguished:
- students (pupils);
- teachers;
- groups;
- disciplines being studied.
4. What are the varieties of entity types? Designation of entity types in the ER model
In the "entity" - "relationship" model, there are two types of entity types:
- weak type. This entity type is dependent on a strong entity;
- strong type. This is an independent type of entity that does not depend on anyone.
Figure 1 shows the weak and strong entity type designations in the ER model.
Rice. 1. Designation of strong and weak entity types
5. What are attributes for? Attribute types. Designation of attributes on the ER model
Each entity type has a specific set of attributes. Attributes are intended to describe a particular entity.
There are the following types of attributes:
- simple attributes. These are attributes that can be part of compound attributes. These attributes consist of one component. For example, simple attributes include: the code of a book in a library or a student's course in an educational institution;
- composite attributes. These are attributes that consist of several simple attributes. For example, the address of residence may contain the name of the country, locality, streets, house numbers;
- unambiguous attributes. These are attributes that contain only one single value for some entity. For example, the attribute "Grade book number" for the "Student" entity type is unambiguous, since a student can have only one grade book number (one value);
- polysemantic attributes. These are attributes that can contain multiple values. For example, the multi-valued attribute "Phone number" for the entity "Student", since a student can have several phone numbers (home, mobile, etc.);
- arbitrary attributes. These are attributes whose value is formed based on the values of other attributes. For example, a student's current year of study can be calculated based on the difference between the current year of study and the student's year of entry into educational institution(if the student had no problems with his studies and studied well the discipline "Organization of databases and knowledge").
On the ER diagram, attributes are designated as shown in Figure 2. As can be seen from the figure, any attribute is designated as an ellipse with a name inside the ellipse. If the attribute is the primary key, then its name is underlined.
Figure 2. Representation of attributes on ER model diagrams
6. How are entity types and attributes of the ER model implemented in real databases and the programs they manage?
When developing database management programs, entity types and their attributes can be represented in different ways, while adhering to several approaches:
- choose a well-known technology as a data source (for example, Microsoft SQL Server, Oracle Database, Microsoft Access, Microsoft ODBC Data Source, etc.), which has already been researched, tested, standardized and has a huge set of database management tools;
- develop own format database and implement methods for its processing, and implement interaction with known data sources in the form special teams like Import/Export. In this case, you will have to program all the routine work to maintain and ensure the reliable operation of the database with your own hands;
- implement a combination of the above two approaches. Modern means software developers have a powerful set of libraries for processing complex sets and visualizing data in them (collections, arrays, visualization components, etc.).
If the database is implemented in well-known relational DBMS (for example, Microsoft Access, Microsoft SQL Server, etc.), then the entity types are represented by tables. Attributes from the ER model correspond to the fields of the table. One entry in a database table represents one entity instance.
Each kind of attribute is implemented as follows:
- simple attribute or an unambiguous attribute can be represented by an accessible set of basic types that are found in any programming language. For example, integer attributes are represented by type int , integer , uint , and so on; attributes containing a fractional part can be represented by type float , double ; string attributes of type string, etc.;
- composite attribute is an object that includes several nested simple attributes. For example, in the Microsoft Access DBMS, a composite attribute of a certain table can be formed based on a set of simple types (fields). In programming languages, the union of fields is implemented by structures or classes;
- multivalued attribute can be implemented as an array or collection of simple or compound attributes;
- arbitrary attribute implemented by an additional field that is calculated when accessing the table. Such a field is called a calculated field and is formed on the basis of other fields in the table;
- attribute that is the primary key can be integer, string, or any other ordinal type. In this case, the value of each table cell that corresponds to the primary key is unique. Most often, the primary key is an integer type (int , integer ).
If the database is implemented in a unique format, then it is most convenient to represent entity types as classes or structures. Entity attributes are implemented as fields (internal data) of the class. Class methods implement the necessary processing of class fields (attributes). Interaction (communication) between classes is implemented using specially designed interfaces using well-known design patterns.
7. An example of a fragment of the ER model for the entity type "Student"
The above example demonstrates a fragment of the ER model for the Student entity type.
Figure 3. Fragment of the ER model for the entity type "Student"
The above figure declares the following attributes, which in the DBMS (program) can have the following types:
- attribute Primary key - is a unique integer value that is generated automatically. In the DBMS, this is a counter field;
- entry year attribute is a simple attribute that can be implemented as an integer value (int , integer );
- attribute Phone number is a multi-valued attribute that can be implemented as an array or collection, etc.;
- attribute Record book number- a simple attribute that can be implemented as a character string, since the gradebook number can contain letters in addition to numbers;
- the attribute Country , City , Street , House number are the attributes that form the composite attribute Address . All these attributes can be of string (text) type (string , Text );
- the attribute Last Name , First Name , Patronymic are simple attributes that are part of the compound attribute Student Name . All these attributes can be of string (text) type (string , Text );
- the Birthday attribute is a simple attribute of the Date type (DateTime );
- attribute Student age is a calculated field that is defined as the difference between the current (system) date and the value of the Birthday attribute.
The term "relational" means "relationship-based". A relational database consists of entities (tables) that have some relationship with each other. The name came from English word relation.
Database design consists of two main phases: logical and physical modeling.
During logical modeling, you collect requirements and develop a database model that is independent of a particular DBMS (relational database management system). It's like creating blueprints for your house. You could think over and draw everything: where the kitchen, bedrooms, living room will be. But this is all on paper and in layouts.
During physical modeling, you create a model that is optimized for a specific application and DBMS. It is this model that is implemented in practice. If we return to the house from the previous paragraph, at this stage you will have to build a house somewhere - carry logs, bricks ...
The database design process consists of the following steps:
- collection of information;
- definition of entities;
- defining attributes for each entity;
- defining relationships between entities;
- normalization;
- transformation to a physical model;
- database creation.
The first 5 stages form the logical design phase and the remaining two form the physical modeling phase.
Logic phase
The logical phase consists of several stages. They are all discussed below.
Gathering Requirements
At this stage, you need to determine exactly how the database will be used and what information will be stored in it. Gather as much information as possible about what the system should and shouldn't do.
Entity definition
At this stage, you need to define the entities that the database will consist of.
An entity is an object in a database that stores data. An entity can be something real (a house, a person, an object, a place) or an abstract thing (a banking transaction, a department of a company, a bus route). In the physical model, an entity is called a table.
Entities are made up of attributes (columns in a table) and records (rows in a table).
Typically, databases are made up of several primary entities associated with a large number of subordinate entities. Core entities are called independent: they do not depend on any other entity. Subordinate entities are called dependent: in order for one of them to exist, the main table associated with it must exist.
In diagrams, entities are usually represented as rectangles. The name of the entity is indicated inside the rectangle:
Any table has the following characteristics:
- there are no identical lines in it;
- all columns (attributes) in the table must have different names;
- elements within the same column have the same type (string, number, date);
- the order of the rows in the table can be arbitrary.
At this stage, you need to identify all categories of information (entities) that will be stored in the database.
Attribute Definition
An attribute represents a property that describes an entity. Attributes are often a number, date, or text. All data stored in an attribute must be of the same type and have the same properties.
In the physical model, attributes are called columns.
After defining the entities, it is necessary to define all the attributes of these entities.
In diagrams, attributes are usually listed within the entity rectangle. In the figure you will find an example of the "Houses" database, only now some attributes are defined for the entities from this database.
Each attribute defines the data type, size, allowed values, and any other rules. These include mandatory, mutable, and uniqueness rules.
The mandatory rule determines whether an attribute is a required part of an entity. If the attribute is an optional part of the entity, then it can be NULL, otherwise not.
You must also determine if the attribute is mutable. Some attribute values cannot change after the entry is created.
And finally, you need to determine if the attribute is unique. If so, then the attribute values cannot be repeated.
Keys
A key is a set of attributes that uniquely identifies an entry. Keys are divided into two classes: simple and compound.
A simple key consists of only one attribute. For example, in the "Passports of the country's citizens" database, the passport number will be a simple key: after all, there are no two passports with the same number.
A composite key consists of several attributes. In the same database "Passports of citizens of the country" there can be a composite key with the following attributes:
surname, name, patronymic, date of birth. This is just an example, since this composite key, in theory, does not provide guaranteed uniqueness of the record.
There are also several types of keys, which are described below.
Possible key
A candidate key is any set of attributes that uniquely identifies an entry in a table. The candidate key can be simple or compound.
Each entity must have at least one possible key, although there may be more than one possible key. None of the primary key attributes can have a NULL value.
A candidate key is also called a surrogate key.
Primary Keys
A primary key is a set of attributes that uniquely identify a record in a table (entity). One of the possible keys becomes the primary key. In diagrams, primary keys are often shown above the main list of attributes or are highlighted with special characters. The entity in the figure has both key and regular attributes.
Alternative Keys
Any possible key that is not the primary key is called an alternate key. An entity can have multiple alternate keys.
Foreign keys
A foreign key is a collection of attributes that refer to the primary or alternate key of another entity. If the foreign key is not associated with the primary entity, then it can only contain null values. If the key is also composite, then all attributes of the foreign key must be undefined.
In diagrams, attributes that are combined into foreign keys are denoted by special symbols. The figure shows two related entities (Houses and their Owners) and the foreign keys formed by them (after all, one person can own more than one house).
Keys are logical constructs, not physical objects. Relational databases have mechanisms to store keys.
Defining Relationships Between Entities
Relational databases allow you to combine information belonging to different entities.
A relationship is a situation in which one entity refers to the primary key of a second entity. Like, for example, the entities House and Master in the previous figure.
Relationships are defined during the base design process. To do this, you should analyze the entities and identify the logical relationships that exist between them.
The relationship type determines the number of entity records associated with another entity record. Relationships are divided into three main types, which are described below.
One to one
Each entry of the first entity corresponds to only one entry from the second entity. And each record of the second entity corresponds to only one record from the first entity. For example, there are two entities: People and Birth Certificates. And one person can only have one birth certificate.
One-to-many
Each record of the first entity can correspond to several records from the second entity. However, each entry of the second entity corresponds to only one entry from the first entity. For example, there are two entities: Order and Order Item. And there can be many items in one order.
many-to-many
Each record of the first entity can correspond to several records from the second entity. However, each record of the second entity can correspond to several records from the first entity. For example, there are two entities: Author and Book. One author can write many books. But a book can have multiple authors.
According to the criterion of mandatory relations are divided into mandatory and optional.
- A mandatory relationship means that for each entry from the first entity, there must be related entries in the second entity.
- An optional relationship means that a record from the first entity may not have a record in the second entity.
Normalization
Normalization is the process of removing redundant data from a database. Each data element must be stored in the database in one and only one instance. There are five common forms of normalization. As a rule, the database is reduced to the third normal form.
During the normalization process, certain actions are performed to remove redundant data. Normalization improves performance, speeds up sorting and index building, reduces the number of indexes per entity, and speeds up insert and update operations.
A normalized database is usually more flexible. When modifying queries or persisted data, a normalized database typically requires fewer changes, and changes have fewer consequences.
First normal form
To convert an entity to first normal form, you must eliminate duplicate groups of values and ensure that each attribute contains only one value, lists of values are not allowed.
In other words, each attribute in an entity should only be stored in one instance.
For example, in the figure, the House entity is not normalized. It contains several attributes for storing data about the owners of the house (the House entity does not correspond to the first normal form).
To bring the House entity to the first normal form, it is necessary to remove the repeated groups of values, that is, remove the Owner 1-3 attributes, placing them in a separate entity. Result (Entity House reduced to first normal form):
Second normal form
A table in second normal form contains only the data that applies to it. Values of non-key entity attributes depend on the primary key. More precisely, attributes depend on the primary key, on the entire primary key, and only on the primary key.
Entities must be in first normal form to conform to second normal form.
For example, the entity House in the figure has an attribute Price per liter of gasoline, which has nothing to do with houses. This attribute is removed (or you can move it to another entity). And also we move the Mayor attribute to a separate entity - this attribute depends on the city where the house is located, and not on the house.
The figure shows the essence House in the second normal form (the Essence House reduced to the second normal form).
third normal form
Third normal form excludes attributes that do not depend on the entire key. Any entity that is in third normal form is also in second normal form. This is the most common form of a database.
In third normal form, every attribute depends on the key, on the whole key, and on nothing but the key.
For example, the House Owner entity in the figure has a Zodiac sign attribute that depends on the date of birth of the owner of the house, and not on his name (which is the key).
To cast the entity Owner of the house, you need to create the entity Signs of the Zodiac and transfer the attribute Sign of the Zodiac there (Entity Owner of the house, reduced to the third normal form):
Restrictions
Constraints are the rules enforced by the database management system. Constraints define the set of values that can be entered in a column or columns.
For example, you do not want the order amount in your very cool store to be less than 500 rubles. You simply set a limit on the Order Amount column.
Stored procedures
Stored procedures are precompiled procedures stored in a database. Stored procedures can be used to define business rules and can be used to perform more complex calculations than constraints alone.
Stored procedures can contain program flow logic as well as database queries. They can take parameters and return results as tables or single values.
Stored procedures are just like regular procedures or functions in any program.
NOTE
Stored procedures reside in the database and run on the database server. They are generally faster than SQL statements because they are stored in compiled form.
Data integrity
By organizing the data into tables and defining the relationships between them, we can assume that a model has been created that correctly reflects the business environment. Now we need to ensure that the data entered into the database gives a correct idea of the state of the matter. In other words, you need to enforce business rules and maintain the integrity of the database.
For example, your company is engaged in the delivery of books. You are unlikely to accept an order from an unknown client, because then you will not even be able to deliver the order. Hence the business rule: orders are accepted only from customers whose information is in the database.
The correctness of data in relational databases is ensured by a set of rules. Data integrity rules fall into four categories.
- Entity Integrity- each entity record must have a unique identifier and contain data. After all, you need to somehow distinguish between all these records in the database.
- Attribute Integrity- each attribute accepts only valid values. For example, the purchase amount can definitely not be less than zero.
- Referential Integrity- a set of rules that ensure the logical consistency of primary and foreign keys when inserting, updating and deleting records. Referential integrity ensures that for every foreign key there is a corresponding primary key. Let's take the previous example with the entities Home Owner and Home. Let's say you are Vasya Ivanov and own a house. You changed your last name to Sidorov and made the appropriate changes to the House owner entity. Definitely you would like your house to continue to be yours under your new name, and not belong to a certain Vasya Ivanov, who no longer exists.
- Custom Integrity Rules- any integrity rules that do not belong to any of the listed categories.
triggers
Trigger is an analogue of a stored procedure, which is called automatically when the data in the table changes.
Triggers are a powerful mechanism for maintaining database integrity. Triggers are called before or after data changes in the table.
With the help of triggers, you can not only undo these changes, but also change the data in any other table.
For example, you are creating an Internet forum and you want to make sure that the forum list shows the latest forum post. Of course, you can take a message from the Forum Posts entity, but this will increase the complexity of your request and its execution time. It's easier to add a trigger to the Forum Posts entity that records the last post added to the Forums entity, in the Last Post attribute. This will greatly simplify the query.
Business rules
Business rules define the restrictions placed on the data according to the requirements of the business (those for whom you are creating the base). Business rules may consist of a set of steps required to complete a particular task, or they may simply be checks that verify that the data entered is correct. Business rules may include data integrity rules. Unlike other rules, they the main objective- ensure the correct conduct of business operations.
For example, in the Very Tough Guys company, it may be customary that only white, blue, and black cars are purchased for official use.
The business rule for the Vehicle Color attribute of the Company Vehicles entity would then be that the vehicle can only be white, blue, or black.
Most DBMSs provide the means to:
- to specify default values;
- to check the data before entering it into the database;
- to maintain relationships between tables;
- to ensure the uniqueness of values;
- for storing stored procedures directly in the database.
All of these features can be used to implement business rules in a database.
Physical model
The next step, after creating the logical model, is to build the physical model. The physical model is the practical implementation of the database. The physical model defines all the objects that you have to implement.
When moving from a logical model to a physical entity, they are converted to tables, and attributes to columns.
Relationships between entities can be converted to tables or left as foreign keys.
Primary keys are converted to primary key constraints. Possible keys are in uniqueness constraints.
Denormalization
Denormalization- this is a deliberate change in the structure of the base that violates the rules of normal forms. This is usually done to improve database performance.
Theoretically, one should always strive for a fully normalized base, but in practice, a fully normalized base almost always means a performance drop. Over-normalizing a database can result in multiple tables being accessed each time data is retrieved. Typically, four tables or fewer must participate in a query.
Standard denormalization techniques are: combining several tables into one, storing the same attributes in several tables, and storing summary or calculated data in a table.
The term "relational" means "relationship-based". A relational database consists of entities (tables) that have some relationship with each other. The name comes from the English word relation.
Database design consists of two main phases: logical and physical modeling.
During logical modeling, you collect requirements and develop a database model that is independent of a particular DBMS (relational database management system). It's like creating blueprints for your house. You could think over and draw everything: where the kitchen, bedrooms, living room will be. But this is all on paper and in layouts.
During physical modeling, you create a model that is optimized for a specific application and DBMS. It is this model that is implemented in practice. If we return to the house from the previous paragraph, at this stage you will have to build a house somewhere - carry logs, bricks ...
The database design process consists of the following steps:
- collection of information;
- definition of entities;
- defining attributes for each entity;
- defining relationships between entities;
- normalization;
- transformation to a physical model;
- database creation.
The first 5 stages form the logical design phase and the remaining two form the physical modeling phase.
Logic phase
The logical phase consists of several stages. They are all discussed below.
Gathering Requirements
At this stage, you need to determine exactly how the database will be used and what information will be stored in it. Gather as much information as possible about what the system should and shouldn't do.
Entity definition
At this stage, you need to define the entities that the database will consist of.
An entity is an object in a database that stores data. An entity can be something real (a house, a person, an object, a place) or an abstract thing (a banking transaction, a department of a company, a bus route). In the physical model, an entity is called a table.
Entities are made up of attributes (columns in a table) and records (rows in a table).
Typically, databases are made up of several primary entities associated with a large number of subordinate entities. Core entities are called independent: they do not depend on any other entity. Subordinate entities are called dependent: in order for one of them to exist, the main table associated with it must exist.
In diagrams, entities are usually represented as rectangles. The name of the entity is indicated inside the rectangle:
Any table has the following characteristics:
- there are no identical lines in it;
- all columns (attributes) in the table must have different names;
- elements within the same column have the same type (string, number, date);
- the order of the rows in the table can be arbitrary.
At this stage, you need to identify all categories of information (entities) that will be stored in the database.
Attribute Definition
An attribute represents a property that describes an entity. Attributes are often a number, date, or text. All data stored in an attribute must be of the same type and have the same properties.
In the physical model, attributes are called columns.
After defining the entities, it is necessary to define all the attributes of these entities.
In diagrams, attributes are usually listed within the entity rectangle. In the figure you will find an example of the "Houses" database, only now some attributes are defined for the entities from this database.
Each attribute defines the data type, size, allowed values, and any other rules. These include mandatory, mutable, and uniqueness rules.
The mandatory rule determines whether an attribute is a required part of an entity. If the attribute is an optional part of the entity, then it can be NULL, otherwise not.
You must also determine if the attribute is mutable. Some attribute values cannot change after the entry is created.
And finally, you need to determine if the attribute is unique. If so, then the attribute values cannot be repeated.
Keys
A key is a set of attributes that uniquely identifies an entry. Keys are divided into two classes: simple and compound.
A simple key consists of only one attribute. For example, in the "Passports of the country's citizens" database, the passport number will be a simple key: after all, there are no two passports with the same number.
A composite key consists of several attributes. In the same database "Passports of citizens of the country" there can be a composite key with the following attributes:
surname, name, patronymic, date of birth. This is just an example, since this composite key, in theory, does not provide guaranteed uniqueness of the record.
There are also several types of keys, which are described below.
Possible key
A candidate key is any set of attributes that uniquely identifies an entry in a table. The candidate key can be simple or compound.
Each entity must have at least one possible key, although there may be more than one possible key. None of the primary key attributes can have a NULL value.
A candidate key is also called a surrogate key.
Primary Keys
A primary key is a set of attributes that uniquely identify a record in a table (entity). One of the possible keys becomes the primary key. In diagrams, primary keys are often shown above the main list of attributes or are highlighted with special characters. The entity in the figure has both key and regular attributes.
Alternative Keys
Any possible key that is not the primary key is called an alternate key. An entity can have multiple alternate keys.
Foreign keys
A foreign key is a collection of attributes that refer to the primary or alternate key of another entity. If the foreign key is not associated with the primary entity, then it can only contain null values. If the key is also composite, then all attributes of the foreign key must be undefined.
In diagrams, attributes that are combined into foreign keys are denoted by special symbols. The figure shows two related entities (Houses and their Owners) and the foreign keys formed by them (after all, one person can own more than one house).
Keys are logical constructs, not physical objects. Relational databases have mechanisms to store keys.
Defining Relationships Between Entities
Relational databases allow you to combine information belonging to different entities.
A relationship is a situation in which one entity refers to the primary key of a second entity. Like, for example, the entities House and Master in the previous figure.
Relationships are defined during the base design process. To do this, you should analyze the entities and identify the logical relationships that exist between them.
The relationship type determines the number of entity records associated with another entity record. Relationships are divided into three main types, which are described below.
One to one
Each entry of the first entity corresponds to only one entry from the second entity. And each record of the second entity corresponds to only one record from the first entity. For example, there are two entities: People and Birth Certificates. And one person can only have one birth certificate.
One-to-many
Each record of the first entity can correspond to several records from the second entity. However, each entry of the second entity corresponds to only one entry from the first entity. For example, there are two entities: Order and Order Item. And there can be many items in one order.
many-to-many
Each record of the first entity can correspond to several records from the second entity. However, each record of the second entity can correspond to several records from the first entity. For example, there are two entities: Author and Book. One author can write many books. But a book can have multiple authors.
According to the criterion of mandatory relations are divided into mandatory and optional.
- A mandatory relationship means that for each entry from the first entity, there must be related entries in the second entity.
- An optional relationship means that a record from the first entity may not have a record in the second entity.
Normalization
Normalization is the process of removing redundant data from a database. Each data element must be stored in the database in one and only one instance. There are five common forms of normalization. As a rule, the database is reduced to the third normal form.
During the normalization process, certain actions are performed to remove redundant data. Normalization improves performance, speeds up sorting and index building, reduces the number of indexes per entity, and speeds up insert and update operations.
A normalized database is usually more flexible. When modifying queries or persisted data, a normalized database typically requires fewer changes, and changes have fewer consequences.
First normal form
To convert an entity to first normal form, you must eliminate duplicate groups of values and ensure that each attribute contains only one value, lists of values are not allowed.
In other words, each attribute in an entity should only be stored in one instance.
For example, in the figure, the House entity is not normalized. It contains several attributes for storing data about the owners of the house (the House entity does not correspond to the first normal form).
To bring the House entity to the first normal form, it is necessary to remove the repeated groups of values, that is, remove the Owner 1-3 attributes, placing them in a separate entity. Result (Entity House reduced to first normal form):
Second normal form
A table in second normal form contains only the data that applies to it. Values of non-key entity attributes depend on the primary key. More precisely, attributes depend on the primary key, on the entire primary key, and only on the primary key.
Entities must be in first normal form to conform to second normal form.
For example, the entity House in the figure has an attribute Price per liter of gasoline, which has nothing to do with houses. This attribute is removed (or you can move it to another entity). And also we move the Mayor attribute to a separate entity - this attribute depends on the city where the house is located, and not on the house.
The figure shows the essence House in the second normal form (the Essence House reduced to the second normal form).
third normal form
Third normal form excludes attributes that do not depend on the entire key. Any entity that is in third normal form is also in second normal form. This is the most common form of a database.
In third normal form, every attribute depends on the key, on the whole key, and on nothing but the key.
For example, the House Owner entity in the figure has a Zodiac sign attribute that depends on the date of birth of the owner of the house, and not on his name (which is the key).
To cast the entity Owner of the house, you need to create the entity Signs of the Zodiac and transfer the attribute Sign of the Zodiac there (Entity Owner of the house, reduced to the third normal form):
Restrictions
Constraints are the rules enforced by the database management system. Constraints define the set of values that can be entered in a column or columns.
For example, you do not want the order amount in your very cool store to be less than 500 rubles. You simply set a limit on the Order Amount column.
Stored procedures
Stored procedures are precompiled procedures stored in a database. Stored procedures can be used to define business rules and can be used to perform more complex calculations than constraints alone.
Stored procedures can contain program flow logic as well as database queries. They can take parameters and return results as tables or single values.
Stored procedures are just like regular procedures or functions in any program.
NOTE
Stored procedures reside in the database and run on the database server. They are generally faster than SQL statements because they are stored in compiled form.
Data integrity
By organizing the data into tables and defining the relationships between them, we can assume that a model has been created that correctly reflects the business environment. Now we need to ensure that the data entered into the database gives a correct idea of the state of the matter. In other words, you need to enforce business rules and maintain the integrity of the database.
For example, your company is engaged in the delivery of books. You are unlikely to accept an order from an unknown client, because then you will not even be able to deliver the order. Hence the business rule: orders are accepted only from customers whose information is in the database.
The correctness of data in relational databases is ensured by a set of rules. Data integrity rules fall into four categories.
- Entity Integrity- each entity record must have a unique identifier and contain data. After all, you need to somehow distinguish between all these records in the database.
- Attribute Integrity- each attribute accepts only valid values. For example, the purchase amount can definitely not be less than zero.
- Referential Integrity- a set of rules that ensure the logical consistency of primary and foreign keys when inserting, updating and deleting records. Referential integrity ensures that for every foreign key there is a corresponding primary key. Let's take the previous example with the entities Home Owner and Home. Let's say you are Vasya Ivanov and own a house. You changed your last name to Sidorov and made the appropriate changes to the House owner entity. Definitely you would like your house to continue to be yours under your new name, and not belong to a certain Vasya Ivanov, who no longer exists.
- Custom Integrity Rules- any integrity rules that do not belong to any of the listed categories.
triggers
Trigger is an analogue of a stored procedure, which is called automatically when the data in the table changes.
Triggers are a powerful mechanism for maintaining database integrity. Triggers are called before or after data changes in the table.
With the help of triggers, you can not only undo these changes, but also change the data in any other table.
For example, you are creating an Internet forum and you want to make sure that the forum list shows the latest forum post. Of course, you can take a message from the Forum Posts entity, but this will increase the complexity of your request and its execution time. It's easier to add a trigger to the Forum Posts entity that records the last post added to the Forums entity, in the Last Post attribute. This will greatly simplify the query.
Business rules
Business rules define the restrictions placed on the data according to the requirements of the business (those for whom you are creating the base). Business rules may consist of a set of steps required to complete a particular task, or they may simply be checks that verify that the data entered is correct. Business rules may include data integrity rules. Unlike other rules, their main purpose is to ensure that business transactions are conducted correctly.
For example, in the Very Tough Guys company, it may be customary that only white, blue, and black cars are purchased for official use.
The business rule for the Vehicle Color attribute of the Company Vehicles entity would then be that the vehicle can only be white, blue, or black.
Most DBMSs provide the means to:
- to specify default values;
- to check the data before entering it into the database;
- to maintain relationships between tables;
- to ensure the uniqueness of values;
- for storing stored procedures directly in the database.
All of these features can be used to implement business rules in a database.
Physical model
The next step, after creating the logical model, is to build the physical model. The physical model is the practical implementation of the database. The physical model defines all the objects that you have to implement.
When moving from a logical model to a physical entity, they are converted to tables, and attributes to columns.
Relationships between entities can be converted to tables or left as foreign keys.
Primary keys are converted to primary key constraints. Possible keys are in uniqueness constraints.
Denormalization
Denormalization- this is a deliberate change in the structure of the base that violates the rules of normal forms. This is usually done to improve database performance.
Theoretically, one should always strive for a fully normalized base, but in practice, a fully normalized base almost always means a performance drop. Over-normalizing a database can result in multiple tables being accessed each time data is retrieved. Typically, four tables or fewer must participate in a query.
Standard denormalization techniques are: combining several tables into one, storing the same attributes in several tables, and storing summary or calculated data in a table.
Zaitsev S.L., Ph.D.
Part 1. The concept of essence
In this article, we will describe entities and entity keys in detail. As you know, entities are concepts about which information should be stored for further processing. AT ERwin entities are a graphical representation of a logical grouping of data. Entities can be tangible, real objects, or intangible conceptual abstractions. Entities are not meant to represent a single object. Rather, they represent classes that include attributes that contain information about multiple instances.
The following questions regarding entities will be discussed below:
- Entity Relational Diagrams
- Entity highlighting
- Defining Entity Types
- Naming and describing entities
- Common mistakes when working with entities
Since ERwin uses the data modeling methodology ER(Entity Relational), let's start with a brief introduction to ER concepts. To begin with, let's start studying entities - "containers" for storing information of a logical model.
Introduction to Entity Relational Diagram
In this and other publications on this topic, an ERD diagram (Entity Relational Diagram) is used to visually represent entities and relationships between them, based on the notation used by ERwin. Although there are other data modeling methodologies such as Extended Relational Analysis (ERA), Object Oriented (OO) and Object Role Modeling (ORM), the fundamental concepts of ER methodology are present and in them.
The ER modeling methodology was developed by P. Chen in the late 1970s. Rectangles are used to represent entities in the ER methodology. In Chen's original ER notation, relationships contain attributes. The equal possibility of using attributes in entities and relationships makes the distinction between entities and relationships quite difficult.
The ER approach has evolved and expanded over time, but the basic concepts continue to provide a solid foundation for good data modeling.
The following is a detailed description of the entity and provides an introduction to the keys, with particular emphasis on finding the entity's primary keys. A description of entity types is also provided, and recommendations are given for naming and describing entities. The last section is devoted to the analysis common mistakes associated with entities and keys.
What is an entity?
Essence is the physical representation of a logical grouping of data. Entities can be tangible, real objects like PERSON or ICE CREAM, or intangible conceptual abstractions like COST CENTER or MARKET. Entities are not meant to represent a single entity, they represent a collection of instances containing information of interest in terms of their uniqueness. For example, the entity PERSON is an instance of objects of type Person. Ivan Petrov, Maria Rusanova and Savely Bogdanov - concrete examples instances of the PERSON entity. A particular entity instance is represented by a table row and is identified by a primary key.
An entity has the following attributes:
- It has a name and a description.
- It represents a class, not a single instance of an abstraction.
- Its specific representatives (instances) can be uniquely identified.
- It contains a logical grouping of attributes that represent information of interest to a corporation.
Formal definitions of an entity
The following is a list of entity definitions of recognized data modeling authorities. Pay attention to their similarities:
- Chen (1976): "A thing that can be uniquely identified."
- Date (1986): "Any distinguishable object that will be represented in a database."
- Finklestein (1989): "An information entity represents a 'thing' that is stored for future reference. The term entity refers to a logical representation of data."
Entity highlighting
How to start the process of extracting entities? Most entities are revealed during work sessions and interviews. Analysis of the requirements for information received from subject matter experts and end users is the best source of information.
Another good source is the corporate model.
Pay attention to nouns and object names - it is quite possible that they will become logical entities. Avoid representing single instances as entities, as is often the case when entities are modeled in terms of a role. Modeling entities in terms of a role is a fairly common mistake. Entities also appear in the normalization process. Bringing the logical model to the third normal form will most likely lead to the appearance of several additional entities.
There are two main groups of entities: dependent and independent. An independent entity does not need information from another entity to identify a unique instance. She appears in ERwin in the form of a rectangle. The primary key of an independent entity does not include the primary keys of other entities.
A dependent entity must draw on information from another entity to identify a unique instance. It is represented on the ER diagram as a rectangle with rounded corners. The primary key of a dependent entity includes the primary keys of one or more parent entities.
Rice. 2.1. Examples of core entities for an ice cream corporation.
Pay attention to fig. 2.1., where the right angles of the independent entities SHOP and ICE CREAM and the rounded corners of the dependent entity ICE CREAM SHOP are shown.
Defining Entity Types
Both dependent and independent entities can be divided into several types:
- Core entities - These are sometimes referred to as core or primary entities. They represent important objects about which information should be stored.
- Codes/References/Classifiers - These entities contain strings that define a value set or scope for an attribute.
- Associative entities - these entities are used to resolve relationships many-to-many.
- Characteristic entities - these entities are of two types: exclusive and inclusive.
Core entities
Core entities represent the most important corporate information objects. They are sometimes referred to as primary, main, or main entities. Since these entities are extremely important, they are likely to be used in many departments of the corporation. Take the time to look for similar entities, as core entities are more likely to be reusable. Within a corporation, core entities should be modeled uniformly. Good modellers find this approach extremely useful.
Core entities can be both independent and dependent. Figure 2.1 shows examples of core entities for an ice cream corporation. The ICE CREAM entity represents the base product of a corporation. The SHOP entity is an example of a distribution channel or an intermediary in the sale of a product.
Assume that the corporation is doing well and a decision is made to open an additional SHOP. There is no need to change the model to add new instances of the SHOP entity. The same applies to the essence of ICE CREAM.
Pay attention to the core entities ICE CREAM and SHOP. Although the example may seem somewhat straightforward, it illustrates the power of the concept behind the core entity modeling.
Understanding the need to model core entities as scalable and extensible containers requires the modeler to look at entities as abstract concepts and model information regardless of its current use. In this example, the ICE CREAM entity model is completely out of the context of the SHOP entity, and vice versa. So if a corporation decides to sell ICE CREAM through a new distribution channel, such as Internet or home delivery, the new distribution channel can be added without changing the other entities.
Code entities
Code entities are always independent. They are often referred to as references, classifiers, or entity types, depending on the methodology used. The unique instances represented by code entities define the scope for attribute values that belong to other entities. Relations between code entities and other entities will be discussed in one of the following publications on this topic. You may be tempted to use a single attribute in a codesheet. It is much better to include at least three attributes in a code entity: identifier, name (sometimes called short name) and definition.
Figure 2.2 TOP- independent entity (note the right angles). The TOP is also a code entity or classifier. Instances (strings) of the TOP entity define the list of available tops.
Code entities usually contain a limited number of attributes. There are implementations where these entities had only one attribute. It is preferable to model code entities using an artificial identifier. The artificial identifier, along with the name and definition, allows new kinds of TOPs to be added as instances (strings) to an entity. Notice the three attributes of the TOP entity.
Experts often refer to code entities as corporate business objects. The term enterprise business object indicates that the entity is defined and shared at the corporate level, and not at the level of a single application, system, or organizational unit. These entities are often shared among many databases to provide a holistic approach to summary reporting and trending.
Rice. 2.2. Code entities allow a corporation to define a set of values to be used centrally within a corporation. Code entity instances define the scope of values for use in other parts of the model.
Associative Entities
Associative are entities that contain the primary keys of two or more other entities. Associative entities are always dependent. They are used to resolve many-to-many relationships of other entities. Many-to-many relationships occur when many instances of one entity are related to many instances of another. Associative entities allow us to model the intersection of two entity instances, ensuring that each instance of an association is unique.
In Figure 2.1, an associative entity is used to resolve a relationship many-to-many between the entities SHOP and ICE CREAM. The introduction of an associative entity makes it possible to use the same ICE CREAM for sale in several copies of the SHOP, without the need to sell the same varieties of ICE CREAM in each of the SHOPs. The ICE CREAM SHOP associative entity takes into account the fact that the SHOP instance sells many ICE CREAM instances, and the ICE CREAM instance can be sold by many SHOP instances.
Characteristic entities
Characteristic entities are always dependent. You should use characteristic entities where it makes sense for entity instances to store different sets of attributes. Finklestein calls characteristic entities secondary entities. Characteristic entities always have one or more "peer" entities. Peer characteristic entities are associated with the parent entity by a special type of relationship, which can be exclusive or inclusive.
Figure 2.3 shows the entity CONTAINER and the characteristic entities HORN and GLASS. The ice cream shop, apparently, sells not by weight, but by individual portions. Note that the instance of the CONTAINER must be a CONE or a CUP. A CONTAINER cannot be both a HORN and a GLASS at the same time. These are exclusive characteristic entities.
The PERSON entity in Figure 2.3 has two characteristic entities EMPLOYEE and CLIENT. Note that exclusive characteristic entities will not allow a single instance of PERSON to contain facts that are common to EMPLOYEE and CUSTOMER. Naturally, this is contrary to actual practice. An EMPLOYEE can definitely be a CUSTOMER. The SUPPLIER can also act as a CUSTOMER. This is an example of inclusive characteristic entities.
Rice. 2.3. Two examples of characteristic entities PERSON and CONTAINER. Both examples use ERwin IE notation to represent exclusive and inclusive characteristic entities. The absence of (X) in the characteristic entity symbol indicates an inclusive relationship.
Structural entity
Sometimes instances of the same entity are related. In his 1992 book "Strategic Systems Development" K. Finklestein suggested using structural entities to represent relationships between instances of the same entity. Relationships between instances of the same entity are called recursive relationships. Recursive relations will be discussed in the article "The concept of relations". Recursive relationships are a logical concept and the concepts are not easily understood by users.
Figure 2.4 shows an additional structural entity that describes the relationship between instances of the EMPLOYEE entity. The diagram shows that the characteristic entity EMPLOYEE of the entity PERSON has two characteristic entities EXECUTOR and MANAGER. The EMPLOYEE STRUCTURE entity represents a relationship between instances of the EMPLOYEE entity.
Rice. 2.4. Structural entity - an illustration of K. Finklestein's approach to the resolution of recursive relations.
Primary key definition
To identify a particular entity instance, you need to define a primary key. primary key is an attribute or set of attributes that uniquely identifies a single instance of an entity. In other words, the primary key can be either one attribute or consist of several. A primary key that consists of more than one attribute is called a composite or component key. In what follows, we will use the term composite key.
The primary key must be static and non-volatile. Static and non-destructible means that the primary key should not be changed. Changes to the primary key are difficult to maintain, often resulting in very costly rework, so the best option is when the primary key is completely independent of entity instances.
Finding the primary key requires parsing the data that defines the entity. Typically, primary keys for core entities are determined during work sessions and discussions. Domain experts and users are good sources of information for selecting potential primary keys. Sample data also provides valuable input when choosing a primary key.
Begin the process of identifying primary keys by identifying all potentially key attributes, called key candidates. A key candidate can be a single attribute or a combination of several attributes. If no key candidates exist, or if the candidate key is a composite key that is too large and unwieldy, consider using an artificial unique identifier. Keys borrowed from a parent entity are called foreign keys. Foreign keys will be discussed in one of the subsequent publications on this topic. Below is the description various types keys:
- Key Candidate. A key candidate is an attribute or set of attributes that identifies a single instance of an entity. Sometimes a single instance of an entity is identified by several attributes or a combination of them.
- Composite key. A key that consists of more than one attribute is called a composite, complex, or component key. For composite keys, each component of the key must have a value for each instance. No part of the key must be NULL. All parts of the key are required and cannot be omitted.
- And an artificial primary key. Sometimes neither a single attribute nor a combination of attributes defines an instance. In these cases, you are using an artificial unique identifier. Artificial primary keys often simply number each instance or code.
- foreign keys. When the primary key of one entity migrates to another table, it is called a foreign key. Foreign keys "link" entities by representing relationships between them. Foreign keys will be discussed in more detail in future articles on this topic.
Bringing the model to the third normal form includes checking for the absence of functional dependencies and identifying primary or composite keys. The functional dependencies discussed in the article play an important role in identifying primary keys and key candidates.
Entity naming
The name assigned to an entity must characterize instances of the entity. The name should be clear and generally accepted. Use a corporate point of view when choosing a name, and try to use names that reflect how data is used within a corporation rather than in a separate business unit. Use names that make sense to the user community and domain experts.
You probably have a set of naming conventions in your corporation that you use during development or when building an enterprise data model that guides you. The use of conventions ensures that names are constructed uniformly within a corporation, regardless of who constructs the name. The following sections provide an initial set of naming conventions and give examples of good and bad naming conventions.
Entity naming conventions
Naming conventions may seem irrelevant if you work in a small organization with few users. However, in a large organization with multiple development teams and a large number of users, naming conventions greatly help with communication and data sharing. Ideally, you should centrally develop and maintain naming conventions, and then document them by publishing them to the entire corporation.
Here are some guidelines for building an initial set of naming conventions, in case your organization doesn't have one yet:
- The name of the entity must be descriptive enough. Use single-word names only when they are the name of widely used concepts. Consider using noun phrases.
- The entity name must be a noun or a phrase based on a singular noun. Use PERSON instead of PERSON or PEOPLE and CONTAINER instead of CONTAINERS.
- The entity name must be unique. Using the same name for entities that contain different data, or different names for entities that contain the same data, will unnecessarily confuse developers and end users.
- The entity name must indicate the data that will be stored in each of the instances.
- The entity name must not contain special characters (such as !, @, #, $, %, &, *, etc.) or indicate ownership (PERSONAL ICE CREAM).
- The entity name must not contain acronyms or abbreviations unless they are part of accepted naming conventions.
Examples of Good Entity Names
It is always better to use consistent names within a corporation. Table 2.1 gives examples of good and bad names for entities.
TABLE 2.1 Example entity names with explanations
Good name |
Bad name |
Explanation |
MATHEMATICAL FORMULA | FORMULA | FORMULA - too vague, adding the adjective MATHEMATICAL makes the meaning much clearer. |
BOOK | BOOKS | BOOK is a singular noun. |
SOFA | SOFA COUCH |
SOFA and COUCH have the same meaning. Pick one. |
ICE CREAM | SOME ICE CREAM | The pronoun SOMETHING does not add any additional meaning or meaning to the term. Avoid unnecessary additions. |
PHOTOS | IMAGE | PHOTOS - quite definitely. IMAGE - somewhat blurry. |
EXPECTED ARRIVAL TIME | ORP | The abbreviation OVP may be confusing to users. |
COMPANY | COMPANY XYZ | XYZ is a specific company instance and must be a string in the COMPANY entity. |
Description of entities
Even good names that tell the user what information to expect from an entity are usually not enough. Every entity needs a clear, precise, and complete description or definition in order to be unambiguously interpreted within a corporation. The description of the entity should explain the meaning of the entity and its meaning for the corporation.
Although description, definition, and purpose are often used interchangeably, the term description is preferred because it encourages us to describe entities in terms that the user can understand.
Rules for Forming Good Descriptions
The description of an entity should explain its meaning, not how the entity's information will be used. You must collect entity descriptions during entity identification. Be careful when including usage information: such information should only be used as an example or for clarification. The way information is used changes more frequently than the information itself, so usage information is not constant.
The description of the entity must be clear, precise, complete and consistent. It should be formulated without the use of technical terms, understandable to anyone who is at least slightly familiar with the concept being described. Make sure the description is in business terms and includes an explanation of the value of the entity.
Examples of good descriptions
Table 2.2 does not claim to be complete, but serves to show good descriptions and the reasons why bad descriptions do not meet the main points.
TABLE 2.2. Entity descriptions with explanations
good description |
Bad description |
Explanation |
PERSON contains information about individuals who enter into interaction with a corporation. Information o PERSON assists the corporation in planning, product development and promotional activities. |
client or employee. | A good description includes the definition of the entity and its meaning to the corporation. |
Includes name, date of birth, etc. for a person. | Simply enumerating the attributes of an entity does not additional information about what an entity is and why it is important to the corporation. | |
Client information and employees. |
Client and employee are examples of roles that a PERSON can play. Using examples alone does not explain what an entity is or why it is important to a corporation. | |
Entity contains characters and numeric data extracted from POS (Point Of Sale - trading terminal), stored using standard compression and packed decimal numbers. |
This artificial example is intended to illustrate that technical descriptions and abbreviations are difficult for business users to understand. |
Common Mistakes in Entity Modeling and Key Selection
This section on common modeling errors does not claim to be complete. Its purpose is to point out the most common mistakes that modelers make.
Role Modeling
What is meant by role modeling? During work sessions, users may tell you that they need to store information about employees. There is a temptation to create the entity EMPLOYEE. Closer examination of information of interest to a corporation, such as name, address, and social security number, shows that these values are independent of the EMPLOYEE entity. For a particular EMPLOYEE, the value of the NAME attribute is independent of the EMPLOYEE entity. This is easy to understand when you consider that your name is still your name whether you are an EMPLOYEE or not.
Entity overloading
Entities are overloaded if they contain information about more than one conceptual entity. If some attributes of an entity describe the same concept, those entities should be checked. Overloaded entities do not have values for every attribute.
Sometimes experts from different subject areas in a corporation use an entity name that sounds and is spelled the same but has different meanings for different experts. The only way to make sure that the same names describe the same objects is to check the descriptions. Make sure the entity contains data that describes a single concept.
For example, the EQUIPMENT entity can have completely different meanings for departments information technologies and for the media and communications department.
Redundant entities
Redundant entities are entities that have different names but contain information about similar concepts. The English language includes many words to represent the same things. One way to discover such entities is to look for entities that contain similar attributes. Compare the descriptions of each of these entities to determine if they represent similar concepts. Redundant entities often appear as a result of the trend to model roles as entities.
For example, the entities MANAGER and EMPLOYEE may contain similar information because both are roles that an instance of the PERSON entity can play.
Selecting the wrong primary key
Choosing the wrong primary key means that you have chosen a primary key that does not stand up to testing. Common mistakes related to primary key are:
- Non-unique: The primary key is not unique for each of the instances. For example, the modeler might consider that the social security number is unique for each PERSON. However, a Social Security number can be reused if the original owner has passed away.
- Required value/uncertainty: The primary key has no value for some of the instances. For example, not every instance of a PERSON entity will have a social security number. Foreigners and small children are the two categories of people who will be missing it.
Using Bad Entity Names
Obscure, ambiguous, or imprecise names make it difficult for new users and development teams to reuse or extend an existing model.
Do not use abbreviations or acronyms as part of the name. Abbreviations and acronyms are open to misinterpretation and may even have different meanings in different subject areas.
Don't include the location as part of the name. As a rule, you will inevitably need another location. A name with a location indicates that you are modeling a specific instance instead of an entity class.
Using Bad Entity Descriptions
Do not use descriptions borrowed only from a dictionary. Dictionary descriptions will not include business-relevant information.
Don't try to paraphrase the entity name. Do not use the name of an entity in its description.
Vague, vague, or worse, incomplete descriptions make it difficult to reuse and extend an existing model. The user will not be able to check whether the entity contains all the necessary information.
This significantly increases the risk of overloaded entities and their use to store information about different objects.
Concepts that seem obvious to all participants in work sessions may not be so obvious over time when a new development team is tasked with extending an existing model.
Conclusion
Entities are objects about which information should be accumulated and maintained. They are "containers" for organizing and grouping business facts. The most important entities are usually identified and documented during work sessions or interviews, and as a result of the normalization process.
Entities are divided into two main groups: dependent and independent. Dependent entities require information from other entities to uniquely identify an instance, independent entities do not. Within the two main groups of entities, more specialized types are distinguished, with features to support specific types of relationships between master and subordinate entities.
Each entity must include one or more sets of attributes that are candidate keys. Key candidates uniquely identify specific entity instances. Key candidates may consist of a single attribute or a group of attributes. If key candidates do not exist or are difficult to maintain, you may need to create an artificial primary key. Analysis and research play an important role in determining primary keys that will remain unique and reliable over time.
Entities need good names and descriptions. Naming standards and conventions provide a holistic approach to developing names and descriptions. The characteristics of an entity are determined by the attributes it contains. Entity attributes represent facts about the entity that the corporation is interested in accumulating and maintaining.
The next article in this series will describe the process of identifying attributes and their characteristics, defining key and non-key attributes, scopes and optional data, and formulating conventions for forming good attribute names and descriptions.
Part 2. The concept of an attribute
The article "Basic Data Modeling Concepts" introduced the basic concepts related to data modeling. In the article The main components of the ERwin diagram - entities, attributes, relationships. Part 1. The concept of an entity was given initial information about entities and entity keys. This article discusses attributes and describes normalization and keys in more detail.
In this article, you will learn how:
- Reveal Attributes
- Perform normalization during attribute analysis
- Naming and describing attributes, and learn about conventions about naming attributes
- Define attribute types and characteristics, such as scope and boolean data types, and validate keys in terms of attributes
- Avoid common mistakes when working with attributes
In ER diagrams, entities and relationships serve to group and combine attributes. It is the attributes that make up the essence of the model. So let's start studying the attributes - the facts that make up the information of the logical model.
What is an attribute?
Attribute is a logical representation of the facts about which the corporation is interested in keeping. Recall that in ERwin, entities serve to visually represent a logical grouping of attributes. On the other hand, attributes represent facts accumulated about entities in a logical model. Attributes are facts that serve to identify, categorize, numeric, or otherwise describe the state of an entity instance.
The attribute must represent a single concept. Attributes form logical groups that describe each instance of an entity. A particular attribute instance is meaning. For example, an attribute named Name defines the scope for facts about an entity named PERSON. Gabriel, RJ, Will, and Vanessa are examples of specific Name meanings for specific PERSON instances. The specific values for each of the entity's attributes represent a single instance.
A valid attribute model has the following characteristics:
- The value of the attribute is of interest to the corporation.
- There is only one attribute instance in the logical model.
- An attribute has a boolean data type and scope.
- The attribute value is defined as required or optional.
- An attribute has a name and a description.
- Only one value can be used per entity instance.
Ice Cream Trade Corporation Betty Wilson wants to order more of the most popular flavors and less of the least popular flavors. Betty's corporation is doing ice cream specials, and is interested in knowing what flavors of ice cream customers choose for banana dessert and fudge during the specials. To meet business requirements, you need to collect banana dessert and fudge ice cream flavor data and date.
In Figure 3.1, there are two entities BANANA DESSERT and CREAM FENDER. Each entity contains attributes representing the components of each of the dishes. Please note that for the BANANA DESSERT essence, you can choose three flavors, three tops: banana, whipped cream and cherries. For an instance of CREAM FENDER, you can choose two flavors and banana, whipped cream and cherries.
Rice. 3.1. Entities and attributes that represent (not very well) two main concepts: CREAM FENDER and BANANA DESSERT
Attribute detection
How to start the process of identifying attributes? Most attributes are identified during work sessions and interviews during entity definition. Analysis of information requirements from domain experts and end users is the best source of information for identifying attributes.
The corporate model is also an excellent basis for highlighting attributes. Compare the entities and attributes of the enterprise model with the entities and attributes of the new logical model. In the enterprise model, there are attributes that were previously defined for each of the entities, in particular for core entities. If the attribute is not present in the enterprise model, additional analysis will determine if it needs to be added or if it belongs to another entity.
Ordering attributes according to information requirements
The attributes of the logical model must strictly comply with the information requirements. Each of the attributes present in the model must serve to satisfy one or more information requirements. The model should contain only those attributes that are necessary to represent the facts of interest to the corporation within the subject area under consideration.
Every fact of interest from a corporation's point of view must be accurately and completely represented in the logical model. Information requirements serve as a measure of whether an attribute should be highlighted. It is helpful to document the relationships between attributes and information requirements.
Attribute Analysis
You should analyze each attribute to determine its relationship to all other attributes in the model. Correctly performed analysis ensures that each of the attributes is present in the model in a single instance and is placed in the entity in accordance with the third normal form.
It is particularly important to parse each primary key and each part of a composite primary key to verify that their values exist for each entity instance. You must also ensure that the primary key identifies one and only one entity instance.
The analysis can also determine whether the corporation is interested in accumulating and maintaining any information about the attribute itself. If an attribute is so important that additional attributes are required to store data about it, then you should think about the possibility of creating a new entity.
You must analyze each of the attributes of the logical model to make sure that each attribute is present in the model in a single instance and only one attribute value exists for each instance of the entity. You must place the attribute on the appropriate entity using normalization rules and define its characteristics.
Only one must remain
The attribute must be present in the logical model in a single instance. "One fact in one place" (Date, 1986). To ensure that each fact is represented by a single attribute, check for attributes with similar names or descriptions. In addition, you must determine whether the attributes are real instances or specific values that are erroneously represented in the model by different attributes.
Attributes with similar names and descriptions may actually represent the same concept and must be represented by the same attribute. In natural language, the same word can represent several concepts. But what's worse, in English language There can be several different words to represent the same concept.
Attributes that have "indicator" or "flag" as part of their name most likely represent a specific value from the scope of the attribute. The concrete value is an attribute instance. Using attribute instances in a model is a common mistake. For example, "Black Hair Indicator" has a value of "yes" if black hair is present and a value of "no" if black hair is not present. It would be preferable to use the "Hair color" attribute in the model, which can have the specific value "Black".
An attribute must represent only one business concept. It must not have multiple values for the same entity instance. Figure 3.1 shows two entities, BANANA DESSERT and CREAM FENDER. Both entities contain a multivalued attribute named "Start or End Date special offer". The attribute's name indicates that its value can represent the start date of the special offer or the end date of the special offer, and we have no way of distinguishing between them! This attribute must be divided into two, each of which will represent a single fact.
If we allow an attribute to have multiple values, this can lead to closely related "hidden" attributes. The previous example is fairly obvious. Not all multivalued attributes can be converted so easily. It may come as a surprise to you that in an attribute that contains a piece of text, such as a comment or note, there are many important values attribute.
Normalization: Putting an Attribute into the Corresponding Entity
Attributes determine the number of entities that will be present in the logical model, reduced to the third normal form. The normalization process consists in analyzing the dependence of attributes on each other and the dependence of attributes on the primary key.
Correctly performed normalization ensures that the model will be scalable and extensible by placing attributes on the appropriate entities.
Bringing the logical model to the third normal form often leads to the emergence of new entities.
Other benefits of normalization are:
- Eliminate or minimize redundancy. Redundant data may be present in attributes representing the same concept but differently named, or in repeating groups. Presenting each fact once in one place minimizes redundancy.
- Eliminate or minimize anomalies when inserting, deleting, or updating. Denormalized data structures allow the same fact to be present in multiple places with incomplete or partial dependence on the primary key. Insertion, deletion, or update anomalies are data inconsistencies that, under these conditions, can lead to surprises or inaccuracies in data access.
- Eliminate or minimize the use of null values for attributes. Duplicate attribute groups often contain null values for many instances because they represent the fact that some entities can have multiple values and others can't. The presence of entities containing instances with null values results in sparsely populated (sparse) data structures.
When the model is cast to third normal form, each attribute belongs to the corresponding entity. Bringing a model to third normal form often reveals new attributes and entities.
Functional dependency
A functional dependency is used to describe relationships between attributes in a model. Each entity attribute must be functionally dependent on the primary key of the entity (and not functionally dependent on any other model attribute). If it is not, the attribute must be moved to a new entity where this provision will be respected.
To determine the functional relationship between attributes, first group them into sets that share a common theme. Carefully analyze the topics in terms of their similarities. Check the attributes in the topics to determine if there is a functional dependency of the attributes within the topic. If an attribute, or group of attributes, does not depend on the entity's primary key, it must be moved to another entity.
Attributes belonging to the same topic may be redundant. Redundant attributes may be grouped into a single entity, or may use a higher level generic abstraction as characteristic entities of the parent entity. There are at least two common themes in Figure 3.1: and top. These attributes are good candidates for transferring to other entities. Let's consider them in the aspect of functional dependence. Attribute value flavor additive to ice cream does not depend on the value of the primary key - banana dessert ingredients. The same goes for the key. Creamy fudge.
Figure 3.2 illustrates a solution where Flavoring additive for ice cream and top highlighted in entities where their values depend on the primary key. This solution fixes some obvious redundancy issues.
First normal form
Casting to first normal form means moving all duplicate attributes to another entity. Duplicate attributes are fairly easy to spot, as they are often simply numbered like Top 1 and Top 2 or Taste 1 and Taste 2.
Create a dependent entity that will contain a set of attributes to represent duplicate attributes. The primary key of the dependent entity will be a composite primary key that will include the primary key of the parent entity and at least one additional attribute to ensure uniqueness.
In figure 3.2, repeated groups have been moved Flavoring additive for ice cream and top into dependent entities. Pay attention to the creation of the entity TASTE.
Rice. 3.2. Eliminate redundant attributes
Second normal form
Converting to second normal form means removing redundant attributes. Redundant attributes can be:
- Different attributes representing the same concept
- Attributes of different entities related to the same topic
- Attributes that do not have values for each of the entity instances
Attributes that represent the same concept must be converted to a single attribute. Redundant attributes may not have values for each of the entity instances, and thus their existence will not depend on the value of the primary key. Move these attributes to the entity, where they will have values for each of the instances.
Create an entity with attributes to represent redundant attributes. The new entity has a primary key that identifies a single instance. This primary key will become the foreign key in the original entity. Foreign keys will be discussed later.
Figure 3.2 shows the solution for some of the redundant attributes of the BANANA DESSERT and CREAM FANDY entities. Consider redundancy from the point of view of two core entities. Both entities have common themes: ice cream flavor and top. This is a sign that the core entities can be combined into more high level abstraction.
Figure 3.3 shows the creation of a supertype named MIX, of which BANANA DESSERT and CREAM FENDER are its implementations. I've added a "Blend Type" classifier attribute to the MIX parent entity to identify if MIX is an instance of the BANANA DESSERT or CREAM FANDY entity. An instance of the MIX entity can be an instance of the BANANA DESSERT or CREAM FANDY entity, but not both.
Rice. 3.3. The redundancy of core entities has been eliminated by moving the common attributes to the more general MIX entity. Note that the primary key "Mix ID" is placed in both the BANANA DESSERT and CREAM FENDER entities.
third normal form
Casting to third normal form means eliminating any attributes that depend on the values of attributes other than the primary key. This is sometimes called a transitive dependency.
Create a new entity and move attributes into it that are independent of the primary key in the original entity. Define a primary key for the new entity so that it guarantees uniqueness.
In Figure 3.3, the attributes Whipped cream and Cherry do not depend on the primary keys of the BANANA DESSERT and CREAM FANDY entities. In fact, you must decide if the attributes are not Whipped cream and Cherry instances of the TOP entity.
In Figure 3.4, notice the additional Blend Date attribute, which provides information about when the MIX entity was instantiated. I removed the Start Date and End Date attributes from the BANANA DESSERT and CREAM FANDY entities. The new SPECIAL OFFER entity now contains these two dates and an Ice Cream Flavor attribute to indicate which type of ice cream is eligible for the offer.
Rice. 3.4. Each attribute depends on the primary key, the full primary key, and nothing but the key .
Defining Attribute Characteristics
Attributes are divided into two groups. An attribute is either a key or it is not. Figure 3.5 shows the key attributes for the logical model of the MIX entity. Notice that, in essence, the primary key attributes are above the line within the entity, and the rest of the attributes are below the line.
Rice. 3.5. All attributes that are not part of the primary key are located in essence below the delimiter. These can be key candidates, foreign and alternate keys, and simple attributes.
Key Attributes
Key attributes are those whose values determine the values of other attributes. The values of key attributes do not depend on the values of any other attributes. A key may consist of a single attribute or be composed of multiple attributes. These attributes can be primary keys, composite primary keys, key candidates, foreign keys, or alternate keys.
Primary key attributes
Whether the primary key is a single attribute or a group, its values determine the values of all other attributes.
A good primary key will have the following features:
- The value guarantees uniqueness for each of the instances
- Meaning has no hidden meaning
- The range of values will remain constant over time
- Values exist for each of the entity instances
Artificial primary keys
Artificial primary keys are attributes created for the sole purpose of identifying specific instances of an entity. In some cases, there is no attribute or group that uniquely identifies an entity instance. In other cases, a composite primary key is unwieldy and difficult to maintain. An artificial primary key is sometimes called a pseudo-key or a system-generated key. It is also known as an artificial unique identifier, which indicates its purpose.
An artificial primary key is often formed by a simple sequential numbering of each of the entity instances. An additional benefit of such artificial keys is that you don't have to care about the meaning of the entity instances associated with them, other than guaranteeing uniqueness. In fact, artificial primary keys created in this way are guaranteed to have the characteristics of good primary keys.
Note that most of the primary keys in Figure 3.5 are artificial. For the most part, the primary key is just a unique number for each of the instances.
Key Candidates
A key candidate is an attribute or group of attributes that identifies a particular instance of an entity. A key candidate provides a mechanism for determining potential primary keys to identify specific instances of an entity.
A key candidate that is not selected as the primary key is also called an alternate key. An alternate key is an attribute or group of attributes that can be used in indexing.
Foreign keys
A foreign key is an attribute or group of attributes that make up the primary key of another entity. A foreign key may or may not be a key attribute in a related entity. Note the term related entity. Foreign keys represent relationships between entities, which will be discussed in more detail in the next article.
Migrating Primary Key Attributes
Foreign key attributes are attributes of another entity's primary key that have been migrated to that entity through a relationship. Foreign keys can be either identifying or non-identifying. Identifying foreign keys become part of the primary key in the entity they are migrated to. Non-identifying foreign keys become non-key attributes.
Non-Key Attributes
Non-key attributes are attributes whose values depend on the values of the primary key or composite primary key. These non-key attributes must depend on the value of the key, the complete key, and nothing but the key.
In his book Strategic Systems Development K. Finklestein defines several types of non-key attributes:
- Selective attributes are attributes used to identify a single instance of an entity when the key is not unique. Also called secondary keys.
- Group attribute - an attribute that combines a group of more detailed attributes.
- Repeating group attributes are attributes that represent multiple occurrences of the same attribute within an entity.
- Derived attributes are attributes whose values are derived from the values of other attributes.
- Master data attributes are attributes that are not selective, group, repeat group, or derived attributes.
Selective, group, and repeating group attributes must not be present in a third normal form logical model. Selective attributes must become part of the primary key if they are needed to identify a single instance of an entity. Group attributes are multivalued. In my opinion, group attributes are best represented in the model as code or classification entities. As noted above, repeating groups should be rendered into subordinate entities.
In third normal form, non-key attributes must be simple (basic) or derived attributes.
Simple Attributes
Simple attributes are attributes that, as a result of decomposition, have been brought to the highest level of detail and, thus, their values are completely dependent on the primary key and are defined for each of the entity instances. They are not a selection criterion and cannot be used to group entities. They represent the simple atomic facts that the corporation is interested in.
Derived attributes
Derived attributes are attributes whose values are derived or calculated from the values of one or more other attributes. The question of the admissibility of the presence of derived attributes in a logical model is being actively discussed. Some experts believe that because derived attribute values depend on source attribute values, derived attributes should not be represented in the logical model.
The logical model is intended to represent the requirements for information in a complete and accurate manner. You may decide to derive attributes at the physical model level according to your usage requirements.
While the arguments for excluding derived attributes are understandable, the logical model is nonetheless the best place to enter names and descriptions for all attributes. Therefore, it is preferable to include derived attributes in the logical model. However, modelers should refrain from using derived attributes as primary keys or as part of composite primary keys. Also don't forget to include the inference rule in the derived attribute description.
Finding the scope of an attribute
The scope of an attribute defines a list of allowed values that an attribute can take on a particular entity instance. The scope includes at least the scope of the generic data type, and may include a scope specified by the user. In the completed logical model, you must find the scope for each of the attributes.
We can say that the scope of an attribute must contain at least two values. An attribute for which only one value is always allowed is probably not displayed correctly in the model. In Figure 3.5, there are two such attributes, Banana and Fudge.
The BANANA DESSERT entity has the Banana attribute. The business rule states that each instance of the BANANA DESSERT entity contains a banana. Therefore, the Banana attribute can only have one value and is probably not necessary. Instead, the description of the BANANA DESSERT entity should state that a banana is included in every instance of it. The same goes for the Fudge attribute in the CREAMY FENDER entity.
Table 3.1 shows the domains of the logical data types of the SPECIAL OFFER entity of the MIX logical model.
TABLE 3.1. Examples of boolean data types
Scope of simple and extended data types
The data type scope determines how the attribute values are represented. In a complete logical model, a data type scope is required for each attribute. The following list provides some examples of ERwin boolean data types:
- Datetime - date/time
- Number - number
- String - string
Many of the newer database platforms support more advanced data types. However, it is important to remember that these complex data types are, with a few exceptions, platform dependent. In any case, if the user needs an attribute, it must be included in the model, regardless of its data type. Some widely used extended data types are given below:
- image - image
- Sound - sound
- Video - video
User Defined Scopes
User-supplied scopes are specialized scopes that refine the set of values allowed for an attribute. These scopes are often organization-specific and should be defined and used consistently within a corporation. For example, an attribute with a scoped data type of Number can also have a user-supplied scope that limits the possible values between 1 and 100. The integrity principle gives a corporation the ability to make changes in one entity to expand the scope for each of the attributes that it uses.
Defining an optional attribute
The attribute value may or may not be required. If a value is required, or required, the value must be present at the time the instance is created. If the value is optional, you can create instances without it.
In the book Information Engineering: Strategic Systems Development K. Finklestein defines the mandatory property of an attribute through a series of "editing rules":
- It is added at once, it is impossible to change later.
- Added immediately, modified later.
- Added later, changed later.
- Added later, can not be changed later.
Watch carefully for optional attributes. If an attribute or set of attributes only has a value for specific instances of an entity, consider moving it to an entity where the value will exist for each of the instances.
Table 3.2 lists the mandatory property for attributes of the SPECIAL OFFER entity. Note that when instantiating the SPECIAL OFFER entity, values are required for all attributes except the Special Offer End Date attribute.
Table 3.2. Attribute Mandatory Examples
Table 3.2 shows a business rule that says that an instance of the SPECIAL OFFER entity requires the following information:
- Entity ID (Special Offer ID)
- Special Offer Flavor ID (Ice Cream ID and Flavor ID)
- Special offer start date (Special offer start)
An end date for each instance of the SPECIAL OFFER entity is optional. The business rule states that the SPECIAL OFFER must have a beginning, but does not have to have an end.
Attributes whose values are required cannot have empty values. Some experts believe that a value should be required for every instance of an entity. Naturally, assuming that the value of each attribute of an entity instance is found or known before the instance is created.
Attributes whose values are optional can have empty values. Some experts believe that an attribute should not be present in an entity if its value is not available to each of its instances. One reason is the difficulty of interpreting empty values. Does an empty value mean that the value is unknown to the instance, or has it simply not been received?
|
Attribute naming
Each attribute must have a clear, precise, and consistent name. The attribute name must not conflict with its description. The attribute name must refer to the values collected for attribute instances. The attribute name must be understandable and generally accepted by the corporation.
It is likely that you have a set of attribute naming conventions in your corporation that have been developed in your corporation or in the formation of an enterprise data model that you are guided by. The use of attribute naming conventions ensures that names are constructed uniformly within a corporation, regardless of who constructs the name.
Attribute naming conventions are important, whether you work in a small or a large organization. However, in a large organization with multiple development teams and a large number of users, naming conventions greatly help with the interaction and understanding of elementary data. Ideally, you should develop and maintain attribute naming conventions centrally and then document and publish them to the entire corporation.
Here are some guidelines for building an initial set of attribute naming conventions, just in case your organization doesn't have one yet:
- The attribute name should be descriptive enough. Consider using noun phrases in the form object/modifier/class.
- Whenever possible, the attribute name should include the name of the entity. Use "Name for person" instead of just "Name".
- The attribute name must refer to the values of specific instances of the attribute. Using the same name for attributes that contain different data, or different names for attributes that contain the same data, will unnecessarily confuse developers and end users.
- The attribute name must use the business language instead of the datasheet language.
- The attribute name must not contain special characters (such as !, @, #, $, %, l, &, *, etc.) or indicate ownership (Name belonging to a person).
- The attribute name must not contain acronyms or abbreviations unless they are part of an accepted naming convention.
It is preferable for modellers to use good naming conventions if they exist, or develop them if there are none.
Attribute names in the form Object/Modifier/Class
The object/modifier/class form is an industry-wide convention for naming attributes. This convention encourages the use of three-part attribute names. The part of the object is sometimes called the subject or the main word. The name of the entity is usually used as the object.
A modifier can be a single term or a group of terms. Although there is no list of standard modifiers, it is desirable for modellers to create short, meaningful modifiers. Using modifiers allows you to create descriptive, meaningful attribute names. If a name becomes unacceptably heavyweight for users or widely used, as required by a corporation, you can compromise by dropping trisyllabic attribute names.
The base part of an attribute name is a class that defines the type of information that the attribute represents. Some commonly used classes:
- Identifier
- Number
- Value
- Quantity
- Frequency
Attribute name examples
Within a corporation, it is always better to use consistent attribute names. Table 3.3 gives examples of good and bad names for attributes. Note that the words in the attribute name are separated by spaces, start with capital letters, and use lowercase characters for the rest.
TABLE 3.3. Attribute names with explanations
Good name |
Bad name |
Explanation |
Person First Name (Person's name) |
Name (Name) |
Name (Name) - the name of the class and needs the designation of the Person object (person) and the First modifier (first). |
Ice Cream Sales Quantity (Ice cream sales volume) |
The Quantity of Sales (Volume of sales) |
Quantity (Quantity) - the name of the class and should be in the last place (in the English version of the attribute name). "The" and "of" don't add any extra meaning. |
Item Cost Amount (Position value value) |
Cost of Item (Position value) |
"of" doesn't add any extra meaning. The class name "Amount" tells the user what should be in the attribute. |
Product Identifier (Product ID) |
Product Identifiers (Product IDs) |
"Identifiers" is plural. The attribute name must be a singular noun. |
Point of Sale Location Code |
POS code (POS code) |
"POS" is an abbreviation. The used class name "Code" (code) needs a modifier. |
Person Birth Date (Person's date of birth) |
Birthday (Birthday) |
Birthday (Birthday) does not contain the name of the class Date (Date). The inclusion of the modifier and the object name clarifies the meaning of the attribute name. |
Description of attributes
The attribute description should be a short explanation of the meaning of the attribute, not how it is used. The description of an attribute must not contradict its name and must not be a simple repetition of the name. Use the name of the class and object in the claim to accurately describe the data. If the attribute is displayed or calculated, include inference rules or calculation formulas. The following rules apply to attribute descriptions:
- The attribute description must be clear, complete, and unambiguous.
- The attribute description must match its name.
- The description of an attribute must not rely on the description of another attribute.
- The attribute description must be in the language of business, not in the language of technical descriptions.
- The name of an attribute should reflect its meaning, not how it is used.
- In the description of the attribute, all abbreviations and acronyms used in its name must be deciphered.
Modelers are encouraged to provide good descriptions for each of the attributes. Good attribute descriptions make it easy for everyone to use the model. Those who use a model created by a good developer experience the pleasure of well-stated information requirements in the model. Compare the examples from Table 3.4.
Table 3.4. Attribute names and descriptions with explanations
Attribute name |
good description |
Bad description |
Explanation |
Person First Name (Person's name) |
The name of the person, which allows the corporation to communicate with the person in friendly terms. |
A field with a length of 40 characters. |
Business language is not used. Technical terms applied. |
Ice Cream Sales Quantity (Ice cream sales volume) |
The quantity of ice cream of a particular variety sold in a particular sales event. |
Volume of sales. |
Doesn't add any new meaning, just rephrases the attribute name in vague terms. |
Item Cost Amount (Position value value) |
The value of a particular position in a particular period of time. Represents the total cost of sales and shipping. |
A six-digit decimal number with two decimal places. |
Too much technical description. Means almost nothing to users of the item. |
Product Identifier (Product ID) |
An artificial unique numeric identifier for a particular product. |
Product identifiers. |
A simple paraphrase of the attribute name. |
Point of Sale Location Code (POS location code) |
A unique code that identifies geographical position points of sale. |
The acronym used may not be clear to users. Also, an important modifier is omitted from the description. |
|
Person Birth Date (Person's date of birth) |
Person's date of birth. |
Person's birthday. |
The description omitted the class name "date". |
Common mistakes when working with attributes
This section on common attribute modeling mistakes does not claim to be complete. Its purpose is to point out the most common mistakes that modelers make.
Sometimes, when modeling something in a certain way, the modeler makes a conscious choice, guided by quite correct principles. It is very important to understand the whole causal chain of decisions and the results to which they can lead.
Modeling in terms of values
What is meant by modeling in terms of values? During a work session, users may tell you that they need a set of attributes that indicate the age categories of a PERSON entity instance. There are at least three problems with this scenario:
- The way corporations define age categories may change over time.
- The age of a particular person definitely changes over time.
- All attributes will represent attribute values Person's age. Naturally, Person's age will change over time, so the best solution is to use a simple attribute in the model Person's date of birth.
Modeling multivalued attributes
Multivalued attributes are those that have multiple values for a concept. Check attribute descriptions that indicate multiple values for the same concept.
Sometimes experts from different subject areas in a corporation use an attribute name that is spelled and pronounced the same but has different meanings for different experts. One way to make sure that attributes with the same name describe the same objects is to check the descriptions. Make sure attribute values describe a single concept.
For example, you can create artificial codes by connecting one or more codes to link previously unrelated data. Text snippets can hide many valuable attributes and values.
Failure to resolve multivalued attributes can leave some important business rules undetected and undocumented.
Modeling redundant attributes
Redundant attributes are those with different names but containing information about similar concepts. In all languages there are many words representing the same things. One way to find redundant attributes is to look at entities with similar attributes. Compare descriptions of all attributes to see if these entities contain data about similar concepts. Redundant attributes are often the result of a tendency to model values as attributes. For example, the MANAGER and EXECUTOR entities can contain the Manager Name and Executive Name attributes. Since both MANAGER and EXECUTOR are roles that an instance of the PERSON entity can act in, you can move this attribute there and name it PERSON Name.
Using Bad Names for Attributes
Unclear, ambiguous, or imprecise attribute names make it difficult for new users and development teams to reuse or evolve an existing model.
Do not use abbreviations or acronyms as part of an attribute name. Abbreviations and acronyms are open to misinterpretation and may even have different meanings in different subject areas.
Do not use proper names that point to a value for a particular instance. An attribute name that uses a proper name is an indicator of serious modeling problems that go beyond poor naming. Do not include the location as part of the attribute name. If a value exists for one location, it definitely exists for another location. An attribute name with a location indicates that you are modeling a specific instance instead of a class.
Using Bad Descriptions for Attributes
Do not use attribute descriptions borrowed only from a dictionary. Dictionary descriptions will not include business-relevant information that makes the attribute important to a corporation. Don't use a simple paraphrase of the attribute name. Do not use the attribute name in its description.
A vague, vague description of an attribute, or worse, its absence, makes it difficult to reuse or evolve an existing model. Users will not be able to verify that the model contains all of the information requirements. It also increases the likelihood of using concrete values and multivalued attributes instead of attributes in the model.
Concepts that seem obvious to all participants in work sessions may not be so obvious over time, when a new development team is tasked with building on an existing model.
Conclusion
Entities are facts about which the corporation is interested in accumulating and maintaining information. They constitute the essence of the model and are mainly revealed during working sessions. Full and accurate reflection of the attributes in the model requires careful analysis to ensure that the attributes exactly match the information requirements. The attribute must be present in the model in a single instance and must represent a single business concept. Normalization rules must be used to place attributes on the appropriate entities.
Attributes can be key or non-key. The key can be a single attribute or a group of attributes. Primary keys are selected from key candidates that uniquely identify an entity instance. Primary key attributes migrate from the original entity to become foreign keys of secondary entities. The values of non-key attributes must be functionally dependent on the value of the primary key.
The scope defines a set of attribute values. Logical scopes can be simple data types such as numbers or strings. They can also be complex user-defined data types that are tailored to meet the specific requirements of a corporation. Newer DBMSs support extended data types such as images and sound.
Attribute values can be required or optional. If the value is required, the attribute cannot have null values. The attribute must have a name and a description. When naming attributes, it is recommended to use the naming standard in the form object/modifier/class. Each attribute must include good description, which uses business terminology to define the essence of the attribute, not how it will be used.