Entity types in the database. Database fundamentals. ER-model (entity-relationship). Defining Entity Types

Space heating - 04.01.2021

Creating a database begins with design.

Database design stages:

· Study of the subject area;

· Data analysis (entities and their attributes);

· Definition of relations between entities and definition of primary and secondary (foreign) keys.

During the design process, the structure of the relational database is determined (composition of tables, their structure and logical connections). The structure of a table is determined by the composition of the columns, the data type and sizes of the columns, and the keys of the table.

Back to basic concepts database models "entity-relationship" include: entities, relationships between them and their attributes (properties).

Essence - any concrete or abstract object in the subject area under consideration. Entities are the basic types of information that are stored in the database (in a relational database, each entity is assigned a table). Entities can include: students, clients, departments, etc. Entity instance and entity type are different concepts. The concept of an entity type refers to a set of homogeneous persons, objects or events acting as a whole (for example, a student, a client, etc.). An entity instance refers, for example, to a particular person in a set. An entity type can be a student, and an instance can be Petrov, Sidorov, etc.

Attribute is a property of an entity in the subject area. Its name must be unique for a particular entity type. For example, for the entity student, the following attributes can be used: last name, first name, patronymic, date and place of birth, passport data, etc. In a relational database, attributes are stored in table fields.

Connection – the relationship between entities in the subject area. Relationships are connections between parts of the database (in a relational database, this is a connection between table records).

Entities are data that are classified by type, and relationships show how these data types relate to each other. If we describe a certain subject area in terms of an entity - a connection, then we get model entity - relationship for this database.

Consider the subject area: Dean's Office (Student Achievement)
The database "Dean's office" should store data on students, groups of students, student grades in various disciplines, teachers, scholarships, etc. We confine ourselves to data on students, groups of students, and student grades in various disciplines. Let's define entities, entity attributes and basic requirements for database functions with limited data.

The main subject-significant entities of the database "Dean's Office" are: Students, Groups of students, Disciplines, Progress.

The main subject-relevant attributes of entities:
-students - last name, first name, patronymic, gender, date and place of birth, a group of students;
-groups of students - name, course, semester;
-disciplines - name, number of hours
- progress - assessment, type of control.

Basic requirements for database functions:
- select the student's progress in disciplines indicating the total number of hours and type of control;
-select students' performance by groups and disciplines;
- choose the disciplines studied by a group of students in a particular course or
certain semester.

From the analysis of the subject area data, it follows that each entity must be assigned the simplest two-dimensional table (relationships). Next, you need to establish logical relationships between tables. It is necessary to establish such a relationship between the Students and Progress tables so that each record from the Students table corresponds to several records in the Progress table, i.e. one-to-many, as each student can have multiple grades.

The logical relationship between Group entities - Students is defined as one-to-many based on the fact that there are many students in the group, and each student is part of one group. The logical relationship between the entities Disciplines - Progress is defined as one-to-many, because for each discipline, several grades can be given to different students.

Based on the above, we compose an entity-relationship model for the Dean's office database

Arrow is symbol relationships: one-to-many.

To create a database, you need to use one of the well-known DBMS, for example, the Access DBMS.

The concept of the ER model. The concept of an entity. attributes. Attribute types

1. What problems can a developer have when designing a database?

When designing a database and developing a software product, the most important problem is the problem of interaction between the developer and the customer. The task of the developer is to most accurately recreate the wishes of the customer when developing a database management software product. The main problem that the developer needs to solve is the correct construction of the database, or rather the schema (structure) of the database.

In addition, the developer additionally encounters other difficulties, which include:

search for efficient algorithms;
selection of appropriate data structures;
debugging and testing complex code;
design and usability of the application interface.

On development stage software manager of the database, the developer must learn in detail the requirements of the customer. The database should be designed in such a way that it is understandable, most accurately reflects the problem being solved, and does not contain redundancy in the data.

To facilitate the process of developing (designing) a database, the so-called semantic models data. For different types The most famous database is the ER data model (Entity-Relationship model).

2. What is an ER-model (Entity-relationship model)? Why do you need to develop an ER model?

ER-model (Entity-relationship model or Entity-relationship diagram) is a semantic data model that is designed to simplify the database design process. All types of databases can be generated from the ER model: relational, hierarchical, network, object. The ER model is based on the concepts of "entity", "relationship" and "attribute".

For large databases, building an ER model avoids design errors that are extremely difficult to fix, especially if the database is already in operation or at the testing stage. Mistakes in the design of the database structure can lead to alteration of the code of the software that manages this database. As a result, time, money and human resources will be used inefficiently.

An ER model is a representation of a database in the form of visual graphic diagrams. An ER model visualizes a process that defines a certain subject area. An entity-relationship diagram is a diagram that graphically represents entities, attributes, and relationships.

The ER model is only a conceptual level of modeling. The ER model does not contain implementation details. For the same ER model, the details of its implementation may differ.

3. What is an entity in a database? Examples

An entity in a database is any object in a database that can be distinguished based on the essence of the subject area for which this database is being developed. The database designer must be able to properly define entities.

Example 1 The following entities can be distinguished in the bookstore database:

book;
provider;
store placement.

Example 2 In the database of accounting for the educational process of some educational institution, the following entities can be distinguished:

students (pupils);
teachers;
groups;
disciplines being studied.

4. What are the varieties of entity types? Designation of entity types in the ER model

In the "entity" - "relationship" model, there are two types of entity types:

weak type. This entity type is dependent on a strong entity;
strong type. This is an independent type of entity that does not depend on anyone.

Figure 1 shows the weak and strong entity type designations in the ER model.

Rice. 1. Designation of strong and weak entity types

5. What are attributes for? Attribute types. Designation of attributes on the ER model

Each entity type has a specific set of attributes. Attributes are intended to describe a particular entity.

There are the following types of attributes:

simple attributes. These are attributes that can be part of compound attributes. These attributes consist of one component. For example, simple attributes include: the code of a book in a library or a student's course in an educational institution;
composite attributes. These are attributes that consist of several simple attributes. For example, the address of residence may contain the name of the country, locality, streets, house numbers;
unambiguous attributes. These are attributes that contain only one single value for some entity. For example, the attribute "Grade book number" for the "Student" entity type is unambiguous, since a student can have only one grade book number (one value);
polysemantic attributes. These are attributes that can contain multiple values. For example, the multi-valued attribute "Phone number" for the entity "Student", since a student can have several phone numbers (home, mobile, etc.);
arbitrary attributes. These are attributes whose value is formed based on the values of other attributes. For example, a student's current year of study can be calculated based on the difference between the current year of study and the student's year of entry into educational institution(if the student had no problems with his studies and studied well the discipline "Organization of databases and knowledge").

On the ER diagram, attributes are designated as shown in Figure 2. As can be seen from the figure, any attribute is designated as an ellipse with a name inside the ellipse. If the attribute is the primary key, then its name is underlined.

Figure 2. Representation of attributes on ER model diagrams

6. How are entity types and attributes of the ER model implemented in real databases and the programs they manage?

When developing database management programs, entity types and their attributes can be represented in different ways, while adhering to several approaches:

choose a well-known technology as a data source (for example, Microsoft SQL Server, Oracle Database, Microsoft Access, Microsoft ODBC Data Source, etc.), which has already been researched, tested, standardized and has a huge set of database management tools;
develop own format database and implement methods for its processing, and implement interaction with known data sources in the form special teams like Import/Export. In this case, you will have to program all the routine work to maintain and ensure the reliable operation of the database with your own hands;
implement a combination of the above two approaches. Modern means software developers have a powerful set of libraries for processing complex sets and visualizing data in them (collections, arrays, visualization components, etc.).

If the database is implemented in well-known relational DBMS (for example, Microsoft Access, Microsoft SQL Server, etc.), then the entity types are represented by tables. Attributes from the ER model correspond to the fields of the table. One entry in a database table represents one entity instance.

Each kind of attribute is implemented as follows:

simple attribute or an unambiguous attribute can be represented by an accessible set of basic types that are found in any programming language. For example, integer attributes are represented by type int , integer , uint , and so on; attributes containing a fractional part can be represented by type float , double ; string attributes of type string, etc.;
composite attribute is an object that includes several nested simple attributes. For example, in the Microsoft Access DBMS, a composite attribute of a certain table can be formed based on a set of simple types (fields). In programming languages, the union of fields is implemented by structures or classes;
multivalued attribute can be implemented as an array or collection of simple or compound attributes;
arbitrary attribute implemented by an additional field that is calculated when accessing the table. Such a field is called a calculated field and is formed on the basis of other fields in the table;
attribute that is the primary key can be integer, string, or any other ordinal type. In this case, the value of each table cell that corresponds to the primary key is unique. Most often, the primary key is an integer type (int , integer ).

If the database is implemented in a unique format, then it is most convenient to represent entity types as classes or structures. Entity attributes are implemented as fields (internal data) of the class. Class methods implement the necessary processing of class fields (attributes). Interaction (communication) between classes is implemented using specially designed interfaces using well-known design patterns.

7. An example of a fragment of the ER model for the entity type "Student"

The above example demonstrates a fragment of the ER model for the Student entity type.

Figure 3. Fragment of the ER model for the entity type "Student"

The above figure declares the following attributes, which in the DBMS (program) can have the following types:

attribute Primary key - is a unique integer value that is generated automatically. In the DBMS, this is a counter field;
entry year attribute is a simple attribute that can be implemented as an integer value (int , integer );
attribute Phone number is a multi-valued attribute that can be implemented as an array or collection, etc.;
attribute Record book number- a simple attribute that can be implemented as a character string, since the gradebook number can contain letters in addition to numbers;
the attribute Country , City , Street , House number are the attributes that form the composite attribute Address . All these attributes can be of string (text) type (string , Text );
the attribute Last Name , First Name , Patronymic are simple attributes that are part of the compound attribute Student Name . All these attributes can be of string (text) type (string , Text );
the Birthday attribute is a simple attribute of the Date type (DateTime );
attribute Student age is a calculated field that is defined as the difference between the current (system) date and the value of the Birthday attribute.

The term "relational" means "relationship-based". A relational database consists of entities (tables) that have some relationship with each other. The name came from English word relation.
Database design consists of two main phases: logical and physical modeling.
During logical modeling, you collect requirements and develop a database model that is independent of a particular DBMS (relational database management system). It's like creating blueprints for your house. You could think over and draw everything: where the kitchen, bedrooms, living room will be. But this is all on paper and in layouts.
During physical modeling, you create a model that is optimized for a specific application and DBMS. It is this model that is implemented in practice. If we return to the house from the previous paragraph, at this stage you will have to build a house somewhere - carry logs, bricks ...

The database design process consists of the following steps:

collection of information;
definition of entities;
defining attributes for each entity;
defining relationships between entities;
normalization;
transformation to a physical model;
database creation.

The first 5 stages form the logical design phase and the remaining two form the physical modeling phase.

Logic phase

The logical phase consists of several stages. They are all discussed below.

Gathering Requirements

At this stage, you need to determine exactly how the database will be used and what information will be stored in it. Gather as much information as possible about what the system should and shouldn't do.

Entity definition

At this stage, you need to define the entities that the database will consist of.

An entity is an object in a database that stores data. An entity can be something real (a house, a person, an object, a place) or an abstract thing (a banking transaction, a department of a company, a bus route). In the physical model, an entity is called a table.

Entities are made up of attributes (columns in a table) and records (rows in a table).

Typically, databases are made up of several primary entities associated with a large number of subordinate entities. Core entities are called independent: they do not depend on any other entity. Subordinate entities are called dependent: in order for one of them to exist, the main table associated with it must exist.
In diagrams, entities are usually represented as rectangles. The name of the entity is indicated inside the rectangle:

Any table has the following characteristics:

there are no identical lines in it;
all columns (attributes) in the table must have different names;
elements within the same column have the same type (string, number, date);
the order of the rows in the table can be arbitrary.

At this stage, you need to identify all categories of information (entities) that will be stored in the database.

Attribute Definition

An attribute represents a property that describes an entity. Attributes are often a number, date, or text. All data stored in an attribute must be of the same type and have the same properties.
In the physical model, attributes are called columns.
After defining the entities, it is necessary to define all the attributes of these entities.
In diagrams, attributes are usually listed within the entity rectangle. In the figure you will find an example of the "Houses" database, only now some attributes are defined for the entities from this database.

Each attribute defines the data type, size, allowed values, and any other rules. These include mandatory, mutable, and uniqueness rules.
The mandatory rule determines whether an attribute is a required part of an entity. If the attribute is an optional part of the entity, then it can be NULL, otherwise not.
You must also determine if the attribute is mutable. Some attribute values cannot change after the entry is created.
And finally, you need to determine if the attribute is unique. If so, then the attribute values cannot be repeated.

Keys

A key is a set of attributes that uniquely identifies an entry. Keys are divided into two classes: simple and compound.
A simple key consists of only one attribute. For example, in the "Passports of the country's citizens" database, the passport number will be a simple key: after all, there are no two passports with the same number.
A composite key consists of several attributes. In the same database "Passports of citizens of the country" there can be a composite key with the following attributes:
surname, name, patronymic, date of birth. This is just an example, since this composite key, in theory, does not provide guaranteed uniqueness of the record.
There are also several types of keys, which are described below.

Possible key

A candidate key is any set of attributes that uniquely identifies an entry in a table. The candidate key can be simple or compound.
Each entity must have at least one possible key, although there may be more than one possible key. None of the primary key attributes can have a NULL value.
A candidate key is also called a surrogate key.

Primary Keys

A primary key is a set of attributes that uniquely identify a record in a table (entity). One of the possible keys becomes the primary key. In diagrams, primary keys are often shown above the main list of attributes or are highlighted with special characters. The entity in the figure has both key and regular attributes.

Alternative Keys

Any possible key that is not the primary key is called an alternate key. An entity can have multiple alternate keys.

Foreign keys

A foreign key is a collection of attributes that refer to the primary or alternate key of another entity. If the foreign key is not associated with the primary entity, then it can only contain null values. If the key is also composite, then all attributes of the foreign key must be undefined.
In diagrams, attributes that are combined into foreign keys are denoted by special symbols. The figure shows two related entities (Houses and their Owners) and the foreign keys formed by them (after all, one person can own more than one house).

Keys are logical constructs, not physical objects. Relational databases have mechanisms to store keys.

Defining Relationships Between Entities

Relational databases allow you to combine information belonging to different entities.
A relationship is a situation in which one entity refers to the primary key of a second entity. Like, for example, the entities House and Master in the previous figure.
Relationships are defined during the base design process. To do this, you should analyze the entities and identify the logical relationships that exist between them.
The relationship type determines the number of entity records associated with another entity record. Relationships are divided into three main types, which are described below.

One to one

Each entry of the first entity corresponds to only one entry from the second entity. And each record of the second entity corresponds to only one record from the first entity. For example, there are two entities: People and Birth Certificates. And one person can only have one birth certificate.

One-to-many

Each record of the first entity can correspond to several records from the second entity. However, each entry of the second entity corresponds to only one entry from the first entity. For example, there are two entities: Order and Order Item. And there can be many items in one order.

many-to-many

Each record of the first entity can correspond to several records from the second entity. However, each record of the second entity can correspond to several records from the first entity. For example, there are two entities: Author and Book. One author can write many books. But a book can have multiple authors.
According to the criterion of mandatory relations are divided into mandatory and optional.

A mandatory relationship means that for each entry from the first entity, there must be related entries in the second entity.
An optional relationship means that a record from the first entity may not have a record in the second entity.

Normalization

Normalization is the process of removing redundant data from a database. Each data element must be stored in the database in one and only one instance. There are five common forms of normalization. As a rule, the database is reduced to the third normal form.
During the normalization process, certain actions are performed to remove redundant data. Normalization improves performance, speeds up sorting and index building, reduces the number of indexes per entity, and speeds up insert and update operations.
A normalized database is usually more flexible. When modifying queries or persisted data, a normalized database typically requires fewer changes, and changes have fewer consequences.

First normal form

To convert an entity to first normal form, you must eliminate duplicate groups of values and ensure that each attribute contains only one value, lists of values are not allowed.
In other words, each attribute in an entity should only be stored in one instance.
For example, in the figure, the House entity is not normalized. It contains several attributes for storing data about the owners of the house (the House entity does not correspond to the first normal form).

To bring the House entity to the first normal form, it is necessary to remove the repeated groups of values, that is, remove the Owner 1-3 attributes, placing them in a separate entity. Result (Entity House reduced to first normal form):

Second normal form

A table in second normal form contains only the data that applies to it. Values of non-key entity attributes depend on the primary key. More precisely, attributes depend on the primary key, on the entire primary key, and only on the primary key.
Entities must be in first normal form to conform to second normal form.
For example, the entity House in the figure has an attribute Price per liter of gasoline, which has nothing to do with houses. This attribute is removed (or you can move it to another entity). And also we move the Mayor attribute to a separate entity - this attribute depends on the city where the house is located, and not on the house.
The figure shows the essence House in the second normal form (the Essence House reduced to the second normal form).

third normal form

Third normal form excludes attributes that do not depend on the entire key. Any entity that is in third normal form is also in second normal form. This is the most common form of a database.
In third normal form, every attribute depends on the key, on the whole key, and on nothing but the key.
For example, the House Owner entity in the figure has a Zodiac sign attribute that depends on the date of birth of the owner of the house, and not on his name (which is the key).
To cast the entity Owner of the house, you need to create the entity Signs of the Zodiac and transfer the attribute Sign of the Zodiac there (Entity Owner of the house, reduced to the third normal form):

Restrictions

Constraints are the rules enforced by the database management system. Constraints define the set of values that can be entered in a column or columns.
For example, you do not want the order amount in your very cool store to be less than 500 rubles. You simply set a limit on the Order Amount column.

Stored procedures

Stored procedures are precompiled procedures stored in a database. Stored procedures can be used to define business rules and can be used to perform more complex calculations than constraints alone.
Stored procedures can contain program flow logic as well as database queries. They can take parameters and return results as tables or single values.
Stored procedures are just like regular procedures or functions in any program.

NOTE
Stored procedures reside in the database and run on the database server. They are generally faster than SQL statements because they are stored in compiled form.

Data integrity

By organizing the data into tables and defining the relationships between them, we can assume that a model has been created that correctly reflects the business environment. Now we need to ensure that the data entered into the database gives a correct idea of the state of the matter. In other words, you need to enforce business rules and maintain the integrity of the database.
For example, your company is engaged in the delivery of books. You are unlikely to accept an order from an unknown client, because then you will not even be able to deliver the order. Hence the business rule: orders are accepted only from customers whose information is in the database.
The correctness of data in relational databases is ensured by a set of rules. Data integrity rules fall into four categories.

Entity Integrity- each entity record must have a unique identifier and contain data. After all, you need to somehow distinguish between all these records in the database.
Attribute Integrity- each attribute accepts only valid values. For example, the purchase amount can definitely not be less than zero.
Referential Integrity- a set of rules that ensure the logical consistency of primary and foreign keys when inserting, updating and deleting records. Referential integrity ensures that for every foreign key there is a corresponding primary key. Let's take the previous example with the entities Home Owner and Home. Let's say you are Vasya Ivanov and own a house. You changed your last name to Sidorov and made the appropriate changes to the House owner entity. Definitely you would like your house to continue to be yours under your new name, and not belong to a certain Vasya Ivanov, who no longer exists.
Custom Integrity Rules- any integrity rules that do not belong to any of the listed categories.

triggers

Trigger is an analogue of a stored procedure, which is called automatically when the data in the table changes.
Triggers are a powerful mechanism for maintaining database integrity. Triggers are called before or after data changes in the table.
With the help of triggers, you can not only undo these changes, but also change the data in any other table.
For example, you are creating an Internet forum and you want to make sure that the forum list shows the latest forum post. Of course, you can take a message from the Forum Posts entity, but this will increase the complexity of your request and its execution time. It's easier to add a trigger to the Forum Posts entity that records the last post added to the Forums entity, in the Last Post attribute. This will greatly simplify the query.

Business rules

Business rules define the restrictions placed on the data according to the requirements of the business (those for whom you are creating the base). Business rules may consist of a set of steps required to complete a particular task, or they may simply be checks that verify that the data entered is correct. Business rules may include data integrity rules. Unlike other rules, they the main objective- ensure the correct conduct of business operations.
For example, in the Very Tough Guys company, it may be customary that only white, blue, and black cars are purchased for official use.
The business rule for the Vehicle Color attribute of the Company Vehicles entity would then be that the vehicle can only be white, blue, or black.
Most DBMSs provide the means to:

to specify default values;
to check the data before entering it into the database;
to maintain relationships between tables;
to ensure the uniqueness of values;
for storing stored procedures directly in the database.

All of these features can be used to implement business rules in a database.

Physical model

The next step, after creating the logical model, is to build the physical model. The physical model is the practical implementation of the database. The physical model defines all the objects that you have to implement.
When moving from a logical model to a physical entity, they are converted to tables, and attributes to columns.
Relationships between entities can be converted to tables or left as foreign keys.
Primary keys are converted to primary key constraints. Possible keys are in uniqueness constraints.

Denormalization

Denormalization- this is a deliberate change in the structure of the base that violates the rules of normal forms. This is usually done to improve database performance.
Theoretically, one should always strive for a fully normalized base, but in practice, a fully normalized base almost always means a performance drop. Over-normalizing a database can result in multiple tables being accessed each time data is retrieved. Typically, four tables or fewer must participate in a query.
Standard denormalization techniques are: combining several tables into one, storing the same attributes in several tables, and storing summary or calculated data in a table.

The term "relational" means "relationship-based". A relational database consists of entities (tables) that have some relationship with each other. The name comes from the English word relation.
Database design consists of two main phases: logical and physical modeling.
During logical modeling, you collect requirements and develop a database model that is independent of a particular DBMS (relational database management system). It's like creating blueprints for your house. You could think over and draw everything: where the kitchen, bedrooms, living room will be. But this is all on paper and in layouts.
During physical modeling, you create a model that is optimized for a specific application and DBMS. It is this model that is implemented in practice. If we return to the house from the previous paragraph, at this stage you will have to build a house somewhere - carry logs, bricks ...