TSO-ISPF JCL COBOL VSAM DB2 CICS Tools Articles Job Portal Forum Quiz Interview Q&A

DB2 Normalization

Normalization is a design approach that minimizes data redundancy and optimizes data structures by systematically and properly placing data elements into the appropriate groupings. Normalization is the process of efficiently organizing data in a database and is the process of decomposing large tables into smaller tables.

Goals of Normalization:

There are two goals of the normalization process:

  • Eliminating redundant data (for example, storing the same data in more than one table).

  • Ensuring data dependencies make sense (only storing related data in a table).

Both of these are worthy goals as they reduce the amount of space a database consumes and ensure that data is logically stored and avoid problems with inserting, updating, or deleting data.

Normal Form:

A series of guidelines were developed by the database community for ensuring that the databases are normalized. Those guidelines are represented by different normal forms.

Types of Normal forms are as follows:

  1. First Normal Form or 1NF

  2. Second normal Form or 2NF

  3. Third Normal Form or 3NF

  4. Fourth Normal Form or 4NF

  5. Fifth Normal Form or 5NF

In practical applications, in general, you use only 1NF, 2NF, and 3NF.

First Normal Form:

  • All entities must have a unique identifier or key, that can be composed of one or more attributes.
  • Eliminate repeating groups and non-atomic data from an entity.

The term atomic derives from atom, the smallest indivisible particle that can exist on its own.

First normal form eliminates repeating groups and non-atomic data from an entity.

To normalize a data model into 1NF, eliminate repeating groups into individual entities. In other words, do not use multiple attributes in a single entity to store similar data. Consider the sample data shown in table for a STUDENT information system for a college or university.

This data contains several violations of 1NF. First, you are tracking courses that really represent a repeating group for STUDENTs. So, the course information should be moved into separate entities. Furthermore, you need to specify identifiers for both entities. The identifier is the primary key for the entity

A second violation of 1NF is the non-atomic data shown in the StudentName attribute. A student name can be broken down into pieces: first name, middle initial and last name. It is not indivisible, and therefore violates first normal form.

Unnormalized STUDENT Data

2907Smith, Jacob RMATMathematicsMAT0011
Discrete Math
Calculus I
English Classics I
4019Patterson, Jane KPHIPhilosophyPHI0010
Intro to Philosophy
Programming Languages
5145Neeld, Norris BEGLEnglish LiteratureSOC0102Ascent of Man8/1/2002
6132Morrison, Xavier QMUSMusicMUS0002
Origin of Jazz
Ascent of Man
7810Brown, Richard ECSComputer Science
8966Juarez, SamanthaEGLEnglish LiteratureEGL0010
English Classics I
Shakespeare II

STUDENT Entity in 1NF

5145NeeldNorrisBEGLEnglish Literature
7810BrownRichardECSComputer Science
8966JuarezSamanthaEGLEnglish Literature

COURSE Entity in 1NF

2907MAT0011Discrete Math8/1/2002
2907MAT0027Calculus I4/30/2002
2907EGL0010English Classics I12/30/2001
4019PHI0010Intro to Philosophy4/30/2002
4019CS00100Programming Languages4/30/2002
5145SOC0102Ascent of Man8/1/2002
6132MUS0002Origin of Jazz4/30/2002
6132SOC0102Ascent of Man8/1/2002
8966EGL0010English Classics I12/30/2001
8966EGL0101Shakespeare II8/1/2002

Second Normal Form:

  • Should be in First Normal Form
  • Every non-key attribute is fully dependent on the key

Second normal form (2NF) ensures that all the attributes of each entity are dependent on the primary key.

Notice that certain courses repeat in the COURSE entity, namely "English Classics I" and "Ascent of Man." This situation indicates a violation of 2NF. To correct the problem, we need to identify the attributes that do not depend on the entire key and remove them. The removed attributes, along with the portion of the primary key on which they depend, are placed in a new entity, ENROLLMENT. The entire primary key of the original entity remains with the original entity.

Another benefit of the normalization process is that you will frequently encounter new attributes that need to be specified for the new entities that are created. For example, perhaps the new COURSE entity causes us to remember that each course is assigned a number of credits that count toward graduation. No changes were required for the STUDENT entity:



COURSE Entity in 2NF

MAT0011Discrete Math3
MAT0027Calculus I4
EGL0010English Classics I3
PHI0010Intro to Philosophy3
CS00100Programming Languages3
SOC0102Ascent of Man3
MUS0002Origin of Jazz3

Third Normal Form:

  • Should be in Second Normal Form
  • Every non-key attribute is non-transitively dependent on the primary key that is, every attribute in the entity should depend only on the key not on any other non-key attributes.

A rule of thumb for identifying 3NF violations is to look for groups of attributes whose values can apply to more than a single entity occurrence. When you discover such attributes, move them to a separate entity.

It is time to review our STUDENT information again, this time looking for 3NF violations. Examine the STUDENT data in closely. Notice that students can have the same major and, as such, certain major information can be repeated, specifically two students in our small sample are English Literature majors. To correct the problem, we need to remove major attributes that transitively depend on the key and create a new entity for them.

STUDENT Entity in 3NF


MAJOR Entity in 3NF

EGLEnglish Literature
CSComputer Science

A Normalized Data Model:

To be complete, a diagram should be developed for the 3NF data model we just created for the STUDENT data. Figure shows such a data model. Notice that we have not filled in the optionality of the relationships. We could do this based on the sample data we used, but we really need to ask more questions before we can answer questions such as Does a every student have to have a major? The current data shows this to be the case, but in reality; you know that most freshmen, and even upperclassmen, may attend college without having a formally declared major.

Student Data Model

image missing

Further Normal Forms:

Normalization does not stop with 3NF. Additional normal forms have been identified and documented. However, normalization past 3NF does not occur often in normal practice. The following are additional normal forms. Just for your information we’ve kept this.

Boyce Codd normal form (BCNF) is a further refinement of 3NF. Indeed, in his later writings Codd refers to BCNF as 3NF. A row is in Boyce Codd normal form if and only if every determinant is a candidate key. Most entities in 3NF are already in BCNF.

Fourth normal form (4NF) states that no entity can have more than a single one-to-many relationship if the one-to-many attributes are independent of each other. An entity is in 4NF if and only if it is in 3NF and has no multiple sets of multivalued dependencies.

Fifth normal form (5NF) specifies that every join dependency for the entity must be a consequence of its candidate keys.

image missing

If you have any doubts or queries related to this chapter, get them clarified from our Mainframe experts on ibmmainframer Community!

Are you looking for Job Change? Job Portal