Introduction to databases — Tutorial 2

Normalization

Question 1

Let \(R\) be a relational table with five columns \((A, B, C, D, E)\). The following set of functional dependencies hold:

\(AB \rightarrow C\)
\(C \rightarrow A\)
\(C \rightarrow B\)
\(C \rightarrow D\)
\(D \rightarrow E\)

Exercise

Exercise 1 Find all the candidate keys of table \(R\).

Solution

A candidate key is a set of columns on which all the other columns are functionally dependent.

From the functional dependencies 2 to 5, we see that all columns are functionally dependent on C.
From the functional dependency 1, we see that C is functionally dependent on {A, B}.
By transitivity, all columns are also functionally dependent on {A, B}.

In conclusion, we have two candidate keys: {A, B} and {C}.

Exercise

Exercise 2 We assume that \(R\) is in 1NF.

Is table \(R\) in 2NF? Justify your answer.
Is table \(R\) in 3NF? Justify your answer.

Solution

Based on the answer to the first exercise, we have that:

A, B and C are prime columns.
D, E are non-prime columns.

We conclude that:

R is in 2NF. In fact, all non-prime columns depend entirely on both candidate keys.
R is not in 3NF. In fact, the functional dependency 5. is between two non-prime columns.

Exercise

Exercise 3 If the table \(R\) from the previous exercise is not in 3NF, what changes would you make to turn \(R\) into a 3NF table?

Important. For each table that you create, specify the primary key and the foreign keys.

Solution

The table is not in 3NF. The functional dependency that violates the 3NF condition is the 5.

We need to create a new table R1, where:

the primary key is the determinant in the functional dependency that violates 3NF (D).

We then move all columns that are dependent on D (only E in our case) from R to R1.

Note that D is kept in R so that we can use it to link R with R1.

In summary:

\[R = \{A, B, C, D\}\]

\[R_1 = \{D, E\}\]

The primary key of R is {C} (we can also choose A, B if we want).
The primary key of R1 is {D}.

The column D in R is a foreign key referencing the column D in R1.

Question 2

Each table has its own set of functional dependencies. For the sake of simplicity, we should always identify a minimal set of functional dependencies.

A set of functional dependencies is minimal if it complies with the following properties:

Every functional dependency in the set is in canonical form: the right side consists of only one column.
Every functional dependency in the set is left-irreducible: it is impossible to remove any of the columns on the left side without losing any information.
Every functional dependency in the set is non-redundant. We should not be able to derive one functional dependency in the set from another.

Formally, a functional dependency is redundant if we can derive it by successive applications of the Armstrong’s axioms on the other functional dependencies. Armstrong’s axioms are detailed here.

Exercise

Exercise 4 Let \(R\) be a relational table with four columns \((A, B, C, D)\). \(\{A, B\}\) is the only candidate key of \(R\).

Identify a minimal set of functional dependencies on \(R\).

Justify your answer.

Solution

From the definition of candidate key, we derive two functional dependencies:

\[AB \rightarrow C\] \[AB \rightarrow D\]

Both functional dependencies are in canonical form (the right-side only consist of one column). Trivially, both functional dependencies are left-irreducible (we cannot remove any of the columns in the determinant without losing information).

Trivially, there is no redundancy in this set.

Exercise

Exercise 5 Would the set obtained in the previous exercise still be minimal, if we added the functional dependency \(B \rightarrow D\)?

Justify your answer and, in case of a negative answer, specify the minimal set of functional dependencies.

Solution

We consider the following functional dependencies:

\[AB \rightarrow C\quad (1)\] \[AB \rightarrow D\quad (2)\] \[B \rightarrow D\quad (3)\]

Intuitively, if D depends on B (and B alone), the fact that we add A, or any other column, to the left side, does not change this fact. Therefore, the functional dependency (2) is redundant.

Let’s prove it with the Armstrong’s axioms.

By using the augmentation axiom on (3) we obtain:

\[AB \rightarrow AD\quad (4)\]

By applying the decomposition axiom on (4) we obtain:

\[AB \rightarrow A\]

\[AB \rightarrow D\]

Since we managed to derive (2) from (3), we formally proved that (2) is redundant.

In summary, a minimal set of functional dependencies include the following functional dependencies:

\[AB\rightarrow C \] \[B\rightarrow D \]

Exercise

Exercise 6 We assume that \(R\) is in 1NF.

Is table \(R\) in 2NF? Justify your answer.
Is table \(R\) in 3NF? Justify your answer.

Solution

From the statement of the exercise we know that {A, B} is the only candidate key. Therefore:

A and B are prime columns.
C and D are non-prime columns.

Given the minimal set of functional dependencies obtained from the previous exercise, we conclude that:

R is not in 2NF. In fact, the non-prime column D is functionally dependent on only one part of the candidate key (the column B).
R is not in 3NF, because it is not in 2NF.

Exercise

Exercise 7 Normalize \(R\) to the 3NF form.

Important. For each table that you create, specify the primary and foreign keys.

Justify your choices.

Solution

The functional dependency that violates the 2NF is:

\[B \rightarrow D\].

We create a new table R1, where the determinant (B) of this functional dependency is the primary key.
We move the columns that depend on B (here, only D) from table R to table R1.

In summary, we have the following tables:

\[R = (A, B, C)\]

\[R_1 = (B, D)\]

{A, B} is the primary key of R.
{B} is the primary key of R1.
Column B in R is a foreign key referencing column B in table R1.

We can see that R is in 3NF because:

The only non-prime column is now C, therefore there is no partial dependency on the candidate key and there cannot be any dependency between non-prime columns.

Similar considerations apply to R1.

Question 3

Let \(R\) be a relational table with five columns \(A\), \(B\), \(C\), \(D\), \(E\). The following set of functional dependencies is given:

\(A \rightarrow B\)
\(A \rightarrow C\)
\(B \rightarrow C\)
\(D \rightarrow E\)

Exercise

Exercise 8 Is the given set of functional dependencies minimal? If not, which functional dependencies must be removed to obtain a minimal set?

Justify your answer.

Solution

The given set is not minimal, because the functional dependency 2. can be obtained by transitivity from 1. and 3. Therefore, we need to remove 2. to obtain a minimal set of functional dependencies.

Exercise

Exercise 9 Given the minimal set of functional dependencies obtained from the previous exercise, determine the candidate keys of \(R\).

Justify your answer.

Solution

Let’s recall our minimal set of functional dependencies:

\[A \rightarrow B\quad (1)\] \[B \rightarrow C\quad (2)\] \[D \rightarrow E\quad (3)\]

There aren’t any columns that are functionally dependent on C and E. Therefore, these two columns cannot be part of any candidate key (they are non-prime columns).
E functionally depends on only D; therefore, D must be part of the candidate key.
However, D alone does not imply all the other columns. We need to add either A or B or both.
It is easy to see that A cannot be functionally dependent on B. Therefore, {B, D} will never imply A and therefore this set cannot be a candidate key.
The only options left are : {A, D} or {A, B, D}.
Let’s consider {A, D}.
By applying the Armstrong’s composition axiom on (1) and (3), we get:

\[AD \rightarrow BE\quad (4)\]

By applying the Armstrong’s decomposition axiom on (4) we obtain:

\[AD \rightarrow B\quad (5)\]

\[AD \rightarrow E\quad (6)\]

By transitivity from (5) and (2) we obtain:

\[AD \rightarrow C\quad (7)\]

By applying Armostrong’s union axiom on (4) and (7), we obtain:

\[AD \rightarrow BCE\quad (8)\]

We can conclude that {A, D} is a candidate key. {A,B,D} is also a key, but since it contains {A, D} it is not a candidate key.

Exercise

Exercise 10 Given the candidate keys that you found in the previous exercise, which columns are prime? Which ones are non-prime?

Solution

Prime columns: A, D.
Non-prime columns: B, C, E

Exercise

Exercise 11 We assume that R is in 1NF. Which normal form is R in?

Justify your answer.

Solution

R is in only in 1NF.

It cannot be in 2NF because:

The non-prime column B has a partial dependency on the candidate key (because of functional dependency 1.).
The non-prime column E has a partial dependency on the candidate key (because of functional dependency 4.).

Note that, even if we consider the minimal set of functional dependencies, C is still dependent on A (by transitivity).

Therefore, we have three functional dependencies that violate the 2NF.

Exercise

Exercise 12 How would you turn R into 3NF?

Justify your answer.

Specify primary and foreign keys of all tables that you create.

Solution

Both prime columns A and D participate in at least one functional dependency that violates the 2NF. Therefore, we need to create two more tables R1 and R2.

R1 will have A as its primary key.
R2 will have D as its primary key.
Columns B and C are moved from R to R1 (because they depend on A); columns E is moved from R to R2 (because it depends on D).

In summary, we have the following tables:

\[R(A, D)\]

\[R1(A, B, C)\]

\[R2(D, E)\]

Note that the primary key of R is {A, D}.
Column A in table R is a foreign key referencing column A in table R1.
Column D in table R is a foreign key referencing column D in table R2.

Let’s verify whether these three tables are in 3NF:

Table R is trivially in 3NF: in fact, there are no non-prime columns.
Table R1 is not 3NF: in fact, C depends on B and they are both non-prime columns.
Table R2 is trivially in 3NF. In fact, the only candidate key is composed of just one column (therefore, partial dependencies are not possible) ; also, there is only one non-prime column, so dependencies between non-prime columns are not possible.

We still have to turn R1 to 3NF. We repeat the same logic.

We create a table R3, where B is the primary key. We move C from R1 to R3.
Column B in R1 is a foreign key referencing column B in R3.