Gestion et analyse des données — Tutorial 2

Normalization theory

In this tutorial you’ll learn:

How to obtain a non-redundant set of functional dependencies.
How to determine the candidate keys of a table given its functional dependencies.
How to determine the normal form of a table.

Question 1

We consider a relational table \(R\) with four columns \(A\), \(B\), \(C\) and \(D\). Let \(\mathcal{F}\) be the following set of functional dependencies:

\(AB \rightarrow C\)
\(D \rightarrow BC\)
\(A \rightarrow B\)

Exercise

Exercise 1 Derive a minimal set \(\mathcal{G}\) of functional dependencies that is equivalent to \(\mathcal{F}\).

Solution

We first need to write the functional dependencies in canonical form (the right-side of each FD must consist of only one column).

The FD (2.) can be rewritten using the decomposition axiom.

\(AB \rightarrow C\)
\(D \rightarrow B\)
\(D \rightarrow C\)
\(A \rightarrow B\)

Next, we need to make sure that all functional dependencies are left-irreducible.

All functional dependencies except the first one is trivially left-irreducible (the determinant consists of only one column). The first has two columns in the determinant, therefore we need to check whether we can eliminate one of the two columns and still preserve an equivalent set of functional dependencies.

We apply Armstrong’s axioms to compute the closure of this set of functional dependencies.

From (1.) and (4.), we can apply the pseudotransitivity axiom and we obtain:

\[A \rightarrow B \wedge AB \rightarrow C \implies AA \rightarrow C \implies A \rightarrow C\]

Since \(A \rightarrow C\) is in the cover, the column \(B\) in FD (1.) is useless and can therefore be omitted.

The set \(\mathcal{G}\) consists of the following FDs:

\(A \rightarrow C\)
\(D \rightarrow B\)
\(D \rightarrow C\)
\(A \rightarrow B\)

None of these functional dependencies are redundant.

Question 2

Let \(R\) be a relational table with five columns \((A, B, C, D, E)\). The following set \(\mathcal{F}\) of functional dependencies hold:

\(AB \rightarrow C\)
\(C \rightarrow A\)
\(C \rightarrow B\)
\(C \rightarrow D\)
\(D \rightarrow E\)

Exercise

Exercise 2 Specify the candidate keys of the table \(R\).

Solution

A candidate key is a set of columns that imply all the other columns.

First, let’s try sets composed of only one column: \(\{A\}\), \(\{B\}\), \(\{C\}\), \(\{D\}\) and \(\{E\}\).

We have the following: (\(\{X\}^+_{\mathcal{F}}\) indicates the set of all columns implied by \(X\)).

\(\{A\}^+_{\mathcal{F}} = \{A\}\)
\(\{B\}^+_{\mathcal{F}} = \{B\}\)
\(\{C\}^+_{\mathcal{F}} = \{A, B, C, D, E\}\)
\(\{D\}^+_{\mathcal{F}} = \{D, E\}\)
\(\{E\}^+_{\mathcal{F}} = \{E\}\)

Therefore, \(\{C\}\) is a candidate key because it implies all the other columns.

From the functional dependency (1), we obtain that \(AB\) implies \(C\); therefore, by transitivity they imply all the other columns.

In conclusion, we have two candidate keys: \(\{C\}\) and \(\{A, B\}\).

Exercise

Exercise 3 We assume that \(R\) is in 1NF.

Is table \(R\) in 2NF? Justify your answer.
Is table \(R\) in 3NF? Justify your answer.
Is table \(R\) in BCNF? Justify your answer.

Solution

\(R\) is in 2NF. In fact, all non-prime columns depend entirely on both candidate keys.
\(R\) is not in 3NF. In fact, the functional dependency \(D \rightarrow E\) is between two non-prime columns.
As a result, \(R\) is not in BCNF either.

Exercise

Exercise 4 If the table \(R\) from the previous exercise is not in BCNF, how would you change the schema so that BCNF is satisfied? For each table, specify the primary key and the foreign keys linking the tables to each other.

Solution

The table is not in 3NF. The offending functional dependency is \(D \rightarrow E\).

We need to create a new table \(R_1\), where the primary key is the determinant in the offending functional dependency (\(D\)). We then move all columns that are dependent on \(D\) (only \(E\) in our case) from \(R\) to \(R_1\). Note that \(D\) is kept in \(R\) so that we can use it to link \(R\) with \(R_1\).

In summary:

\[R = \{A, B, C, D\}\]

\[R_1 = \{D, E\}\]

The primary key of \(R\) is \(\{C\}\) (or, we can also choose \(\{A, B\}\) if we want). The primary key of \(R_1\) is \(\{D\}\).

The column \(D\) in \(R\) is a foreign key referencing the column \(D\) in \(R_1\).

Question 3

Let \(R\) be a relational table with four columns \((A, B, C, D)\). \(\{A, B\}\) is the only candidate key.

Exercise

Exercise 5 Identify a minimal set of functional dependencies. Justify your answer.

Solution

A candidate key implies all the other columns of the table. Therefore we have :

\[AB \rightarrow C\] \[AB \rightarrow D\]

Both functional dependencies are in canonical form (the right side only consist of one column). Trivially, both functional dependencies are left-irreducible (we cannot remove any of the columns in the determinant without losing information). So, this is a minimal set of functional dependencies.

Exercise

Exercise 6 Add \(B \rightarrow D\) to the set of functional dependencies that you identified in the previous exercise. Modify the minimal set of functional dependencies accordingly. Justify your answer.

Solution

We consider the following functional dependencies:

\(AB \rightarrow C\)
\(AB \rightarrow D\)
\(B \rightarrow D\)

The FD (3) clearly makes FD (2) redundant. By using the augmentation axiom on (3) we obtain:

\(AB \rightarrow AD\)

By applying the decomposition axiom on (4) we obtain:

\(AB \rightarrow A\)
\(AB \rightarrow D\)

Which shows that (2) is redundant.

In summary, we have the following functional dependencies:

\[AB\rightarrow C \] \[B\rightarrow D \]

Exercise

Exercise 7 We assume that \(R\) is in 1NF.

Is table \(R\) in 2NF? Justify your answer.
Is table \(R\) in 3NF? Justify your answer.
Is table \(R\) in BCNF? Justify your answer.

Solution

\(R\) is not in 2NF. In fact, the non-prime column \(C\) is functionally dependent on only one part of the candidate key.
\(R\) is not in 3NF, let alone BCNF, because it doesn’t fulfill the first condition, that is being in 2NF.

Exercise

Exercise 8 Normalize \(R\) to the BCNF form. Justify your choices.

Solution

The functional dependency that gives a partial dependency is: \(B \rightarrow D\). We create a new table \(R_1\), where the determinant (\(B\)) of the offending functional dependency is the primary key. We move the columns that depend on \(B\) (here, only \(D\)) from table \(R\) to table \(R_1\).

In summary:

\[R = (A, B, C)\]

\[R_1 = (B, D)\]

We can see that \(R\) is in 3NF because:

The only non-prime column is \(C\), therefore there cannot be any dependency between non-prime columns.

\(R\) is in BCNF because:

The only functional dependency \(AB \rightarrow C\) has a key as its determinant.

Similarly, for \(R_1\).

Question 4

We consider the following table:

Patient (ssn, first_name, last_name, phone_number, insurance_number, 
         insurance_expiration_date)

where the following set \(\mathcal{F}\) of functional dependencies holds:

ssn -> first_name, last_name, phone_number, insurance_number, 
       insurance_expiration_date 

insurance_number -> insurance_expiration_date

Exercise

Exercise 9 Derive a minimal set \(\mathcal{G}\) of functional dependencies that is equivalent to \(\mathcal{F}\).

Solution

First, we need to rewrite the FDs in canonical form.

\(ssn \rightarrow first\_name\)
\(ssn \rightarrow last\_name\)
\(ssn \rightarrow phone\_number\)
\(ssn \rightarrow insurance\_number\)
\(ssn \rightarrow insurance\_expiration\_date\)
\(insurance\_number \rightarrow insurance\_expiration\_date\)

The determinant of each FD is composed of only one column, therefore it is already left-irreducible. It is easy to see that all FDs with ssn as a determinant must be kept (otherwise we lose some information).

By transitivity from 4. and 6. we obtain:

\[ssn \rightarrow insurance\_expiration\_date\]

Therefore, \(\mathcal{G}\) is obtained from \(\mathcal{F}\) by removing 5.

Exercise

Exercise 10 Given \(\mathcal{G}\), identify the candidate keys in the table Patient.

Solution

From the functional dependencies in \(\mathcal{F}\), it’s easy to see that that the only column that implies all the others is ssn. Therefore, {ssn} is the only candidate key in this table.

Exercise

Exercise 11 Specify the normal form of the table Patient. Justify your answer.

Solution

It is immediate to verify that the table is 1NF.
The table is 2NF because there is only one candidate key, which is composed of only one column. Therefore, there cannot be any partial dependency.
The table is not in 3NF. Indeed, there is a functional dependency between two non-prime columns:

\[insurance\_number \rightarrow insurance\_expiration\_date\]

Exercise

Exercise 12 How would you normalize table Patient to BCNF? Justify your answer.

Solution

The offending functional dependency is:

\[insurance\_number \rightarrow insurance\_expiration\_date\]

We need to create another table (let’s call it Insurance) that contains the determinant of the offending functional dependency (insurance_number) as its primary key. We need to move the colum (insurance_expiration_date) that depends on insurance_number from table Patient to table Insurance.

In summary:

Patient   (ssn, first_name, last_name, phone_number, insurance_number)
Insurance (insurance_number, insurance_expiration_date)

Note that the column insurance_number in table Patient is foreign key to the column insurance_number in table Insurance.