Introduction to Databases — Final project

Assignment


Introduction

The goal of this project is to assess your knowledge of the main notions presented in classroom.

The project must be implemented by students working in groups.

Creating the groups

Create the groups on Edunao

This project consists in designing a relational database for the given application context, importing the data of a given dataset into a PostgreSQL database and querying the data in SQL.

The deadline to submit the work is Friday November 24th, 11:59 PM.

Submit your project

The submission site is available at this link.

Please note that:

  • You must be member of a group to submit your work.
  • Only one submission per group is possible. Two members of the same group cannot submit two different versions.

You must submit three files, please follow these instructions:

  • A report in PDF containing the answers to all the questions and exercises, except the SQL queries (max 10 pages). Please name the file groupXX_report.pdf, where XX is your group number.

  • A file containing the PostgreSQL database that you generated. Please name the file groupXX_db.sql. To generate this file, please watch this video.

  • A file with all the SQL queries. Please name the file groupXX_queries.sql. Separate each query in the file by a blank line.

Example:

SELECT *
FROM Airport 

SELECT *
FROM Hotel 

Each group must submit an original work. Edunao is equipped with an anti-plagiarism tool that is able to check the originality of each submission. Submissions that are flagged as too similar by the tool are likely to be discarded and will not be evaluated.

Application context

We intend to manage the data of a travel reservation system with clients all over the world. Upon registration, customers are automatically given a numeric identifier and they are asked to indicate their first and family names, their gender, date of birth, a phone number, an email address and their country of residence.

Any customer can book a trip that includes the reservation of one or more flights and, possibly, one or more hotels.

Example

Alice wants to fly from Paris, France to New York City (NYC), USA and she intends to stay in NYC for 10 days. Her trip includes two flights: an outbound flight from Paris to NYC and an inbound flight from NYC to Paris; and an hotel in NYC.

A flight is operated by an airline company, of which the system keeps its name (e.g., British Airways), the country where the airline is incorporated and, when available, its IATA code (e.g., BA, a two-letter code identifying the airline), its ICAO code (e.g., BAW, a three-letter code identifying the airline) and alternate name or alias (e.g., British).

A flight connects two airports, each having a name (e.g., London Heathrow Airport), and, possibly, a IATA (e.g., LHR) and ICAO code (e.g., EGLL); an airport serves a specific location (e.g., London, UK) and its precise position is given by its geographic coordinates (latitude and longitude).

A flight connecting two airports at specific departure and arrival times is identified by a flight number. Two flights operated by two different airline companies cannot have the same flight number, but the same flight number can denote two flights operated by the same airline company on different days.

Example

Emirates flight EK074 leaves Paris, France at 10 AM and arrives at Dubai, UAE at 7:40 PM (regardless of the departure day).

For each flight booked by a customer, the system keeps the seat number, the travel class (e.g., economy or business), the price and the date of the flight. Usually, airlines include details on the type of aircraft they plan to use on their flight schedules; these details include the name of the aircraft (e.g., Boeing 787-8) and, when available, the IATA code (e.g., 788, a unique three-letter identifier for the aircraft) and the ICAO code (e.g., B788, a unique four-letter identifier for the aircraft).

The system maintains a list of hotels, with their names, addresses and an average review score, which is a real number denoting the average grade assigned to the hotel by its customers. Customers can write a review for an hotel; in which case the system stores the text of the review, the date and its author. When a customer books an hotel, the system keeps the price paid, the check-in and check-out dates and whether the breakfast is included.

The dataset

You can download the dataset by clicking here. The dataset consists of 7 CSV files: aircrafts.csv, airlines.csv, airports.csv, hotels.csv, customers.csv, hotel_bookings.csv, flight_bookings.csv. The separator character in each file is the tab character (‘’).

Notice

Take some time to look at the content of the files to understand the structure that your tables will have.

You can open CSV files in Excel.

Creation of a relational database

You’ll now proceed to the definition of a relational database for our travel reservation system. First, you need to define the conceptual schema and then you’ll define the tables that compose the database.

The conceptual schema

Before defining the logical schema of the database, answer the following questions:

  1. Can you use the name of the hotel as a primary key? Justify your answer.

  2. Can you use the flight number as a primary key to identify a flight? Justify your answer and, in case of a negative answer, propose a solution.

  3. Knowing that it is unlikely that two reviews have the same textual content, would you use it as a primary key? Justify your answer.

  4. Knowing that the IATA code uniquely identifies an airport, would you choose it as a primary key for the entity Airport? Justify your answer.

Exercise

Exercise 1 Propose an Entity-Relationship diagram describing the conceptual model of a relational database for the given application context.

  • Specify all the attributes for each entity and relation.

  • For each entity, underline the attributes composing the primary key.

  • For each relation, clearly indicate the minimum and maximum cardinality.

Normalization

Exercise

Exercise 2 Translate the conceptual schema into a collection of tables. For each table:

  • Indicate its name and the names of the columns (but not their types).

  • Underline the columns that are part of the primary key.

  • Indicate the entity in the ER diagram to which the table corresponds.

Which normal form are your tables in?

Exercise

Exercise 3 For each table:

  • Indicate a minimal set of functional dependencies.

  • Indicate the normal form that the table satisfies. Justify your answer.

Normalize your tables.

Exercise

Exercise 4 Normalize each table up to the 3rd normal form (3NF).

The physical schema

You can now define the physical schema of your database.

Exercise

Exercise 5 Write the SQL code to create the tables. For each table:

  • Indicate the primary key.

  • Indicate the foreign keys.

  • Indicate NOT NULL and UNIQUE constraints, if needed.

To make sure you choose the right data types for the columns, you can also check the values in the dataset.

Database creation and data import

If you followed the normalization steps correctly, it is unlikely that you can import the data directly from the given CSV files.

You need a CSV file per table to import the data into your database.

Creating the CSV files

Use Excel to rearrange the data from the input CSV files, so as to create one CSV per table.

Make sure you save the files as CSV with UTF-8 encoding.

Watch this video to learn how to create the database in PostgreSQL and import the data.

Exercise

Exercise 6 Create a PostgreSQL database with the tables that you defined in the previous exercise. Import the data into the tables.

Running queries

Write the following queries in SQL.

Exercise

Exercise 7

  1. Get the average ticket price on Air France flights.

  2. Count the number of customers by country.

  3. Select the names of the airports in Paris, France.

  4. Select the name of the cities (city and country name) with the most airports.

  5. Select the name of the airline companies that use an Airbus A300.

  6. Select the identifier, first and family names of all customers who flew to Sydney, Australia.

  7. Select the identifier, name, city and country of the busiest airport (the one with more outbound and inbound flights).

  8. Select the average price in the economy class.

  9. Select the average price in the business class.

  10. Select the name, city and country of the destination airport of french customers.

  11. Select the destination cities (specify city and country name) preferred by women.

  12. Select the destination cities (specify city and country name) preferred by men.

  13. Count the number of customers by country flying to Paris.

  14. Count the number of hotels by city.

  15. Determine the amount of money spent by Tatiana REZE in flights.