Gestion et analyse des données — Tutorial 3

SQL


In this tutorial you’ll learn:

  • The key integrity constraints in a relational database.
  • Basic SQL queries.
  • Advanced SQL queries.

Installing PostgreSQL

In this tutorial you’ll be using PostgreSQL as a relational database management system (RDBMS).

Instructions for Windows users

You can download the latest version of PostgreSQL (14.3) from this page.

The installer contains:

  • The PostgreSQL server.
  • pgAdmin, a graphical administration tool.

You’ll find detailed installation instructions on this page.

Instructions for macOS users

The best way to get PostgreSQL is to download and install Postgres.app. You’ll find more details on this page.

In addition, you need to install pgAdmin, ad administration tool that lets you interact with your database through a graphical interface.

Starting PostgreSQL and pgAdmin

Instructions for Windows users

Once the installation is completed, a PostgreSQL server should be automatically started.

You’ll open a connection to the server through the following steps:

  • Execute pgAdmin.

  • On the left-side menu, right-click on Servers and select Create server.

  • In the General tab, give the server a name of your choice.

  • In the Connection tab, write localhost as host. Don’t change the values of the other options.

What if I get the Unable to connect to server error?

You might want to have a look at the list of the services running on your computer to verify whether the PostgreSQL server is actually running.

This page explains you how to open the panel with the list of services. You should locate PostgreSQL in this panel. If the service is not running, you’ll need to manually start it.

Instructions for MacOS users

You’ll open a connection to the server through the following steps:

  • Execute pgAdmin.

  • On the left-side menu, right-click on Servers and select Create server.

  • In the General tab, give the server a name of your choice.

  • In the Connection tab, write localhost as host. Don’t change the values of the other options.

Obtain the data

We consider the database of a DVD rental store chain containing data on films, actors, customers and the transactions of the store.

This database goes by the name Sakila
and was developed by Mike Hillyer, a former member of the MySQL team.

The conceptual schema of the Sakila database

Figure 1: The conceptual schema of the Sakila database

The tables of the database are documented on this page

Sakila has been ported from MySQL to PostgreSQL under the name pagila.

In order to import the data into PostgreSQL, follow the steps below:

  • From the pgAdmin interface, create a new database (right-click on Databases, select Create database). You can call the new database pagila (or any name you want).

  • Download the dump file pagila-insert-data.sql This file contains all the SQL statements necessary to create the tables of the database and populate them with some data.

  • Right-click on the new database, select Query tool.

  • Click on the second (from the left) small icon on the top menu of the query tool to open a new file.

  • Through the interface, locate pagila-schema.sql and open it.

  • Hit on the F5 button or click on the play icon on the top menu of the query tool to execute the script.

The previous steps should have created the schema of the database
and inserted the data.

You can see the list of tables by selecting (on the left menu) Schemas, public, Tables (15 tables should appear).

Open a new query tool to do the exercises.

Integrity constraints

Exercise

Exercise 1 Execute the following statement:

insert into film_actor values(1, 25)

What is this statement supposed to do? What is the reaction of the DBMS?

Solution

This statement should insert a new row in table film_actor, where the value of actor_id is 1 and the value of film_id is 25.

The DBMS returns an error because there is already a row with these values and the two columns film_actor, film_id form the primary key.

Exercise

Exercise 2 Write the statement to delete the film with film_id=1 from the table Film.

Execute the command. What happens?

Solution

delete from film where film_id=1

The statement is rejected because there are rows in other tables that reference the row that we intend to delete. This is the effect of the foreign key constraint.

Exercise

Exercise 3 Look at the definition of the foreign key constraints in table film_actor (right-click on the table, select Constraints, foreign key).

Is the definition of the foreign key constraint to table film coherent with the behavior observed in the previous exercise?

NB In order to see the options of a foreign key, click on the edit button on the left of the constraint. Then look at the tab Action.

Solution

The foreign key linking table film_actor to table film is defined with the option RESTRICT on delete. This is coherent with the behavior that we observed in the previous exercise. Referenced rows cannot be deleted if there are still referencing rows.

Exercise

Exercise 4 Execute the following query:

SELECT f.film_id as film_id_in_table_film, 
       fa.film_id AS film_id_in_table_film_actor, 
       fa.actor_id as actor_id, f.title as title
FROM film f JOIN film_actor fa ON f.film_id = fa.film_id
WHERE  f.title='ACE GOLDFINGER'

What does the query? Note down the identifier of the film in both tables film and film_actor. (columns film_id_in_table_film and film_id_in_table_film_actor).

Solution

The query returns the list of all actors in the film titled Ace Goldfinger. We note that the identifier of the film in both tables is identical (2), as it should because the query joins the two tables on the equality of these two values.

Exercise

Exercise 5 Write and execute a statement to set the value 10000 to the identifier of the film ACE GOLDFINGER in table Film.

After the modification, execute the query of the previous exercise. What changed? Explain in details what happened.

Solution

The statement to modify the film_id of the given film is as follows:

UPDATE film SET film_id=10000 WHERE title=‘ACE GOLDFINGER’

After executing the same query as the previous exercise, we see that the identifier of the film has changed in the table film_actor too. This is expected, because the foreign key constraint between the colum film_id in table film_actor and the column film_id in table film has the option ON UPDATE CASCADE. This means that if we modify the identifier of the film in table film, the modification is propagated to all the referencing columns.

Exercise

Exercise 6 Execute the following statement:

UPDATE film_actor
SET film_id=2
WHERE film_id=10000

What does? What happens? Explain.

Solution

The statement intends to set the identifier of the film titled Ace Goldfinger (in the previous exercise we gave it the identifier 10000) back to its original value. However, we execute the statement on the table film_actor. The action is not allowed, as the identifier 2 does not correspond to any film in table film.

The foreign key enforces the referential integrity constraint. A row cannot refer to a non-existing entity in the referenced table.

Basic queries

Exercise

Exercise 7 Write the following SQL queries:

  1. Return all the information on all customers.

  2. Return the first and last name of all customers.

  3. Return the first and last name of all customers of the store with identifier 1.

Solution

  1. select * 
    from customer
    
  2. select first_name, last_name 
    from customer
    
  3. select first_name, last_name 
    from customer
    where store_id=1
    

Sorting and paginating

Exercise

Exercise 8 Write the following SQL queries:

  1. Return the last and first name of all customers. Sort by last name in ascending order.

  2. Same as in 1., but only return the first 100 customers.

  3. Return the last and first name of all customers of the store with identifier 1. Sort by last name in ascending order and by first name in descending order.

Solution

  1. select last_name, first_name 
    from customer
    order by last_name asc
    
  2. select last_name, first_name 
    from customer
    order by last_name asc
    limit 100
    
  3. select first_name, last_name 
    from customer
    where store_id=1
    

Aggregating queries

Exercise

Exercise 9 Write the following SQL queries:

  1. Count the number of films in the database (expected result: 1000).

  2. How many distinct actor last names are there?

  3. Compute the total amount of payments across all rentals (expected result: 67416.51).

  4. Compute the average, minimum and maximum duration of rental across all films (expected result: 4.9850000000000000, 3, 7).

  5. Return the number of actors for each film.

  6. Return the number of copies of each film in each store (table inventory).

  7. Same as 6., but only returns the pairs (film, store) if the number of copies is greater than or equal to 3.

Solution

  1. select count(*) 
    from film
    
  2. select  count (distinct last_name) 
    from actor
    
  3. select sum(amount) 
    from payment
    
  4. select avg(rental_duration), min(rental_duration), max(rental_duration) 
    from film
    
  5. select film_id, count(*) as nb_actors
    from film_actor
    group by film_id
    
  6. select film_id, store_id, count(*) as nb_films
    from inventory
    group by film_id, store_id
    
  7. select film_id, store_id, count(*) as nb_films
    from inventory
    group by film_id, store_id
    having count(*) >=3
    

Join queries

Exercise

Exercise 10 Write the following SQL queries:

  1. Return the first and last name of the manager of the store with identifier 1 (expected result: Mike Hillyer).

  2. Return the first and last name of the actors in the film ACE GOLDFINGER.

  3. Return first and last name of each actor and the number of films in which they played a role.

  4. Same as in 3., but order by number of films in descending order.

  5. Same as in 4., but only return actors who played a role in at least 10 films.

  6. Return the identifier, the first and family name of the customers who have rented between 10 and 20 movies in the category Family.

Solution

  1. select first_name, last_name
    from staff join store using(store_id)
    where store_id=1
    
  2. select first_name, last_name
    from film join film_actor using(film_id) join actor using(actor_id)
    where title='ACE GOLDFINGER'
    
  3. select first_name, last_name, count(*) as nb_films
    from actor join film_actor using(actor_id)
    group by actor_id
    
  4. select first_name, last_name, count(*) as nb_films
    from actor join film_actor using(actor_id)
    group by actor_id
    order by nb_films desc
    
  5. select first_name, last_name, count(*) as nb_films
    from actor join film_actor using(actor_id)
    group by actor_id
    having count(*) >= 10
    order by nb_films desc
    
  6. select cust.customer_id, first_name, last_name, count(*) as nb_films
    from customer cust join rental using(customer_id) 
      join inventory using(inventory_id)
        join film_category using(film_id)
        join category cat using(category_id)
    where cat.name='Family'
    group by customer_id
    having count(*) between 5 and 10
    

Miscellaneous queries

Exercise

Exercise 11 Write the following SQL queries:

  1. Which last names are not repeated in table actor?

  2. Is a copy of the movie ACADEMY DINOSAUR available for rent from store 1?

  3. Return the title and the release year of all films in one of the following categories: Family, Animation, Documentary.

Tip You can use the operator IN
  1. Find all customers (id, last name, first name) whose last name starts with the letter L.
    Tip You can use the operator LIKE
  2. Return the total paid by each customer. For each customer, display a single column containing first and last name and another column with the total amount paid. Order the result alphabetically by last name
Tip You can use the operator CONCAT
  1. Return the total revenue from the rentals across the stores in each country. Order by descending revenue.

  2. The first and last name of the actor that played in the most films. If two or more actors are tied, the query must return the names of all of them.

Solution

  1. select last_name
    from actor
    group by last_name
    having count(*) = 1
    
  2. select distinct i.inventory_id
    from film f join inventory i using(film_id)
    join rental r using(inventory_id)
    where f.title='ACADEMY DINOSAUR' 
        and i.store_id=1 
        and r.return_date is not null  
    
  3. select  distinct f.title, f.release_year
    from film f join film_category using(film_id)
    join category cat using(category_id)
    where cat.name in ('Family', 'Animation', 'Documentary')
    
  4. select customer_id, first_name, last_name
    from customer
    where last_name LIKE 'L%'
    
  5. select concat(first_name, ' ', last_name), sum(amount)
    from customer join payment using (customer_id)
    group by customer_id
    order by last_name asc
    
  6. select country, sum(amount) as revenue
    from payment join rental using (rental_id)
      join inventory using (inventory_id)
      join store using (store_id)
      join address using (address_id)
      join city using (city_id)
      join country using (country_id)
    group by country_id
    order by revenue desc
    
  7. select actor_id, first_name, last_name, count() from film_actor join actor using(actor_id) group by actor_id having count() = (select max(nb_films) from (select count(*) as nb_films from film_actor group by actor_id) t)