Gestion et analyse des données — Tutorial 9

Queries in Cassandra


Setting up the environment

  • Download Cassandra from this address.

  • Move the downloaded archive file to a folder of your choice (other than downloads).

  • Extract the archive. This will create a folder, containing all the files with the Cassandra server.

  • Open a terminal and go the the Cassandra folder.

  • Run the Cassandra server with the following command ./bin/cassandra -f.

  • Once the initialization is complete, open a new terminal and go again to the Cassandra folder.

  • Open the CQL shell, by typing the following command: ./bin/cqlsh.

Once you enter the CQL shell, you can start interacting with the Cassandra server to create a new database.

Import the data

  • Download the data here

  • Extract the downloaded archive. This will create a new folder containing the data. Let’s refer to this folder as DATA_FOLDER.

  • Go back to the CQL shell and type the following command source 'DATAFOLDER/sakila.cql' (replace DATAFOLDER with the full path to the data folder). This command will create the sakila keyspace (database).

  • In the CQL shell, type the command use sakila;

  • We have 11 tables in this database. You can see the definition with the command describe sakila.

  • As the last step, we need to import the data. No luck here, we need to import the data table by table with the following commands:

COPY sakila.actors_by_film FROM 'DATAFOLDER/actors_by_film.csv' WITH HEADER = TRUE ;

COPY sakila.categories_by_film FROM 'DATAFOLDER/categories_by_film.csv' WITH HEADER = TRUE ;

COPY sakila.customers FROM 'DATAFOLDER/customers.csv' WITH HEADER = TRUE ;

COPY sakila.film FROM 'DATAFOLDER/film.csv' WITH HEADER = TRUE ;

COPY sakila.inventory FROM 'DATAFOLDER/inventory.csv' WITH HEADER = TRUE ;

COPY sakila.rentals_by_customer FROM 'DATAFOLDER/rentals_by_customer.csv' WITH HEADER = TRUE ;

COPY sakila.rentals_by_film FROM 'DATAFOLDER/rentals_by_film.csv' WITH HEADER = TRUE ;

COPY sakila.rentals_by_staff FROM 'DATAFOLDER/rentals_by_staff.csv' WITH HEADER = TRUE ;

COPY sakila.rentals FROM 'DATAFOLDER/rentals.csv' WITH HEADER = TRUE ;

COPY sakila.staff FROM 'DATAFOLDER/staff.csv' WITH HEADER = TRUE ;

COPY sakila.store FROM 'DATAFOLDER/store.csv' WITH HEADER = TRUE ;

If during the import you get the error ‘too many open files’ close the shell with exit and start it again.

Queries

The CQL language used by Cassandra is highly similar to SQL, but there are some key differences. Here we analyze few of them.

Exercise

Exercise 1 Consider the table rentals_by_customer and try some queries that use (not all in the same query):

  • The WHERE condition.

  • Aggregating functions and the GROUP BY clause.

  • The SELECT JSON statement.

Feel free to refer to the CQL documentation.