cassandra
  1. cassandra-collectiions

Cassandra Collections

Cassandra provides support for collections such as lists, sets, and maps, allowing you to store and work with multiple values within a single row or column. This feature is particularly useful for scenarios where you need to store and retrieve data with varying lengths or multiple combinations.

Syntax

In Cassandra, you can define collections for a column with the following syntax:

CREATE TABLE table_name (
    column1 name1 [type1],
    column2 name2 [type2],
    column3 name3 [type3],
    column4 name4 <type4>,
    PRIMARY KEY (column1, column2)
) [WITH options AND replication];

The valid collection data types are:

  • list<datatype>
  • set<datatype>
  • map<datatype_key, datatype_value>

Example

Suppose we want to store data for students in different classes. We can use a set to store the subject names and a map to store the student names and their corresponding grades.

CREATE TABLE class_scores (
    class_name text,
    subjects set<text>,
    student_grades map<text, int>,
    PRIMARY KEY (class_name)
);
INSERT INTO class_scores (class_name, subjects, student_grades) 
VALUES ('classA', {'math', 'science', 'history'}, 
{'Alice': 95, 'Bob': 80, 'Charlie': 85});

Output

We can retrieve the data from the class_scores table using a SELECT statement, as follows:

SELECT * FROM class_scores;

The output of the SELECT statement would be:

 class_name | subjects                 | student_grades
------------+--------------------------+------------------------------------
    classA  | {'history','math','science'} | {'Bob':80,'Charlie':85,'Alice':95}

Explanation

In the example above, we create a table called class_scores with columns for the class name, subjects, and student grades. We use a set to store the subjects and a map to store the student names and their corresponding grades.

We insert a row of data using the INSERT INTO statement with the values for class name, subjects and student grades.

We then retrieve the data from the class_scores table using the SELECT * FROM class_scores statement, which returns all the columns and rows of data from the table.

Use

Cassandra collections are useful in scenarios where you need to store and retrieve data with varying lengths or multiple combinations, such as storing a list of items or storing a map of key-value pairs. They can be used to optimize performance by reducing the number of queries required to retrieve data and reducing the amount of data transfer between the client and server.

Important Points

  • Collections should be used sparingly as they can be inefficient and can negatively impact performance.
  • Avoid creating very large collections as they can lead to hotspots and other performance issues.
  • Cassandra does not support updates to collection items. You must retrieve the entire collection, update it, and then write it back to the database.
  • Lists and sets are unordered collections, while maps are ordered by their keys.
  • You must define the data type for the key and value if you use a map.

Summary

In this tutorial, we learned about collections in Cassandra and how to define and use them for storing and retrieving multiple values within a single row or column. We saw examples of using sets and maps to store data for students in different classes and how to retrieve that data using a SELECT statement. It's important to use collections sparingly and be aware of their impact on performance.

Published on: