Curriculum
In this tutorial, let’s look at the 2nd set of interview questions and answers on Cassandra.
A tombstone in Cassandra is a marker that indicates that a particular piece of data has been deleted. Tombstones are used to ensure that deleted data is eventually removed from all replicas of the data.
A counter in Cassandra is a special data type that allows you to perform atomic increments and decrements on a single value. Counters are eventually consistent and have some limitations on their usage.
A materialized view in Cassandra is a precomputed view of a table that can be queried independently. Materialized views are updated automatically as the underlying data changes, and they can improve read performance for specific use cases.
The partition key in Cassandra is used to determine which node in the cluster is responsible for storing a particular piece of data. It is also used to group related data together in the same partition.
A secondary index in Cassandra is an index on a non-primary key column that allows you to query the data based on that column. Secondary indexes have some limitations and should be used carefully.
In Cassandra, the partition key is used to determine which node in the cluster is responsible for storing a particular piece of data, and it is also used to group related data together in the same partition. The clustering key is used to determine the order of the data within a partition.
A write-ahead log (WAL) in Cassandra is a log of data modifications that is written to disk before the modifications are applied to the in-memory data structures. The WAL is used for crash recovery and durability.
The read path in Cassandra is the process of retrieving data from the cluster in response to a read request. The read path involves contacting multiple nodes in the cluster and merging the results to produce the final response.
The write path in Cassandra is the process of storing data in the cluster in response to a write request. The write path involves contacting multiple nodes in the cluster and ensuring that the data is stored correctly and consistently.
The gossip protocol in Cassandra is a peer-to-peer communication protocol used by nodes in the cluster to share information about the cluster’s topology, status, and schema. It helps ensure that all nodes have a consistent view of the cluster.
A hinted handoff in Cassandra is a mechanism used to ensure that writes are eventually applied to all replicas of the data, even if some replicas are temporarily unavailable. When a write request fails due to a replica being unavailable, a hint is stored on the coordinator node. The hint is replayed when the unavailable replica comes back online.
A commit log in Cassandra is a sequential log of all writes to the database. The commit log is used for durability and crash recovery. When a node crashes, the commit log is used to recover the state of the database.
A batch statement in Cassandra is a way to group multiple write requests into a single atomic operation. Batch statements can improve write performance by reducing the number of network round trips required to write data.
A lightweight transaction in Cassandra is a way to perform a conditional write that ensures that no other write has occurred on the data since the last read. Lightweight transactions can be slower than normal writes, but they provide stronger consistency guarantees.
A compaction in Cassandra is the process of merging multiple SSTables (sorted string tables) into a single SSTable. Compaction helps reclaim disk space and improve read performance.
A bloom filter in Cassandra is a probabilistic data structure used to test whether an element is a member of a set. Bloom filters are used to improve read performance by reducing the number of disk seeks required to find data.
A key cache in Cassandra is an in-memory cache of the partition keys that are currently in use by the system. Key caches can improve read performance by reducing the number of disk seeks required to find data.
A row cache in Cassandra is an in-memory cache of the entire rows of data that are frequently accessed. Row caches can improve read performance by reducing the number of disk seeks required to find data.
A column family in Cassandra is a container for data that is organized as rows and columns. Each row has a unique key, and each column has a name and a value.
In Cassandra, a column family is an older term for a container for data that is organized as rows and columns. A table is the current term for the same concept.
A partition key in Cassandra is a value used to determine which node in a cluster is responsible for storing a particular piece of data. The partition key is used to distribute data across the cluster and is typically a subset of the row key.
A clustering column in Cassandra is a column used to sort rows within a partition. Clustering columns allow you to retrieve data in a specific order and can improve read performance.
A compound key in Cassandra is a key that consists of multiple columns. The first column is the partition key, and the remaining columns are clustering columns.
In Cassandra, a partition key is used to determine which node in a cluster is responsible for storing a particular piece of data. A clustering column is used to sort rows within a partition.
A secondary index in Cassandra is an index on a non-partition key column. Secondary indexes allow you to query data based on a column other than the partition key.
A materialized view in Cassandra is a denormalized view of a base table that is created to optimize specific queries. Materialized views can improve read performance by reducing the number of disk seeks required to find data.
A tombstone in Cassandra is a marker that indicates that a piece of data has been deleted. Tombstones are used to ensure that deleted data is eventually removed from all replicas.
The compaction strategy in Cassandra is a configuration setting that determines how SSTables are merged together during the compaction process. The compaction strategy can affect read and write performance, disk space usage, and data durability.
A replica in Cassandra is a copy of a piece of data that is stored on a different node in the cluster. Replicas are used to provide fault tolerance and ensure that data is available even if some nodes fail.
A consistency level in Cassandra is a setting that determines how many replicas must acknowledge a write or read request before it is considered successful. Consistency levels can affect the availability and durability of data in the cluster.