Curriculum
In this tutorial, let’s look at the 4th set of interview questions and answers on Cassandra.
A coordinator node in Cassandra is the node that receives a client request and coordinates the execution of the request across the appropriate nodes in the cluster.
A read repair in Cassandra is a process in which a node that receives a read request for data that is not consistent across the replicas performs a repair by updating the inconsistent replicas.
A hinted handoff timeout in Cassandra is the length of time that a replica node will hold onto writes intended for another replica node that is currently down. If the down node does not come back online within this timeout period, the held writes will be forwarded to another replica node.
A quorum in Cassandra is the minimum number of replicas that must respond to a read or write request in order for the request to be considered successful. The quorum can be configured on a per-request basis.
A consistency level in Cassandra is the level of consistency that must be achieved for a read or write request to be considered successful. The consistency level can be configured on a per-request basis.
A replication factor in Cassandra is the number of replicas that are created for each piece of data in the cluster. The replication factor can be configured on a per-keyspace basis.
A token in Cassandra is a numeric value that is assigned to each node in the cluster to represent its position in the cluster’s ring. Tokens are used to determine which nodes are responsible for which ranges of data.
A commit log in Cassandra is a sequential log of write operations that is used to recover data in the event of a node failure. The commit log is stored on disk and is flushed to disk periodically.
A memtable in Cassandra is an in-memory data structure used to temporarily store write operations before they are flushed to disk. The memtable is periodically flushed to an SSTable.
An SSTable (Sorted String Table) in Cassandra is a file that stores data on disk. SSTables are immutable and are created when a memtable is flushed to disk. SSTables are used for read operations and are merged during the compaction process to reclaim disk space and improve read performance.
In Cassandra, a wide row is a row that contains multiple columns, while a narrow row is a row that contains only one or a few columns. Wide rows are typically used for time series data or data that requires many columns, while narrow rows are typically used for data that requires only a few columns.
In Cassandra, a partition key is used to determine which node in the cluster is responsible for storing a particular row of data. A clustering key is used to determine the order in which rows are stored within a partition.
A compaction in Cassandra is the process of merging SSTables to reclaim disk space and improve read performance. During a compaction, multiple SSTables are merged into a single SSTable, with duplicate keys being resolved based on the most recent version of the data.
A tombstone in Cassandra is a special marker that is used to mark a row or column as deleted. Tombstones are used to ensure that deleted data is eventually removed from the cluster during the compaction process.
A bloom filter in Cassandra is a probabilistic data structure that is used to test whether a particular key is present in an SSTable. Bloom filters are used to reduce the number of disk reads required during a read operation.
A token range in Cassandra is the range of token values that is assigned to a particular node in the cluster. Token ranges are used to determine which nodes are responsible for which ranges of data.
In Cassandra, a row cache is used to cache entire rows of data in memory, while a key cache is used to cache the location of frequently accessed rows in memory. Row caches are useful for workloads that require low latency access to data, while key caches are useful for workloads that require high throughput.
A counter column in Cassandra is a special type of column that stores a 64-bit integer value that can be incremented or decremented. Counter columns are used for storing data that requires atomic updates across multiple nodes in the cluster.
The Cassandra Query Language (CQL) is a SQL-like language that is used to interact with a Cassandra database. CQL supports the creation, modification, and querying of tables, as well as the execution of user-defined functions and aggregates.
In Cassandra, a partition is a subset of the data in a table that is stored on a single node in the cluster. A token is a numeric value that is assigned to each node in the cluster to represent its position in the cluster’s ring. Tokens are used to determine which nodes are responsible for which ranges of data.
A materialized view in Cassandra is a denormalized view of a table that is optimized for a specific query pattern. Materialized views are used to improve read performance and reduce the number of queries required to retrieve data.
A secondary index in Cassandra is an index that is created on a column in a table to allow for more efficient queries on that column. Secondary indexes can be used to retrieve data based on criteria other than the partition key or clustering key.
A replica in Cassandra is a copy of a partition that is stored on a different node in the cluster. Replicas are used to ensure that data is available even if one or more nodes in the cluster fail.
The gossip protocol in Cassandra is a decentralized protocol that is used to share cluster state information among nodes in the cluster. The gossip protocol is used to discover new nodes in the cluster, detect failed nodes, and propagate updates to cluster state information.
A snitch in Cassandra is a component that is responsible for determining the network topology of the cluster. Snitches are used to determine which nodes are responsible for which ranges of data and which nodes are used for read and write operations.
In Cassandra, a data center is a logical grouping of nodes that are physically located in the same geographic location or region. Data centers are used to improve fault tolerance and reduce network latency.
In Cassandra, a coordinator node is responsible for receiving client requests and coordinating read and write operations across the cluster. The coordinator node determines which nodes are responsible for storing and retrieving data and communicates with those nodes to execute the requested operation.
A lightweight transaction in Cassandra is a transaction that allows for atomic updates to multiple rows or columns in a table. Lightweight transactions are used to ensure that updates to the database are consistent and do not violate any constraints or invariants.
A SASI (SSTable Attached Secondary Index) index in Cassandra is an index that is attached to an SSTable and is optimized for search queries. SASI indexes can be used to search for data based on criteria other than the partition key or clustering key.
In Cassandra, a simple strategy is a replication strategy that replicates data to a fixed number of nodes in the cluster. A network topology strategy is a replication strategy that replicates data to nodes in different data centers and racks to improve fault tolerance and reduce network latency.