Curriculum

Cassandra Interview Questions & Answers Part 4

In this tutorial, let’s look at the 4th set of interview questions and answers on Cassandra.

91. What is a coordinator node in Cassandra?

A coordinator node in Cassandra is the node that receives a client request and coordinates the execution of the request across the appropriate nodes in the cluster.

92. What is a read repair in Cassandra?

A read repair in Cassandra is a process in which a node that receives a read request for data that is not consistent across the replicas performs a repair by updating the inconsistent replicas.

93. What is a hinted handoff timeout in Cassandra?

A hinted handoff timeout in Cassandra is the length of time that a replica node will hold onto writes intended for another replica node that is currently down. If the down node does not come back online within this timeout period, the held writes will be forwarded to another replica node.

94. What is a quorum in Cassandra?

A quorum in Cassandra is the minimum number of replicas that must respond to a read or write request in order for the request to be considered successful. The quorum can be configured on a per-request basis.

95. What is a consistency level in Cassandra?

A consistency level in Cassandra is the level of consistency that must be achieved for a read or write request to be considered successful. The consistency level can be configured on a per-request basis.

96. What is a replication factor in Cassandra?

A replication factor in Cassandra is the number of replicas that are created for each piece of data in the cluster. The replication factor can be configured on a per-keyspace basis.

97. What is a token in Cassandra?

A token in Cassandra is a numeric value that is assigned to each node in the cluster to represent its position in the cluster’s ring. Tokens are used to determine which nodes are responsible for which ranges of data.

98. What is a commit log in Cassandra?

A commit log in Cassandra is a sequential log of write operations that is used to recover data in the event of a node failure. The commit log is stored on disk and is flushed to disk periodically.

99. What is a memtable in Cassandra?

A memtable in Cassandra is an in-memory data structure used to temporarily store write operations before they are flushed to disk. The memtable is periodically flushed to an SSTable.

100. What is an SSTable in Cassandra?

An SSTable (Sorted String Table) in Cassandra is a file that stores data on disk. SSTables are immutable and are created when a memtable is flushed to disk. SSTables are used for read operations and are merged during the compaction process to reclaim disk space and improve read performance.

101. What is the difference between a wide and a narrow row in Cassandra?

In Cassandra, a wide row is a row that contains multiple columns, while a narrow row is a row that contains only one or a few columns. Wide rows are typically used for time series data or data that requires many columns, while narrow rows are typically used for data that requires only a few columns.

102. What is the difference between a partition key and a clustering key in Cassandra?

In Cassandra, a partition key is used to determine which node in the cluster is responsible for storing a particular row of data. A clustering key is used to determine the order in which rows are stored within a partition.

103. What is a compaction in Cassandra?

A compaction in Cassandra is the process of merging SSTables to reclaim disk space and improve read performance. During a compaction, multiple SSTables are merged into a single SSTable, with duplicate keys being resolved based on the most recent version of the data.

104. What is a tombstone in Cassandra?

A tombstone in Cassandra is a special marker that is used to mark a row or column as deleted. Tombstones are used to ensure that deleted data is eventually removed from the cluster during the compaction process.

105. What is a bloom filter in Cassandra?

A bloom filter in Cassandra is a probabilistic data structure that is used to test whether a particular key is present in an SSTable. Bloom filters are used to reduce the number of disk reads required during a read operation.

106. What is a token range in Cassandra?

A token range in Cassandra is the range of token values that is assigned to a particular node in the cluster. Token ranges are used to determine which nodes are responsible for which ranges of data.

107. What is the difference between a row cache and a key cache in Cassandra?

In Cassandra, a row cache is used to cache entire rows of data in memory, while a key cache is used to cache the location of frequently accessed rows in memory. Row caches are useful for workloads that require low latency access to data, while key caches are useful for workloads that require high throughput.

108. What is a counter column in Cassandra?

A counter column in Cassandra is a special type of column that stores a 64-bit integer value that can be incremented or decremented. Counter columns are used for storing data that requires atomic updates across multiple nodes in the cluster.

109. What is the Cassandra Query Language (CQL)?

The Cassandra Query Language (CQL) is a SQL-like language that is used to interact with a Cassandra database. CQL supports the creation, modification, and querying of tables, as well as the execution of user-defined functions and aggregates.

110. What is the difference between a partition and a token in Cassandra?

In Cassandra, a partition is a subset of the data in a table that is stored on a single node in the cluster. A token is a numeric value that is assigned to each node in the cluster to represent its position in the cluster’s ring. Tokens are used to determine which nodes are responsible for which ranges of data.

111. What is a materialized view in Cassandra?

A materialized view in Cassandra is a denormalized view of a table that is optimized for a specific query pattern. Materialized views are used to improve read performance and reduce the number of queries required to retrieve data.

112. What is a secondary index in Cassandra?

A secondary index in Cassandra is an index that is created on a column in a table to allow for more efficient queries on that column. Secondary indexes can be used to retrieve data based on criteria other than the partition key or clustering key.

113. What is a replica in Cassandra?

A replica in Cassandra is a copy of a partition that is stored on a different node in the cluster. Replicas are used to ensure that data is available even if one or more nodes in the cluster fail.

114. What is the gossip protocol in Cassandra?

The gossip protocol in Cassandra is a decentralized protocol that is used to share cluster state information among nodes in the cluster. The gossip protocol is used to discover new nodes in the cluster, detect failed nodes, and propagate updates to cluster state information.

115. What is a snitch in Cassandra?

A snitch in Cassandra is a component that is responsible for determining the network topology of the cluster. Snitches are used to determine which nodes are responsible for which ranges of data and which nodes are used for read and write operations.

116. What is a data center in Cassandra?

In Cassandra, a data center is a logical grouping of nodes that are physically located in the same geographic location or region. Data centers are used to improve fault tolerance and reduce network latency.

117. What is the role of a coordinator node in Cassandra?

In Cassandra, a coordinator node is responsible for receiving client requests and coordinating read and write operations across the cluster. The coordinator node determines which nodes are responsible for storing and retrieving data and communicates with those nodes to execute the requested operation.

118. What is a lightweight transaction in Cassandra?

A lightweight transaction in Cassandra is a transaction that allows for atomic updates to multiple rows or columns in a table. Lightweight transactions are used to ensure that updates to the database are consistent and do not violate any constraints or invariants.

119. What is a SASI index in Cassandra?

A SASI (SSTable Attached Secondary Index) index in Cassandra is an index that is attached to an SSTable and is optimized for search queries. SASI indexes can be used to search for data based on criteria other than the partition key or clustering key.

120. What is the difference between a simple strategy and a network topology strategy in Cassandra?

In Cassandra, a simple strategy is a replication strategy that replicates data to a fixed number of nodes in the cluster. A network topology strategy is a replication strategy that replicates data to nodes in different data centers and racks to improve fault tolerance and reduce network latency.

Cassandra