Curriculum

Cassandra Interview Questions & Answers Part 3

In this tutorial, let’s look at the 3rd set of interview questions and answers on Cassandra.

61. What is hinted handoff in Cassandra?

Hinted handoff in Cassandra is a mechanism used to ensure that writes are not lost when a node is temporarily unavailable. When a node becomes unavailable, any writes that would normally be sent to that node are instead stored on other nodes. Once the node becomes available again, the writes are sent to it.

62. What is a seed node in Cassandra?

A seed node in Cassandra is a node that is used to bootstrap new nodes into a cluster. Seed nodes are used to help new nodes discover the topology of the cluster.

63. What is a gossip protocol in Cassandra?

The gossip protocol in Cassandra is a protocol used by nodes in a cluster to exchange information about the state of the cluster. Gossip is used to discover new nodes, detect failed nodes, and propagate schema changes.

64. What is the commit log in Cassandra?

The commit log in Cassandra is a log file that records all write operations to disk. The commit log is used to ensure that writes are not lost if a node crashes before the data can be written to an SSTable.

65. What is an SSTable in Cassandra?

An SSTable in Cassandra is a sorted string table that stores data on disk. SSTables are immutable and can only be created or deleted, not updated.

66. What is a Bloom filter in Cassandra?

A Bloom filter in Cassandra is a probabilistic data structure used to test whether an element is a member of a set. Bloom filters are used to reduce the number of disk seeks required to find data in an SSTable.

67. What is a write-ahead log (WAL) in Cassandra?

A write-ahead log (WAL) in Cassandra is a log file that records all changes to an SSTable. The WAL is used to ensure that writes are not lost if a node crashes during the compaction process.

68. What is an anti-entropy repair in Cassandra?

An anti-entropy repair in Cassandra is a process used to compare data between nodes in a cluster and repair any inconsistencies. Anti-entropy repair is used to ensure that data is consistent across all replicas.

69. What is a read repair in Cassandra?

A read repair in Cassandra is a process used to compare data between replicas and repair any inconsistencies. Read repair is used to ensure that data is consistent across all replicas.

70. What is a batch statement in Cassandra?

A batch statement in Cassandra is a statement that groups multiple write operations into a single request. Batch statements can improve write performance by reducing the number of round-trips to the database.

71. What is Cassandra’s data partitioning scheme?

Cassandra uses a partitioning scheme called consistent hashing. This allows for data to be distributed evenly across the cluster.

72. What is a token in Cassandra?

A token in Cassandra is a value that represents a location on the ring. Each node in the cluster is assigned a range of tokens that it is responsible for.

73. What is the replication factor in Cassandra?

The replication factor in Cassandra is the number of replicas that are created for each piece of data. The replication factor determines how many copies of the data will be stored in the cluster.

74. What is the snitch in Cassandra?

The snitch in Cassandra is a component that determines the location of nodes in the cluster. The snitch is responsible for resolving IP addresses to data center and rack information.

75. What is a virtual node (vnode) in Cassandra?

A virtual node (vnode) in Cassandra is a technique used to evenly distribute data across the cluster. Each node is responsible for multiple virtual nodes, which allows for more fine-grained control over data distribution.

76. What is the purpose of the memtable in Cassandra?

The memtable in Cassandra is a memory-resident data structure used to store recently written data. The memtable is periodically flushed to disk to create an SSTable.

77. What is the difference between a keyspace and a table in Cassandra?

In Cassandra, a keyspace is a container for tables, similar to a database in other database management systems. A table is a collection of rows that share a common schema.

78. What is a compound primary key in Cassandra?

A compound primary key in Cassandra is a primary key that consists of more than one column. The first column in the compound key is the partition key, and the remaining columns are clustering columns.

79. What is the role of the coordinator node in Cassandra?

The coordinator node in Cassandra is responsible for handling read and write requests from clients. The coordinator node is chosen based on the partition key of the request.

80. What is the consistency level in Cassandra?

The consistency level in Cassandra determines how many replicas must acknowledge a write or read operation before it is considered successful. Higher consistency levels ensure greater data consistency but can result in higher latency.

81. What is hinted handoff in Cassandra?

Hinted handoff is a feature in Cassandra that allows a replica node to temporarily hold onto writes intended for another replica node that is currently down. When the down node comes back online, the held writes are forwarded to it.

82. What is the tombstone in Cassandra?

A tombstone in Cassandra is a marker that is placed on a piece of data when it is deleted. The tombstone is used to ensure that the deleted data is not resurrected during the merge process.

83. What is a compaction in Cassandra?

Compaction in Cassandra is the process of merging multiple SSTables into a single SSTable. Compaction is necessary to reclaim disk space and improve read performance.

84. What is a bloom filter in Cassandra?

A bloom filter in Cassandra is a probabilistic data structure used to determine if a given piece of data exists in an SSTable. Bloom filters can reduce the number of disk seeks required for a read operation.

85. What is a token range in Cassandra?

A token range in Cassandra is a range of tokens that corresponds to a subset of data in the cluster. Each node is responsible for a specific set of token ranges.

86. What is a secondary index in Cassandra?

A secondary index in Cassandra is an index that is created on a non-primary key column in a table. Secondary indexes can be used to efficiently query data based on non-primary key columns.

87. What is a materialized view in Cassandra?

A materialized view in Cassandra is a view that is precomputed and stored as a table. Materialized views can be used to improve query performance by precomputing expensive queries.

88. What is a batch statement in Cassandra?

A batch statement in Cassandra is a group of write or delete statements that are executed together as a single transaction. Batch statements can improve write performance by reducing the number of round trips to the database.

89. What is the purpose of the nodetool utility in Cassandra?

The nodetool utility in Cassandra is a command-line tool used to perform administrative tasks on a Cassandra cluster, such as starting and stopping nodes, viewing cluster status, and running repairs.

90. What is the purpose of the CQL shell in Cassandra?

The CQL shell in Cassandra is a command-line tool used to interact with a Cassandra cluster using the Cassandra Query Language (CQL). The CQL shell can be used to create keyspaces and tables, insert and query data, and perform administrative tasks.

Cassandra