Curriculum
In this tutorial, let’s look at the 3rd set of interview questions and answers on Cassandra.
Hinted handoff in Cassandra is a mechanism used to ensure that writes are not lost when a node is temporarily unavailable. When a node becomes unavailable, any writes that would normally be sent to that node are instead stored on other nodes. Once the node becomes available again, the writes are sent to it.
A seed node in Cassandra is a node that is used to bootstrap new nodes into a cluster. Seed nodes are used to help new nodes discover the topology of the cluster.
The gossip protocol in Cassandra is a protocol used by nodes in a cluster to exchange information about the state of the cluster. Gossip is used to discover new nodes, detect failed nodes, and propagate schema changes.
The commit log in Cassandra is a log file that records all write operations to disk. The commit log is used to ensure that writes are not lost if a node crashes before the data can be written to an SSTable.
An SSTable in Cassandra is a sorted string table that stores data on disk. SSTables are immutable and can only be created or deleted, not updated.
A Bloom filter in Cassandra is a probabilistic data structure used to test whether an element is a member of a set. Bloom filters are used to reduce the number of disk seeks required to find data in an SSTable.
A write-ahead log (WAL) in Cassandra is a log file that records all changes to an SSTable. The WAL is used to ensure that writes are not lost if a node crashes during the compaction process.
An anti-entropy repair in Cassandra is a process used to compare data between nodes in a cluster and repair any inconsistencies. Anti-entropy repair is used to ensure that data is consistent across all replicas.
A read repair in Cassandra is a process used to compare data between replicas and repair any inconsistencies. Read repair is used to ensure that data is consistent across all replicas.
A batch statement in Cassandra is a statement that groups multiple write operations into a single request. Batch statements can improve write performance by reducing the number of round-trips to the database.
Cassandra uses a partitioning scheme called consistent hashing. This allows for data to be distributed evenly across the cluster.
A token in Cassandra is a value that represents a location on the ring. Each node in the cluster is assigned a range of tokens that it is responsible for.
The replication factor in Cassandra is the number of replicas that are created for each piece of data. The replication factor determines how many copies of the data will be stored in the cluster.
The snitch in Cassandra is a component that determines the location of nodes in the cluster. The snitch is responsible for resolving IP addresses to data center and rack information.
A virtual node (vnode) in Cassandra is a technique used to evenly distribute data across the cluster. Each node is responsible for multiple virtual nodes, which allows for more fine-grained control over data distribution.
The memtable in Cassandra is a memory-resident data structure used to store recently written data. The memtable is periodically flushed to disk to create an SSTable.
In Cassandra, a keyspace is a container for tables, similar to a database in other database management systems. A table is a collection of rows that share a common schema.
A compound primary key in Cassandra is a primary key that consists of more than one column. The first column in the compound key is the partition key, and the remaining columns are clustering columns.
The coordinator node in Cassandra is responsible for handling read and write requests from clients. The coordinator node is chosen based on the partition key of the request.
The consistency level in Cassandra determines how many replicas must acknowledge a write or read operation before it is considered successful. Higher consistency levels ensure greater data consistency but can result in higher latency.
Hinted handoff is a feature in Cassandra that allows a replica node to temporarily hold onto writes intended for another replica node that is currently down. When the down node comes back online, the held writes are forwarded to it.
A tombstone in Cassandra is a marker that is placed on a piece of data when it is deleted. The tombstone is used to ensure that the deleted data is not resurrected during the merge process.
Compaction in Cassandra is the process of merging multiple SSTables into a single SSTable. Compaction is necessary to reclaim disk space and improve read performance.
A bloom filter in Cassandra is a probabilistic data structure used to determine if a given piece of data exists in an SSTable. Bloom filters can reduce the number of disk seeks required for a read operation.
A token range in Cassandra is a range of tokens that corresponds to a subset of data in the cluster. Each node is responsible for a specific set of token ranges.
A secondary index in Cassandra is an index that is created on a non-primary key column in a table. Secondary indexes can be used to efficiently query data based on non-primary key columns.
A materialized view in Cassandra is a view that is precomputed and stored as a table. Materialized views can be used to improve query performance by precomputing expensive queries.
A batch statement in Cassandra is a group of write or delete statements that are executed together as a single transaction. Batch statements can improve write performance by reducing the number of round trips to the database.
The nodetool utility in Cassandra is a command-line tool used to perform administrative tasks on a Cassandra cluster, such as starting and stopping nodes, viewing cluster status, and running repairs.
The CQL shell in Cassandra is a command-line tool used to interact with a Cassandra cluster using the Cassandra Query Language (CQL). The CQL shell can be used to create keyspaces and tables, insert and query data, and perform administrative tasks.