Cassandra - Operation timed out

For example:

Query 'SELECT i,n FROM dovecot.user_mailbox_objects WHERE u = '1@2' AND g = 0xc92f64f79f0d1ed01e6d5b314f04886c AND b = 0' failed: Operation timed out - received only 0 responses.

This typically means that there are too many tombstones. Dovecot is supposed to prevent this from happening by dividing mails into buckets that have maximum of 10k messages. However, that doesn't always work perfectly.

There are two common possibilities for the "Operation timed out":

There are so many tombstones that Cassandra is too slow to go through them all and give a response. This will trigger a timeout on Cassandra side. You may get a warning in Cassandra logs about it, e.g.:
```
ReadCommand.java:569 - Read 6 live rows and 50000 tombstone cells for query ...
```
There are so many tombstones that Cassandra reaches the maximum number of tombstones before the query is aborted. Cassandra logs about it, e.g.:
```
MessageDeliveryTask.java:76 - Scanned over 100001 tombstones during query ...; query aborted
```

If the query doesn't fail entirely / always, you can use TRACING ON; in cqlsh before running the SELECT to see what it reports.

Maximum Tombstones Reached

If Cassandra reaches tombstone_failure_threshold (in cassandra.yaml, default: 100000) the query processing is stopped and an error is logged by Cassandra. This is still visible to Dovecot as "Operation timed out".

Potential solutions:

Compact away the tombstones
Increase tombstone_failure_threshold

Query Timeout

If Cassandra reaches read timeout in processing, the query fails with "Operation timed out".

Potential solutions:

Compact away the tombstones
Enable page_size in dovecot-dict-cql.conf.ext to enable Cassandra paging. This should tell Cassandra servers to send the output in smaller chunks and prevent the timeout. Of course, going through the pages can still take a long time.
Increase Cassandra read timeout. On Dovecot side this is controller by the request_timeout setting in dovecot-dict-cql.conf.ext. However, this can't be set higher than read_request_timeout (or read_request_timeout_in_ms) in Cassandra server's cassandra.yaml.

Compact Tombstones

To remove the tombstones immediately:

Shrink gc_grace_seconds in the affected Cassandra table to a small enough value. It needs to be lower than the tombstone creation time.
Run Cassandra compaction (nodetool compact).
Grow gc_grace_seconds back to the original value.

It may be useful to shrink gc_grace_seconds permanently. Its idea is to prevent zombie rows from coming back to life when Cassandra nodes are out of service for a while. However, most installations already see these zombie rows anyway and have ways to handle them.

Guides

Lua Support

Authentication

Databases

Mechanisms

Events

Guides

Mail Delivery

Mailbox

Formats

SQL

Sieve

Extensions

Users

Cassandra - Operation timed out

Maximum Tombstones Reached

Query Timeout

Compact Tombstones

Databases

Mechanisms

Formats

Extensions

Cassandra - Operation timed out ​

Maximum Tombstones Reached ​

Query Timeout ​

Compact Tombstones ​

Cassandra - Operation timed out

Maximum Tombstones Reached

Query Timeout

Compact Tombstones