S3 Multisite
We offer Ceph RGW Multisite support for our S3 object storage service at https://s3.uni-muenster.de. The system is deployed across our current two locations. Einsteinstraße and Schlossplatz.
Your data is asynchronously replicated between both sites to ensure geographical redundancy and disaster recovery capabilities. In case of a failure at one site, we can switch access to the other site to maintain availability.
For technical details on how Ceph RGW Multisite works, see the official documentation: Ceph RGW Multisite Guide
Object Count Limits per Bucket in Multisite Setup
Ceph RGW uses bucket index sharding to distribute object metadata across the cluster and can approximately store 100,000 objects per shard.
If the number of objects exceeds the optimal number of shards, Ceph RGW can dynamically reshard the bucket index to increase the number of shards in that bucket.
Unfortunately, dynamic resharding does not work for multisite setups in our current Ceph version (source).
This means the number of shards — and therefore the maximum number of objects — needs to be known beforehand.
Our current default is 8 shards per bucket, which translates to a maximum of approximately 800,000 objects per bucket.
If you plan to store more than 800,000 objects in a single bucket, please contact us beforehand with the bucket name and the expected maximum number of objects.
We will then create the bucket and configure the appropriate shard count manually.
Please note: Once a bucket is created, the shard count cannot be changed later.
Why not just use a high shard count by default?
Using a high number of shards increases the complexity and resource usage of the system. Each shard requires additional metadata storage and can increase the load on the cluster, potentially reducing overall performance for buckets with fewer objects.
This especially affects operations that need to process all shards, such as synchronization between sites (syncing) or listing objects (list operations), which become more resource-intensive and slower with more shards.
Therefore, setting a very high shard count by default is inefficient and can negatively impact performance, especially for buckets with a smaller number of objects. This is why we prefer to configure shard counts based on your specific storage needs.