S3 Usage Introduction

This is a brief introduction to how you can use our S3 service

S3 Storage Provided in the University Cloud

The S3 protocol is very popular for storage in modern applications. We provide an S3 service based on our Ceph clusters, which is compatible with the Amazon S3 (Simple Storage Service) API. This allows users to store and manage large amounts of data in a scalable and fault-tolerant manner. In Ceph, the Rados Gateway (RGW) component provides an S3-compatible interface, enabling applications to interact with Ceph using standard S3 APIs and tools. The RGW translates S3 requests into Ceph’s native protocol, allowing data to be stored and retrieved from the underlying Ceph storage cluster.

Key features of Ceph’s S3 implementation include:

  • Object storage with support for buckets, objects, and metadata
  • Compatibility with S3 APIs for creating, reading, updating, and deleting objects
  • Support for S3 access controls, such as bucket policies and ACLs
  • Integration with Ceph’s distributed storage architecture, providing scalability, high availability, and durability

Since the official s3cmd documentation is a bit spartan, and most of the tutorials online will focus on the use of commercial services with their own idiosyncrasies, here is a brief introduction that should help to get you going with our own service:

S3 API endpoints

Your S3 client will need an S3 API endpoint to communicate with. Here is an overview over the ones we currently provide:

  • radosgw.public.os.wwu.de < - only ms1 (datacenter in Einsteinstraße)
    • This is our main location for storage
    • Unless you request a different location, this is where we will store your data
  • radosgw.public.os2.wwu.de <- only ms2 (datacenter at the castle)
    • This location is used mostly for backups from ms1
  • s3.uni-muenster.de <- replicated to both locations
    • If you have requested S3 storage to be replicated automatically to both locations, this is the S3 endpoint to use for accessing it

Using the S3 API

  1. Request S3 storage via our suppport queue at cloud@uni-muenster.de
  2. Install s3cmd
  • If you use Linux, it is probably available via your package manager
  • If you use MacOS, there is a homebrew formula
  • If you use Windows, there are many tools you could choose, some with a GUI; if you have no preference, we would recommend using s3cmd, because this is the most commonly used tool. If you don’t want to set up a local Python installation and use the release from Github, you could take a look at WSL and follow the Linux instructions.
  • There are, of course, other ways of installing s3cmd, such as directly downloading the current release from github; pick what works for you
  1. (Optional but recommended) Configure s3cmd
  2. Utilise our S3 Services

Using s3cmd

If you do not want to store a configuration locally, you can provide the necessary information via command line parameters like so:

# this command will create a new bucket to store objects in
s3cmd --host=radosgw.public.os.wwu.de --host-bucket=radosgw.public.os.wwu.de --access_key=<ACCESS_KEY_GOES_HERE> --secret_key=<SECRET_KEY_GOES_HERE> mb s3://example-bucket-name/
# this command will list the buckets available in your account
s3cmd --host=radosgw.public.os.wwu.de --host-bucket=radosgw.public.os.wwu.de --access_key=<ACCESS_KEY_GOES_HERE> --secret_key=<SECRET_KEY_GOES_HERE> ls

This can get tedious rather quickly. A way to avoid having to input your credentials each time is to store them in a configuration file at ~/.s3cfg. The following is a configuration that will work with our default location (ms1), adjust the host names if you are accessing one of the others:

[default]
access_key = 123
host_base = radosgw.public.os.wwu.de
host_bucket = radosgw.public.os.wwu.de
secret_key = XYZ
check_ssl_certificate = True
check_ssl_hostname = True
delete_after = False
delete_after_fetch = False
delete_removed = False
use_https = True

This is just a subset of the available configuration options, refer to the official documentation for more information.

Now that we have configured our s3cmd client, we can shorten the commands drastically:

# create a bucket
s3cmd mb s3://example-bucket-name
# store a file
s3cmd put backup.tar.gz s3://example-bucket-name
# list all buckets
s3cmd ls
# list the files in our bucket
s3cmd ls s3://example-bucket-name

# list all possible s3cmd commands and parameters
s3cmd --help

And this is pretty much it. Try creating and deleting buckets, storing files, downloading, and deleting them to get familiar with the tool. You will find advanced tutorials online, if you want to configure ACLs or set expiration policies on your files, so they are automatically deleted. You can do all of this using s3cmd.