Host your own Mastodon Instance with Docker (Part 3)

My Mastodon instance is now running without any issues for around to months. But still, there are some things that I noticed during that time that I want to address. My biggest concern is that the machine Mastodon is running on will run out of disk space. That’s why in this post I’m going to describe how I moved the static file storage of my Mastodon instance to a S3 compatible object storage provider. If your interested in the whole process of setting up your own Mastodon instance, make sure to check out Part 1 and Part 2 of the setup too.

Overview

The issues I want to address in this blog post are the following:

  • Push notifications are currently not working on my instance.
  • Whenever I restart the services, my home timeline is empty afterwards.
  • As already mentioned, Mastodon is using more and more disk space with every day.

Enable Push Notifications

The first thing I’m going to do is enable push notifications. I actually wouldn’t have missed that feature because I disable push notifications anyway for most apps. But I noticed an error in the browser’s console about it that bothered me. The error reads “The VAPID public key is not set. You will not be able to receive Web Push Notifications.”. After some research I found that this VAPID public key is part of the Web Push Protocol and that I have to configure the public as well as the private key with the environment variables VAPID_PUBLIC_KEY and VAPID_PRIVATE_KEY. The Mastodon documentation also mentions how to generate them: rake mastodon:webpush:generate_vapid_key. This command is supposed to be executed in the Mastodon folder, so I can easily execute it inside a temporary docker container with the following command:

docker run --rm tootsuite/mastodon /bin/sh -c 'bundle exec rake mastodon:webpush:generate_vapid_key'

This prints out the keys that I need to configure:

VAPID_PRIVATE_KEY=IpxsNxc1ey4XinremJQksmBhhkRwsyUfYkbArnqWpUM=
VAPID_PUBLIC_KEY=BLY5D6FzCC2hTliuyx3FOO9WxEvIAk5W9i2nKLKWSvqSuRBRRiL74jBMhDv78XJysbG1QFFoPVbwDSMBus5yDKE=

So I’m adding those keys to my .env file and also extend my docker-compose.base.yml file with the new variables:

version: '3.8'

services:
  mastodon-base:
    image: 'tootsuite/mastodon'
    environment:
      # ... existing configuration omitted ... #
      VAPID_PRIVATE_KEY: '${MASTODON_VAPID_PRIVATE_KEY}'
      VAPID_PUBLIC_KEY: '${MASTODON_VAPID_PUBLIC_KEY}'

Afterwards I’m able to restart my services, the error disappears and instead the browser now asks me if I want to receive push notifications from Mastodon.

Persist the Redis Cache

The next thing to tackle is the empty home feed after restarts. I found an interesting article about the architecture of Mastodon. It turns out that the feeds are actually loaded from the Redis cache and not the Postgres database. Also the background jobs that are processed by Sidekiq are loaded from Redis. I could have lived with an empty feed after restarts, but getting rid of all the background jobs that are queued up is probably not the best thing. So it makes sense to address that issue.

To persist the cache, I add an additional volume to the docker-compose.yml and mount it into the Redis container. I also adjust the command executed so that the cache gets written to disk every minute:

version: '3.8'

services:
  # ... existing services configuration omitted ... #
  mastodon-redis:
    # ... existing configuration omitted ... #
    command: 'redis-server --save 60 1 --loglevel warning'
    volumes:
      - 'mastodon-redis-volume:/data'

volumes:
  # ... existing volumes configuration omitted ... #
  mastodon-redis-volume:
    external: true

That should already be enough to make sure not the whole cache is lost on restart, at least for my small single user instance.

Add an Object Storage Provider

The last task on my todo list to address is the ever growing disk space that Mastodon is consuming. I already configured Mastodon to delete content after a few days to keep the used disk space small. But that doesn’t affect the cached avatar and header images of all the users the instance knows of (a few weeks ago a PR was merged that should cleanup data of inactive users, I’m looking forward to see whether that will have some impact). This data currently makes up the most used space. I didn’t check the actual numbers, but only the avatars itself use over 2GB of the total 7GB of data it currently uses. The number might sound small, but as I run the instance on public cloud infrastructure I like to keep the size of the used block storage small to reduce the cost. The prices of my cloud provider Infomaniak are much lower than the prices of the big players like Azure or AWS, but still the block storage costs much more than object storage, so I want to move that data to object storage since Mastodon supports that. With object storage I also no longer need to worry about running out of disk space, as it basically can grow infinite (which I hope it does not 🙃).

My cloud providers infrastructure is based on OpenStack, so it’s object storage is using Swift and not S3. Mastodon supports Swift too, but I found it easier to configure an S3 storage. Luckily Swift is compatible to S3, so I can integrate the object storage with S3. I’m not going to show how I configured the S3 bucket, as this probably depends on the provider that is used. If your provider supports S3 it will probably work out of the box. For my providers Swift bucket, the key was to create EC2 credentials that can be used to access the bucket via the S3 API:

# install the client tools
sudo apt install python3-openstackclient
sudo apt install python3-swiftclient

# load credentials
source openstack.sh

# create a bucket and enable public read access
openstack container create mastodon
swift post -r '.r:*' mastodon

# create credentials for the S3 API
openstack ec2 credentials create

With that done I have an S3 bucket and the credentials to access it (which should consist of a access key ID and the secret access key). I also had to figure out the URL of the S3 API, which is different from the one for the Swift API. In my case it is https://s3.pub1.infomaniak.cloud. With all that information I can configure my Mastodon instance by extending the service definition in docker-compose.base.yml with some environment variables:

version: '3.8'

services:
  mastodon-base:
    image: 'tootsuite/mastodon'
    environment:
      # ... existing configuration omitted ... #
      S3_ENABLED: 'true'
      S3_PROTOCOL: 'https'
      S3_ENDPOINT: '${MASTODON_S3_ENDPOINT}'
      S3_HOSTNAME: '${MASTODON_S3_HOSTNAME}'
      S3_REGION: '${MASTODON_S3_REGION}'
      S3_BUCKET: '${MASTODON_S3_BUCKET}'
      AWS_ACCESS_KEY_ID: '${MASTODON_S3_ACCESS_KEY_ID}'
      AWS_SECRET_ACCESS_KEY: '${MASTODON_S3_SECRET_ACCESS_KEY}'
      S3_ALIAS_HOST: '${MASTODON_S3_DOMAIN}'

As you can see, for most of the values just reference variables in my .env file so I can easily configure the actual values. I also prefixed them with MASTODON_ so I know what service they belong to. My configuration looks like this:

MASTODON_S3_ENDPOINT='https://s3.pub1.infomaniak.cloud'
MASTODON_S3_HOSTNAME='s3.pub1.infomaniak.cloud'
MASTODON_S3_REGION='us-east-1'
MASTODON_S3_BUCKET='mastodon'
MASTODON_S3_ACCESS_KEY_ID='MyAccessKeyId'
MASTODON_S3_SECRET_ACCESS_KEY='MySecretAccessKey'
MASTODON_S3_DOMAIN='s3.pub1.infomaniak.cloud'

To check whether the configuration works, I start my instance and change my profile picture: The upload works, and I can see that the picture is added to the object storage. But the picture is not displayed correctly on Mastodon. The problem is that Mastodon just replaces the default domain social.raeffs.dev with the configured S3 domain s3.pub1.infomaniak.cloud and appends the same path to it, but the object storage needs more information to resolve the correct storage bucket, like the bucket name.

After digging through the documentation of my storage provider I found that I have a few different options to access the stored data:

https://s3.pub1.infomaniak.cloud/object/v1/AUTH_<project-id>/<bucket>/<path>
https://<project-id>.s3.pub1.infomaniak.cloud/<bucket>/<path>
https://<bucket>.<project-id>.s3.pub1.infomaniak.cloud/<path>

So changing MASTODON_S3_DOMAIN to mastodon.my-project-id.s3.pub1.infomaniak.cloud should fix the problem. And indeed, after that change the picture is displayed correctly.

Put the Object Storage behind a Proxy

I could have stopped at this point. But I wasn’t happy with that ugly URL for the object storage, and I was also not sure whether it is a good idea to expose the project identifier of the object storage to the public. So at first I wanted to set up a DNS CNAME entry that points to the object storage, but I had some problems with that. It worked when I pointed it to s3.pub1.infomaniak.cloud, but not when I used one of the alternative URLs. I assume the problem was that the hostname didn’t match anymore and thus the object storage didn’t know to which bucket the request points. But pointing to s3.pub1.infomaniak.cloud directly would mean that I need to modify the path somehow and add the additional information. Since my instance runs behind Cloudflare I could have set up some rules that take care of that, as done by @blasteh@m.blasteh.uk. But I wanted to stay independent of Cloudflare and since I already use a reverse proxy in front of Mastodon, I decided to add the necessary configuration to serve the object storage data via that proxy too.

To simplify the configuration, I’m using a separate subdomain for the object storage data. Otherwise I would have to figure out which requests need to be served directly and which should be forwarded to the object storage. But with a separate domain, I can add the following configuration to my Caddyfile:

# ... existing configuration omitted ... #

{$MASTODON_S3_DOMAIN} {
  handle {
    rewrite * {$MASTODON_S3_PATH_PREFIX}{uri}
    reverse_proxy {$MASTODON_S3_ENDPOINT} {
      header_up -Host
    }
    header ?Cache-Control "public, max-age=315576000, immutable"
  }
}

This configuration rewrites the requested URL to include the path prefix that is required by my storage provider to correctly serve the files. It also removes the host header if it is set and then forwards the request to the object storage. Finally, it adds the cache control header if it is not set to make sure the data is cached by the clients, as the data stored in object storage by Mastodon is immutable.

Before I can start the instance I need to make sure that the environment variables are set correctly:

MASTODON_S3_DOMAIN='social-cdn.raeffs.dev'
MASTODON_S3_PATH_PREFIX='/object/v1/AUTH_my-project-id/mastodon'

And also add the variables to the service definition in my docker-compose.yml file:

version: '3.8'

services:
  # ... existing services configuration omitted ... #
  mastodon-proxy:
    # ... existing configuration omitted ... #
    environment:
      MASTODON_DOMAIN: '${MASTODON_DOMAIN}'
      MASTODON_S3_DOMAIN: '${MASTODON_S3_DOMAIN}'
      MASTODON_S3_ENDPOINT: '${MASTODON_S3_ENDPOINT}'
      MASTODON_S3_PATH_PREFIX: '${MASTODON_S3_PATH_PREFIX}'

volumes:
  # ... existing volumes configuration omitted ... #

Migrate existing Data

Now that my Mastodon instance is running backed by an object storage I have to move all the existing data to the new storage. Luckily it turns out that this is an easy task with the AWS CLI. And even better, the CLI is available as an official docker image too, so I don’t even need to install anything to move my data. I can simply mount the docker volume with the stored data into a container running the CLI and execute the following commands:

docker run --rm -it -v ~/.aws:/root/.aws amazon/aws-cli configure
docker run --rm -it -v ~/.aws:/root/.aws -v mastodon-volume:/data amazon/aws-cli s3 sync /data s3://mastodon/ --endpoint-url=https://s3.pub1.infomaniak.cloud

After waiting for almost an hour the command completed successfully and I could verify that all the links on my Mastodon instance were working again. Once I was sure everything works, I removed the volume mount from the Mastodon service definition and deleted the docker volume, as I don’t need it anymore. But I didn’t think about the provisioning script I made that writes an empty file to that volume to make sure the provisioning is done only once. So I had to fix that too.

I was wondering if there is a command to check whether an account already exists or not. There isn’t, but it turns out that the command to approve an account should work too. It fails if the account doesn’t exist but succeeds if the account exists, even if it is already approved.

With that knowledge I can adjust my provisioning script to use the approve command instead of an empty file:

#!/bin/bash

echo "Migrating database..."
bundle exec rake db:migrate

echo "Checking if provisioning is required..."
bin/tootctl accounts approve $MASTODON_ADMIN_USERNAME

if [ $? -eq 0 ]; then
    echo "Provisioning not required"
else
    echo "Provisioning mastodon..."

    bin/tootctl accounts create $MASTODON_ADMIN_USERNAME --email $MASTODON_ADMIN_EMAIL --confirmed --role Owner

    echo "Provisioning done"
fi

With that done I can start up my instance again.

Conclusion

By adding an object storage to my Mastodon instance, I’m pretty confident that it will keep running smoothly in the future. I’m also planning to add some monitoring to it to keep everything in control. But at least for the moment I don’t need to worry anymore that I run out of disk space.

If you want to host your own Mastodon instance too, you can find all the relevant files in my GitHub repository.