cloud-image-uploader SOP
Upload Cloud images to public clouds after they are built in Koji.
Source code: https://pagure.io/cloud-image-uploader
Contact Information
- Owner
-
Cloud SIG, Jeremy Cline (jcline)
- Contact
-
#cloud:fedoraproject.org (Matrix)
- Servers
- Purpose
-
Upload Cloud images to public clouds.
Description
cloud-image-uploader is an AMQP message consumer (run via fedora-messaging
consume
) that processes Pungi compose messages published on the
org.fedoraproject.*.pungi.compose.status.change
AMQP topic. When a compose
enters the FINISHED
or FINISHED_INCOMPLETE
states, the service downloads
any images in the compose and uploads it to the relevant cloud provider.
The service does not accept any incoming connections and only depends on the RabbitMQ message broker and the relevant cloud provider’s APIs.
It requires a few gigabytes of temporary space to download the images before uploading them to the cloud provider. It is heavily I/O bound and the most computationally expensive thing it does is decompress the images.
General Configuration
The Fedora Ansible repository contains the
OpenShift
application definition. The playbook to create the OpenShift application is
located at playbooks/openshift-apps/cloud-image-uploader.yml
.
The Ansible playbook creates multiple fedora-messaging configuration files from
the config.toml
template. All application configuration is either in the
fedora-messaging configuration file or in environment variables. The
environment variables are used for secrets and vary based on which service the
container handles.
The fedora-messaging configuration file in use by a container is defined in the
FEDORA_MESSAGING_CONF
environment variable.
Deploying
The OpenShift deployment consists a single image and multiple containers using
that image, one container for each content type (containers, azure, aws, and
gcp). The only variation between the containers is the secrets volumes mounted,
secrets injected via environment variables, and the FEDORA_MESSAGING_CONF
environment variable which points to one of the fedora-messaging configurations
in /etc/fedora-messaging/
.
Staging
The staging BuildConfig builds a container from the main branch. You need to trigger a build manually, either from the web UI or the CLI.
Although composes are not done in staging, it’s still possible to test in
staging manually. First, start a debug terminal to enter a running container.
Next, find an AMQP message for a
production
compose in the FINISHED
or FINISHED_INCOMPLETE
state. You can trigger the
fedora-messaging consumer to process the message by running:
FEDORA_MESSAGING_CONF=/etc/fedora-messaging/service-config.toml fedora-messaging reconsume <message-id>
Production
The production BuildConfig builds a container from the prod branch. Just like staging, you need to trigger a build manually. After deploying to staging, the main branch can be merged into the production branch to "promote" it:
$ git checkout prod && git merge --ff-only main
Azure
Images are uploaded whenever a compose that contains vhd-compressed
images.
Images are first uploaded to a container in the storage account and then
imported into an Image Gallery.
Credentials for Azure are provided using environment variables and are discovered by the Azure Python SDK automatically.
Image Cleanup
Image clean-up is automated.
The storage account is configured to delete any blob in the container older than 1 week and should require no manual attention. Nothing in the container is required after the VHD is imported to the Image Gallery.
Images in the Gallery are cleaned up by the image uploader after a new image has been uploaded. For complete details on the image cleanup policy refer to the consumer code, but at the time of this writing the policy is as follows:
-
Any image that has an end-of-life field that is in the past is removed.
-
Only the latest 7 images that are marked as "excluded from latest = True" within an image definition are retained. When an image is marked as "exclude from latest = False", new virtual machines that don’t reference an explicit image version will boot using the newest image (following semver). All images are uploaded with "excluded from latest = True" and are only marked as "excluded from latest = False" after testing.
-
Only the latest 7 images in the Rawhide image definitions are retained, regardless of whether they are marked "excluded from latest = False".
At the moment, testing and promotion to "excluded from latest = False" is a manual process, but in the future will be automated to happen regularly (weekly, perhaps).
Authentication
The following environment variables are used:
AZURE_SUBSCRIPTION_ID - Identifies the subscription within an Azure tenant (our tenant only has 1) AZURE_CLIENT_ID - The application ID used during authentication. AZURE_SECRET - The application secret used during authentication. AZURE_TENANT - Identifies the Azure tenant.
If you have access to the Fedora Project tenant, these values are available in
the web portal under the Microsoft Entra ID service
in the "App registrations" tab. To manage things via the CLI you can do dnf
install azure-cli
. All commands below assume you’ve logged in with az login
.
There are two app registrations, fedora-cloud-image-uploader
and
fedora-cloud-image-uploader-staging
. These were created by running:
$ az ad app create --display-name fedora-cloud-image-uploader
Authorization
Images are placed in two resource groups (containers for arbitrary resources).
fedora-cloud-staging
is used for the staging deployment, and fedora-cloud
is used for the production deployment.
The app registrations are granted access to their respective resource group by assigning them a role on the resource group. The role definition can be seen with:
$ az role definition list --name "Image Uploader"
This role is then assigned to the app registration with
$ az role assignment create --assignee "fedora-cloud-image-uploader" \ --role "Image Uploader" \ --scope "/subscriptions/{subscription_id}/resourceGroups/fedora-cloud"
In the event that additional permissions are required, the role can be updated with additional permission.
Credential rotation
At the moment, credentials are set to expire and will need to be periodically rotated. To do so via the CLI:
$ az ad app list -o table # Find the application to issue new secrets for and set CLIENT_ID to its "Id" field $ touch azure_secret $ chmod 600 azure_secret $ SECRET_NAME="Some useful name for the secret" $ az ad app credential reset --id $CLIENT_ID --append --display-name $SECRET_NAME --years 1 --query password --output tsv > azure_secret
AWS
AWS images are uploaded by this service to the Fedora AWS account. Cleanup is handled by the general Fedora AWS resource cleaner and uses the tags applied to a resource to determine when to remove them.
Images are first uploaded to the fedora-s3-bucket-fedimg
S3 bucket, and then
imported as EC2 snapshots to the region configured in the base_region
setting
of the consumer_config.aws
section. The snapshot is then replicated to all
the regions listed in the ami_regions
setting.
Containers
Containers are pushed to the registry.fedoraproject.org
and quay.io/fedora/
registries. These include the Fedora Toolbox, Fedora and Fedora Minimal, ELN,
and Atomic Desktop images.
Google Cloud Engine
Google Cloud Engine images are published under the fedora-cloud
project in
Google Cloud Platform. The flow is similar to other clouds, as the tarball is
uploaded to the fedora-cloud-image-upload
bucket and then imported as a
machine image. The bucket has a lifecycle configuration to delete an object 3
days after it has been created so old tarballs are cleaned up automatically
after being imported.
Credentials
The service uses the
fedora-image-uploader@fedora-cloud.iam.gserviceaccount.com
service account.
New credentials can be issued for that account under the IAM & Admin panel,
although the current credentials do not expire.
Permissions
The service account is assigned the Fedora Image Uploader
role which should
grant it the minimal permissions required to manage images. The current
permission list is as follows:
-
compute.globalOperations.get
-
compute.images.create
-
compute.images.createTagBinding
-
compute.images.delete
-
compute.images.deleteTagBinding
-
compute.images.deprecate
-
compute.images.get
-
compute.images.getFromFamily
-
compute.images.list
-
compute.images.listEffectiveTags
-
compute.images.listTagBindings
-
compute.images.setLabels
-
compute.images.update
-
compute.images.useReadOnly
-
resourcemanager.projects.get
In the event that the application requires new permissions, edit the Fedora
Image Uploader
role to include the new permissions.
Cleanup
Machine images are labeled to include their end-of-life
date. After this date
is reached, the image is removed. Images are uploaded as "deprecated" by
default. Every two weeks an image in an Image Family is promoted and marked as
not deprecated. Deprecated images are removed after two weeks.
Want to help? Learn how to contribute to Fedora Docs ›