SOP Installation/Configuration of OCP4 on Fedora Infra
Install
To install OCP4 on Fedora Infra, one must be apart of the following groups:
-
sysadmin-openshift
-
sysadmin-noc
Prerequisites
Visit the OpenShift Console and download the following OpenShift tools:
-
A Red Hat Access account is required
-
OC client tools Here
-
OC installation tool Here
-
Ensure the downloaded tools are available on the
PATH
-
A valid OCP4 subscription is required to complete the installation configuration, by default you have a 60 day trial.
-
Take a copy of your pull secret file you will need to put this in the
install-config.yaml
file in the next step.
Generate install-config.yaml file
We must create a install-config.yaml
file, use the following example for inspiration, alternatively refer to the documentation[1] for more detailed information/explainations.
apiVersion: v1 baseDomain: stg.fedoraproject.org compute: - hyperthreading: Enabled name: worker replicas: 0 controlPlane: hyperthreading: Enabled name: master replicas: 3 metadata: name: 'ocp' networking: clusterNetwork: - cidr: 10.128.0.0/14 hostPrefix: 23 networkType: OpenShiftSDN serviceNetwork: - 172.30.0.0/16 platform: none: {} fips: false pullSecret: 'PUT PULL SECRET HERE' sshKey: 'PUT SSH PUBLIC KEY HERE kubeadmin@core'
-
Login to the
os-control01
corresponding with the environment -
Make a directory to hold the installation files:
mkdir ocp4-<ENV>
-
Enter this newly created directory:
cd ocp4-<ENV>
-
Generate a fresh SSH keypair:
ssh-keygen -f ./ocp4-<ENV>-ssh
-
Create a
ssh
directory and place this keypair into it. -
Put the contents of the public key in the
sshKey
value in theinstall-config.yaml
file -
Put the contents of your Pull Secret in the
pullSecret
value in theinstall-config.yaml
-
Take a backup of the
install-config.yaml
toinstall-config.yaml.bak
, as running the next steps consumes this file, having a backup allows you to recover from mistakes quickly.
Create the Installation Files
Using the openshift-install
tool we can generate the installation files. Make sure that the install-config.yaml
file is in the /path/to/ocp4-<ENV>
location before attempting the next steps.
Create the Manifest Files
The manifest files are human readable, at this stage you can put any customisations required before the installation begins.
-
Create the manifests:
openshift-install create manifests --dir=/path/to/ocp4-<ENV>
-
All configuration for RHCOS must be done via MachineConfigs configuration. If there is known configuration which must be performed, such as NTP, you can copy the MachineConfigs into the
/path/to/ocp4-<ENV>/openshift
directory now. -
The following step should be performed at this point, edit the
/path/to/ocp4-<ENV>/manifests/cluster-scheduler-02-config.yml
change themastersSchedulable
value tofalse
.
Create the Ignition Files
The ignition files have been generated from the manifests and MachineConfig files to generate the final installation files for the three roles: bootstrap
, master
, worker
. In Fedora we prefer not to use the term master
here, we have renamed this role to controlplane
.
-
Create the ignition files:
openshift-install create ignition-configs --dir=/path/to/ocp4-<ENV>
-
At this point you should have the following three files:
bootstrap.ign
,master.ign
andworker.ign
. -
Rename the
master.ign
tocontrolplane.ign
. -
A directory has been created,
auth
. This contains two files:kubeadmin-password
andkubeconfig
. These allowcluster-admin
access to the cluster.
Copy the Ignition files to the batcave01
server
On the batcave01
at the following location: /srv/web/infra/bigfiles/openshiftboot/
:
-
Create a directory to match the environment:
mkdir /srv/web/infra/bigfiles/openshiftboot/ocp4-<ENV>
-
Copy the ignition files, the ssh files and the auth files generated in previous steps, to this newly created directory. Users with
sysadmin-openshift
should have the necessary permissions to write to this location. -
when this is complete it should look like the following:
├── <ENV> │ ├── auth │ │ ├── kubeadmin-password │ │ └── kubeconfig │ ├── bootstrap.ign │ ├── controlplane.ign │ ├── ssh │ │ ├── id_rsa │ │ └── id_rsa.pub │ └── worker.ign
Update the ansible inventory
The ansible inventory/hostvars/group vars should be updated with the new hosts information.
For inspiration see the following PR where we added the ocp4 production changes.
Update the DNS/DHCP configuration
The DNS and DHCP configuration must also be updated. This PR contains the necessiary changes DHCP for prod and can be done in ansible.
However the DNS changes may only be performed by sysadmin-main
. For this reason any DNS changes must go via a patch snippet which is emailed to the infrastructure@lists.fedoraproject.org
mailing list for review and approval. This process may take several days.
Generate the TLS Certs for the new environment
This is beyond the scope of this SOP, the best option is to create a ticket for Fedora Infra to request that these certs are created and available for use. The following certs should be available:
-
*.apps.<ENV>.fedoraproject.org
-
api.<ENV>.fedoraproject.org
-
api-int.<ENV>.fedoraproject.org
Run the Playbooks
There are a number of playbooks required to be run. Once all the previous steps have been reached, we can run these playbooks from the batcave01
instance.
-
sudo rbac-playbook groups/noc.yml -t 'tftp_server,dhcp_server'
-
sudo rbac-playbook groups/proxies.yml -t 'haproxy,httpd,iptables'
Baremetal / VMs
Depending on if some of the nodes are VMs or baremetal, different tags should be supplied to the following playbook. If the entire cluster is baremetal you can skip the kvm_deploy
tag entirely.
If there are VMs used for some of the roles, make sure to leave it in.
-
sudo rbac-playbook manual/ocp4-place-ignitionfiles.yml -t "ignition,repo,kvm_deploy"
Baremetal
At this point we can switch on the baremetal nodes and begin the PXE/UEFI boot process. The baremetal nodes should via DHCP/DNS have the configuration necessary to reach out to the noc01.iad2.fedoraproject.org
server and retrieve the UEFI boot configuration via PXE.
Once booted up, you should visit the management console for this node, and manually choose the UEFI configuration appropriate for its role.
The node will begin booting, and during the boot process it will reach out to the os-control01
instance specific to the <ENV>
to retrieve the ignition file appropriate to its role.
The system will then become autonomous, it will install and potentially reboot multiple times as updates are retrieved/applied etc.
Eventually you will be presented with a SSH login prompt, where it should have the correct hostname eg: ocp01
to match what is in the DNS configuration.
Bootstrapping completed
When the control plane is up, we should see all controlplane instances available in the appropriate haproxy dashboard. eg: haproxy.
At this time we should take the bootstrap
instance out of the haproxy load balancer.
-
Make the necessiary changes to ansible at:
ansible/roles/haproxy/templates/haproxy.cfg
-
Once merged, run the following playbook once more:
sudo rbac-playbook groups/proxies.yml -t 'haproxy'
Begin instllation of the worker nodes
Follow the same processes listed in the Baremetal section above to switch on the worker nodes and begin installation.
Configure the os-control01
to authenticate with the new OCP4 cluster
Copy the kubeconfig
to ~root/.kube/config
on the os-control01
instance.
This will allow the root
user to automatically be authenticated to the new OCP4 cluster with cluster-admin
privileges.
Accept Node CSR Certs
To accept the worker/compute nodes into the cluster we need to accept their CSR certs.
List the CSR certs. The ones we’re interested in will show as pending:
oc get csr
To accept all the OCP4 node CSRs in a one liner do the following:
oc get csr -o go-template='{{range .items}}{{if not .status}}{{.metadata.name}}{{"\n"}}{{end}}{{end}}' | xargs oc adm certificate approve
This should look something like this once completed:
[root@os-control01 ocp4][STG]= oc get nodes NAME STATUS ROLES AGE VERSION ocp01.ocp.stg.iad2.fedoraproject.org Ready master 34d v1.21.1+9807387 ocp02.ocp.stg.iad2.fedoraproject.org Ready master 34d v1.21.1+9807387 ocp03.ocp.stg.iad2.fedoraproject.org Ready master 34d v1.21.1+9807387 worker01.ocp.stg.iad2.fedoraproject.org Ready worker 21d v1.21.1+9807387 worker02.ocp.stg.iad2.fedoraproject.org Ready worker 20d v1.21.1+9807387 worker03.ocp.stg.iad2.fedoraproject.org Ready worker 20d v1.21.1+9807387 worker04.ocp.stg.iad2.fedoraproject.org Ready worker 34d v1.21.1+9807387 worker05.ocp.stg.iad2.fedoraproject.org Ready worker 34d v1.21.1+9807387
At this point the cluster is basically up and running.
Follow on SOPs
Several other SOPs should be followed to perform the post installation configuration on the cluster.
Want to help? Learn how to contribute to Fedora Docs ›