SOP Installation/Configuration of OCP4 on Fedora Infra
Install
To install OCP4 on Fedora Infra, one must be apart of the following groups:
-
sysadmin-openshift -
sysadmin-noc
Prerequisites
Visit the OpenShift Console and download the following OpenShift tools:
-
A Red Hat Access account is required
-
OC client tools Here
-
OC installation tool Here
-
Ensure the downloaded tools are available on the
PATH -
A valid OCP4 subscription is required to complete the installation configuration, by default you have a 60 day trial.
-
Take a copy of your pull secret file you will need to put this in the
install-config.yamlfile in the next step.
Generate install-config.yaml file
We must create a install-config.yaml file, use the following example for inspiration, alternatively refer to the documentation[1] for more detailed information/explainations.
apiVersion: v1
baseDomain: stg.fedoraproject.org
compute:
- hyperthreading: Enabled
name: worker
replicas: 0
controlPlane:
hyperthreading: Enabled
name: master
replicas: 3
metadata:
name: 'ocp'
networking:
clusterNetwork:
- cidr: 10.128.0.0/14
hostPrefix: 23
networkType: OpenShiftSDN
serviceNetwork:
- 172.30.0.0/16
platform:
none: {}
fips: false
pullSecret: 'PUT PULL SECRET HERE'
sshKey: 'PUT SSH PUBLIC KEY HERE kubeadmin@core'
-
Login to the
os-control01corresponding with the environment -
Make a directory to hold the installation files:
mkdir ocp4-<ENV> -
Enter this newly created directory:
cd ocp4-<ENV> -
Generate a fresh SSH keypair:
ssh-keygen -f ./ocp4-<ENV>-ssh -
Create a
sshdirectory and place this keypair into it. -
Put the contents of the public key in the
sshKeyvalue in theinstall-config.yamlfile -
Put the contents of your Pull Secret in the
pullSecretvalue in theinstall-config.yaml -
Take a backup of the
install-config.yamltoinstall-config.yaml.bak, as running the next steps consumes this file, having a backup allows you to recover from mistakes quickly.
Create the Installation Files
Using the openshift-install tool we can generate the installation files. Make sure that the install-config.yaml file is in the /path/to/ocp4-<ENV> location before attempting the next steps.
Create the Manifest Files
The manifest files are human readable, at this stage you can put any customisations required before the installation begins.
-
Create the manifests:
openshift-install create manifests --dir=/path/to/ocp4-<ENV> -
All configuration for RHCOS must be done via MachineConfigs configuration. If there is known configuration which must be performed, such as NTP, you can copy the MachineConfigs into the
/path/to/ocp4-<ENV>/openshiftdirectory now. -
The following step should be performed at this point, edit the
/path/to/ocp4-<ENV>/manifests/cluster-scheduler-02-config.ymlchange themastersSchedulablevalue tofalse.
Create the Ignition Files
The ignition files have been generated from the manifests and MachineConfig files to generate the final installation files for the three roles: bootstrap, master, worker. In Fedora we prefer not to use the term master here, we have renamed this role to controlplane.
-
Create the ignition files:
openshift-install create ignition-configs --dir=/path/to/ocp4-<ENV> -
At this point you should have the following three files:
bootstrap.ign,master.ignandworker.ign. -
Rename the
master.igntocontrolplane.ign. -
A directory has been created,
auth. This contains two files:kubeadmin-passwordandkubeconfig. These allowcluster-adminaccess to the cluster.
Copy the Ignition files to the batcave01 server
On the batcave01 at the following location: /srv/web/infra/bigfiles/openshiftboot/:
-
Create a directory to match the environment:
mkdir /srv/web/infra/bigfiles/openshiftboot/ocp4-<ENV> -
Copy the ignition files, the ssh files and the auth files generated in previous steps, to this newly created directory. Users with
sysadmin-openshiftshould have the necessary permissions to write to this location. -
when this is complete it should look like the following:
├── <ENV>
│ ├── auth
│ │ ├── kubeadmin-password
│ │ └── kubeconfig
│ ├── bootstrap.ign
│ ├── controlplane.ign
│ ├── ssh
│ │ ├── id_rsa
│ │ └── id_rsa.pub
│ └── worker.ign
Update the ansible inventory
The ansible inventory/hostvars/group vars should be updated with the new hosts information.
For inspiration see the following PR where we added the ocp4 production changes.
Update the DNS/DHCP configuration
The DNS and DHCP configuration must also be updated. This PR contains the necessiary changes DHCP for prod and can be done in ansible.
However the DNS changes may only be performed by sysadmin-main. For this reason any DNS changes must go via a patch snippet which is emailed to the infrastructure@lists.fedoraproject.org mailing list for review and approval. This process may take several days.
Generate the TLS Certs for the new environment
This is beyond the scope of this SOP, the best option is to create a ticket for Fedora Infra to request that these certs are created and available for use. The following certs should be available:
-
*.apps.<ENV>.fedoraproject.org -
api.<ENV>.fedoraproject.org -
api-int.<ENV>.fedoraproject.org
Run the Playbooks
There are a number of playbooks required to be run. Once all the previous steps have been reached, we can run these playbooks from the batcave01 instance.
-
sudo rbac-playbook groups/noc.yml -t 'tftp_server,dhcp_server' -
sudo rbac-playbook groups/proxies.yml -t 'haproxy,httpd,iptables'
Baremetal / VMs
Depending on if some of the nodes are VMs or baremetal, different tags should be supplied to the following playbook. If the entire cluster is baremetal you can skip the kvm_deploy tag entirely.
If there are VMs used for some of the roles, make sure to leave it in.
-
sudo rbac-playbook manual/ocp4-place-ignitionfiles.yml -t "ignition,repo,kvm_deploy"
Baremetal
At this point we can switch on the baremetal nodes and begin the PXE/UEFI boot process. The baremetal nodes should via DHCP/DNS have the configuration necessary to reach out to the noc01.rdu3.fedoraproject.org server and retrieve the UEFI boot configuration via PXE.
Once booted up, you should visit the management console for this node, and manually choose the UEFI configuration appropriate for its role.
The node will begin booting, and during the boot process it will reach out to the os-control01 instance specific to the <ENV> to retrieve the ignition file appropriate to its role.
The system will then become autonomous, it will install and potentially reboot multiple times as updates are retrieved/applied etc.
Eventually you will be presented with a SSH login prompt, where it should have the correct hostname eg: ocp01 to match what is in the DNS configuration.
Bootstrapping completed
When the control plane is up, we should see all controlplane instances available in the appropriate haproxy dashboard. eg: haproxy.
At this time we should take the bootstrap instance out of the haproxy load balancer.
-
Make the necessiary changes to ansible at:
ansible/roles/haproxy/templates/haproxy.cfg -
Once merged, run the following playbook once more:
sudo rbac-playbook groups/proxies.yml -t 'haproxy'
Begin instllation of the worker nodes
Follow the same processes listed in the Baremetal section above to switch on the worker nodes and begin installation.
Configure the os-control01 to authenticate with the new OCP4 cluster
Copy the kubeconfig to ~root/.kube/config on the os-control01 instance.
This will allow the root user to automatically be authenticated to the new OCP4 cluster with cluster-admin privileges.
Accept Node CSR Certs
To accept the worker/compute nodes into the cluster we need to accept their CSR certs.
List the CSR certs. The ones we’re interested in will show as pending:
oc get csr
To accept all the OCP4 node CSRs in a one liner do the following:
oc get csr -o go-template='{{range .items}}{{if not .status}}{{.metadata.name}}{{"\n"}}{{end}}{{end}}' | xargs oc adm certificate approve
This should look something like this once completed:
[root@os-control01 ocp4][STG]= oc get nodes NAME STATUS ROLES AGE VERSION ocp01.ocp.stg.rdu3.fedoraproject.org Ready master 34d v1.21.1+9807387 ocp02.ocp.stg.rdu3.fedoraproject.org Ready master 34d v1.21.1+9807387 ocp03.ocp.stg.rdu3.fedoraproject.org Ready master 34d v1.21.1+9807387 worker01.ocp.stg.rdu3.fedoraproject.org Ready worker 21d v1.21.1+9807387 worker02.ocp.stg.rdu3.fedoraproject.org Ready worker 20d v1.21.1+9807387 worker03.ocp.stg.rdu3.fedoraproject.org Ready worker 20d v1.21.1+9807387 worker04.ocp.stg.rdu3.fedoraproject.org Ready worker 34d v1.21.1+9807387 worker05.ocp.stg.rdu3.fedoraproject.org Ready worker 34d v1.21.1+9807387
At this point the cluster is basically up and running.
Follow on SOPs
Several other SOPs should be followed to perform the post installation configuration on the cluster.
Want to help? Learn how to contribute to Fedora Docs ›