Metal-Operator Architectural Description 
The metal-operator is a Kubernetes operator designed to manage bare metal servers within a Kubernetes environment. It automates the provisioning, configuration, and lifecycle management of physical servers by integrating them into Kubernetes using Custom Resource Definitions (CRDs) and controllers. The architecture promotes modularity, scalability, and flexibility, enabling seamless integration with various boot mechanisms and provisioning tools.
Architectural Diagram 
Key Components 
1. Custom Resource Definitions (CRDs) 
- Endpoint: Represents devices on the out-of-band management network, identified by MAC and IP addresses.
 - BMC: Models Baseboard Management Controllers (BMCs), allowing interaction with server hardware.
 - BMCSecret: Securely stores credentials required to access BMCs.
 - Server: Represents physical servers, managing their state, power, and configurations.
 - ServerClaim: Allows users to reserve servers by specifying desired configurations and boot images.
 - ServerBootConfiguration: Signals the need to prepare the boot environment for a server.
 - ServerMaintenance: Represents maintenance tasks for servers, such as BIOS updates or hardware repairs.
 - BIOSSettings: Handles updating the BIOS setting on the physical server's BIOS.
 - BIOSSettingsSet: Handles creation of multiple 
BIOSSettingsby selecting physical server's through labels. - BIOSVersion: Handles upgrading the BIOS Version on the physical server's BIOS.
 - BIOSVersionSet: Handles creation of multiple 
BIOSVersionby selecting physical server's through labels. - BMCSettings: Handles updating the BMC setting on the physical server's Manager.
 - BMCVersion: Handles upgrading the BMC Version on the physical server's Manager.
 - BMCVersionSet: Handles creation of multiple 
BMCVersionby selecting BMC's through labels. 
2. Controllers 
EndpointReconciler: Discovers devices on the out-of-band network by processing
Endpointresources. It uses a MAC Prefix Database to identify device types, vendors, protocols, and default credentials. When a BMC is detected, it creates correspondingBMCandBMCSecretresources.BMCReconciler: Manages
BMCresources by connecting to BMC devices using credentials fromBMCSecret. It retrieves hardware information, updates the BMC status, and detects managed servers, creatingServerresources for them.ServerReconciler: Manages
Serverresources and their lifecycle states. During the Discovery phase, it interacts with BMCs and uses the metalprobe agent to collect in-band hardware information, updating the server's status. It handles power management, BIOS configurations, and transitions servers through various states (e.g., Initial, Discovery, Available, Reserved).ServerClaimReconciler: Handles
ServerClaimresources, allowing users to reserve servers. Upon creation of aServerClaim, it allocates an available server, transitions it to the Reserved state, and creates aServerBootConfiguration. When the claim is deleted, it releases the server, transitioning it to the Cleanup state for sanitization.BIOSSettingsReconciler: Handles
BIOSSettingsresource. Provides ability to update the bios settings on physical server's BIOS.BiosSettingsSetReconciler: Handles
BIOSSettingsSetresource. Provides ability to update the bios settings on several physical server's BIOS at a time through selecting server's through labels.BiosVersionReconciler: Handles
BIOSVersionresource. Provides ability to upgrade the bios version on physical server's BIOS.BiosVersionSetReconciler: Handles
BIOSVersionSetresource. Provides ability to upgrade the bios version on several physical server's BIOS at a time through selecting server's through labels.BMCSettingsReconciler: Handles
BMCSettingsresource. Provides ability to update the bmc settings on physical server's Manager.BMCVersionReconciler: Handles
BMCVersionresource. Provides ability to upgrade the bmc version on physical server's Manager.BMCVersionSetReconciler: Handles
BMCVersionSetresource. Provides ability to upgrade the BMC version on several physical server's BMC at a time through selecting BMC's through labels.Boot Operator (External Component): Monitors
ServerBootConfigurationresources to prepare the boot environment (e.g., configuring DHCP, PXE servers). Once the boot environment is ready, it updates theServerBootConfigurationstatus to Ready.
Workflow Summary 
Discovery and Initialization:
- The EndpointReconciler discovers devices on the out-of-band network, creating 
Endpointresources. - BMCs are identified using the MAC Prefix Database, leading to the creation of 
BMCandBMCSecretresources. - The BMCReconciler connects to BMCs, gathers hardware details, and creates 
Serverresources for each managed server. 
- The EndpointReconciler discovers devices on the out-of-band network, creating 
 Server Discovery Phase:
- The ServerReconciler enters the Discovery phase, interacting with BMCs and booting servers using a predefined ignition.
 - The metalprobe agent runs on the servers, collecting detailed hardware information (e.g., network interfaces, storage devices) and reporting back to update the 
Serverstatus. 
Server Availability:
- Once discovery is complete, servers transition to the Available state, ready to be claimed.
 
Server Reservation and Boot Configuration:
- Users create 
ServerClaimresources to reserve servers, specifying desired OS images and ignition configurations. - The ServerClaimReconciler allocates servers, transitions them to the Reserved state, and creates 
ServerBootConfigurationresources. 
- Users create 
 Boot Environment Preparation:
- External components (e.g., boot-operator) watch for 
ServerBootConfigurationresources and prepare the boot environment accordingly. - Once the environment is ready, they update the 
ServerBootConfigurationstatus to Ready. 
- External components (e.g., boot-operator) watch for 
 Server Power-On and Usage:
- The ServerReconciler detects the ready status and powers on the server.
 - The server boots using the specified image and ignition configuration.
 
Cleanup and Maintenance:
- When a 
ServerClaimis deleted, the server transitions to the Cleanup state. - The ServerReconciler performs sanitization tasks (e.g., wiping disks, resetting configurations) before returning the server to the Available state.
 - Servers can enter the Maintenance state for updates or repairs.
 
- When a 
 
Architectural Benefits 
- Modularity: Separation of concerns allows for flexible integration with various boot mechanisms and provisioning tools (e.g., OpenStack Ironic, custom solutions).
 - Scalability: Automates the management of large numbers of servers through Kubernetes CRDs and controllers.
 - Extensibility: Supports customization through additional CRDs and operators, enabling adaptation to specific infrastructure needs.
 - Security: Manages sensitive information like BMC credentials using Kubernetes Secrets and enforces access control via RBAC policies.
 - Automation: Streamlines hardware provisioning, configuration, and lifecycle management, reducing manual intervention and potential errors.