Servers
The Server Custom Resource Definition (CRD) represents a bare metal server. It manages the state and lifecycle of physical servers, enabling automated hardware management tasks such as power control, BIOS configuration, and firmware updates. Interaction with a Server resource is facilitated through its associated Baseboard Management Controller (BMC), either by referencing a BMC resource or by providing direct BMC configuration.
Example Server Resource
apiVersion: metal.ironcore.dev/v1alpha1
kind: Server
metadata:
name: my-server
spec:
systemUUID: "123e4567-e89b-12d3-a456-426614174000"
power: "Off"
reclaimPolicy: Recycle
bmcRef:
name: my-bmc
bootOrder:
- name: PXE
priority: 1
device: Network
BIOS:
- version: "1.0.3"
settings:
BootMode: UEFI
Virtualization: EnabledUsage
The Server CRD is central to managing bare metal servers. It allows for:
- Power Management: Powering servers on and off.
- BIOS Configuration: Changing BIOS settings and performing BIOS updates.
- Lifecycle Management: Handling the server's lifecycle through various states.
- Hardware Discovery: Gathering hardware information via BMC and in-band agents.
Lifecycle and States
A server undergoes the following phases:
Initial: The server object is created; hardware details are not yet known.
Discovery:
- The
ServerReconcilerinteracts with the BMC to retrieve hardware details. - An initial boot is performed using a predefined ignition configuration.
- An agent called
metalproberuns on the server to collect additional data (e.g., network interfaces, disks). - The collected data is reported back to the
metal-operatorand added to theServerStatus.`
- The
Available: The server has completed discovery and is ready for use.
Reserved:
- A
ServerClaimresource is created to claim the server. - The server transitions to the
Reservedstate. - The server is allocated for a specific use or user.
- A
Released:
- Only entered when
spec.reclaimPolicyisRetainand theServerClaimhas been deleted. - The server is powered off and its
BootConfigurationRefis cleared, butspec.serverClaimRefis kept. - The server stays in
Releaseduntil an operator manually clearsspec.serverClaimRef, at which point it transitions back toAvailable. - See Reclaim Policy below.
- Only entered when
Maintenance:
- Servers in the
Availablestate can transition toMaintenance. - Maintenance tasks such as BIOS updates or hardware repairs are performed.
- Servers in the
Error:
- The server has encountered an error.
- Requires intervention to resolve issues before it can return to
Available.
The state diagram below represents the various server states and their transitions:
Reclaim Policy
The spec.reclaimPolicy field controls what happens to a Server when its bound ServerClaim is deleted. Two values are supported, with Recycle as the default:
| Value | Behavior |
|---|---|
Recycle | When the claim is gone, the server is powered off, its BootConfigurationRef is cleared, spec.serverClaimRef is removed, and the server transitions directly back to Available so that it can be claimed again. |
Retain | When the claim is gone, the server is powered off and its BootConfigurationRef is cleared, but spec.serverClaimRef is not removed. The server transitions to the Released state and remains there until an operator manually clears spec.serverClaimRef. Once cleared, the server transitions back to Available. |
Retain is useful when human inspection is required between uses: for example, to forensically investigate a workload, audit disks, or run an out-of-band sanitization step before the server re-enters the pool. Recycle is the right choice for general-purpose pools where servers should be returned to Available automatically.
Example using Retain:
apiVersion: metal.ironcore.dev/v1alpha1
kind: Server
metadata:
name: my-server
spec:
systemUUID: "123e4567-e89b-12d3-a456-426614174000"
reclaimPolicy: Retain
bmcRef:
name: my-bmcTo return a Released server to the pool, remove the stale claim reference:
kubectl patch server my-server --type=merge -p '{"spec":{"serverClaimRef":null}}'Interaction with BMC
Interaction with a server is done through its BMC:
Via Reference: Reference a BMC resource using bmcRef.
apiVersion: metal.ironcore.dev/v1alpha1
kind: Server
metadata:
name: server-with-bmc-ref
spec:
systemUUID: "123e4567-e89b-12d3-a456-426614174000"
power: "On"
bmcRef:
name: my-bmc
bootOrder:
- name: PXE
priority: 1
device: Network
BIOS:
- version: "1.0.3"
settings:
BootMode: UEFI
HyperThreading: EnabledInline Configuration: Use the bmc field to provide direct BMC access details.
apiVersion: v1alpha1
kind: BMC
metadata:
name: my-bmc
spec:
endpointRef:
name: my-bmc-endpoint
bmcSecretRef:
name: my-bmc-secret
protocol:
name: Redfish
port: 8000
consoleProtocol:
name: SSH
port: 22