Infrastructure Firmware Upgrade: Beyond CLI Commands and Change Windows
When a piece of infrastructure needs a firmware upgrade, most engineers and consultants typically think of two things:
- the set of CLI commands to enter on each individual device,
- the date of the maintenance window. A proper firmware upgrade is much more than that. Here is my framework for infrastructure firmware upgrades, which remains relevant whether we are dealing with CLI-managed devices or software-defined GUI-based controllers. I dissect it into the following phases.
Alignment phase#
- Define scope of work.
- Explain the intended preparatory work and align it with customer expectations.
- Define the responsibilities.
- Define the nature of the interaction with the customer: preferred and reasonable type of communication (calls, workshops), remote vs onsite, frequency.
Deliverables and Output#
- Scope of Work
- RACI Matrix
Discovery phase#
In this phase, the following data must be gathered:
- vendor and model of the underlying hardware: chassis, supervisor modules, I/O modules, etc. A simple listing of the hardware is enough for this phase, because the inspection phase might reveal the necessity of collecting additional data on the hardware if some caveats are documented by the vendor.
- Software type and version: Not only the main piece of software that runs the system, but also any software version that runs sub-components. A good example for this are the EPLD components on Cisco Nexus switches which might need to be upgraded separately. The Discovery phase might generate insights that change the fate of the firmware upgrade project, such as when the target version constitutes a deviation from a recommended vendor version or has not been extensively tested by the vendor.
Required Input#
- vendor-specific CLI commands or GUI reports (e.g. ‘show tech-support’ on Cisco NX-OS and IOS-XE platforms).
Deliverables and Output#
- an inventory of the hardware components and their software versions.
Inspection Phase#
This phase focuses on reading project-related vendor documentation with the goal of looking for possible caveats like hardware incompatibilities, software incompatibilities, unsupported features, software behavior changes. Any serious caveats must be discussed with the customer for the purpose of taking an informed decision.
Required Input#
- official vendor documents: Release Notes, Upgrade Guides, etc.
Deliverables and Output#
- Risk Assessment: based on the deliverable from the Discovery phase, write down the caveats for each hardware component of the infrastructure to be upgraded.
Planning Phase#
In this phase, the Method of Procedure is built. Ideally, the Method of Procedure is presented to the vendor with a request for comments. The availability of a Support Engineer must be discussed and documented with the vendor in advance. This becomes important when the underlying hardware is old. The process of reserving an engineer might vary from vendor to vendor. Rollback scenarios and feasibility must be discussed with the customer.
Deliverables and Output#
- MoP: Method of Procedure: an ordered list of both technical and operational steps necessary for a successful execution, verification (using pre- and post-checks) and troubleshooting of the upgrade job. It relies heavily on the vendor technical procedures while taking into consideration the customer infrastructure constraints.
- Pre-checks List
- Post-checks List
- Rollback procedure
The Execution Phase#
In this phase, the upgrade action itself is carried out. Coordination with the customer is mandatory: date and time of the action, possible paperwork expected (registration at the reception of the customer facilities). Rollback is initiated when one or more of the following triggers is activated:
- an operational incident, which prevents the further execution of the upgrade process, occurs: unexpected power outage, uncoordinated event requires the availability of the infrastructure (e.g. the Storage team initiated a replication and did not inform the customer)
- a hardware or software error occurs during the execution of the MoP steps. The customer must be informed about such situations and the rollback procedure is then initiated.
Required Input#
- the Method of Procedure document.
- the Pre-checks List
- the Post-checks List
- the Rollback procedure.
Deliverables and Output#
- Upgrade execution log, with timestamps of the important states of the system (e.g. pre-checks execution time, reboot time, stabilization time, post-checks execution time).
- Incident log, including symptoms and attempted troubleshooting measures.
- Communication log, which captures important exchange with third parties, the customer, etc. and especially any rollback decision.
Post-Execution Phase#
In this phase, the following points are discussed:
- encountered issues.
- Future maintenance windows for re-initiating the upgrade process.
- Deviations from the MoP.
Deliverables and Output#
- Root Cause Analysis.
- Lessons Learned for the upcoming upgrade cycle.
- Points of Improvements (PoI) for the current framework and/or any of its related documents (MoP, Pre-checks List, Post-checks List, etc.).
Closing Thought#
The purpose of firmware upgrades is not to deploy the latest recommended version of the software within the shortest change window possible; It is to ensure that the infrastructure software version can be taken from its current state to a desired state while minimizing risks, confining service disruption to only within the planned change window and providing a path to reverse it in case of major setbacks. The usefulness of this framework is not only measured by the amount of clarity it brings to the upgrade action, but also by its applicability in future migration cycles.