In January 2023, I published an article and shared Python code that allows monitoring of your systems to verify your vNIC backing devices are configured for the best possible redundancy and warn you of any situations that should be resolved.
I just published a new script that extends on that idea to
allow you to proactively change backing devices so you can free a VIOS for
maintenance (upgrades, etc.) While you
can certainly use the HMC Web based GUI to view and change individual vNIC
backing devices, it can be a time-consuming process if you have a lot of devices
to change. Of course, failover is
automatic if properly configured, so you could just shut down the VIOS and let
the failover handle the switching, but many people prefer a more planned and
controlled approach.
This script has two primary functions:
- Change all vNIC devices for a specified Power server so any active backing devices associated with a specified VIOS are changed to the highest priority (lowest numbered) operational alternative backing device that is NOT served by the specified VIOS. In other words, move all vNICs off a specified VIOS so that VIOS can be maintained.
- Change all vNIC devices for a specified Power server to set the auto priority failover flag to either 1 or 0. This is intended to make it easy to undo the previous usage. When you force a specific backing device, auto priority failover is automatically set to 0 to prevent the system from switching right back to the original backing device. Setting it back to 1 (on) after the maintenance is complete will put all the backing devices back to the preferred interfaces based on priority. I usually recommend setting auto priority failover to 0 (off) during normal operations to prevent flapping between interfaces in the case of intermittent failure, and this script can be used to do that as well. If you choose to do that, I strongly recommend regularly monitoring for non-operational interfaces using my previously published monitoring script https://blog.vios4i.com/2023/01/monitoring-vnic-on-power.html or another monitoring tool or process.
If you need more background on vNIC, please see my previous
article: Introduction to SR-IOV and vNIC for IBM i.
Getting the vnic-move.py script
You can find the open source script vnic-move.py in the
public repository at: https://github.com/IBM/blog-vios4i
Download it directly with: https://github.com/IBM/blog-vios4i/raw/main/src/vnic-move.py
This is a free open-source script released under Eclipse
Public License v2.0. Bug fixes and
improvements will be checked into the public Git repository when tested. If you want to monitor for changes, I suggest
creating a github account and watching the project as it is unlikely I’m going
to write an article about each change.
Setting up the vnic-move.py script
Unlike the vnic-check.py monitoring script, this script is
intended to be used interactively by a system administrator only when preparing
to perform maintenance on a VIOS. That
means, while it is possible to run this on an IBM i using the PASE environment,
it is much more likely that this will be run from an administrator’s
workstation. Given the current sad state
of world where Windows is the most widely used desktop operating system, that
poor System Administrator will probably be forced to use Windows rather than
something better (*cough* Linux *cough*).
If you are going to run this on AIX or Linux, install
Python3, then create keys and run the script as shown below.
If you are running Windows, you have a few options to run
Python3 including (easiest to hardest) as a native Windows Executable, in a
container using a container manager like Docker, with the Windows Subsystem for
Linux (WSL), or as a Linux Virtual machine.
This script runs commands on the HMC using the ssh command,
so you will also need to verify you have that command in the environment where
you will run it (but don’t despair if you don’t). The good news is that even Windows 10/11
generally has the ssh command. To find
out if this is true in your case, just open a command line and run “ssh”. If you get a usage message, you’ve got it. If it’s not there, Windows
Settings->Apps->Optional Features will usually let you install “OpenSSH
Client” unless your organization has other ideas.
Setting up the ssh keys and agent
In general, the process you will need to use is:
- Create an account on the HMC that you will use for this script. You can skip this step if you already have separate accounts for each system administrator, or if you are okay will running the command with the default hscroot account. Please note that the HMC account will need permissions to run the lshwres command to retrieve the vNIC information, and to run the chhwres command if you want to actually switch the backing devices.
- Generate a public key/private key on your workstation (or where you want to run the script). Usually, this is done with the ssh-keygen command, and usually is just a matter of running the commands and responding to the prompts. Mostly with the defaults. If you leave the passcode blank (not recommended), you will not need to do any of the ssh-agent stuff below.
- If you selected defaults, the ssh-keygen command will have created an id_rsa.pub file, and it will have showed you where it created it. You will need to add this public key to the authorized keys of the HMC account that you want it to use. The correct way to do that on the HMC is from the command line with mkauthkeys. The format is: mkauthkeys -a “[contents of public key]”. The easiest way to do this is probably to open the public key file with notepad and copy/paste it to the command. If you do the copy/paste thing, please note that the public key is one long string with no embedded lines, so pay attention to wrapping in your text editor. If you see “>” continuation lines when running the command on the HMC, you probably included a line break that shouldn’t be there.
- Test the public key access from the workstation with the command: “ssh [hmcaddress]” The first time you run this it will prompt you if you want to trust that host. You will need to respond yes so that the host key is added to your known_hosts file. It should prompt for your passcode, and when that is provided, it will give you access to HMC command line. The exit command will end the ssh session. Repeat a second time to verify that it skips the host verification prompt.
- Setup your ssh-agent
- If running Unix (Container, WSL, or VM), you’ll just run: “ssh-agent [shell]” where shell is usually bash. This will give you a shell that is a child of the ssh-agent, so you can proceed with the add keys option below.
- If running native Windows, there are a few more steps. First you’ll need to go into your services app -- “services.msc” will get you there from the search line. Find the service named “OpenSSH Authentication Agent” and make sure it is not Disabled. “Automatic (Delayed Start)” is a good choice as it will only open when needed. You only need to enable the service once. After that, just run “ssh-agent” from the command line to start it each time you need to use it.
- Authenticate your keys to the agent. No matter what method you use, this is done with the command “ssh-add [path to id_rsa file]” For Windows users, you’ll probably need to give the whole path to the key file, for Unix users, you can usually use: ~/.ssh/id_rsa. When it finds the file, it will prompt for the passcode you set. When you enter the passcode, it will store the unlocked key in the agent service memory. Be aware that when using an agent, realistically any process on the computer where it is running can access the unlocked keys. Save your risky web activities (you know what I mean) for times that you have not unlocked the keys.
- While the agent is running and the key is unlocked, the ssh command in the script will access the unlocked key from the agent and skip the prompt for a passphrase. This allows the script to run all the commands in batch mode as it needs to.
- When you are done with the unlocked keys, you can start over by running “ssh-agent -D” to remove all unlocked keys. This is especially important on a shared workstation, or one that you never log off.
If you have any problems getting the ssh keys and
authentication setup, google is a great resource. Secure shell has been around a long time, so
many people have tackled the process of setting up keys and using an agent, and
some have written good tutorials on how to do it.
Using the vnic-move.py script
Now that the ssh and authentication setup part is out of the way, here’s some examples of how to use the script:
python3 vnic-move.py --hmc myhmcuser@myhmc --vios=vios1 --system myp10system –verify
We would use this if we are planning on taking vios1 down for maintenance. This one will check all the vNICs for system mkp10system managed by myhmc and generate the commands to change the backing devices for any vnic that currently have a backing device served by vios1. The –verify option makes it check the configuration and print the commands without executing them. You can then manually run the commands, or run the next one to run them all automatically.
python3 vnic-move.py --hmc myhmcuser@myhmc --vios=vios1 --system myp10system
This one will do what the previous one did, plus it will run the commands to make the vnic device changes on the HMC.
python3 vnic-move.py --hmc myhmcuser@myhmc --vios=vios2 --system myp10system
Suppose now we have run the previous command above and shutdown vios1, here is the command to do vios2., but what if vios1 is not finished starting up when we tell it to do vios2? If there are three valid backing devices via three or more vios, it will happily switch to the alternate vios and continue. If not, it is going to print an error telling you there is no operational alternate backing device and stop without running any commands. This error could also display if you happen to have one test system that only has one backing device. If you have reviewed the error messages and know that you don’t care if they lose connectivity, perhaps you know that vnic is not critical if it loses connectivity or it is powered off, you can skip all errors with the --force option:
python3 vnic-move.py --hmc myhmcuser@myhmc --vios=vios2 --system myp10system --force
If you force changes and it breaks something, that’s on you. To be clear, you really should look at all the commands generated and make sure you are comfortable with running them in any case, because any code can have errors, and there are no warranties or support contracts for this free open-source script.
python3 vnic-move.py --hmc myhmcuser@myhmc --system myp10system –autofailover=1
Suppose now you’ve finished all of your maintenance and all VIOS are back online, so now you want to reset auto-priority-failover back on so everything is running where it should. The above command will do that.
All of those are great if you are not working in a highly restrictive security environment, but maybe your employer won’t allow you to install Python on the workstations with ssh access to the HMC, or they have a blanket policy against using ssh keys (I’m not going to judge). There is still an option to use this script to make your life easier. Starting with the first example, pending maintenance on vios1:
python3 vnic-move.py –offline --vios=vios1
That will print the command you need to run on hmc myhmc:
Collect data from HMC with the following command and store in a file:
lshwres -m myp10system -r virtualio --rsubtype vnic --header
-F lpar_name%lpar_id%slot_num%auto_priority_failover%backing_devices%backing_device_states
You can copy/paste or otherwise transfer the command to the
system that can run ssh and then copy/paste the output of the command to a file
on the local workstation where you are running the script.
Now you can process that file with the following:
python3 vnic-move.py –file=/path/to/file --vios=vios1 --system myp10system
That will check the input and print the commands needed to
change the vNIC backing devices.
Copy/paste or otherwise transfer and run those commands and you will be done
with that step.
If you use offline mode like this, make sure you don’t allow
too much time between collecting the command output and processing it or you
might generate commands that are no longer correct for the CURRENT state of the
vNIC devices.
Need help?
If you need help implementing best practices for your vNICs,
the IBM i Technology Expert Labs team (formerly known as Lab Services) is
available to help with implementation planning, execution, and knowledge
transfer. See https://www.ibm.com/services/infrastructure
for contact information or speak to your IBM Sales Representative or Business
Partner. If you are planning a new
hardware purchase, you can include implementation services by the Technology Expert
Labs team in your purchase.
Disclaimer
I am an employee of IBM on the IBM i Technology Expert Labs team
(formerly known as Lab Services). The opinions
in this post are my own and don't necessarily represent IBM's positions,
strategies, or opinions.
References
Previous Blog post on vNIC and SR-IOV
https://blog.vios4i.com/2022/11/sriov-and-vnic.html
Previous Blog post with a vNIC monitoring script
https://blog.vios4i.com/2023/01/monitoring-vnic-on-power.html
Github public repository for this Blog
https://github.com/IBM/blog-vios4i
Microsoft Article on using Public keys with Windows
https://learn.microsoft.com/en-us/windows-server/administration/openssh/openssh_keymanagement
IBM Support page on setting up SSH keys for the HMC