Proactive vNIC Changes for VIOS Maintenance

In January 2023, I published an article and shared Python code that allows monitoring of your systems to verify your vNIC backing devices are configured for the best possible redundancy and warn you of any situations that should be resolved.

I just published a new script that extends on that idea to allow you to proactively change backing devices so you can free a VIOS for maintenance (upgrades, etc.)  While you can certainly use the HMC Web based GUI to view and change individual vNIC backing devices, it can be a time-consuming process if you have a lot of devices to change.   Of course, failover is automatic if properly configured, so you could just shut down the VIOS and let the failover handle the switching, but many people prefer a more planned and controlled approach.

This script has two primary functions:

  • Change all vNIC devices for a specified Power server so any active backing devices associated with a specified VIOS are changed to the highest priority (lowest numbered) operational alternative backing device that is NOT served by the specified VIOS.  In other words, move all vNICs off a specified VIOS so that VIOS can be maintained.
  • Change all vNIC devices for a specified Power server to set the auto priority failover flag to either 1 or 0.  This is intended to make it easy to undo the previous usage.  When you force a specific backing device, auto priority failover is automatically set to 0 to prevent the system from switching right back to the original backing device.  Setting it back to 1 (on) after the maintenance is complete will put all the backing devices back to the preferred interfaces based on priority.   I usually recommend setting auto priority failover to 0 (off) during normal operations to prevent flapping between interfaces in the case of intermittent failure, and this script can be used to do that as well.  If you choose to do that, I strongly recommend regularly monitoring for non-operational interfaces using my previously published monitoring script https://blog.vios4i.com/2023/01/monitoring-vnic-on-power.html or another monitoring tool or process.

If you need more background on vNIC, please see my previous article: Introduction to SR-IOV and vNIC for IBM i.

 

Getting the vnic-move.py script

You can find the open source script vnic-move.py in the public repository at: https://github.com/IBM/blog-vios4i

Download it directly with: https://github.com/IBM/blog-vios4i/raw/main/src/vnic-move.py

This is a free open-source script released under Eclipse Public License v2.0.   Bug fixes and improvements will be checked into the public Git repository when tested.  If you want to monitor for changes, I suggest creating a github account and watching the project as it is unlikely I’m going to write an article about each change.

 

Setting up the vnic-move.py script

Unlike the vnic-check.py monitoring script, this script is intended to be used interactively by a system administrator only when preparing to perform maintenance on a VIOS.  That means, while it is possible to run this on an IBM i using the PASE environment, it is much more likely that this will be run from an administrator’s workstation.  Given the current sad state of world where Windows is the most widely used desktop operating system, that poor System Administrator will probably be forced to use Windows rather than something better (*cough* Linux *cough*).

If you are going to run this on AIX or Linux, install Python3, then create keys and run the script as shown below.

If you are running Windows, you have a few options to run Python3 including (easiest to hardest) as a native Windows Executable, in a container using a container manager like Docker, with the Windows Subsystem for Linux (WSL), or as a Linux Virtual machine. 

This script runs commands on the HMC using the ssh command, so you will also need to verify you have that command in the environment where you will run it (but don’t despair if you don’t).  The good news is that even Windows 10/11 generally has the ssh command.  To find out if this is true in your case, just open a command line and run “ssh”.  If you get a usage message, you’ve got it.  If it’s not there, Windows Settings->Apps->Optional Features will usually let you install “OpenSSH Client” unless your organization has other ideas.

Setting up the ssh keys and agent

 This script runs remote HMC commands via a batch mode SSH command, so you will need to configure an SSH key to avoid a password prompt.  This key can either have an empty passcode or a secure passcode using an ssh-agent.  To be clear, I would never recommend using an empty passcode ssh key for a user account that can make changes to your environment, so the choice I recommend is using an ssh-agent to manage access with a passcode.

In general, the process you will need to use is:

  • Create an account on the HMC that you will use for this script.  You can skip this step if you already have separate accounts for each system administrator, or if you are okay will running the command with the default hscroot account.  Please note that the HMC account will need permissions to run the lshwres command to retrieve the vNIC information, and to run the chhwres command if you want to actually switch the backing devices.
  • Generate a public key/private key on your workstation (or where you want to run the script).  Usually, this is done with the ssh-keygen command, and usually is just a matter of running the commands and responding to the prompts. Mostly with the defaults.  If you leave the passcode blank (not recommended), you will not need to do any of the ssh-agent stuff below.
  • If you selected defaults, the ssh-keygen command will have created an id_rsa.pub file, and it will have showed you where it created it.  You will need to add this public key to the authorized keys of the HMC account that you want it to use.  The correct way to do that on the HMC is from the command line with mkauthkeys.  The format is: mkauthkeys -a “[contents of public key]”.  The easiest way to do this is probably to open the public key file with notepad and copy/paste it to the command.  If you do the copy/paste thing, please note that the public key is one long string with no embedded lines, so pay attention to wrapping in your text editor.  If you see “>” continuation lines when running the command on the HMC, you probably included a line break that shouldn’t be there.
  • Test the public key access from the workstation with the command: “ssh [hmcaddress]”  The first time you run this it will prompt you if you want to trust that host.  You will need to respond yes so that the host key is added to your known_hosts file.  It should prompt for your passcode, and when that is provided, it will give you access to HMC command line.  The exit command will end the ssh session.  Repeat a second time to verify that it skips the host verification prompt.
  • Setup your ssh-agent
    • If running Unix (Container, WSL, or VM), you’ll just run: “ssh-agent [shell]” where shell is usually bash.  This will give you a shell that is a child of the ssh-agent, so you can proceed with the add keys option below.
    • If running native Windows, there are a few more steps.  First you’ll need to go into your services app -- “services.msc” will get you there from the search line.  Find the service named “OpenSSH Authentication Agent” and make sure it is not Disabled.   “Automatic (Delayed Start)” is a good choice as it will only open when needed.  You only need to enable the service once.  After that, just run “ssh-agent” from the command line to start it each time you need to use it.
  • Authenticate your keys to the agent.  No matter what method you use, this is done with the command “ssh-add [path to id_rsa file]”  For Windows users, you’ll probably need to give the whole path to the key file, for Unix users, you can usually use: ~/.ssh/id_rsa.  When it finds the file, it will prompt for the passcode you set.  When you enter the passcode, it will store the unlocked key in the agent service memory.  Be aware that when using an agent, realistically any process on the computer where it is running can access the unlocked keys.  Save your risky web activities (you know what I mean) for times that you have not unlocked the keys.
  • While the agent is running and the key is unlocked, the ssh command in the script will access the unlocked key from the agent and skip the prompt for a passphrase.  This allows the script to run all the commands in batch mode as it needs to.
  • When you are done with the unlocked keys, you can start over by running “ssh-agent -D” to remove all unlocked keys.  This is especially important on a shared workstation, or one that you never log off.

If you have any problems getting the ssh keys and authentication setup, google is a great resource.  Secure shell has been around a long time, so many people have tackled the process of setting up keys and using an agent, and some have written good tutorials on how to do it.

 

Using the vnic-move.py script

Now that the ssh and authentication setup part is out of the way, here’s some examples of how to use the script:

python3 vnic-move.py --hmc myhmcuser@myhmc --vios=vios1 --system myp10system –verify 

We would use this if we are planning on taking vios1 down for maintenance.  This one will check all the vNICs for system mkp10system managed by myhmc and generate the commands to change the backing devices for any vnic that currently have a backing device served by vios1.  The –verify option makes it check the configuration and print the commands without executing them.  You can then manually run the commands, or run the next one to run them all automatically.

python3 vnic-move.py --hmc myhmcuser@myhmc --vios=vios1 --system myp10system

This one will do what the previous one did, plus it will run the commands to make the vnic device changes on the HMC.

python3 vnic-move.py --hmc myhmcuser@myhmc --vios=vios2 --system myp10system

Suppose now we have run the previous command above and shutdown vios1, here is the command to do vios2., but what if vios1 is not finished starting up when we tell it to do vios2?  If there are three valid backing devices via three or more vios, it will happily switch to the alternate vios and continue.  If not, it is going to print an error telling you there is no operational alternate backing device and stop without running any commands.  This error could also display if you happen to have one test system that only has one backing device.  If you have reviewed the error messages and know that you don’t care if they lose connectivity, perhaps you know that vnic is not critical if it loses connectivity or it is powered off,  you can skip all errors with the --force option:

python3 vnic-move.py --hmc myhmcuser@myhmc --vios=vios2 --system myp10system --force

If you force changes and it breaks something, that’s on you.  To be clear, you really should look at all the commands generated and make sure you are comfortable with running them in any case, because any code can have errors, and there are no warranties or support contracts for this free open-source script.

python3 vnic-move.py --hmc myhmcuser@myhmc --system myp10system –autofailover=1

Suppose now you’ve finished all of your maintenance and all VIOS are back online, so now you want to reset auto-priority-failover back on so everything is running where it should.  The above command will do that.


All of those are great if you are not working in a highly restrictive security environment, but maybe your employer won’t allow you to install Python on the workstations with ssh access to the HMC, or they have a blanket policy against using ssh keys (I’m not going to judge).  There is still an option to use this script to make your life easier.  Starting with the first example, pending maintenance on vios1:

python3 vnic-move.py –offline --vios=vios1

That will print the command you need to run on hmc myhmc:

Collect data from HMC with the following command and store in a file:

lshwres -m myp10system -r virtualio --rsubtype vnic --header -F lpar_name%lpar_id%slot_num%auto_priority_failover%backing_devices%backing_device_states

 

You can copy/paste or otherwise transfer the command to the system that can run ssh and then copy/paste the output of the command to a file on the local workstation where you are running the script.

Now you can process that file with the following:

python3 vnic-move.py –file=/path/to/file --vios=vios1 --system myp10system

That will check the input and print the commands needed to change the vNIC backing devices.  Copy/paste or otherwise transfer and run those commands and you will be done with that step.

If you use offline mode like this, make sure you don’t allow too much time between collecting the command output and processing it or you might generate commands that are no longer correct for the CURRENT state of the vNIC devices.

 

Need help?

If you need help implementing best practices for your vNICs, the IBM i Technology Expert Labs team (formerly known as Lab Services) is available to help with implementation planning, execution, and knowledge transfer.  See https://www.ibm.com/services/infrastructure for contact information or speak to your IBM Sales Representative or Business Partner.  If you are planning a new hardware purchase, you can include implementation services by the Technology Expert Labs team in your purchase.

Disclaimer

I am an employee of IBM on the IBM i Technology Expert Labs team (formerly known as Lab Services).  The opinions in this post are my own and don't necessarily represent IBM's positions, strategies, or opinions.

 

References

 

Previous Blog post on vNIC and SR-IOV

https://blog.vios4i.com/2022/11/sriov-and-vnic.html


Previous Blog post with a vNIC monitoring script

https://blog.vios4i.com/2023/01/monitoring-vnic-on-power.html


Github public repository for this Blog

https://github.com/IBM/blog-vios4i

 

Microsoft Article on using Public keys with Windows

https://learn.microsoft.com/en-us/windows-server/administration/openssh/openssh_keymanagement

 

IBM Support page on setting up SSH keys for the HMC

https://www.ibm.com/support/pages/setting-ssh-run-commands-hardware-management-console-without-being-prompted-password

 

No comments:

Post a Comment

Proactive vNIC Changes for VIOS Maintenance

In January 2023, I published an article and shared Python code that allows monitoring of your systems to verify your vNIC backing devices ar...