Proactive vNIC Changes for VIOS Maintenance

In January 2023, I published an article and shared Python code that allows monitoring of your systems to verify your vNIC backing devices are configured for the best possible redundancy and warn you of any situations that should be resolved.

I just published a new script that extends on that idea to allow you to proactively change backing devices so you can free a VIOS for maintenance (upgrades, etc.)  While you can certainly use the HMC Web based GUI to view and change individual vNIC backing devices, it can be a time-consuming process if you have a lot of devices to change.   Of course, failover is automatic if properly configured, so you could just shut down the VIOS and let the failover handle the switching, but many people prefer a more planned and controlled approach.

This script has two primary functions:

  • Change all vNIC devices for a specified Power server so any active backing devices associated with a specified VIOS are changed to the highest priority (lowest numbered) operational alternative backing device that is NOT served by the specified VIOS.  In other words, move all vNICs off a specified VIOS so that VIOS can be maintained.
  • Change all vNIC devices for a specified Power server to set the auto priority failover flag to either 1 or 0.  This is intended to make it easy to undo the previous usage.  When you force a specific backing device, auto priority failover is automatically set to 0 to prevent the system from switching right back to the original backing device.  Setting it back to 1 (on) after the maintenance is complete will put all the backing devices back to the preferred interfaces based on priority.   I usually recommend setting auto priority failover to 0 (off) during normal operations to prevent flapping between interfaces in the case of intermittent failure, and this script can be used to do that as well.  If you choose to do that, I strongly recommend regularly monitoring for non-operational interfaces using my previously published monitoring script https://blog.vios4i.com/2023/01/monitoring-vnic-on-power.html or another monitoring tool or process.

If you need more background on vNIC, please see my previous article: Introduction to SR-IOV and vNIC for IBM i.

 

Getting the vnic-move.py script

You can find the open source script vnic-move.py in the public repository at: https://github.com/IBM/blog-vios4i

Download it directly with: https://github.com/IBM/blog-vios4i/raw/main/src/vnic-move.py

This is a free open-source script released under Eclipse Public License v2.0.   Bug fixes and improvements will be checked into the public Git repository when tested.  If you want to monitor for changes, I suggest creating a github account and watching the project as it is unlikely I’m going to write an article about each change.

 

Setting up the vnic-move.py script

Unlike the vnic-check.py monitoring script, this script is intended to be used interactively by a system administrator only when preparing to perform maintenance on a VIOS.  That means, while it is possible to run this on an IBM i using the PASE environment, it is much more likely that this will be run from an administrator’s workstation.  Given the current sad state of world where Windows is the most widely used desktop operating system, that poor System Administrator will probably be forced to use Windows rather than something better (*cough* Linux *cough*).

If you are going to run this on AIX or Linux, install Python3, then create keys and run the script as shown below.

If you are running Windows, you have a few options to run Python3 including (easiest to hardest) as a native Windows Executable, in a container using a container manager like Docker, with the Windows Subsystem for Linux (WSL), or as a Linux Virtual machine. 

This script runs commands on the HMC using the ssh command, so you will also need to verify you have that command in the environment where you will run it (but don’t despair if you don’t).  The good news is that even Windows 10/11 generally has the ssh command.  To find out if this is true in your case, just open a command line and run “ssh”.  If you get a usage message, you’ve got it.  If it’s not there, Windows Settings->Apps->Optional Features will usually let you install “OpenSSH Client” unless your organization has other ideas.

Setting up the ssh keys and agent

 This script runs remote HMC commands via a batch mode SSH command, so you will need to configure an SSH key to avoid a password prompt.  This key can either have an empty passcode or a secure passcode using an ssh-agent.  To be clear, I would never recommend using an empty passcode ssh key for a user account that can make changes to your environment, so the choice I recommend is using an ssh-agent to manage access with a passcode.

In general, the process you will need to use is:

  • Create an account on the HMC that you will use for this script.  You can skip this step if you already have separate accounts for each system administrator, or if you are okay will running the command with the default hscroot account.  Please note that the HMC account will need permissions to run the lshwres command to retrieve the vNIC information, and to run the chhwres command if you want to actually switch the backing devices.
  • Generate a public key/private key on your workstation (or where you want to run the script).  Usually, this is done with the ssh-keygen command, and usually is just a matter of running the commands and responding to the prompts. Mostly with the defaults.  If you leave the passcode blank (not recommended), you will not need to do any of the ssh-agent stuff below.
  • If you selected defaults, the ssh-keygen command will have created an id_rsa.pub file, and it will have showed you where it created it.  You will need to add this public key to the authorized keys of the HMC account that you want it to use.  The correct way to do that on the HMC is from the command line with mkauthkeys.  The format is: mkauthkeys -a “[contents of public key]”.  The easiest way to do this is probably to open the public key file with notepad and copy/paste it to the command.  If you do the copy/paste thing, please note that the public key is one long string with no embedded lines, so pay attention to wrapping in your text editor.  If you see “>” continuation lines when running the command on the HMC, you probably included a line break that shouldn’t be there.
  • Test the public key access from the workstation with the command: “ssh [hmcaddress]”  The first time you run this it will prompt you if you want to trust that host.  You will need to respond yes so that the host key is added to your known_hosts file.  It should prompt for your passcode, and when that is provided, it will give you access to HMC command line.  The exit command will end the ssh session.  Repeat a second time to verify that it skips the host verification prompt.
  • Setup your ssh-agent
    • If running Unix (Container, WSL, or VM), you’ll just run: “ssh-agent [shell]” where shell is usually bash.  This will give you a shell that is a child of the ssh-agent, so you can proceed with the add keys option below.
    • If running native Windows, there are a few more steps.  First you’ll need to go into your services app -- “services.msc” will get you there from the search line.  Find the service named “OpenSSH Authentication Agent” and make sure it is not Disabled.   “Automatic (Delayed Start)” is a good choice as it will only open when needed.  You only need to enable the service once.  After that, just run “ssh-agent” from the command line to start it each time you need to use it.
  • Authenticate your keys to the agent.  No matter what method you use, this is done with the command “ssh-add [path to id_rsa file]”  For Windows users, you’ll probably need to give the whole path to the key file, for Unix users, you can usually use: ~/.ssh/id_rsa.  When it finds the file, it will prompt for the passcode you set.  When you enter the passcode, it will store the unlocked key in the agent service memory.  Be aware that when using an agent, realistically any process on the computer where it is running can access the unlocked keys.  Save your risky web activities (you know what I mean) for times that you have not unlocked the keys.
  • While the agent is running and the key is unlocked, the ssh command in the script will access the unlocked key from the agent and skip the prompt for a passphrase.  This allows the script to run all the commands in batch mode as it needs to.
  • When you are done with the unlocked keys, you can start over by running “ssh-agent -D” to remove all unlocked keys.  This is especially important on a shared workstation, or one that you never log off.

If you have any problems getting the ssh keys and authentication setup, google is a great resource.  Secure shell has been around a long time, so many people have tackled the process of setting up keys and using an agent, and some have written good tutorials on how to do it.

 

Using the vnic-move.py script

Now that the ssh and authentication setup part is out of the way, here’s some examples of how to use the script:

python3 vnic-move.py --hmc myhmcuser@myhmc --vios=vios1 --system myp10system –verify 

We would use this if we are planning on taking vios1 down for maintenance.  This one will check all the vNICs for system mkp10system managed by myhmc and generate the commands to change the backing devices for any vnic that currently have a backing device served by vios1.  The –verify option makes it check the configuration and print the commands without executing them.  You can then manually run the commands, or run the next one to run them all automatically.

python3 vnic-move.py --hmc myhmcuser@myhmc --vios=vios1 --system myp10system

This one will do what the previous one did, plus it will run the commands to make the vnic device changes on the HMC.

python3 vnic-move.py --hmc myhmcuser@myhmc --vios=vios2 --system myp10system

Suppose now we have run the previous command above and shutdown vios1, here is the command to do vios2., but what if vios1 is not finished starting up when we tell it to do vios2?  If there are three valid backing devices via three or more vios, it will happily switch to the alternate vios and continue.  If not, it is going to print an error telling you there is no operational alternate backing device and stop without running any commands.  This error could also display if you happen to have one test system that only has one backing device.  If you have reviewed the error messages and know that you don’t care if they lose connectivity, perhaps you know that vnic is not critical if it loses connectivity or it is powered off,  you can skip all errors with the --force option:

python3 vnic-move.py --hmc myhmcuser@myhmc --vios=vios2 --system myp10system --force

If you force changes and it breaks something, that’s on you.  To be clear, you really should look at all the commands generated and make sure you are comfortable with running them in any case, because any code can have errors, and there are no warranties or support contracts for this free open-source script.

python3 vnic-move.py --hmc myhmcuser@myhmc --system myp10system –autofailover=1

Suppose now you’ve finished all of your maintenance and all VIOS are back online, so now you want to reset auto-priority-failover back on so everything is running where it should.  The above command will do that.


All of those are great if you are not working in a highly restrictive security environment, but maybe your employer won’t allow you to install Python on the workstations with ssh access to the HMC, or they have a blanket policy against using ssh keys (I’m not going to judge).  There is still an option to use this script to make your life easier.  Starting with the first example, pending maintenance on vios1:

python3 vnic-move.py –offline --vios=vios1

That will print the command you need to run on hmc myhmc:

Collect data from HMC with the following command and store in a file:

lshwres -m myp10system -r virtualio --rsubtype vnic --header -F lpar_name%lpar_id%slot_num%auto_priority_failover%backing_devices%backing_device_states

 

You can copy/paste or otherwise transfer the command to the system that can run ssh and then copy/paste the output of the command to a file on the local workstation where you are running the script.

Now you can process that file with the following:

python3 vnic-move.py –file=/path/to/file --vios=vios1 --system myp10system

That will check the input and print the commands needed to change the vNIC backing devices.  Copy/paste or otherwise transfer and run those commands and you will be done with that step.

If you use offline mode like this, make sure you don’t allow too much time between collecting the command output and processing it or you might generate commands that are no longer correct for the CURRENT state of the vNIC devices.

 

Need help?

If you need help implementing best practices for your vNICs, the IBM i Technology Expert Labs team (formerly known as Lab Services) is available to help with implementation planning, execution, and knowledge transfer.  See https://www.ibm.com/services/infrastructure for contact information or speak to your IBM Sales Representative or Business Partner.  If you are planning a new hardware purchase, you can include implementation services by the Technology Expert Labs team in your purchase.

Disclaimer

I am an employee of IBM on the IBM i Technology Expert Labs team (formerly known as Lab Services).  The opinions in this post are my own and don't necessarily represent IBM's positions, strategies, or opinions.

 

References

 

Previous Blog post on vNIC and SR-IOV

https://blog.vios4i.com/2022/11/sriov-and-vnic.html


Previous Blog post with a vNIC monitoring script

https://blog.vios4i.com/2023/01/monitoring-vnic-on-power.html


Github public repository for this Blog

https://github.com/IBM/blog-vios4i

 

Microsoft Article on using Public keys with Windows

https://learn.microsoft.com/en-us/windows-server/administration/openssh/openssh_keymanagement

 

IBM Support page on setting up SSH keys for the HMC

https://www.ibm.com/support/pages/setting-ssh-run-commands-hardware-management-console-without-being-prompted-password

 

New tool to work with WWNN/WWPN

Really quick post here. 

To be honest, I'm doing this as much for my own use and to make life a bit easier for my customers as we work through setups. 

For a while now, I've been using a little local HTML/Javascript tool I made to add or remove colons and switch upper/lower case on WWPNs. Anyone that has done much work with multiple platforms where some require lowercase with colons and others want uppercase without colons knows the pain. 

There are plenty of these already available on the web, but in my quick search, I didn't find one to do exactly what I wanted, so I wrote this one to use offline. For example, I frequently copy a storage adapter WWPN from IBM i DSPHDWRSC or STRSST to Brocade switches or DS8000/FlashSystems storage platforms to create zoning or host groups. The WWPN you get from IBM i is all uppercase with no colons, while the Brocade switches want to get lowercase with colons. 

This tool will let you easily convert a WWN or list of WWNs to/from any combination of upper/lowercase or colons/no-colons. Bookmark the URL https://blog.vios4i.com/p/wwn-tool.html and use it whenever you need, or just come to the https:/blog.vios4i.com home page and use the permanent link on the right side of the screen, then stick around and read and comment on my latest blog post. 

If you'd also like to have a tool to use offline, this one does all of its work with HTML and embedded javascript.  I've made it available on my github repositoryYou can find the open source wwpncvt.html in the public repository at: https://github.com/IBM/blog-vios4i

Download directly with: https://github.com/IBM/blog-vios4i/raw/main/src/wwpncvt.html  Use Save As to get a local copy and run that without any internet connection.

Monitoring vNIC on Power

 

As I discussed in my previous post Introduction to SR-IOV and vNIC for IBM i, vNIC is a powerful feature built on top of SR-IOV and VIOS that virtualizes Ethernet connections to your IBM i, AIX, or Linux partitions with high performance and automatic failover to alternative ports (backing devices).

One problem with creating a vNIC with multiple backing devices in a failover configuration is you might not notice when it automatically switches to a backup device because a higher priority device fails.  If you don’t notice, you could end up running in a reduced redundancy situation because the problem that caused the switch never gets resolved. 

The solution for that problem is monitoring.  To that end, I’m publishing a Python script I wrote to monitor your vNIC configuration for several conditions that could indicate a problem.

Best practices

When creating a vNIC configuration, there are a few practices that will help ensure you get the best possible redundancy to protect your self from failures in various parts of your infrastructure.

  • Use multiple backing devices.  Each vNIC should have at least one backing device to failover to in case the primary device encounters a problem.  It is common to see configurations with 3 or 4 total backing devices.  As of this writing, up to 16 are supported.
  • Spread backing devices across points of failure.  Separate them to different VIOS, different adapters, and different network switches.
  • Spread active devices across all VIOS and physical adapters to statically load balance the work.  Don’t put all your active devices on one VIOS and all your backup devices on the other or you’ll be looking a big processor/memory spike when you take the primary VIOS down.  Likewise, it makes little sense to squeeze all your traffic through some of your adapters/ports and leave others idle.  Using all of your ports also makes it possible to detect port failures that are related to switching and routing that would otherwise go undetected until they are your only option when primary ports fail.
  • Assign each backing device a unique failover priority so the backup sequence is deterministic.  For vNIC, the lowest priority number is the highest priority, and it defaults to 50.  Typically, I would assign 50 to the desired active device, 60 to the first backup, 70 to the second backup, etc.  You can use any numbers you wish, but keep them unique for a given vNIC, and leave some space in the numbering to change it around if you need to.
  • Use the HMC GUI to activate backup ports if you need to move traffic proactively.  Select a specific LPAR from the HMC screen, then select Virtual NICs from the left-hand menu to display/edit the vNICs for a partition.  To switch to a different backing device, select the desired device and select Action-Make the backing device active.  You will notice that when you do this, the “Auto Priority Failover” setting will change to “Disabled”. That will prevent the vNIC from switching based on priority unless the active port fails.

Installing the Monitor code

The vnic-check.py script is written in Python.  It will run on any platform that has Python3.6 or above, including the IBM i.  As this is an IBM i centric blog, I include the instructions to install, configure and run it on IBM I using the PASE environment, but it will run an pretty much anywhere.

Prerequisites

You must have the PASE environment (5733-SC1 Base), and OpenSSH installed (5733-SC1 Opt 1).  

 

Installing Open Source packages and python3

See https://www.ibm.com/support/pages/getting-started-open-source-package-management-ibm-i-acs for details of this process.

Start the SSH server on the IBMi partition with the STRTCPSVR *SSHD command if it is not already started.

Using IBM Access Client Solutions (ACS),  Select a system for the LPAR where you wish to install the monitoring tool.  From the menu, select Tools->Open Source Package Management

On the “Connect to SSH” window that is displayed, enter a user ID and password with sufficient authority to install the open source packages (see link above for details)

If your IBMi partition does not have the ability to connect to the Internet, use the radio button under “Proxy Mode” to select “SSH Tunneling” This will allow the packages to be downloaded via your workstation.

If you get a message box that the Open Source Environment is not installed, click “Yes” to install it.

When the open-source environment install is complete, it will display a window with the list of installed packages.  If python3 is in that list, you are done.  If not, switch to the “available packages” tab, click “python3” and click Install.  If no available packages display, you may need to close the open source management window and reopen it.

Verify python3 is installed by opening a QShell session (QSH command) and running “/QOpensys/pkgs/bin/python3 -V”  It should show a Python version number of 3.6 or higher.  F3 to exit back to your command line.

 

Create an HMC Monitoring account

This script runs query commands on the HMC using a private key with no passphrase.  Since it is a very bad security idea to have that kind of access to your HMC hscroot account, you’ll want to create an account that can only run monitoring commands.  Seriously, DO NOT create private keys as described here using hscroot or any other HMC account that can make changes.  If you’re not going to use a restricted account, don’t use this script.

Connect to the HMC using a SSH client like Putty as user hscroot (or another user with the authority to create user accounts). 

Run the command:

mkhmcusr -i "name=monitor,taskrole=hmcviewer,description=For restricted monitoring scripts,pwage=99999,resourcerole=ALL:,authentication_type=local,remote_webui_access=0,remote_ssh_access=1,min_pwage=0,session_timeout=0,verify_timeout=15,idle_timeout=120,inactivity_expiration=0"

It will prompt for a password.  Assign it something secure.  This will create an account named “monitor” that can only be used by the SSH interface.  It will not be able to use the Web GUI, and it will be restricted in the commands it can run.

Repeat this account creation on each HMC that will be monitored.

 

Create a key to access the monitor account

You will be running the monitoring script from one of your IBMi partitions, with a specific user id that will have an SSH key to access the HMC using the monitor account you just created.

Pick the User-Id that will be running the command.  I’m not going to go into detail on creating this account since if you’re an IBMi administrator, you already know how to create accounts and create job schedule entries that use a specific account.  Of course, you can use an existing account for this as well.

The account you choose will need to have a home directory where you can create an ssh private key that you will authorize to connect to the HMC monitor account.

Start QShell (QSH) from the account you will use and run the following:

# on all of the following commands, replace 1.2.3.4 with the IP address of the HMC you want to monitor.  Repeat for each HMC if you are monitoring more than one.

mkdir -p $HOME # make sure there is a home directory

cd $HOME # change to the home directory

ssh-keygen

# press enter three times to accept the default file /home/MONITOR/.ssh/id_rsa and use an empty passphrase

ssh monitor@1.2.3.4 mkauthkeys -a \"`cat ~/.ssh/id_rsa.pub`\"

# answer ‘yes’ to the authenticity prompt

# Enter the HMC Monitor account password when prompted for Password:

# finally test the SSH key access with:

ssh monitor@1.2.3.4 lssyscfg -r sys -F name

# you should get a list of the system names managed by that HMC without any password prompting

Use F3 to leave the QShell prompt.

 

Now you’ll need to download and edit the top of the vnic-check.py file to set your parameters. 

You can find the open source script vnic-check.py in the public repository at: https://github.com/IBM/blog-vios4i

Download directly with: https://github.com/IBM/blog-vios4i/raw/main/src/vnic-check.py

 

smtphost: Set to the name (or address) of a SMTP relay in your organization where you can send mail.  On the IBMi, if the current partition is running a mail server locally, you can use 127.0.0.1 here.  Set this to None if you just want to get a printed result to the screen (or a spooled file in batch).  Using None is useful to ensure the command is working properly before setting up email.

sender:  If using email, this needs to be a valid email address that can send mail in your organization.

toaddrs:  This is a list of email addresses that should get messages when the check finds any conditions that need fixing.  You can use a comma separated list of addresses between the brackets where each address is enclosed in quotes.

hmcs: this should be a list of the SSH address of the monitor account on your HMC in the format monitor@ipaddress.  You can also use a DNS name instead of the ip address if DNS is properly configured for your PASE environment (verify by using the host command in a PASE shell).  The entire list should be surrounded by [] characters, and each hmc address should be surrounded by single quote characters and separated by commas.  It is okay to only have one hmc in the list.  You will need to do the same key setup described above on each HMC if you use more than one.

minopercount: this should be the lowest number of backing devices that is acceptable in your environment.  Any vNIC with less than this number of operational devices will be reported as a problem.

When you have set your parameters, transfer the script to the home directory of the user that will be running the command.

Finally, make sure it works by opening QShell (QSH command) and running the script:

/QOpensys/pkgs/bin/python3  vnic-check.py

If all goes well, you’ll get no email or output (indicating all of the vNICs found are without problems), or a list of the problems found. If you get no output and want to make sure it is finding your vNICs,  Change the minopercount variable to a high number (999) and rerun to report all of your vNICs are lower than the desired count.

When you have verified all is well, reset the variables as needed and schedule a job to run:

QSH CMD('/QOpensys/pkgs/bin/python3  vnic-check.py')

as the selected user on the desired schedule.

  

Need help?

If you need help implementing best practices for your vNICs, the IBM i Technology Services team (formerly known as Lab Services) is available to help with implementation planning, execution, and knowledge transfer.  See https://www.ibm.com/it-infrastructure/services/lab-services for contact information or speak to your IBM Sales Representative or Business Partner.  If you are planning a new hardware purchase, you can include implementation services by the Technology Services team in your purchase.

Disclaimer

I am an employee of IBM on the IBM i Technology Services team (formerly known as Lab Services).  The opinions in this post are my own and don't necessarily represent IBM's positions, strategies, or opinions.

 

References

 

Getting started with Open Source Package Management in IBM i ACS

https://www.ibm.com/support/pages/getting-started-open-source-package-management-ibm-i-acs

 

IBM i ACS Open Source Package Management Auth Fail Error

https://www.ibm.com/support/pages/node/1167988

 

Please Stop Changing Partition Profiles

 


The Enhanced HMC interface is here to stay.  If you are still changing partition profiles on your Power HMC, you really need to start using the new functionality instead, or you risk getting out of sync and losing changes.  It is painful to create a bunch of new virtual fiber channel adapters, and then have them magically disappear with your next reboot.  It’s even worse when you reboot a VIOS and choose an out of date partition profile and suddenly some of your client disks go away.  Ask me how I know.

I normally try to write articles focused on IBM i, but in this case, there really isn’t any difference between IBM i, AIX, and Linux.  All partitions (especially VIOS) should follow the same rules.

 

First a bit of history

IBM made the Enhanced HMC interface available as an option with version 8.1.0.1.  If you were an administrator like me, you just looked at it once or twice, figured it didn’t make any sense compared to what you were used to, and just selected “Classic” from the menu when you logged in. 

Version V8R8.7.0 officially eliminated the Classic interface, but some enterprising users found and published a backdoor approach to access the classic interface even at that level (see Bart’s Blog - Enable classic HMC GUI on release V9R1 – referenced below) That unofficial approach was then shut down for good in May of 2020.

Why?  Because IBM is focusing development on a single easy to use interface that leverages DLPAR operations for all the new features like vNIC (see my previous blog post if that’s new to you).

 

Making the Move

First and foremost, make sure that your partition profiles are in sync with the running profile.

There is an excellent blog post in the IBM Community that explains this in much more detail.

If you are using VIOS, DON’T FORGET THE VIOS!  There is far more risk of lost configuration on VIOS than any other partition, because when you are using the Enhanced GUI, you are often making dynamic changes to VIOS you may not even be aware of.

The gist of it is that you should be running with the “Save configuration changes to profile” setting at “Enabled”.  If it is not currently set to enabled, you need to get it set that way.

If the setting is currently “disabled”, start by saving your current configuration to the default partition profile.  Select the partition view for the desired partition from the GUI, select Partition Actions->Profiles->Save Current Configuration and select the default profile name.  Most users only have one profile per partition.  If you are one of the few that has more than one, pick a name for the profile that you will use from now on.  The default name used for newly created partitions is “default_profile”, so that is pretty good choice for a name.   Save the configuration with the desired name.  If you created a new name, go into “Manage Profiles” for your last time and change it your newly saved profile as the default.  Now is also a good time to delete all those profiles you will not be using any more.

Now you can change the “Save configuration changes to profile” setting to “Enabled”.

 

Doing it the Enhanced way

Once you have this setting enabled, just stay away from “Manage Profiles” and make all of your changes using the Enhanced GUI dynamic menu operations available from the left-hand menu of the partition view. 

When you need to activate a partition that you previously shutdown, make sure you use the “Current Configuration” option rather than picking a partition profile.

The biggest difference between changing partition profiles and restarting with a different profile is that in the Enhanced GUI, it will make the changes dynamically on a running partition.  It will also make the corresponding changes on the VIOS, if necessary.  The days of keeping track of virtual port numbers can be gone, if you let them.

You’ll find that when you Google the procedure to do anything on the HMC, you will often find articles and screen shots that point you to modify the profile.  If at any point, one of these articles suggests using the Manage Profiles option or tells you to select a specific profile when activating a partition, keep looking for a new procedure.  You can often get good basic information from these articles, but the specific procedures are likely to get you into trouble.

Enhanced HMC changes are typically dynamic on a running partition.  This requires communication between the HMC and the running partition, which you will typically see referred to as an RMC connection.  One difference for the IBM i world is that IBM i uses a LIC connection rather than the RMC connections that are used by AIX and Linux.  This all means that you won’t see an RMC active flag on an IBM i partition.  I mention this for two reasons.  First, much of the documentation you will run into will mention the need for an active RMC connection for various procedures.  That is not true for IBM i.  Second, the O/S on an IBM i does need to be operating to make some dynamic changes.  The error message you’ll get while attempting to make some changes on an activated IBM i partition with refer to RMC, but it really means its not booted to a DLpar capable state. 

You may notice that there are things you cannot change using the Enhanced interface while the partition is active.  Some examples are max processor, max memory, max virtual adapters, and processor compatibility mode.  All these options require a shutdown and restart.  You will be permitted to make the changes while the partition is shutdown.

Why is it so slow? (Spoiler - it's not)

You might not believe me here, but it isn’t slow.  It just feels that way because it is doing everything dynamically right now when you are used to delaying all that processing to partition activation.

Making changes to profiles is blazing fast because they are not actually changing any real resources, but you will pay the price during activation of that profile.  On the contrary, when you make a change to a running partition with a dynamic HMC change, all that processing that happens in the hypervisor and O/S to add that resource will happen immediately -- while you wait.  That’s right, while you wait means, well... you will be waiting.

I’ve actually done some benchmarks on new system setups to compare dynamic operations with HMC commands (chhwres - equivalent to the Enhanced HMC GUI)  to HMC profile change commands (chsyscfg commands) that get applied via the “chsyscfg -o apply” command.  The chhwres commands on either a running or inactive partition, tend to be slow to operate, while the equivalent profile changes are very fast until they are either applied via apply command or profile activation.  In the end, it comes down to when you are going to wait.  You can wait now, or you can wait later, but you are always going to wait for the actual resource creation in the hypervisor.

To be completely honest, I’m a command line guy.  Sure, I’ll use the HMC GUI to create small test partitions and add a few virtual network or virtual fiber channel connections when I must.  I’m much more likely to create a command script to do it all for anything more than a couple resources.  I don’t have the patience to create hundreds of virtual fiber channel connections on a giant Power 1080 one by one in a GUI.  That said, most IBM i admins don’t create a lot of resources except during hardware refreshes and migrations, so using the GUI is right way to learn – it’s also safer.

I’ll post some more details of the command line way of creating and configuring partitions and partition resources in the future for those that are interested in that approach.

Need Help?

If you need help fixing a profile problem, or with a hardware refresh or migration and don’t want to go it alone, the IBM i Technology Services team (formerly known as Lab Services) is available to help with implementation planning, execution, and knowledge transfer.  See https://www.ibm.com/it-infrastructure/services/lab-services for contact information or speak to your IBM Sales Representative or Business Partner.  If you are planning a new hardware purchase, you can include implementation services by the Technology Services team in your purchase.

Disclaimer

I am an employee of IBM on the IBM i Technology Services team (formerly known as Lab Services).  The opinions in this post are my own and don't necessarily represent IBM's positions, strategies, or opinions.

 

 

 

References

 

Synchronize Current Configuration and configuration change on inactive partition in HMC Enhanced UI

https://community.ibm.com/community/user/power/blogs/hariganesh-muralidharan1/2020/06/08/sync-curr-config-and-inactive-lpar-config-change

 

Bart’s Blog - Enable classic HMC GUI on release V9R1

https://theibmi.org/2019/09/11/enable-classic-hmc-gui-on-release-v9r1/

 

IBM Support - Saving Configuration Changes To Profile

https://www.ibm.com/support/pages/saving-configuration-changes-profile

 

How to create Shared Ethernet Adapater without touching VIOS

https://theibmi.org/2016/03/26/how-to-create-sea-with-no-touch-vio/

 

HMC – Enhanced+ interface tricks

https://theibmi.org/2020/11/15/hmc-enhanced-interface-tricks/

 

PowerVM VIOS for IBM i

 

In this post, I will discuss the pros and cons of creating a completely virtualized IBM i environment with redundant VIOS (Virtual I/O Server).  You can just look at the name of this website to understand where I stand on that issue.  Many IBM i administrators try to avoid VIOS for several reasons.  To be completely honest, a LONG time ago I was even one of them.  That was a mistake.  I want to make the case for why you should consider VIOS for I/O virtualization on your next system.

In the present day, there are many options for virtualizing the workload on an IBM Power server.  The options range from absolutely no virtualization (a non-partitioned system), to all Input/Output and processor completely virtualized and mobile.  According to the 2022 Fortra (formerly HelpSystems) survey, 22% of you have a single partition, and 25% have two partitions.  If that’s you, you probably don’t need VIOS... yet. 

It is also common to find particularly critical partitions with dedicated processors and dedicated I/O resources on the same Power servers as fully virtualized partitions that are sharing resources. 

I’m a big fan of virtualizing everything, but I understand that is not always optimal.  Fortunately, PowerVM has the flexibility provide the right choice for you on a partition-by-partition basis.

Why should you virtualize I/O? 

Ask yourself a question:  If you have more than one partition, why don’t you buy a separate Power system for each partition? 

Your business probably requires multiple partitions for a reason: workload splitting, different applications, development/testing environments, etc.  You also have good reasons to consolidate your separate workloads onto a smaller number of more powerful systems.  Usually, those reasons relate to things like cost, allowance for growth, limited floor space, power, or cooling requirements.

The same reasons apply to why you should virtualize your I/O resources.  Ethernet infrastructure (especially 10G) is a limited resource.  Switches, cabling and SFPs all add to expenses and complexity.

Sharing fiber channel ports for storage also reduces the number of ports needed on SAN switches, as well as reducing cable needs.  This saves money and time.

If you use external (SAN) storage, you can even use Live Partition Mobility (LPM) to move running partitions between physical servers.  This is a very common practice in the AIX world, but fairly rare for IBM i.  More to come on that.

External Storage also allows you to leverage technologies such as FlashCopy to create backups with almost zero downtime or create test or reporting copies practically instantly.  It will also greatly simplify server migrations and enable storage-based replication for High Availability and Disaster Recovery solutions.  I’ll write a future article that delves deeper into the benefits of external storage, as it is a technology that deserves a deep dive.

When you have a fully virtualized PowerVM infrastructure in place, creating a new partition becomes a very simple thing.  There is no longer any need to assign any physical resources.  Just create new virtual resources with the HMC GUI and your partition (IBM i, AIX, or Linux) is ready to go.  Okay, you might need to do some zoning and maybe assign some storage before you can use it, but the partition will be ready to go.

Redundancy is critical

Proper virtualization leverages redundancy to improve reliability.  Ideally, all your virtualized resources should have backup. 

Virtual Ethernet connections should be based on vNIC with multiple backing adapters for automatic failover, or Shared Ethernet Adapters backed by multiple physical adapters in multiple VIOS.  Each adapter should connect to the network via separate network switches.  Eliminate all single points of failure and you will eliminate many potential problems before they happen.

Storage should have multiple paths via multiple fiber channel cards owned by multiple VIOS partitions connected through multiple SAN switches (fabrics) to multiple storage ports.  Again, eliminate those single points of failure.

A properly implemented virtual infrastructure is more reliable than individual physical adapters directly mapped to partitions.

Don’t fear the VIOS

If I had any musical talent, I’d make a version of the classic “Don’t Fear the Reaper” song as “Don’t Fear the VIOS”.  I don’t, so I’ll just stick with text.  Trust me.  It’s better this way.

Many IBM i administrators want to avoid VIOS because it is based on AIX, which is an unfamiliar technology.  As I mentioned before, I was one of those until I spent a few years at a company which used VIOS extensively.

Let me be very clear about this.  AIX guys are NOT smarter than IBM i guys.  They just understand a different command syntax.  They might be smarter than Windows guys, but who isn’t, right?

AIX users should NOT be the only ones that benefit from VIOS in their environments.  VIOS is intended to be implemented as an appliance, similar to the HMC, but exclusively in software.  There is a connection to the HMC that is the primary means of configuration.  There is also a command line environment that is subset of simplified AIX commands and some commands that are specific to VIOS.  It is well documented with both online help and manuals, but you will rarely need to use it.

The fact is, once you have done the basic install of VIOS, all your ongoing monitoring and configuration can be completed from the modern Enhanced HMC GUI interface.  If you want to add a partition, map a new fiber channel port , configure a new vNIC, etc. You do it all with clicks on a web interface.  The only time you MUST use the command line on the VIOS is for a few commands during an install, and to install software updates.  Software updates are usually a painless process that involves an install to an alternate boot disk and a simple reboot to activate.  The alternate disk install also means the upgrades are completely reversible in case of problems.  Remember that you want to have redundant connections to multiple VIOS, so that reboot will not be disruptive to your environment. 

I should mention that just because you usually don’t have to use the command line interface doesn’t mean you won’t want to use the command line interface.  There is a massive amount of information to be had from those simple commands.  Watch for a future post where I publish and explain some of my favorite information gathering VIOS commands.

The benefits of VIOS outweigh the costs, especially if you are using external storage.

Licensing topics

Fun fact, you are probably already licensed for VIOS.  PowerVM is required for partitioning, and all editions include VIOS.  If have PowerVM licenses for your server, you are already entitled to install VIOS.  You can get it from IBM Entitled System Support by going to “My Entitled Software”, “By Product” and select 5765-VE3. 

Another important consideration for those of you with extra processors not licensed for IBM i, VIOS is not IBM i, so you do not need those licenses for the processors running VIOS.  That means the processor overhead related to handling the I/O virtualization does not have a premium beyond the cost to activate the processor.  You can make sure you are in compliance by using HMC processor pools to limit the IBM i partitions to the number of licensed processors, and putting your VIOS (and Linux) in an uncapped pool.

Another virtualization topic specific to IBM i is the way the O/S and most applications are licensed.  I mentioned earlier that Live Partition Mobility, moving a running partition to a different server, is a common practice for AIX shops.  It is pretty rare for IBM i.  I think one of the key reasons that has been true historically is that AIX O/S and applications are not generally licensed to a processor while IBM i O/S and applications are pretty much always licensed to a processor serial number.  That means moving an active IBM i partition to another Power server can result in license problems.  Fortunately, IBM recently announced Virtual Serial Numbers that can be attached to a partition and migrate with it.  If Live Partition Mobility appeals to you, look into getting a Virtual Serial Number. 

I should mention that since LPM moves memory over a network to the other server, LPM on IBM i may require a much more robust network environment than the equivalent AIX resources.  IBM i uses single level storage, so it uses large amounts of very active memory.  There are certainly memory size and activity limits that could preclude the use of LPM for very large environments.  As always, your environment matters, and your results may vary.

 

iVirtualization (AKA i hosting i)

There is another option for virtualizing I/O and disk resources for a client partition by using the iVirtualization functionality built into IBM i since V6.1.  This functionality allows you to virtualize ethernet adapters owned by the parent partition and to create virtual disk objects that are shared to another client partition as virtual SCSI disks.

The *NWS* commands to support this are all native IBM commands that will look familiar to IBM i administrators.  Don’t kid yourself.  They are no less complex than the corresponding VIOS commands to someone that has never used them.

In some limited situations, iVirtualization might be a viable option.  For example, on a small system with internal NVMe on a single backplane such that it is not possible to split between multiple VIOS for redundancy. 

Another case where iVirtualization might be preferred is for a small linux test partition hosted from an existing IBM i partition with internal disk and no VIOS infrastructure.

I would not use it with external storage in any case as it would lose all of the benefit of multipathing.

Now here are the primary reasons I would recommend VIOS over iVirtualization:

-          License costs.  Hosting on IBM i means paying for an IBM i license for work that could be free.

-          Performance.  The numbers I have seen have consistently shown the client partitions do not perform as well as an equivalent VIOS configuration.  This is especially problematic with an IBM i client as performance is related to number of disks, which results in more objects and more overhead.

-          Completely manual configuration.  The HMC GUI configuration that is available with VIOS does not work with iVirtualization, so it needs to be configured completely with commands.

-          No Redundancy.  When the host is down, the clients are down. To be fair, you could use multiple host partitions and mirror disks in the client, but you can do that with VIOS also.

-          No LPM.  Live Partition Mobility is not supported for clients of iVirtualization.

-          No development.  If you look at the table of changes in the IBM i Virtualization Summary referenced below, you will see that there has been only one change to iVirtualization since 2015, compared to constant development and improvements for VIOS.

 

What if you need help implementing VIOS with IBM i?

Whether you have a large environment or small, implementing new technologies can be challenging.  If you need help beyond the available documentation, the IBM i Technology Services team (formerly known as Lab Services) is available to help with implementation planning, execution, and knowledge transfer.  See https://www.ibm.com/it-infrastructure/services/lab-services for contact information or speak to your IBM Sales Representative or Business Partner.  If you are planning a new hardware purchase, you can include implementation services by the Technology Services team in your purchase.

Disclaimer

I am an employee of IBM on the IBM i Technology Services team (formerly known as Lab Services).  The opinions in this post are mine and don't necessarily represent IBM's positions, strategies, or opinions.

 

References:

2022 IBM i Marketplace Survey Results - Fortra

https://www.fortra.com/resources/guides/ibm-i-marketplace-survey-results

 

IBM i Virtualization Summary

https://www.ibm.com/support/pages/node/1135420

 

Introduction to SR-IOV and vNIC for IBM i

 

This is the first in a series of articles on frequently overlooked Power systems features that highlight the value for IBM i customers, starting with sharing Ethernet adapters with SR-IOV, and the added benefits that can be achieved with vNIC technology on top of SR-IOV. 

Whether you have an existing system that is already capable of these features, or you are considering migrating to new hardware, you can only benefit from knowing what your options are.

What Is SR-IOV?

SR-IOV (Single Root Input/Output Virtualization) is a hardware specification that allows multiple operating systems to simultaneously use a single I/O adapter in a virtualized environment.  It is not unique to the Power Hypervisor (PHYP).  You can find SR-IOV being used heavily in x86 based virtualization, such as VMWare or Hyper-V – a fact that just serves to complicate searches for information related to IBM i.

More to the point for the IBM i administrator, it allows a single SR-IOV capable adapter to be shared by multiple LPARs.  You can split a single adapter with two ports, dedicating each port to a separate lpar, or you can go more granular and share different percentages of the bandwidth of a single physical port between multiple partitions.  When sharing a single physical port, you get to specify the minimum percentage of outgoing bandwidth each partition gets, allowing each partition to use available bandwidth to burst higher when necessary.  It is also possible to limit the maximum outgoing bandwidth a given partition will use, although this is only possible via the HMC CLI, not the HMC GUI.

What is vNIC?

vNIC is a Power virtualization technology built into PowerVM that leverages the combination of VIOS (Virtual I/O Server) virtualization with SR-IOV adapters to get the performance and flexibility of SR-IOV with the additional flexibility and redundancy of a fully virtualized solution.  I expect to expand on VIOS in much more detail in future article.  For now, I’ll just say that vNIC provides an automated active/passive failover ability and supports the use of Live Partition Mobility.  If you already use VIOS, you should strongly consider SR-IOV adapters with vNIC rather than Shared Ethernet Adapters (SEA) unless you need the active/active load sharing configuration that is only available with SEA.  If you don’t use VIOS, watch out for a future article for why you should.

Why SR-IOV?

-          Better use of limited resources.  10G ethernet adapters have become common in enterprise configurations.  Most of these adapters have multiple ports.  Without SR-IOV, each adapter is usually dedicated to a single partition, often leaving the extra ports unused while additional adapters are dedicated to other partitions, leaving even more ports unused.  How many of these ports are utilized to their full capacity?  Not as many as you might think (seriously, collect some performance stats and see for yourself).  More adapters used at a fraction of their capacity means more cabling and more network switch ports, all used at a fraction of their capacity.  That gets costly, for both the server and network budgets, especially when working with 10G ports.

-          More flexibility.  Once you have connected ports to network switches, you can add partitions that use those ports without any additional cabling or network configuration.  This is especially true if you configure those ports as trunks and use VLAN tagging at the IBM i TCP/IP configuration to access different networks and IP address ranges.

-          Better Performance than other shared configurations.  Compared to traditional server-based networking configurations (VIOS Shared Ethernet Adapters or IBM i NWS Virtual Ethernet), SR-IOV connections perform much better.  Virtual ethernet connections have processor overhead, and many tuning parameters that limit performance.  SR-IOV establishes a hypervisor managed path to the hardware that is second only to a dedicated adapter.  In the real world, SR-IOV will perform effectively the same as a dedicated adapter, and better than any server virtualized adapter.

Who should use SR-IOV?

-          Large Enterprises should consider SR-IOV and vNIC technology to achieve high bandwidth connectivity to enterprise scale 10G (and up) infrastructure .  Automatic failover (vNIC) to redundant connections ensures connectivity that leverages the highly redundant network infrastructures that exist in high-end enterprises.

-          Small businesses should consider SR-IOV and vNIC technology to get the maximum capacity out of the investment in network connectivity.  Fewer adapters, less cabling and a smaller number of network ports is easier on the budget, while still providing the ability to adapt to changing business needs.  SR-IOV adapters provide the ability to share adapters between partitions without any server based virtualization, resulting in a simple to maintain shared configuration when other virtualization functions are not required.

What else do I need to know?

-          For all of the following, see the SR-IOV FAQ for details.  It can be found at:  https://community.ibm.com/community/user/power/viewdocument/sr-iov-vnic-and-hnv-information

  • You must have an SR-IOV supported adapter, so make sure your IBM Sales Representative or Business Partner knows you want SR-IOV when ordering a new system.
  • SR-IOV adapters must be placed in specific slots.  On Power 9 and Power 10 hardware, this includes most of the slots in the system.
  • There are limits on the number of SR-IOV enabled adapters per system.  As of November 2022, the maximum number of SR-IOV shared adapters is lower of 32 or the number of SR-IOV slots in the system.  This is not really limiting for most customers.
  • There are limits on how many shared (logical) ports can be assigned to a physical port, depending on the specific adapter (Ranging from 4 to 60)
  • There are limits on how many shared (logical) ports can be assigned per adapter (ranging from 48 to 120)
  • SR-IOV adapters in shared mode require Hypervisor memory (see FAQ)
  • Pay particular attention to limitations for 1G ports on supported adapters, especially 1G SFP+ in 10G+ adapters as these may not be supported for SR-IOV.
  • HMC is required for SR-IOV support.
  • VIOS is required for vNIC.  VIOS is NOT required for SR-IOV.
  • Sharing a Link Aggregation (e.g. LACP) of multiple ports is not allowed.  This is not as bad as it sounds as Link aggregation is effectively used as a redundancy measure in a VIOS SEA configuration rather than as a performance measure.  SEA simply does not have the capacity to use more than a single link’s bandwidth.  In practically all cases where Link Aggregation is used with VIOS, vNIC with failover is a better solution.  In the rare case that it is necessary, Link Aggregation can be managed at the IBM i O/S level with the CRTLINETH RSRCNAME(*AGG) command if the SR-IOV physical ports are 100% dedicated to a single partition.  See https://www.ibm.com/support/pages/configuring-ethernet-link-aggregation
  • Changing the minimum capacity of a SR-IOV logical port is disruptive, so plan accordingly.  Remember that the value is a minimum, and all logical ports can burst higher.  This means that barring any specific continuous outgoing bandwidth requirements, you are better off estimating low.
  • Bandwidth splitting on SR-IOV adapters is based on outgoing bandwidth only.  There is no way to split incoming bandwidth, so consideration should be given to anticipated incoming bandwidth when deciding on how many partitions can share a port.
  • SR-IOV cards are not owned by any partition, so typically adapter firmware updates are included in System firmware updates.  If necessary, there is a separate procedure to install adapter firmware updates separately that you may need to use. 

How to configure an SR-IOV port on IBM i

Rather than including a bunch of HMC screenshots that duplicate existing resources, I’ll direct you to the excellent reference material in the “Selected References” below, especially the Redpaper.  These references will show you how to put an SR-IOV adapter in shared or hypervisor mode and how to configure a logical port for a partition.  There is no difference between doing this for AIX and IBM i.  The specific web interface may change a bit with each HMC release, but the concepts remain the same.

Once the resource is created, the easiest way to determine the resource name is to select the partition from the HMC and get the CMNxx resource name from the “Hardware Virtualized I/O” page for SR-IOV, or the “vNIC” page for a vNIC.  It will also show up along with all of the other resources in WRKHDWRSC *CMN, or STRSST.  Once the resource name is located, configure it exactly as you would any other Ethernet resource by creating a Line description, IP address, etc.

You can dynamically add and remove SR-IOV and vNIC resources to/from a running partition.  Make sure that if you remove one, there are not any configurations using that resource.

What if you need help implementing SR-IOV or vNIC on an IBM i?

Whether you have a large environment or small, implementing new technologies can be challenging.  If you need help beyond the available documentation, the IBM i Technology Services team (formerly known as Lab Services) is available to help with implementation planning, execution, and knowledge transfer.  See https://www.ibm.com/it-infrastructure/services/lab-services for contact information or speak to your IBM Sales Representative or Business Partner.  If you are planning a new hardware purchase, you can include implementation services by the Technology Services team in your purchase.

Disclaimer

I am an employee of IBM on the IBM i Technology Services team (formerly known as Lab Services).  The opinions in this post are my own and don't necessarily represent IBM's positions, strategies, or opinions.

Selected References

I often find that researching topics related to Power Systems provides a wealth of information relating to AIX and VIOS, and substantially less that relates directly to IBM i.  Having spent a few years administering AIX systems, I am familiar with the many excellent AIX blogs that are available.  Many of these references are very AIX focused, but don’t let that dissuade you from reading them -- they are also excellent resources for IBM i administrators.

 

IBM Power Systems SR-IOV Technical Overview and Introduction Redpaper https://www.redbooks.ibm.com/abstracts/redp5065.html

 

IBM Support: Configuring Ethernet Link Aggregation

https://www.ibm.com/support/pages/configuring-ethernet-link-aggregation

 

IBM Community: SR-IOV FAQ

https://community.ibm.com/community/user/power/viewdocument/sr-iov-vnic-and-hnv-information

 

AIX for System Administrators – SR-IOV & vNIC summary pages

http://aix4admins.blogspot.com/2016/01/sr-iov-vnic.html

http://aix4admins.blogspot.com/2017/03/vnic_20.html

 

YouTube – This is the replay from the May 28th, Power Systems Virtual User Group Webinar covering Single Root I/O Virtualization (SR-IOV) presented by expert Chuck Graham

https://youtu.be/1ANyxQaSXOI


TL;DR 

SR-IOV lets you share ethernet adapter cards across multiple IBM i partitions without using VIOS.  vNIC adds the ability to include automatic active/passive failover if you also use VIOS.

 

Proactive vNIC Changes for VIOS Maintenance

In January 2023, I published an article and shared Python code that allows monitoring of your systems to verify your vNIC backing devices ar...