Monitoring vNIC on Power

 

As I discussed in my previous post Introduction to SR-IOV and vNIC for IBM i, vNIC is a powerful feature built on top of SR-IOV and VIOS that virtualizes Ethernet connections to your IBM i, AIX, or Linux partitions with high performance and automatic failover to alternative ports (backing devices).

One problem with creating a vNIC with multiple backing devices in a failover configuration is you might not notice when it automatically switches to a backup device because a higher priority device fails.  If you don’t notice, you could end up running in a reduced redundancy situation because the problem that caused the switch never gets resolved. 

The solution for that problem is monitoring.  To that end, I’m publishing a Python script I wrote to monitor your vNIC configuration for several conditions that could indicate a problem.

Best practices

When creating a vNIC configuration, there are a few practices that will help ensure you get the best possible redundancy to protect your self from failures in various parts of your infrastructure.

  • Use multiple backing devices.  Each vNIC should have at least one backing device to failover to in case the primary device encounters a problem.  It is common to see configurations with 3 or 4 total backing devices.  As of this writing, up to 16 are supported.
  • Spread backing devices across points of failure.  Separate them to different VIOS, different adapters, and different network switches.
  • Spread active devices across all VIOS and physical adapters to statically load balance the work.  Don’t put all your active devices on one VIOS and all your backup devices on the other or you’ll be looking a big processor/memory spike when you take the primary VIOS down.  Likewise, it makes little sense to squeeze all your traffic through some of your adapters/ports and leave others idle.  Using all of your ports also makes it possible to detect port failures that are related to switching and routing that would otherwise go undetected until they are your only option when primary ports fail.
  • Assign each backing device a unique failover priority so the backup sequence is deterministic.  For vNIC, the lowest priority number is the highest priority, and it defaults to 50.  Typically, I would assign 50 to the desired active device, 60 to the first backup, 70 to the second backup, etc.  You can use any numbers you wish, but keep them unique for a given vNIC, and leave some space in the numbering to change it around if you need to.
  • Use the HMC GUI to activate backup ports if you need to move traffic proactively.  Select a specific LPAR from the HMC screen, then select Virtual NICs from the left-hand menu to display/edit the vNICs for a partition.  To switch to a different backing device, select the desired device and select Action-Make the backing device active.  You will notice that when you do this, the “Auto Priority Failover” setting will change to “Disabled”. That will prevent the vNIC from switching based on priority unless the active port fails.

Installing the Monitor code

The vnic-check.py script is written in Python.  It will run on any platform that has Python3.6 or above, including the IBM i.  As this is an IBM i centric blog, I include the instructions to install, configure and run it on IBM I using the PASE environment, but it will run an pretty much anywhere.

Prerequisites

You must have the PASE environment (5733-SC1 Base), and OpenSSH installed (5733-SC1 Opt 1).  

 

Installing Open Source packages and python3

See https://www.ibm.com/support/pages/getting-started-open-source-package-management-ibm-i-acs for details of this process.

Start the SSH server on the IBMi partition with the STRTCPSVR *SSHD command if it is not already started.

Using IBM Access Client Solutions (ACS),  Select a system for the LPAR where you wish to install the monitoring tool.  From the menu, select Tools->Open Source Package Management

On the “Connect to SSH” window that is displayed, enter a user ID and password with sufficient authority to install the open source packages (see link above for details)

If your IBMi partition does not have the ability to connect to the Internet, use the radio button under “Proxy Mode” to select “SSH Tunneling” This will allow the packages to be downloaded via your workstation.

If you get a message box that the Open Source Environment is not installed, click “Yes” to install it.

When the open-source environment install is complete, it will display a window with the list of installed packages.  If python3 is in that list, you are done.  If not, switch to the “available packages” tab, click “python3” and click Install.  If no available packages display, you may need to close the open source management window and reopen it.

Verify python3 is installed by opening a QShell session (QSH command) and running “/QOpensys/pkgs/bin/python3 -V”  It should show a Python version number of 3.6 or higher.  F3 to exit back to your command line.

 

Create an HMC Monitoring account

This script runs query commands on the HMC using a private key with no passphrase.  Since it is a very bad security idea to have that kind of access to your HMC hscroot account, you’ll want to create an account that can only run monitoring commands.  Seriously, DO NOT create private keys as described here using hscroot or any other HMC account that can make changes.  If you’re not going to use a restricted account, don’t use this script.

Connect to the HMC using a SSH client like Putty as user hscroot (or another user with the authority to create user accounts). 

Run the command:

mkhmcusr -i "name=monitor,taskrole=hmcviewer,description=For restricted monitoring scripts,pwage=99999,resourcerole=ALL:,authentication_type=local,remote_webui_access=0,remote_ssh_access=1,min_pwage=0,session_timeout=0,verify_timeout=15,idle_timeout=120,inactivity_expiration=0"

It will prompt for a password.  Assign it something secure.  This will create an account named “monitor” that can only be used by the SSH interface.  It will not be able to use the Web GUI, and it will be restricted in the commands it can run.

Repeat this account creation on each HMC that will be monitored.

 

Create a key to access the monitor account

You will be running the monitoring script from one of your IBMi partitions, with a specific user id that will have an SSH key to access the HMC using the monitor account you just created.

Pick the User-Id that will be running the command.  I’m not going to go into detail on creating this account since if you’re an IBMi administrator, you already know how to create accounts and create job schedule entries that use a specific account.  Of course, you can use an existing account for this as well.

The account you choose will need to have a home directory where you can create an ssh private key that you will authorize to connect to the HMC monitor account.

Start QShell (QSH) from the account you will use and run the following:

# on all of the following commands, replace 1.2.3.4 with the IP address of the HMC you want to monitor.  Repeat for each HMC if you are monitoring more than one.

mkdir -p $HOME # make sure there is a home directory

cd $HOME # change to the home directory

ssh-keygen

# press enter three times to accept the default file /home/MONITOR/.ssh/id_rsa and use an empty passphrase

ssh monitor@1.2.3.4 mkauthkeys -a \"`cat ~/.ssh/id_rsa.pub`\"

# answer ‘yes’ to the authenticity prompt

# Enter the HMC Monitor account password when prompted for Password:

# finally test the SSH key access with:

ssh monitor@1.2.3.4 lssyscfg -r sys -F name

# you should get a list of the system names managed by that HMC without any password prompting

Use F3 to leave the QShell prompt.

 

Now you’ll need to download and edit the top of the vnic-check.py file to set your parameters. 

You can find the open source script vnic-check.py in the public repository at: https://github.com/IBM/blog-vios4i

Download directly with: https://github.com/IBM/blog-vios4i/raw/main/src/vnic-check.py

 

smtphost: Set to the name (or address) of a SMTP relay in your organization where you can send mail.  On the IBMi, if the current partition is running a mail server locally, you can use 127.0.0.1 here.  Set this to None if you just want to get a printed result to the screen (or a spooled file in batch).  Using None is useful to ensure the command is working properly before setting up email.

sender:  If using email, this needs to be a valid email address that can send mail in your organization.

toaddrs:  This is a list of email addresses that should get messages when the check finds any conditions that need fixing.  You can use a comma separated list of addresses between the brackets where each address is enclosed in quotes.

hmcs: this should be a list of the SSH address of the monitor account on your HMC in the format monitor@ipaddress.  You can also use a DNS name instead of the ip address if DNS is properly configured for your PASE environment (verify by using the host command in a PASE shell).  The entire list should be surrounded by [] characters, and each hmc address should be surrounded by single quote characters and separated by commas.  It is okay to only have one hmc in the list.  You will need to do the same key setup described above on each HMC if you use more than one.

minopercount: this should be the lowest number of backing devices that is acceptable in your environment.  Any vNIC with less than this number of operational devices will be reported as a problem.

When you have set your parameters, transfer the script to the home directory of the user that will be running the command.

Finally, make sure it works by opening QShell (QSH command) and running the script:

/QOpensys/pkgs/bin/python3  vnic-check.py

If all goes well, you’ll get no email or output (indicating all of the vNICs found are without problems), or a list of the problems found. If you get no output and want to make sure it is finding your vNICs,  Change the minopercount variable to a high number (999) and rerun to report all of your vNICs are lower than the desired count.

When you have verified all is well, reset the variables as needed and schedule a job to run:

QSH CMD('/QOpensys/pkgs/bin/python3  vnic-check.py')

as the selected user on the desired schedule.

  

Need help?

If you need help implementing best practices for your vNICs, the IBM i Technology Services team (formerly known as Lab Services) is available to help with implementation planning, execution, and knowledge transfer.  See https://www.ibm.com/it-infrastructure/services/lab-services for contact information or speak to your IBM Sales Representative or Business Partner.  If you are planning a new hardware purchase, you can include implementation services by the Technology Services team in your purchase.

Disclaimer

I am an employee of IBM on the IBM i Technology Services team (formerly known as Lab Services).  The opinions in this post are my own and don't necessarily represent IBM's positions, strategies, or opinions.

 

References

 

Getting started with Open Source Package Management in IBM i ACS

https://www.ibm.com/support/pages/getting-started-open-source-package-management-ibm-i-acs

 

IBM i ACS Open Source Package Management Auth Fail Error

https://www.ibm.com/support/pages/node/1167988

 

Please Stop Changing Partition Profiles

 


The Enhanced HMC interface is here to stay.  If you are still changing partition profiles on your Power HMC, you really need to start using the new functionality instead, or you risk getting out of sync and losing changes.  It is painful to create a bunch of new virtual fiber channel adapters, and then have them magically disappear with your next reboot.  It’s even worse when you reboot a VIOS and choose an out of date partition profile and suddenly some of your client disks go away.  Ask me how I know.

I normally try to write articles focused on IBM i, but in this case, there really isn’t any difference between IBM i, AIX, and Linux.  All partitions (especially VIOS) should follow the same rules.

 

First a bit of history

IBM made the Enhanced HMC interface available as an option with version 8.1.0.1.  If you were an administrator like me, you just looked at it once or twice, figured it didn’t make any sense compared to what you were used to, and just selected “Classic” from the menu when you logged in. 

Version V8R8.7.0 officially eliminated the Classic interface, but some enterprising users found and published a backdoor approach to access the classic interface even at that level (see Bart’s Blog - Enable classic HMC GUI on release V9R1 – referenced below) That unofficial approach was then shut down for good in May of 2020.

Why?  Because IBM is focusing development on a single easy to use interface that leverages DLPAR operations for all the new features like vNIC (see my previous blog post if that’s new to you).

 

Making the Move

First and foremost, make sure that your partition profiles are in sync with the running profile.

There is an excellent blog post in the IBM Community that explains this in much more detail.

If you are using VIOS, DON’T FORGET THE VIOS!  There is far more risk of lost configuration on VIOS than any other partition, because when you are using the Enhanced GUI, you are often making dynamic changes to VIOS you may not even be aware of.

The gist of it is that you should be running with the “Save configuration changes to profile” setting at “Enabled”.  If it is not currently set to enabled, you need to get it set that way.

If the setting is currently “disabled”, start by saving your current configuration to the default partition profile.  Select the partition view for the desired partition from the GUI, select Partition Actions->Profiles->Save Current Configuration and select the default profile name.  Most users only have one profile per partition.  If you are one of the few that has more than one, pick a name for the profile that you will use from now on.  The default name used for newly created partitions is “default_profile”, so that is pretty good choice for a name.   Save the configuration with the desired name.  If you created a new name, go into “Manage Profiles” for your last time and change it your newly saved profile as the default.  Now is also a good time to delete all those profiles you will not be using any more.

Now you can change the “Save configuration changes to profile” setting to “Enabled”.

 

Doing it the Enhanced way

Once you have this setting enabled, just stay away from “Manage Profiles” and make all of your changes using the Enhanced GUI dynamic menu operations available from the left-hand menu of the partition view. 

When you need to activate a partition that you previously shutdown, make sure you use the “Current Configuration” option rather than picking a partition profile.

The biggest difference between changing partition profiles and restarting with a different profile is that in the Enhanced GUI, it will make the changes dynamically on a running partition.  It will also make the corresponding changes on the VIOS, if necessary.  The days of keeping track of virtual port numbers can be gone, if you let them.

You’ll find that when you Google the procedure to do anything on the HMC, you will often find articles and screen shots that point you to modify the profile.  If at any point, one of these articles suggests using the Manage Profiles option or tells you to select a specific profile when activating a partition, keep looking for a new procedure.  You can often get good basic information from these articles, but the specific procedures are likely to get you into trouble.

Enhanced HMC changes are typically dynamic on a running partition.  This requires communication between the HMC and the running partition, which you will typically see referred to as an RMC connection.  One difference for the IBM i world is that IBM i uses a LIC connection rather than the RMC connections that are used by AIX and Linux.  This all means that you won’t see an RMC active flag on an IBM i partition.  I mention this for two reasons.  First, much of the documentation you will run into will mention the need for an active RMC connection for various procedures.  That is not true for IBM i.  Second, the O/S on an IBM i does need to be operating to make some dynamic changes.  The error message you’ll get while attempting to make some changes on an activated IBM i partition with refer to RMC, but it really means its not booted to a DLpar capable state. 

You may notice that there are things you cannot change using the Enhanced interface while the partition is active.  Some examples are max processor, max memory, max virtual adapters, and processor compatibility mode.  All these options require a shutdown and restart.  You will be permitted to make the changes while the partition is shutdown.

Why is it so slow? (Spoiler - it's not)

You might not believe me here, but it isn’t slow.  It just feels that way because it is doing everything dynamically right now when you are used to delaying all that processing to partition activation.

Making changes to profiles is blazing fast because they are not actually changing any real resources, but you will pay the price during activation of that profile.  On the contrary, when you make a change to a running partition with a dynamic HMC change, all that processing that happens in the hypervisor and O/S to add that resource will happen immediately -- while you wait.  That’s right, while you wait means, well... you will be waiting.

I’ve actually done some benchmarks on new system setups to compare dynamic operations with HMC commands (chhwres - equivalent to the Enhanced HMC GUI)  to HMC profile change commands (chsyscfg commands) that get applied via the “chsyscfg -o apply” command.  The chhwres commands on either a running or inactive partition, tend to be slow to operate, while the equivalent profile changes are very fast until they are either applied via apply command or profile activation.  In the end, it comes down to when you are going to wait.  You can wait now, or you can wait later, but you are always going to wait for the actual resource creation in the hypervisor.

To be completely honest, I’m a command line guy.  Sure, I’ll use the HMC GUI to create small test partitions and add a few virtual network or virtual fiber channel connections when I must.  I’m much more likely to create a command script to do it all for anything more than a couple resources.  I don’t have the patience to create hundreds of virtual fiber channel connections on a giant Power 1080 one by one in a GUI.  That said, most IBM i admins don’t create a lot of resources except during hardware refreshes and migrations, so using the GUI is right way to learn – it’s also safer.

I’ll post some more details of the command line way of creating and configuring partitions and partition resources in the future for those that are interested in that approach.

Need Help?

If you need help fixing a profile problem, or with a hardware refresh or migration and don’t want to go it alone, the IBM i Technology Services team (formerly known as Lab Services) is available to help with implementation planning, execution, and knowledge transfer.  See https://www.ibm.com/it-infrastructure/services/lab-services for contact information or speak to your IBM Sales Representative or Business Partner.  If you are planning a new hardware purchase, you can include implementation services by the Technology Services team in your purchase.

Disclaimer

I am an employee of IBM on the IBM i Technology Services team (formerly known as Lab Services).  The opinions in this post are my own and don't necessarily represent IBM's positions, strategies, or opinions.

 

 

 

References

 

Synchronize Current Configuration and configuration change on inactive partition in HMC Enhanced UI

https://community.ibm.com/community/user/power/blogs/hariganesh-muralidharan1/2020/06/08/sync-curr-config-and-inactive-lpar-config-change

 

Bart’s Blog - Enable classic HMC GUI on release V9R1

https://theibmi.org/2019/09/11/enable-classic-hmc-gui-on-release-v9r1/

 

IBM Support - Saving Configuration Changes To Profile

https://www.ibm.com/support/pages/saving-configuration-changes-profile

 

How to create Shared Ethernet Adapater without touching VIOS

https://theibmi.org/2016/03/26/how-to-create-sea-with-no-touch-vio/

 

HMC – Enhanced+ interface tricks

https://theibmi.org/2020/11/15/hmc-enhanced-interface-tricks/

 

Proactive vNIC Changes for VIOS Maintenance

In January 2023, I published an article and shared Python code that allows monitoring of your systems to verify your vNIC backing devices ar...