Monitoring vNIC on Power

 

As I discussed in my previous post Introduction to SR-IOV and vNIC for IBM i, vNIC is a powerful feature built on top of SR-IOV and VIOS that virtualizes Ethernet connections to your IBM i, AIX, or Linux partitions with high performance and automatic failover to alternative ports (backing devices).

One problem with creating a vNIC with multiple backing devices in a failover configuration is you might not notice when it automatically switches to a backup device because a higher priority device fails.  If you don’t notice, you could end up running in a reduced redundancy situation because the problem that caused the switch never gets resolved. 

The solution for that problem is monitoring.  To that end, I’m publishing a Python script I wrote to monitor your vNIC configuration for several conditions that could indicate a problem.

Best practices

When creating a vNIC configuration, there are a few practices that will help ensure you get the best possible redundancy to protect your self from failures in various parts of your infrastructure.

  • Use multiple backing devices.  Each vNIC should have at least one backing device to failover to in case the primary device encounters a problem.  It is common to see configurations with 3 or 4 total backing devices.  As of this writing, up to 16 are supported.
  • Spread backing devices across points of failure.  Separate them to different VIOS, different adapters, and different network switches.
  • Spread active devices across all VIOS and physical adapters to statically load balance the work.  Don’t put all your active devices on one VIOS and all your backup devices on the other or you’ll be looking a big processor/memory spike when you take the primary VIOS down.  Likewise, it makes little sense to squeeze all your traffic through some of your adapters/ports and leave others idle.  Using all of your ports also makes it possible to detect port failures that are related to switching and routing that would otherwise go undetected until they are your only option when primary ports fail.
  • Assign each backing device a unique failover priority so the backup sequence is deterministic.  For vNIC, the lowest priority number is the highest priority, and it defaults to 50.  Typically, I would assign 50 to the desired active device, 60 to the first backup, 70 to the second backup, etc.  You can use any numbers you wish, but keep them unique for a given vNIC, and leave some space in the numbering to change it around if you need to.
  • Use the HMC GUI to activate backup ports if you need to move traffic proactively.  Select a specific LPAR from the HMC screen, then select Virtual NICs from the left-hand menu to display/edit the vNICs for a partition.  To switch to a different backing device, select the desired device and select Action-Make the backing device active.  You will notice that when you do this, the “Auto Priority Failover” setting will change to “Disabled”. That will prevent the vNIC from switching based on priority unless the active port fails.

Installing the Monitor code

The vnic-check.py script is written in Python.  It will run on any platform that has Python3.6 or above, including the IBM i.  As this is an IBM i centric blog, I include the instructions to install, configure and run it on IBM I using the PASE environment, but it will run an pretty much anywhere.

Prerequisites

You must have the PASE environment (5733-SC1 Base), and OpenSSH installed (5733-SC1 Opt 1).  

 

Installing Open Source packages and python3

See https://www.ibm.com/support/pages/getting-started-open-source-package-management-ibm-i-acs for details of this process.

Start the SSH server on the IBMi partition with the STRTCPSVR *SSHD command if it is not already started.

Using IBM Access Client Solutions (ACS),  Select a system for the LPAR where you wish to install the monitoring tool.  From the menu, select Tools->Open Source Package Management

On the “Connect to SSH” window that is displayed, enter a user ID and password with sufficient authority to install the open source packages (see link above for details)

If your IBMi partition does not have the ability to connect to the Internet, use the radio button under “Proxy Mode” to select “SSH Tunneling” This will allow the packages to be downloaded via your workstation.

If you get a message box that the Open Source Environment is not installed, click “Yes” to install it.

When the open-source environment install is complete, it will display a window with the list of installed packages.  If python3 is in that list, you are done.  If not, switch to the “available packages” tab, click “python3” and click Install.  If no available packages display, you may need to close the open source management window and reopen it.

Verify python3 is installed by opening a QShell session (QSH command) and running “/QOpensys/pkgs/bin/python3 -V”  It should show a Python version number of 3.6 or higher.  F3 to exit back to your command line.

 

Create an HMC Monitoring account

This script runs query commands on the HMC using a private key with no passphrase.  Since it is a very bad security idea to have that kind of access to your HMC hscroot account, you’ll want to create an account that can only run monitoring commands.  Seriously, DO NOT create private keys as described here using hscroot or any other HMC account that can make changes.  If you’re not going to use a restricted account, don’t use this script.

Connect to the HMC using a SSH client like Putty as user hscroot (or another user with the authority to create user accounts). 

Run the command:

mkhmcusr -i "name=monitor,taskrole=hmcviewer,description=For restricted monitoring scripts,pwage=99999,resourcerole=ALL:,authentication_type=local,remote_webui_access=0,remote_ssh_access=1,min_pwage=0,session_timeout=0,verify_timeout=15,idle_timeout=120,inactivity_expiration=0"

It will prompt for a password.  Assign it something secure.  This will create an account named “monitor” that can only be used by the SSH interface.  It will not be able to use the Web GUI, and it will be restricted in the commands it can run.

Repeat this account creation on each HMC that will be monitored.

 

Create a key to access the monitor account

You will be running the monitoring script from one of your IBMi partitions, with a specific user id that will have an SSH key to access the HMC using the monitor account you just created.

Pick the User-Id that will be running the command.  I’m not going to go into detail on creating this account since if you’re an IBMi administrator, you already know how to create accounts and create job schedule entries that use a specific account.  Of course, you can use an existing account for this as well.

The account you choose will need to have a home directory where you can create an ssh private key that you will authorize to connect to the HMC monitor account.

Start QShell (QSH) from the account you will use and run the following:

# on all of the following commands, replace 1.2.3.4 with the IP address of the HMC you want to monitor.  Repeat for each HMC if you are monitoring more than one.

mkdir -p $HOME # make sure there is a home directory

cd $HOME # change to the home directory

ssh-keygen

# press enter three times to accept the default file /home/MONITOR/.ssh/id_rsa and use an empty passphrase

ssh monitor@1.2.3.4 mkauthkeys -a \"`cat ~/.ssh/id_rsa.pub`\"

# answer ‘yes’ to the authenticity prompt

# Enter the HMC Monitor account password when prompted for Password:

# finally test the SSH key access with:

ssh monitor@1.2.3.4 lssyscfg -r sys -F name

# you should get a list of the system names managed by that HMC without any password prompting

Use F3 to leave the QShell prompt.

 

Now you’ll need to download and edit the top of the vnic-check.py file to set your parameters. 

You can find the open source script vnic-check.py in the public repository at: https://github.com/IBM/blog-vios4i

Download directly with: https://github.com/IBM/blog-vios4i/raw/main/src/vnic-check.py

 

smtphost: Set to the name (or address) of a SMTP relay in your organization where you can send mail.  On the IBMi, if the current partition is running a mail server locally, you can use 127.0.0.1 here.  Set this to None if you just want to get a printed result to the screen (or a spooled file in batch).  Using None is useful to ensure the command is working properly before setting up email.

sender:  If using email, this needs to be a valid email address that can send mail in your organization.

toaddrs:  This is a list of email addresses that should get messages when the check finds any conditions that need fixing.  You can use a comma separated list of addresses between the brackets where each address is enclosed in quotes.

hmcs: this should be a list of the SSH address of the monitor account on your HMC in the format monitor@ipaddress.  You can also use a DNS name instead of the ip address if DNS is properly configured for your PASE environment (verify by using the host command in a PASE shell).  The entire list should be surrounded by [] characters, and each hmc address should be surrounded by single quote characters and separated by commas.  It is okay to only have one hmc in the list.  You will need to do the same key setup described above on each HMC if you use more than one.

minopercount: this should be the lowest number of backing devices that is acceptable in your environment.  Any vNIC with less than this number of operational devices will be reported as a problem.

When you have set your parameters, transfer the script to the home directory of the user that will be running the command.

Finally, make sure it works by opening QShell (QSH command) and running the script:

/QOpensys/pkgs/bin/python3  vnic-check.py

If all goes well, you’ll get no email or output (indicating all of the vNICs found are without problems), or a list of the problems found. If you get no output and want to make sure it is finding your vNICs,  Change the minopercount variable to a high number (999) and rerun to report all of your vNICs are lower than the desired count.

When you have verified all is well, reset the variables as needed and schedule a job to run:

QSH CMD('/QOpensys/pkgs/bin/python3  vnic-check.py')

as the selected user on the desired schedule.

  

Need help?

If you need help implementing best practices for your vNICs, the IBM i Technology Services team (formerly known as Lab Services) is available to help with implementation planning, execution, and knowledge transfer.  See https://www.ibm.com/it-infrastructure/services/lab-services for contact information or speak to your IBM Sales Representative or Business Partner.  If you are planning a new hardware purchase, you can include implementation services by the Technology Services team in your purchase.

Disclaimer

I am an employee of IBM on the IBM i Technology Services team (formerly known as Lab Services).  The opinions in this post are my own and don't necessarily represent IBM's positions, strategies, or opinions.

 

References

 

Getting started with Open Source Package Management in IBM i ACS

https://www.ibm.com/support/pages/getting-started-open-source-package-management-ibm-i-acs

 

IBM i ACS Open Source Package Management Auth Fail Error

https://www.ibm.com/support/pages/node/1167988

 

No comments:

Post a Comment

Proactive vNIC Changes for VIOS Maintenance

In January 2023, I published an article and shared Python code that allows monitoring of your systems to verify your vNIC backing devices ar...