As I discussed in my previous post Introduction to SR-IOV and vNIC for IBM i, vNIC is a powerful feature built on top of SR-IOV and VIOS that virtualizes Ethernet connections to your IBM i, AIX, or Linux partitions with high performance and automatic failover to alternative ports (backing devices).
One problem with creating a vNIC with multiple backing
devices in a failover configuration is you might not notice when it
automatically switches to a backup device because a higher priority device
fails. If you don’t notice, you could
end up running in a reduced redundancy situation because the problem that
caused the switch never gets resolved.
The solution for that problem is monitoring. To that end, I’m publishing a Python script I
wrote to monitor your vNIC configuration for several conditions that could
indicate a problem.
Best practices
When creating a vNIC configuration, there are a few practices that will help ensure you get the best possible redundancy to protect your self from failures in various parts of your infrastructure.
- Use multiple backing devices. Each vNIC should have at least one backing device to failover to in case the primary device encounters a problem. It is common to see configurations with 3 or 4 total backing devices. As of this writing, up to 16 are supported.
- Spread backing devices across points of failure. Separate them to different VIOS, different adapters, and different network switches.
- Spread active devices across all VIOS and physical adapters to statically load balance the work. Don’t put all your active devices on one VIOS and all your backup devices on the other or you’ll be looking a big processor/memory spike when you take the primary VIOS down. Likewise, it makes little sense to squeeze all your traffic through some of your adapters/ports and leave others idle. Using all of your ports also makes it possible to detect port failures that are related to switching and routing that would otherwise go undetected until they are your only option when primary ports fail.
- Assign each backing device a unique failover priority so the backup sequence is deterministic. For vNIC, the lowest priority number is the highest priority, and it defaults to 50. Typically, I would assign 50 to the desired active device, 60 to the first backup, 70 to the second backup, etc. You can use any numbers you wish, but keep them unique for a given vNIC, and leave some space in the numbering to change it around if you need to.
- Use the HMC GUI to activate backup ports if you need to move traffic proactively. Select a specific LPAR from the HMC screen, then select Virtual NICs from the left-hand menu to display/edit the vNICs for a partition. To switch to a different backing device, select the desired device and select Action-Make the backing device active. You will notice that when you do this, the “Auto Priority Failover” setting will change to “Disabled”. That will prevent the vNIC from switching based on priority unless the active port fails.
Installing the Monitor code
The vnic-check.py script is written in Python. It will run on any platform that has
Python3.6 or above, including the IBM i.
As this is an IBM i centric blog, I include the instructions to install,
configure and run it on IBM I using the PASE environment, but it will run an
pretty much anywhere.
Prerequisites
You must have the PASE environment (5733-SC1 Base), and
OpenSSH installed (5733-SC1 Opt 1).
Installing Open Source packages and python3
See https://www.ibm.com/support/pages/getting-started-open-source-package-management-ibm-i-acs
for details of this process.
Start the SSH server on the IBMi partition with the STRTCPSVR
*SSHD command if it is not already started.
Using IBM Access Client Solutions (ACS), Select a system for the LPAR where you wish
to install the monitoring tool. From the
menu, select Tools->Open Source Package Management
On the “Connect to SSH” window that is displayed, enter a
user ID and password with sufficient authority to install the open source
packages (see link above for details)
If your IBMi partition does not have the ability to connect
to the Internet, use the radio button under “Proxy Mode” to select “SSH
Tunneling” This will allow the packages to be downloaded via your workstation.
If you get a message box that the Open Source Environment is
not installed, click “Yes” to install it.
When the open-source environment install is complete, it
will display a window with the list of installed packages. If python3 is in that list, you are
done. If not, switch to the “available
packages” tab, click “python3” and click Install. If no available packages display, you may
need to close the open source management window and reopen it.
Verify python3 is installed by opening a QShell session (QSH
command) and running “/QOpensys/pkgs/bin/python3 -V” It should show a Python version number of 3.6
or higher. F3 to exit back to your
command line.
Create an HMC Monitoring account
This script runs query commands on the HMC using a private
key with no passphrase. Since it is a
very bad security idea to have that kind of access to your HMC hscroot account,
you’ll want to create an account that can only run monitoring commands. Seriously, DO NOT create private keys as
described here using hscroot or any other HMC account that can make
changes. If you’re not going to use a
restricted account, don’t use this script.
Connect to the HMC using a SSH client like Putty as user
hscroot (or another user with the authority to create user accounts).
Run the command:
mkhmcusr -i "name=monitor,taskrole=hmcviewer,description=For
restricted monitoring
scripts,pwage=99999,resourcerole=ALL:,authentication_type=local,remote_webui_access=0,remote_ssh_access=1,min_pwage=0,session_timeout=0,verify_timeout=15,idle_timeout=120,inactivity_expiration=0"
It will prompt for a password. Assign it something secure. This will create an account named “monitor”
that can only be used by the SSH interface.
It will not be able to use the Web GUI, and it will be restricted in the
commands it can run.
Repeat this account creation on each HMC that will be monitored.
Create a key to access the monitor account
You will be running the monitoring script from one of your
IBMi partitions, with a specific user id that will have an SSH key to access
the HMC using the monitor account you just created.
Pick the User-Id that will be running the command. I’m not going to go into detail on creating
this account since if you’re an IBMi administrator, you already know how to
create accounts and create job schedule entries that use a specific account. Of course, you can use an existing account
for this as well.
The account you choose will need to have a home directory
where you can create an ssh private key that you will authorize to connect to
the HMC monitor account.
Start QShell (QSH) from the account you will use and run the
following:
# on all of the following commands, replace 1.2.3.4 with the
IP address of the HMC you want to monitor. Repeat for each HMC if you are monitoring more than one.
mkdir -p $HOME # make sure there is a home directory
cd $HOME # change to the home directory
ssh-keygen
# press enter three times to accept the default file /home/MONITOR/.ssh/id_rsa
and use an empty passphrase
ssh monitor@1.2.3.4 mkauthkeys -a \"`cat
~/.ssh/id_rsa.pub`\"
# answer ‘yes’ to the authenticity prompt
# Enter the HMC Monitor account password when prompted for
Password:
# finally test the SSH key access with:
ssh monitor@1.2.3.4 lssyscfg -r sys -F name
# you should get a list of the system names managed by that
HMC without any password prompting
Use F3 to leave the QShell prompt.
Now you’ll need to download and edit the top of the
vnic-check.py file to set your parameters.
You can find the open source script vnic-check.py in the
public repository at: https://github.com/IBM/blog-vios4i
Download directly with: https://github.com/IBM/blog-vios4i/raw/main/src/vnic-check.py
smtphost: Set to the name (or address) of a SMTP relay in
your organization where you can send mail.
On the IBMi, if the current partition is running a mail server locally,
you can use 127.0.0.1 here. Set this to
None if you just want to get a printed result to the screen (or a spooled file
in batch). Using None is useful to
ensure the command is working properly before setting up email.
sender: If using
email, this needs to be a valid email address that can send mail in your
organization.
toaddrs: This is a
list of email addresses that should get messages when the check finds any
conditions that need fixing. You can use
a comma separated list of addresses between the brackets where each address is
enclosed in quotes.
hmcs: this should be a list of the SSH address of the
monitor account on your HMC in the format monitor@ipaddress. You can also use a DNS name instead of the ip
address if DNS is properly configured for your PASE environment (verify by using the host command in a PASE shell). The entire list should be surrounded by []
characters, and each hmc address should be surrounded by single quote
characters and separated by commas. It
is okay to only have one hmc in the list.
You will need to do the same key setup described above on each HMC if you use more than one.
minopercount: this should be the lowest number of backing
devices that is acceptable in your environment.
Any vNIC with less than this number of operational devices will be
reported as a problem.
When you have set your parameters, transfer the script to
the home directory of the user that will be running the command.
Finally, make sure it works by opening QShell (QSH command)
and running the script:
/QOpensys/pkgs/bin/python3
vnic-check.py
If all goes well, you’ll get no email or output (indicating
all of the vNICs found are without problems), or a list of the problems
found. If you get no output and want to
make sure it is finding your vNICs,
Change the minopercount variable to a high number (999) and rerun to
report all of your vNICs are lower than the desired count.
When you have verified all is well, reset the variables as
needed and schedule a job to run:
QSH CMD('/QOpensys/pkgs/bin/python3 vnic-check.py')
as the selected user on the desired schedule.
Need help?
If you need help implementing best practices for your vNICs,
the IBM i Technology Services team (formerly known as Lab Services) is
available to help with implementation planning, execution, and knowledge
transfer. See https://www.ibm.com/it-infrastructure/services/lab-services
for contact information or speak to your IBM Sales Representative or Business
Partner. If you are planning a new
hardware purchase, you can include implementation services by the Technology
Services team in your purchase.
Disclaimer
I am an employee of IBM on the IBM i Technology Services
team (formerly known as Lab Services). The
opinions in this post are my own and don't necessarily represent IBM's
positions, strategies, or opinions.
References
Getting started with Open Source Package Management in IBM i
ACS
https://www.ibm.com/support/pages/getting-started-open-source-package-management-ibm-i-acs
IBM i ACS Open Source Package Management Auth Fail Error
https://www.ibm.com/support/pages/node/1167988