I modified my nmon2influxdb tool to import HMC Performance and Capacity Monitoring(PCM) data in InfluxDB.
HMC PCM data
Honestly I knew it was available since a long time but I didn’t see a use case at first. But talking with users from nmon2influxdb I saw most of the them was not using the tool to analyze nmon files but to have a consolidated view of all their servers. Loading data from the HMC itself it’s easy to setup and PCM data are pretty interesting. And you have a lot less measurements compared to nmon making it easier to centralize performance metrics from hundreds of partitions and servers.
Here a gist showing an entry example for a partition.
I choose to use the Processed Metrics with the default sample of every 30 seconds and fetching the last two hours of statistics.
So import should be done every 120 minutes or a little bit less to be safe.
I updated the nmon2influxdb.org site with the latest informations. This post is mostly to show use cases.
You should have a look to demo.nmon2influxdb.org to HMC partition and HMC system dashboards to have a look what the results can be. User and password are demo.
After downloading the latest binary from github, you should update the configuration file ~/.nmon2influxdb.cfg with the informations needed to connect to your HMC:
hmc_server = "myhmc" hmc_user = "hscroot" hmc_password = "abc123"
After that, you just need to run the command:
nmon2influxdb hmc import Getting list of managed systems MANAGED SYSTEM: p750A partition powerVC: 2940 points MANAGED SYSTEM: p720-NIM_RETIRED Error getting PCM data MANAGED SYSTEM: POWER8-S824A partition WM-SLES1: 17885 points partition LV-PCM-Manager: 8330 points partition PowerVC-LE: 7105 points partition LVL-cluster2: 7134 points partition lvl-cluster1: 7134 points partition WM-SLES2: 17958 points MANAGED SYSTEM: p750B partition adxlpar2: 2952 points partition adxlpar1: 4182 points
You can also specify options manually:
nmon2influxdb hmc import --hmc myhmc --hmcuser hscroot --hmcpass abc123
Note: I don’t like to have clear passwords in my configuration file. I plan to fix it in issue #29 but I would like to have a better configuration management module. I know what I want, I just need time to code it. :)
You can use pre-built dashboards available with the new release. I am still experimenting with the display of HMC metrics and didn’t want to hardcode dashboard for now. To load them in your grafana instance:
nmon2influxdb dashboard hmc_partition.json nmon2infuxdb dashboard hmc_server.json
What is really interesting here are the processing units measurements: you can see capped, uncapped or donated processing units. Obviously, you can also see the maximum and entitled processing units.
Here an example displaying the capped and uncapped processing units used by a partition:
Memory is less interesting. You can see the amount of physical and logical memory allocated which will change if you are using Active Memory Sharing or Active Memory Expansion but you will have no statistics about this memory’s usage. You need operating system’s statistics here.
If you look quickly, you will not see the difference with a nmon report. But here the data doesn’t show the vscsi device on partition’s side but the vhost device on the vio servers.
Note: it’s only displaying vio server by ID and not the vio’s partition name. I am thinking about fetching it from the system but it’s low priority for now. Let me know if you are interested. :)
You can also see the differences between virtual traffic(between partitions in the same system) and physical(what is sent outside the system through vios).
I am just showing here some of the data aggregation capabilities. You can also filter a entire system by vswitch id for example.
But you can also choose to display the vio server id and the physical wwpn for multiple partitions on the same chart:
It’s almost the same than partition level but for the whole system:
It’s also possible to see cpu usage at system level by shared processor pools.
A memory allocation view at system level.
HMC doesn’t provides metrics for physical FC adapters at client partition level but does it for vios.
It’s the same for generic physical adapters.
This measurement will give all vhost statistics.
Here again, the name is not obvious but it’s where you will find all ethernet adapters physical and virtual.
This view can be used to see the Shared Ethernet Adapters statistics.
The last but not the least :) HMC gives great metrics to see the Shared Storage Pool usage.
But an interesting thing given by using InfluxDB is we can display metrics from different systems belonging to the same Shared Storage Pool. So we can see in one chart the I/O activity of all vios belonging to the same Shared Storage Pool.
Tagging is living :)
Without tagging the measurements, charts would not so flexible. It’s what gave the powerful data analysis capabilities. On nmon files, tagging was pretty limited but PCM data comes with a lot of informations allowing a lot more tagging and making data analysis a lot stronger.
One of InfluxDB great advantage is his SQL-like query system. It allows to group measurements by tags and apply filters in a great way.
It’s better to show what are the available tags on a measurement:
SELECT * FROM PartitionVirtualEthernetAdapters LIMIT 1 name: PartitionVirtualEthernetAdapters -------------------------------------- time SEA ViosID VlanID VswitchID name partition system value 1479669301000000000 ent4 1 1130 0 sentBytes adxlpar2 p750B 39
So here we see we have this tags: SEA, ViosID, VlanID, VswitchID, name, partition, system and value.
It’s possible to see all values for a specific tag:
SHOW TAG VALUES FROM "PartitionVirtualEthernetAdapters" WITH KEY = "partition" name: PartitionVirtualEthernetAdapters -------------------------------------- key value partition test_n1 partition test_n2 partition test_n3
And filter based on this value with a WHERE clause:
SELECT * FROM PartitionVirtualEthernetAdapters WHERE "partition" = 'test_n1' LIMIT 1 name: PartitionVirtualEthernetAdapters -------------------------------------- time SEA ViosID VlanID VswitchID name partition system value 1479669301000000000 ent4 1 1130 0 ReceivedBytes test_n1 testsys1 1502
GROUP BY is really powerful. It’s also possible to perform calculations on this metrics. Here I use the mean function:
SELECT MEAN(value) FROM PartitionVirtualEthernetAdapters GROUP BY "VlanID"LIMIT 1 name: PartitionVirtualEthernetAdapters tags: VlanID=1 time mean ---- ---- 0 26.14571920001631 name: PartitionVirtualEthernetAdapters tags: VlanID=10 time mean ---- ---- 0 0 name: PartitionVirtualEthernetAdapters tags: VlanID=1130 time mean ---- ---- 0 754.0389195656023
It’s pretty nice to be able to query performance metrics like that but where it’s becoming really great is when you combine it with the query editor provided by Grafana.
It’s making complex queries fun. It will display for you all the available tags. You can easily build your chart without knowing InfluxDB SQL-Like syntax.
Grafana add another great feature by allowing templating.
It will create a variable with values generated from an InfluxDB query:
SHOW TAG VALUES FROM "SystemProcessor" WITH KEY = "system" name: SystemProcessor --------------------- key value system POWER8-S824A system Server-9117-MMC-SN105C627 system p750-SSIS system p750A system p750B system p755-HPC
It’s also possible to have nested templating with query like that:
SHOW TAG VALUES FROM "PartitionProcessor" WITH KEY = "partition" where system =~ /$ManagedSystem/
It’s really useful for HMC data. It allows to display only partitions belonging to a managed system:
HMC developers gave us a great way to measure system performance.
Their API is maybe a little bit complex ;) but it’s very powerful.
I had a lot of fun developing this feature and I hope you will find it useful. Feedbacks are welcome. :)