HPE Performance Cluster Manager 1.11.0 Release Notes
==============================================================================

Copyright (c) 2018-2024 Hewlett Packard Enterprise Development LP.
All rights reserved.


Notices
------------------------------------------------------------------------------

The information contained herein is subject to change without notice. The
only warranties for Hewlett Packard Enterprise products and services are set
forth in the express warranty statements accompanying such products and
services. Nothing herein should be construed as constituting an additional
warranty. Hewlett Packard Enterprise shall not be liable for technical or
editorial errors  or omissions contained herein.

Confidential computer software. Valid license from Hewlett Packard Enterprise
required for possession, use, or copying. Consistent with FAR 12.211 and
12.212, Commercial Computer Software, Computer Software Documentation, and
Technical Data for Commercial Items are licensed to the U.S. Government under
vendor's standard commercial license.

Links to third-party websites take you outside the Hewlett Packard Enterprise
website. Hewlett Packard Enterprise has no control over and is not
responsible for information outside the Hewlett Packard Enterprise website.


Acknowledgments Library
------------------------------------------------------------------------------

This library topic contains elements you can conref for the frontmatter
Acknowledgments topic. For information about trademarks and how to
acknowledge them, see the Legal website.

Microsoft(R) and Windows(R) are either registered trademarks or trademarks of
Microsoft Corporation in the United States and/or other countries.

Java(R) and Oracle(R) are registered trademarks of Oracle and/or its affiliates.

Linux(R) is the registered trademark of Linus Torvalds in the U.S and other
countries.

Red Hat(R) and RPM(R) are trademarks of Red Hat, Inc. in the United States and
other countries.

ARM(R) is a registered trademark of ARM Limited in the United States and other
countries.

SUSE(R) is a registered trademark of SUSE LLC in the United States and other
countries.

Ubuntu(R) is a registered trademark of Canonical Ltd.

Altair(R) and Altair PBS Professional(R) are registered trademarks of Altair Engineering, Inc.


******************************************************************************
Contents
******************************************************************************

1.0 Overview

2.0 Getting Started
    2.1 Distribution Media and Software Documentation
    2.2 Operating System Support
    2.3 Hardware Requirements
    2.4 Software Licensing Information
    2.5 Electronic Software Delivery
    2.6 Warranty
    2.7 HPE Software Support

3.0 New Features and Improvements
    3.1 New Operating System Support
    3.2 Cluster Health Check Additions and Enhancements
    3.3 Cluster Discovery, Networking and Configuration Improvements
    3.4 General Additions and Enhancements
    3.5 Monitoring Improvements
    3.6 Scalable Unit (SU) Leader Nodes Improvements
    3.7 Quorum High Availability Setup Improvements
    3.8 Power Consumption Management Improvements
    3.9 Documentation Updates
    3.10 Deprecated Features
    3.11 Future Deprecations

4.0 Known Issues and Workarounds
    4.1 Upgrade
    4.2 Installation
    4.3 Networking
    4.4 High Availability
    4.5 SU-Leader
    4.6 System Config and Discovery
    4.7 Monitoring
    4.8 Command Line Interface (CLI)
    4.9 Graphical User Interface (GUI)
    4.10 Diags and Firmware
    4.11 Miscellaneous
    4.12 Ubuntu

5.0 Feedback

6.0 Appendix
    6.1 Notes on Using Unsupported or Unmanaged Network Switches with HPCM
    6.2 Supported Power Distribution Units (PDUs)
    6.3 HPE Power and Cooling Infrastructure Monitor (PCIM) Supported Devices
    6.4 HPCM Update Repository Guide
    6.5 List of CASTs Addressed in HPCM 1.11.0
    6.6 List of Issues Addressed in HPCM 1.11.0


******************************************************************************
1.0 Overview
******************************************************************************

HPE Performance Cluster Manager delivers an integrated system management
solution for Linux(R)-based high performance computing (HPC) clusters. HPE
Performance Cluster Manager provides complete provisioning, management, and
monitoring for clusters scaling to 100,000 nodes. The software enables fast
system setup from bare-metal, comprehensive hardware monitoring and management,
image management, software updates and power management.  HPE Performance
Cluster Manager reduces the time and resources spent administering HPC systems
- lowering total cost of ownership, increasing productivity and providing a
better return on hardware investments.

Initial system setup involves installation of software including the Linux
operating system on the administrative node, discovery of hardware components
for the cluster nodes and operating system provisioning for all the compute
and service nodes in the cluster. HPE Performance Cluster Manager can quickly
provision a cluster with thousands of nodes from bare metal - typically
within an hour. In addition, new cluster nodes being added to the existing
cluster are automatically discovered and configured without requiring system
shutdown.

Hardware management is comprehensive and secure. HPE Performance Cluster
Manager collects telemetry from the cluster nodes and stores them in a secure
repository. System administrator tasks on the administrative nodes are kept
secure from end-user access. When issues are detected, alerts are sent to the
attention of the system administrator via the console (GUI, CLI) and by email.
The system administrator can setup automatic reactions to specific alerts such
as power capping when a specific temperature is reached in the data center.
Additional analyses of the hardware metrics can be done by visualizing the
metrics at a specific point in time or over a historical period. The installed
software including the bios on the cluster nodes can be compared and flagged
for any inconsistencies with versions or missing items. Integrated firmware
flashing supports flashing of bios, BMC/iLO, CMC, network adapters and
switches.

The HPE Performance Cluster Manager image management system supports a secure
software image repository that stores  software in multiple formats including
RPM, ISO, remote repository and gold image. Software stored in the image
repository can be multiple versions of Linux operating system or other
software such as middleware and other applications. Each software image has
version control accountability built-in to track changes such as the software
image version, who made the change and date of last change.  Any software
image in the repository can be installed on-demand on a cluster node or set of
cluster nodes and restored to the original software environment as required.

For power management, HPE Performance Cluster Manager offers tools for
accurate measurement and prediction of power usage for better capacity
planning. Step-by-step topology and protocol-aware Power On/Off feature
enables controlled start and shut-down of the clustered system. For example,
power-on order is rack, chassis, cluster node and power-off order is cluster
node, chassis, rack. Power telemetry is collected in watts for rack AC, bulk
DC, cluster nodes and liquid cooling infrastructure. The metrics can be saved
for analysis and historical reference.  In addition, for the HPE SGI 8600
system, HPE Performance Cluster Manager supports advanced power management
features for power capping and power resource management for jobs with the
Altair PBS Professional Power Awareness feature. HPE Apollo systems require
Apollo Platform Manager (purchased separately) for power capping and rack
management.

HPE Performance Cluster Manager provides a comprehensive cluster management
environment providing resiliency, security, operational efficiency and scale
for HPE Apollo, HPE Cray, HPE Cray EX, HPE Cray XD, SGI and HPE ProLiant high
performance computing clusters.


******************************************************************************
2.0 Getting Started
******************************************************************************

2.1 Distribution Media and Software Documentation
------------------------------------------------------------------------------

HPE provides the HPE Performance Cluster Manager software and documentation
is available on as an electronic software download and on physical media.
Order the HPE Performance Cluster Manager Media SKU (Q9V62A) to order the
physical DVD media.

To download the software, visit the "My HPE Software Center" site at:

  http://www.hpe.com/downloads/software

If you have not associated your Support Agreement ID (SAID) with your HPE
Account, you may need to enter your SAID in order to search for the HPE 
Performance Cluster Manager software.

Customers may download the software and corresponding documentation from the
specified URL provided at time of delivery.

Individual product media files or ISO files are described in the "ISO File
Descriptions" section on the "Release Notes" tab on the HPCM 1.9 product
release page on the HPE Support Center.

Additional documentation can be downloaded from www.hpe.com/software/hpcm

Patches for the HPE Performance Cluster Manager product are published to
HPE's Software Delivery Repository at the following location:

  https://update1.linux.hpe.com/repo/hpcm/

To subscribe to patches, follow the instructions on the project page:

  https://downloads.linux.hpe.com/SDR/project/hpcm/

In order to see updates on the Software Delivery Repository, you will need
to have an HPE Account linked with any applicable HPE service agreements. 
See the section "6.4 HPCM Update Repository Guide" in the Appendix for more 
details.


2.2 Operating System Support
------------------------------------------------------------------------------

HPE Performance Cluster Manager supports the SUSE Linux Enterprise Server
(SLES), Red Hat Enterprise Linux (RHEL), HPE Cray Operating System (COS), Rocky
Linux, and Tri-Lab Operating System Stack (TOSS), and CentOS Linux releases 
noted below. HPCM can manage clusters in which all nodes run the same operating 
system release, a multi-distro cluster in which compute nodes run a different 
operating system release than the system management nodes (i.e., admin and 
leader), or a multi-distro cluster in which compute nodes run a variety of 
different operating system releases.  Review the following details to see which 
specific operating system releases are tested and supported on the various node
types and architectures:

  - x86_64
    o admin and leader: RHEL/Rocky8.9, SLES15SP5
    o compute/service : RHEL/Rocky8.8, RHEL/Rocky8.9, RHEL/Rocky9.2, RHEL/Rocky9.3
                        SLES15SP4, SLES15SP5, COS 2.4, COS 2.5, COS 23.11
                        TOSS 4.6, TOSS 4.7
                        Ubuntu 22.04.3

  - aarch64
    o compute/service : RHEL8.9, RHEL9.3
                        SLES15SP5, COS 23.11
                        TOSS 4.7

Additional Notes:

[1] Aarch64 support was re-introduced in the HPCM 1.10 release. Refer to system
    documentation for supported operating systems.

[2] HPE performed the majority of testing and validation with Mellanox OFED
    versions 23.10-1.1.9.0 (on latest chips) and 4.9-4.0.8.0 (on legacy chips),
    OPA 10.11.0.1.2 on supported distros.

[3] HPE tested and validated HPCM 1.11 with HPE Cray Programming Environment
    24.03 and Slingshot 2.2.

[4] CentOS 8.x is no longer supported as the community has moved to the CentOS
    stream model. HPE suggests using Rocky Linux 8.x as an alternative to the
    previous CentOS 8.x releases.

[5] HPCM 1.8 was the last release to support SLES12 and RHEL/CentOS7 releases
    on compute nodes.


2.3 Hardware Requirements
------------------------------------------------------------------------------

HPE Performance Cluster Manager software is supported on the following Gen9,
Gen10, Gen10+ and Gen11 platforms:

  - SGI 8600
  - HPE Apollo 2000, 4000, 6000, 6500 and 9000 systems
  - HPE Apollo 20 (including CLX-AP) and 40 systems
  - HPE ProLiant DL 325 / 345 / 360 / 380 / 385 / 580 servers
  - HPE Apollo 70 system
  - HPE Apollo 80 system
  - HPE Apollo 35 server
  - HPE Cray XD2000, XD6500 systems
    o XD220v, XD224, XD225v, XD295v
    o XD665, XD670
  - HPE Cray EX Supercomputers
    o HPE Cray EX235a, EX235n, EX255a, EX420, EX425, EX4252
    o HPE Cray EX2500 (chassis, compute blades, switch chassis, and CDU)
    o HPE Cray EX3000 (chassis, compute blades, switch chassis, and CDU)
    o HPE Cray EX4000 (chassis, compute blades, switch chassis, and CDU)
  - Superdome Flex Family


2.4 Software Licensing Information
------------------------------------------------------------------------------

For the Software to be valid on an HPE cluster, each server in the HPE cluster
must have a valid HPE Performance Cluster Manager license. Subject to the terms
and conditions of this Agreement and the payment of any applicable license fee,
HPE grants a non-exclusive, non-transferable license to use (as defined below),
in object code form, one copy of the Software on one device (server or node) at
a time for internal business purposes, unless otherwise indicated above or in
applicable Transaction Document(s). "Use" means to install, store, load,
execute and display the Software in accordance with the Specifications. Use of
the Software is subject to these license terms and to the other restrictions
specified by Hewlett Packard Enterprise in any other tangible or electronic
documentation delivered or otherwise made available with or at the time of
purchase of the Software, including license terms, warranty statements,
Specifications, and "readme" or other informational files included in the
Software itself. Such restrictions are hereby incorporated in this Agreement by
reference. Some Software may require license keys or contain other technical
protection measures. HPE reserves the right to monitor compliance with Use
restrictions remotely or otherwise. Hewlett Packard Enterprise may make a
license management program available which records and reports license usage
information, If so supplied, customer agrees to install and run such license
management program beginning no later than one hundred and eighty (180) days
from the date it is made available and continuing for the period that the
Software is Used.

Other terms of the HPE Software License are provided on the license agreement
that is delivered with the HPE Performance Cluster Manager software.


2.5 Electronic Software Delivery
------------------------------------------------------------------------------

Electronic software is available. Hewlett Packard Enterprise recommends
purchasing electronic products over physical products when available for
faster delivery and the convenience of not having to manage confidential paper
licenses.


2.6 Warranty
------------------------------------------------------------------------------

Hewlett Packard Enterprise will replace defective delivery media for a period
of 90 days from the date of purchase. This warranty applies to all HPCM
products found on the delivery media.


2.7 HPE Software Support
------------------------------------------------------------------------------

HPE Services leverages our breadth and depth of technical expertise and
innovation to help accelerate digital transformation with Advisory,
Professional, and Operational Services. There is a full range of services to
complement HPE Performance Cluster Manager software from advisory and design,
benchmarking and tuning services, factory pre-installation, configuration, and
acceptance as well as training and operational services.

Advisory Services includes design, strategy, road map, and other services to
help enable the digital transformation journey, tuned to IT and business needs.
Advisory Services helps customers on their journey to Hybrid IT, Big Data, and
the Intelligent Edge.

Professional Services helps integrate the new solution with project management,
installation and startup, relocation services, and more. In addition, Factory
Express installs the software in the factory when building the system. HPE
Education Services helps train staff using and managing the software and other
technology. We help mitigate risk to the business, so there is no interruption
when new technology is being integrated into the existing IT environment.

Operational Services:

  - HPE Flexible Capacity is a new consumption model to manage on-demand
    capacity, combining the agility and economics of public cloud with the
    security and performance of on-premises IT.
  - HPE Datacenter Care offers a tailored operational support solution built
    on core deliverables. It includes hardware and software support, a team of
    experts to help personalize deliverables and share best practices, as well
    as optional building blocks to address specific IT and business needs. HPE
    Datacenter Care for Hyperscale gives customers access to the Hyperscale
    Center of Excellence with technical experts who understand how to manage
    IT at scale including the software.
  - HPE Proactive Care is an integrated set of hardware and software support
    including an enhanced call experience with start to finish case management
    helping resolve incidents quickly and keeping IT reliable and stable.
  - HPE Foundation Care helps when there is a hardware or software problem
    offering several response levels dependent on IT and business requirements.

HPE Software Support offers a number of additional software support services,
many of which are provided to our customers at no additional charge.


HPE Performance Cluster Manager Software Technical Support and Update Service
-----------------------------------------------------------------------------

Software products include three years of 24 x 7 HPE Software Technical Support
and Update Service. This service provides access to Hewlett Packard Enterprise
technical resources for assistance in resolving software implementation or
operations problems. The service also provides access to software updates and
reference manuals in electronic form.

  - To download product update releases:
    My HPE Software Center: www.hpe.com/downloads/software

  - To learn more about accessing support materials:
    HPE Support Center: www.hpe.com/support/AccessToSupportMaterials

  - To subscribe to eNewsletters and alerts:
    HPE Email Preference Center: www.hpe.com/support/e-updates

IMPORTANT: Access to some online resources requires product entitlement.
You must have an HPE Account setup with relevant support agreement IDs
and product entitlements.

Your HPE Account is the new identity and access management infrastructure 
service for HPE's customers and partners; it replaces the HPE Passport.


Registration for Software and Technical Support and Update Services
------------------------------------------------------------------------------

If you received a license entitlement certificate, registration for this
service will take place following online redemption of the license
certificate/key.


How to Use Your Software Technical Support and Update Service
------------------------------------------------------------------------------

Once registered, you will receive a service contract in the mail containing
the Customer Service phone number and your Service Agreement Identifier (SAID).
You will need your SAID when calling for technical support. Using your SAID,
you can also go to the HPE Support Center web page to view your contract
online.


Sign Up for Product Alerts
------------------------------------------------------------------------------

To setup product alerts, follow the steps outlined below:

1) Login with your HPE Account to the HPE Support Center (support.hpe.com)
2) Hover mouse of the menu icon (3 horizontal lines) next to Support Center
3) Hover mouse over products and select "Sign up for Product Alerts"
4) On the page titled "Get connected with updates from HPE", enter required
   information and search for "HPE Performance Cluster Manager" in Step 1 of
   the "Products" section
5) Select both "HPE Performance Cluster Manager" and "HPE Performance Cluster
   Manager Licenses" in Step 3 of the "Products" section, and click the "Add
   selected products" button
6) Click on the large "Subscribe" button


Contacting HPE regarding HPE Performance Cluster Manager
------------------------------------------------------------------------------

Hewlett Packard Enterprise addresses cluster manager questions at the asset-
solution level or at the serial-number level, as follows:

  - Hewlett Packard Enterprise provides solution-level services to HPE 
    products that are designated with a Base System Code. Examples include the
    following:
      o HPE Cray EX systems
      o HPE Cray XD 6500 systems
      o Other configurations that include HPE Slingshot networking
    Use the asset solution serial number to open cluster manager service 
    requests. This is the single serial number typically used to open 
    technical service cases for any software, hardware, interconnect 
    (networking), or cooling question.
  - Serial number services originate using the individual server serial number 
    level. Typically, this is the serial number of the cluster admin node.


Join the Conversation
------------------------------------------------------------------------------

The HPE Community forum is a community-based, user-supported tool for Hewlett
Packard Enterprise customers to participate in discussions with the customer
community about Hewlett Packard Enterprise products (community.hpe.com).


Websites
------------------------------------------------------------------------------

   +------------------------------------------------------------------------+
   | Website                   | Link                                       |
   |---------------------------+--------------------------------------------|
   | HPE Performance Cluster   | www.hpe.com/software/hpcm                  |
   | Manager                   |                                            |
   |---------------------------+--------------------------------------------|
   | My HPE Software Center    | www.hpe.com/downloads/software             |
   |---------------------------+--------------------------------------------|
   | Hewlett Packard           | www.hpe.com/support/hpesc                  |
   | Enterprise Support Center |                                            |
   |---------------------------+--------------------------------------------|
   | Contact Hewlett Packard   | www.hpe.com/assistance                     |
   | Enterprise Worldwide      |                                            |
   |---------------------------+--------------------------------------------|
   | HPE Services              | www.hpe.com/services                       |
   |---------------------------+--------------------------------------------|
   | Subscription              | www.hpe.com/support/e-updates              |
   | Service/Support Alerts    |                                            |
   |---------------------------+--------------------------------------------|
   | HPE Performance Cluster   | downloads.linux.hpe.com/SDR/project/hpcm/  |
   | Manager SDR Information   |                                            |
   +------------------------------------------------------------------------+


******************************************************************************
3.0 Features and Improvements
******************************************************************************

The following sections highlight some of the features of the HPE Performance
Cluster Manager product.  Due to differences in platform hardware and
firmware, some HPCM features are not available on every supported platform.
Please note the exceptions noted in the following table:

  +-------------------------------------------------------------------------------------+
  | Platform           | Image Mgmt & | Monitoring | Mgmt Network | BIOS     | BMC/iLO  |
  |                    | Provisioning |            | Switch Mgmt  | Flashing | Flashing |
  |--------------------+--------------+------------+--------------+----------+----------|
  | SGI 8600           | Yes          | Yes        | Yes          | Yes [1]  | Yes      |
  |--------------------+--------------+------------+--------------+----------+----------|
  | Proliant DL325/345 | Yes          | Yes        | Yes          | Yes      | Yes [2]  |
  | 360/380/385/580    |              |            |              |          |          |
  |--------------------+--------------+------------+--------------+----------+----------|
  | Apollo 2000 Nodes  | Yes          | Yes        | Yes          | Yes      | Yes [2]  |
  |--------------------+--------------+------------+--------------+----------+----------|
  | Apollo 4000 Nodes  | Yes          | Yes        | Yes          | Yes      | Yes [2]  |
  |--------------------+--------------+------------+--------------+----------+----------|
  | Apollo 6000 Nodes  | Yes          | Yes        | Yes          | Yes      | Yes [2]  |
  |--------------------+--------------+------------+--------------+----------+----------|
  | Apollo 6500 Nodes  | Yes          | Yes        | Yes          | Yes      | Yes [2]  |
  |--------------------+--------------+------------+--------------+----------+----------|
  | Apollo 9000 Nodes  | Yes          | Yes        | Yes          | Yes      | Yes      |
  |--------------------+--------------+------------+--------------+----------+----------|
  | Apollo 20 (kl20)   | Yes          | Yes        | Yes          | Yes      | Yes      |
  |--------------------+--------------+------------+--------------+----------+----------|
  | Apollo 20 (CLX-AP) | Yes          | Yes        | Yes          | Yes      | Yes      |
  |--------------------+--------------+------------+--------------+----------+----------|
  | Apollo 20 (kl20)   | Yes          | Yes        | Yes          | Yes      | Yes      |
  |--------------------+--------------+------------+--------------+----------+----------|
  | Apollo 40 (sx40)   | Yes          | Yes        | Yes          | Yes      | Yes      |
  |--------------------+--------------+------------+--------------+----------+----------|
  | Apollo 40 (pc40)   | Yes          | Yes        | Yes          | Yes      | Yes      |
  |--------------------+--------------+------------+--------------+----------+----------|
  | Apollo 35          | Yes          | Yes        | Yes          | No       | Yes      |
  |--------------------+--------------+------------+--------------+----------+----------|
  | Cray EX2500        | N/A          | Yes        | Yes          | Yes (cC) | Yes (nC) |
  |-------------------------------------------------------------------------------------|  
  | Cray EX3000        | N/A          | Yes        | Yes          | Yes (cC) | Yes (nC) |
  |--------------------+--------------+------------+--------------+----------+----------|
  | Cray EX4000        | N/A          | Yes        | Yes          | Yes (cC) | Yes (nC) |
  |--------------------+--------------+------------+--------------+----------+----------|  
  | Cray EX235a        | Yes          | Yes        | Yes          | Yes      | Yes (nC) |
  |--------------------+--------------+------------+--------------+----------+----------|
  | Cray EX235n        | Yes          | Yes        | Yes          | Yes      | Yes (nC) |
  |--------------------+--------------+------------+--------------+----------+----------|
  | Cray EX255a        | Yes          | Yes        | Yes          | Yes      | Yes (nC) |
  |--------------------+--------------+------------+--------------+----------+----------|
  | Cray EX420         | Yes          | Yes        | Yes          | Yes      | Yes (nC) |
  |--------------------+--------------+------------+--------------+----------+----------|
  | Cray EX425         | Yes          | Yes        | Yes          | Yes      | Yes (nC) |
  |--------------------+--------------+------------+--------------+----------+----------|  
  | Cray EX4252        | Yes          | Yes        | Yes          | Yes      | Yes (nC) |
  |--------------------+--------------+------------+--------------+----------+----------|
  | Superdome Flex 280 | Yes          | Yes        | Yes          | No       | No       |
  |--------------------+--------------+------------+--------------+----------+----------|
  | Cray XD224         | Yes          | No [4]     | Yes          | No [4]   | No [4]   |
  |--------------------+--------------+------------+--------------+----------+----------|
  | Cray XD6500/X665   | Yes          | Yes        | Yes          | Yes      | Yes      |
  |--------------------+--------------+------------+--------------+----------+----------|
  | Cray XD6500/X665   | Yes          | Yes        | Yes          | Yes      | Yes      |
  |--------------------+--------------+------------+--------------+----------+----------|
  | Cray XD2000/X225v  | Yes          | Yes        | Yes          | Yes      | Yes      |
  |--------------------+--------------+------------+--------------+----------+----------|
  | Cray XD2000/X295v  | Yes          | Yes        | Yes          | Yes      | Yes      |
  |--------------------+--------------+------------+--------------+----------+----------|
  | Cray XD2000/X220v  | Yes          | Yes        | Yes          | Yes      | Yes      |
  +-------------------------------------------------------------------------------------+

   [1] On the SGI 8600, HCAs can also be flashed.
   [2] The Service Pack for Proliant (SPP) may be used to flash the firmware 
       on the HPE Proliant DL and Apollo systems.
   [3] Power related information has been moved from this table into the HPE 
       Performance Cluster Manager Power Consumption Guide.
   [4] Monitoring and firmware flashing of the XD224 will be addressed in a 
       patch to HPCM 1.11 when the final platform firmware is available.

The HPE Performance Cluster Manager 1.11 release includes several improvements
in the following areas:

  - Improved support for the HPE Cray XD6500 XD670 platforms (HPCM-2957)
    o Added power and bios setting support (HPCM-5555)
    o Added firmware flashing support (HPCM-5442)
    o Added monitoring support (HPCM-5556)
  - Partial support* for HPE Cray XD224 platform (HPCM-5151)
    * Able to discover, pxe boot, provision and do basic power management.


3.1 New Operating System Support
------------------------------------------------------------------------------

Operating system support has been updated to include the following:

  - Red Hat Enterprise Linux 8.9 (admin/leader/compute/service nodes)
  - Rocky Linux 8.9 (admin/leader/compute/service nodes)
  - Red Hat Enterprise Linux 9.3 (compute/service nodes)
  - Rocky Linux 9.3 (compute/service nodes)
  - HPE Cray Operating System 23.11 (compute/service nodes)
  - Tri-Lab Operating System Stack (TOSS) 4.7
  - Ubuntu 22.04.3 (compute/service nodes)

For the complete list of supported operating systems, see the notes in
section "2.2 Operating System Support" above.

    HPCM support for Ubuntu on any given compute platform is contingent on the 
    platform itself being supported on Ubuntu (see the HPE Servers Support & 
    OS Certification Matrix for Ubuntu for details):
    https://techlibrary.hpe.com/us/en/enterprise/servers/supportmatrix/ubuntu.aspx


3.2 Cluster Health Check and HW Triage Toolkit Additions and Enhancements
------------------------------------------------------------------------------

This release includes several changes related to improved support of platforms
both in diagnostics and in the hardware triage toolkit (HTT), as well as fixes
to cluster health check include the following:

  - Adds support for the EX235n, EX254n, EX255a, EX425, and EX4252 platforms 
    to HTT (HPCM-5987)

  - Includes a statically compiled gpu_sizzle in the diags (HPCM-5495)

  - AMD gpu diagnostics now based on ROCm 6.0 (HPCM-6189)

  - Adds several new and/or updated gpu-based diagnostics (HPCM-2427,HPCM-2428,
    HPCM-5112,HPCM-5141)


3.3 Cluster Discovery, Networking and Configuration Improvements
------------------------------------------------------------------------------

This release includes several changes related to discovery and system
configuration designed to aid in system setup, including the following:

  - New Hardware Support (HPCM-2957,HPCM-5151)
    This release improves support for the HPE Cray XD6500 XD670 platform and
    introduces basic support for the XD224 platform.

  - Adds support for LUKS2 security on diskful nodes (HPCM-5672)
    HPCM 1.11 introduces support for LUKS2 security on any diskful x86_64 
    nodes equipped with a trusted platform 2 (TPM2) device.  For more 
    information, see the section "Enabling and managing security on a disk 
    enabled with Linux unified key setup 2 (LUKS2)" in the HPCM 
    Administration Guide, the section "Cluster definition file example - 
    Entries for service nodes that enable Linux Unified Key Setup 2 (LUKS2) 
    security" in the Installation Guide, and in section "4.4.5 Enabling LUKS2
    Security on Q-HA Physcial Nodes" of these release notes.

    You do not need to enable LUKS2 security on the admin node in order to 
    enable LUKS2 security on other nodes.

  - Validates qualified updated firmware revisions for network switches:
    o Aruba switches (HPCM-5564)
    o HPE FlexFabric/FlexNetwork switches (HPCM-5565)
    See section 6.1 in the Appendix for details.

  - Improves network setup in configure-cluster (HPCM-4638)
    The configure-cluster tool will now prompt users during the initial
    interface setup menu to set the management IPs on the head and head-bmc
    networks, and on the admin node's interfaces.


3.4 General Additions and Enhancements
------------------------------------------------------------------------------

This release introduces several new enhancements to improve performance and 
easy of use including:

  - Instructions in Upgrade Guide to Prevent EPEL Conflicts (HPCM-5089) 
    EPEL is an optional repository that provides several packages that 
    conflict with or provide newer versions of packages in HPCM 1.10. The HPCM
    Upgrade Guide now includes instructions on how version lock those packages
    to prevent conflicts.

  - Adds support for iSCSI provisioning (HPCM-5608,HPCM-5803)
    iSCSI has been added as a rootfs option in addition to disk, tmpfs, and 
    NFS. Typically, iSCSI provisioning methods are higher-performing than 
    NFS provisioning methods because iSCSI provisioning methods connect to 
    the rootfs file system at the block level, while NFS adds another layer 
    to the file system.

  - Cluster manager command line interface improvements, including:
    o sudo user name captured in cm.log (HPCM-1680) 
    o 'cm image show' now includes image size information (HPCM-4379)
    o adds chassis type to 'cm controller' command (HPCM-4759)

  - Adds support for setting console rw/ro permissions (HPCM-3590,HPCM-5428)
    This change allows the HPCM admin to set, unset, and show which users are
    allowed either read-write or read-only access, using built-in Conserver 
    capabilities. Permissions may be set globally, so that all consoles have 
    the same permission lists, or each node can be set individually. If a node
    is unset, it goes back to the global setting (if set) or the initial 
    defaults (if not set).

  - Makes critical services more available during clone-slot (HPCM-5599)
    Critical services (e.g., database, power, config management, etc.) are 
    now left running rather than turned off during the clone-slot operation
    ensuring that those services are available. At the end of the clone-slot
    operation, services are turned off for a very short period of time while 
    a secondary in-place sync is performed to quickly update the destination 
    slot with any missing data.


3.5 Monitoring Improvements
------------------------------------------------------------------------------

This release provides various improvements to the monitoring infrastructure 
and tooling, including: 

  - Unified Alerting Platform Part 2 (HPCM-5221)
    HPCM 1.11 further improves the new alerting infrastructure introduced 
    in HPCM 1.10. It provides alerting for cooling distribution unit (CDU) 
    telemetry, CDU and cabinet leak events, AMD GPUs, slingshot switch status 
    heartbeat, email notifications for alerts, and more. Refer to the section 
    "Monitoring alerts with unified alerting" in the HPCM System Monitoring 
    Guide for more details.

  - Provides advanced option to destroy/rebuild all topics in kafka (HPCM-425)
    See 'cm monitoring advanced kafka wipe -h' for more details about use of 
    this new option.
    
  - Adds a new datadir health check for kafka (HPCM-4602)
    This release includes a new check for kafka that checks for mismatched 
    topic ids (meaning the folders are from different cluster instances), 
    missing replica folders, or extra folders. 

  - Records HPCM-specific logs into opensearch (HPCM-4430)
    This release records logs from /opt/clmgr/log and /var/log/log.ctdb from 
    the admin, and /var/log/glusterfs from su-leaders into opensearch.

  - Adds collection of PCIM 'metric_cooldev_pdu' to pdu-collect (HPCM-3213)

  - Enhanced slingshot health reporting, including, but not limited to:
    o report prots with ber and tx/rx pause (HPCM-2500)
    o report ports with UCW and llr_replay errors (HPCM-4095)
    o enhanced error handling in slingshot health reporting (HPCM-4080)
    o report MultiBit Errors (MBE) (HPCM-5418)
    o report which ports are unconfigured (HPCM-4096)

  - Adds automation of cluster view configuration in dashboards (HPCM-4268)
    A new procedure detects the hardware type of the system and then automates
    the configuration in the cluster view panel, as well as attempting to 
    automate the Grafana dashboards.
    
  - Adds new 'cm monitoring {slurm,pbs}' command (HPCM-5546)
    See the following sections in the HPCM System Monitoring Guide for more information:
    o Monitoring Altair PBS Professional operations
    o Monitoring SLURM operations

  - Adds new rackmap utility (HPCM-4291)
    The rackmap utility provides users with the ability to display power 
    status information, temperature readings, HPE Slingshot information, and 
    other cluster node telemetry data in a two-dimensional rack map display
    directly from the cluster manager command line interface. Refer to the 
    section "Visualizing telemetry and status information with the rackmap 
    tool" in the HPCM System Monitoring Guide for more details.
    
  - Adds a new /opt/clmgr/tools/monitoring.sh script to collect data pertinent
    to analyzing monitoring issues (HPCM-5663)

As always, refer to the HPE Performance Cluster Manager System Monitoring Guide 
for information on configuring monitoring tools and services in HPCM.


3.6 Scalable Unit Leader Nodes Improvements
------------------------------------------------------------------------------

This release contains several related changes that improve the performance of
Scalable Unit leader nodes, including:

  - Insecure NFS Disabled by Default (HPCM-5932)
    HPCM 1.11 now blocks non-root users from accessing the gluster NFS server
    present on systems using SU leaders.  To effect this change on a system 
    which has already been deployed, run the following commands from one of 
    the SU leaders and then reboot the leader:

      # volume set cm_shared nfs.ports-insecure off
      # volume set cm_logs nfs.ports-insecure off
      # volume set ctdb nfs.ports-insecure off
      # volume set cm_obj_sharded nfs.ports-insecure off

  The next time the leader reboots, the gluster NFS server will not accept 
  non-privileged ports. HPE notes that although the gluster CLI will state 
  that nfs.ports-insecure off as the default, HPE has found that it must be 
  set to off explicitly for gluster NFS to have the correct behavior. The 
  behavior change takes affect the next time gluster NFS is restarted. 


3.7 Quorum High Availability Setup Improvements
------------------------------------------------------------------------------

This release contains changes that improve the robustness of Quorum-HA setups,
including:

  - Improved ability to handle heavy connections without fencing (HPCM-5632)
    Under certain conditions when the admin virtual machine is under heavy
    connection loads, the connections managed by the firewall could get 
    exhausted which would cause operations on the physical node to fault, in
    turn causing the admin virtual machine to be fenced. This change allows
    HPCM to better handle this situation in both Quorum-HA and SAC-HA 
    configurations.


3.8 Power Consumption Management Improvements
------------------------------------------------------------------------------

HPCM 1.11 introduces the following technical previews:

  - System Power Capping (HCPM-5802)
    HPCM 1.11 introduces system-level power capping as a technical preview
    for the HPE Cray EX systems only.  For more information on installing and
    running system power capping, see the HPCM Power Consumption Management 
    Guide.

  - Added cpwrcli and mpwrcli interfaces (HPCM-5424,HPCM-5590)
    The cpwrcli command and the corresponding cm power REST API enable users
    to power on and power off nodes, chassis, and other components. Likewise,
    the mpwrcli command allows you to set power limits for certain controllers 
    and nodes. Both commands are introduced as technical previews in HPCM 1.11.
    For more information, see CM Power Service REST API Documentation on the
    cluster manager home page and the HPCM Power Consumption Management Guide.


3.9 Documentation Updates
------------------------------------------------------------------------------

The following documentation was updated for the HPCM 1.10 release:

  - HPE Performance Cluster Manager 1.11 Release Notes
  - HPE Performance Cluster Manager Getting Started Guide (007-6500-016)
  - HPE Performance Cluster Manager Installation Quick Start (P35632-009)
  - HPE Performance Cluster Manager Installation Guide for Clusters with
    Scalable Unit (SU) Leaders (P36611-008)
  - HPE Performance Cluster Manager Installation Guide for Clusters without
    Leader Nodes (P36610-008)
  - HPE Performance Cluster Manager Installation Guide for Clusters with ICE
    Leader Nodes (P36609-008)
  - HPE Performance Cluster Manager Command Reference (P36705-008)
  - HPE Performance Cluster Manager Administration Guide (007-6499-016)
  - HPE Performance Cluster Manager System Monitoring Guide (S-0120-005)
  - HPE Performance Cluster Manager Power Consumption Management Guide 
    (007-6498-016)
  - HPE Performance Cluster Manager Upgrade Guide (S-9926-004)

HPE provides direct links to specific versions of the HPCM manuals:

  - Getting Started Guide: https://www.hpe.com/support/hpcm-gsg-016
  - Installation Quick Start: https://www.hpe.com/support/hpcm-inst-qs-009
  - Install With SU Leader Nodes: https://www.hpe.com/support/hpcm-inst-su-leaders-008
  - Install Without Leader Nodes: https://www.hpe.com/support/hpcm-inst-no-leaders-008
  - Install With ICE Leader Nodes: https://www.hpe.com/support/hpcm-inst-ice-leaders-008
  - Upgrade Guide: https://www.hpe.com/support/hpcm-upgrade-004
  - Command Reference: https://www.hpe.com/support/hpcm-cr-008
  - Administration Guide: https://www.hpe.com/support/hpcm-admin-016
  - System Monitoring Guide: https://www.hpe.com/support/hpcm-monitor-005
  - Power Consumption Management Guide: https://www.hpe.com/support/hpcm-power-016

The latest versions of documentation are always available on the HPE Support
Center. The Document List for HPE Performance Cluster Manager can found online:

  https://support.hpe.com/hpesc/public/docDisplay?docId=a00050433en_us

If a new revision of a manual is not ready at release, a placeholder document
with a link to the online version of the manual will be provided in the 
clmgr-docs package included on the product ISO.


3.9.1 Manual Changes of Interest
------------------------------------------------------------------------------

The following information has been moved out of the HCPM PDF manuals:

  - Cluster Manager Ports
    The section on Cluster Manager Ports has been removed from the HPCM
    Administration Guide and moved into a searpate document which is available
    in the /docs directory of the product ISO and which gets installed to the 
    following location:

      /opt/clmgr/doc/HPCM_Port_Info.pdf

  - Singularity Examples
    The examples covering installation of Singularity containers has been
    removed from the HPCM Administration Guide and moved into a separate 
    document which gets installed to the following location:

      /opt/clmgr/doc/HPCM_Singularity_Examples.pdf


3.9.2 Release Note Update 01 Changes
------------------------------------------------------------------------------

Update 01 of the HPCM 1.11 release notes updated the following sections:

  - 2.2 Operating System Support
    Note [3] mistakenly referenced CPE 23.03; the correct version is 24.03.

  - Known Problems and Workarounds section updates:
    o 4.7.8 GPU native monitoring failing on EX254n platform
    o 4.7.9 Native monitoring GPU_AMD_temp display is 0 on EX255a platform
    o 4.11.9 Unable to create RHEL9.x images in Q-HA admin virtual machine


3.10 Deprecated Features
------------------------------------------------------------------------------

The following features have been deprecated in the HPCM 1.11 release:

  - su-leader-setup --add-leaders option
    HPE is deprecating the --add-leaders option used to add new groups of SU
    leaders to an existing system.  See the section "4.5.2 Growing SU-Leaders
    No Longer Recommended" below for more information.


3.11 Future Deprecations
------------------------------------------------------------------------------

The following section describes features that should be avoided when possible
because HPE plans to deprecate them in the future. HPE announces deprecations
in advance so that users have time to plan in ways that minimize the impact 
of specific changes.

  - Writeable NFS Options
    HPE plans to deprecate the writable NFS options: xfs file per node and 
    directory tree per node. These options were originally designed for use on
    the SGI ICE and HPE SGI 8600 platforms.


******************************************************************************
4.0 Known Issues and Workarounds
******************************************************************************

NOTE: Failure to reboot the admin node after upgrading or installing HPCM
      patches may result in a non-functioning cluster. HPE recommends that
      users reboot the admin node after upgrading or updating HPCM software
      on the admin node to ensure that all relevant services are restarted.

4.1 Upgrade
------------------------------------------------------------------------------
4.1.1 Preparing a System for Upgrade
------------------------------------------------------------------------------

There are certain steps you can take before upgrading from an earlier version
of HPCM 1.x that will provide for a smoother upgrade experience. Many of these
steps are already outlined in the HPCM Installation Guide in the section
entitled "Upgrading from an HPE Performance Cluster Manager 1.x release". The
following are additional steps that may not be noted in the guide yet.

NOTE: HPE tested upgrade scenarios from HPCM 1.10 to HPCM 1.11 only. HPE only 
      tests upgrades from the most recent N-1 release to the latest release N.
      To upgrade from N-2 (e.g., HPCM 1.9) to HPCM 1.11, follow the upgrade
      guide for the HPCM 1.10 release, and then follow the upgrade guide for
      HPCM 1.11.


4.1.2 Problems Creating Images after Operating System Upgrades
------------------------------------------------------------------------------

When creating images with a new HPCM version based on a newly supported
operating system, HPE recommends that you create initial node images
without any operating system updates. If an operating system updates
repo is already selected, unselect it and proceed with initial image
creation. Once you have confirmed that your image has been created, you
can re-select the operating system updates repo and apply updates to the
image. An operating system updates repo may sometimes contain updates
that have not been tested by HPE. By using the original operating system
release without updates, image creation will be closer to what was
validated at the time of the release and it provides more visibility into
which operating system packages are being updated.


4.1.3 Remove Mellanox OFED before upgrading running su-leaders
------------------------------------------------------------------------------
IM#1001790278

Attempting to upgrade an su-leader node which has the Mellanox OFED bits
installed will lead to errors due to conflicts between operating system OFED
packages and those provided by Mellanox. As such, the Mellanox OFED packages
must be removed before the upgrade or refresh, and then re-installed after
the upgrade is complete.  To remove the Mellanox OFED packages, use the
following command:

leader1:~ # /usr/sbin/ofed_uninstall.sh --force

When the upgrade of the su-leader is complete, reinstall the Mellanox OFED
packages.


4.1.4 ldmsd@.service Error Messages during Upgrade
------------------------------------------------------------------------------
HPCM-2718

When upgrading to a new HPCM version on the admin node, during the Installation
of the cray-ldms package, the following failure may be reported: 

admin: Failed to try-restart ldmsd@.service: Unit name ldmsd@.service is missing the instance name.
admin: See system logs and 'systemctl status ldmsd@.service' for details.

This message is harmless and can be ignored.


4.1.5 samba-ad-dc-libs deprecated in sles15sp5, but not obsoleted by SUSE
------------------------------------------------------------------------------
HPCM-5085

SUSE removed the samba-ad-dc-libs package, but did not setup any rules to 
obsolete the package. This package may be installed on some HPCM systems. The
HPCM Upgrade Guide contains specific instructions on removing the package from
images and running systems as part of the HPCM 1.9 to HPCM 1.10 upgrade 
process.


4.1.6 Package Upgrade Failures
------------------------------------------------------------------------------
HPCM-6115

During testing of upgrades from HPCM 1.10 + RHEL88 to HPCM 1.11 + RHEL89, HPE
noticed scriptlet failures reported by packages provided by the operating 
system vendor such as the following:

  admin: Running scriptlet: systemd-239-78.el8.x86_64 1419/1419 admin: 
  Couldn't write '10000' to 'kernel/perf_event_max_sample_rate': Invalid 
  argument admin: warning: %transfiletriggerin(systemd-239-78.el8.x86_64) 
  scriptlet failed

HPE did not see any failures that impacted functional operation the system.
Concerned users may review any scriptlet failures reported during 
installation or upgrade.


4.1.7 Version Locks When Upgrading Admin/Leader Nodes and Images
------------------------------------------------------------------------------
HPCM-6054

The HPCM Upgrade guide contains specific information on version locking of 
packages that may exist in more than one software repository.  HPE 
recommends following the instructions in that guide to prevent overwriting
required packages with others from other software repositories that might be
configured on the system such as EPEL or SLE Updates for COS customers.


4.1.8 SLES15SP5 QU1 c-c fails on brltty and libbrllap versions too old
------------------------------------------------------------------------------
HPCM-5885

When upgrading the physical admin nodes in a quorum HA configuration, the
virtual admin node, or a physical admin node in a non-HA configuration with the
SLES15SP5 QU1 iso, there may be package dependency errors during the upgrade
causing the upgrade to error out before completing. These errors may look like:

  Resolving package dependencies...
  2 Problems:
  Problem: nothing provides 'group(brlapi)' needed by the to be installed
  libbrlapi0_8-6.4-150400.4.3.3.x86_64
  Problem: nothing provides 'system-user-brltty = 6.4-150400.4.3.3' needed by the
  to be installed brltty-6.4-150400.4.3.3

To resolve this error, remove the conflicting packages using the zypper command
on all of the physical admin nodes and virtual admin nodes as well. These
packages are not used by the Cluster Manager and are safe to remove. To
remove the packages, run:

  # zypper remove brltty libbrlapi0_8 


4.2 Installation
------------------------------------------------------------------------------
4.2.1 All-at-once Kickstart Alternate Install Method Broken
------------------------------------------------------------------------------
IM#1001767617,1001810593

The alternate installation method using all-at-once kickstart is currently not
working as expected. Customers are advised not to use this method until HPE
provides a fix for the issue.


4.2.2 Caution about creating bootable USB drives
------------------------------------------------------------------------------

There are instructions in the HPE Performance Cluster Manager Installation
Guide that describe how to create a bootable USB drive to install the HPCM
product on an admin node.  HPE recommends using a 32GB (or larger) USB drive;
smaller USB drives will run out of space and the process to create a bootable
USB drive may turn the media into a read-only device.


4.2.3 cray-rasdaemon package not installed by default
------------------------------------------------------------------------------
IM#1001779638

The cray-rasdaemon package is not installed by default by HPCM, but the
package is available within the product for use on any Cray EX system.

cray-rasdaemon is a RAS (Reliability, Availability and Serviceability) logging
tool. It currently records memory errors, using the EDAC tracing events.
EDAC is drivers in the Linux kernel that handle detection of ECC errors
from memory controllers for most chipsets on i386 and x86_64 architectures.
EDAC drivers for other architectures like arm also exists. This userspace
component consists of an init script which makes sure EDAC drivers and DIMM
labels are loaded at system startup, as well as a utility for reporting
current error counts from the EDAC sysfs files.


4.2.4 Image created from sles15sp5 + updates - boot from disk only fails on 
      some platforms; falls to dracut
------------------------------------------------------------------------------ 
HPCM-5162

An issue has been observed where nodes that are assigned to boot from disk
only or set to disk bootloader, and the image assigned the node was created
using the SLES15 SP5 distro and SLES15-SP5 distro updates, the node will fail
to boot and drop to a dracut shell prompt due to being unable to find any disk
devices.

This issue has not been observed however if the image was created originally
without the SLES15 SP5 distro updates, then updated using the 'cm image zypper'
or 'cm image update' commands with the distro updates repository selected or
included in the repo group.

To resolve this issue, we recommend creating the node image without the
SLES15 SP5 distro updates repository selected or included in the repo group.
The image can then be updated with SLES15 SP5 updates after the image was
already created using the 'cm image zypper' or 'cm image update' commands. The
node assigned to the image can also be updated using the 'cm node zypper' or
'cm node update' commands with the distro updates repository selected or
included in a repo group.


4.2.5 Gluster Package Conflicts on Layered HPCM Installation
------------------------------------------------------------------------------ 
HPCM-390

When performing a layered installation (i.e., Linux distro installed first, 
followed by the HPCM install later), ensure that there are no gluster related 
packages installed BEFORE attempting the HPCM installation to avoid package 
conflicts. The HPCM product provides its own tested gluster packages.


4.2.6 configure-cluster admin failure due to system-user-qemu user conflict
------------------------------------------------------------------------------ 
HPCM-5358

When performing a layered installation (i.e., Linux distro installed first,
followed by the HPCM install later) using the standalone-install.sh script, 
there is a chance that the configure-cluster step will fail due to a UID 
conflict.  Both SLES15 SP5 (system-user-qemu) and RHEL8.8 (qemu-kvm-common) 
will attempt to create a qemu user with the UID 107. In some cases, other 
packages which create users may have already used the UID 107, which causes 
a failure like the following:

/usr/sbin/groupadd -r -g 107 qemu
/usr/sbin/useradd -r -c qemu user -d / -g qemu -u 107 qemu -s /usr/sbin/nologin
useradd: UID 107 is not unique
error: %prein(system-user-qemu-20170617-150400.22.33.noarch) scriptlet failed, exit status 1
error: system-user-qemu-20170617-150400.22.33.noarch: install failed

To workaround the failure, install the appropriate qemu packages on the admin
node BEFORE attempting to run the standalone-install.sh script.


4.2.7 Q-ha SLES15sp5 QU1 qemu-generated adminvm.xml files generated
------------------------------------------------------------------------------
HPCM-6157

When installing or upgrading physical admin nodes in a quorum HA configuration
using the SLES15SP5 QU1 install iso, there may be a qemu-generated adminvm.xml
file created on one or more of the physical admin nodes that was not generated
by the quorum HA setup tooling. 

This generated xml file should not be used for managing the functional admin VM
and should be removed. To remove the file, run the following command: 

# pdsh -g gluster rm -f /etc/libvirt/qemu/adminvm.xml


4.2.8 DL365 Gen11 Servers need additional kernel parameter with RHEL8.9
------------------------------------------------------------------------------
HPCM-5913 

HPE internal testing has revealed an issue that causes DL365 Gen11 Servers 
to panic during boot when running RHEL 8.9. The problem does not impact older 
or newer versions of RHEL8. To workaround the issue, HPE recommends adding an 
additional kernel parameter. The procedure to set a kernel parameter is 
different for nodes that are provisioned versus physical admin nodes.

  - DL365 as Leader, Service or Compute Node
    1) Set the kernel flag with the 'cm node set' command. For example:
       # cm node set -n {NODE/S} --kernel-extra-params nox2apic
    2) Follow the instructions to provision the node like normal.
 
  - DL365 as Admin Node (Quorum-HA or SAC-HA Physical Nodes)
    1) Boot the admin node installer from boot media like normal
    2) When presented with a menu system for installation, follow on-screen 
       instructions until asked to provide the kernel list. 
    3) When prompted for "Additional parameters (like console=, etc)," enter
       the following additional kernel parameters:
       --- nox2apic
       * It's important to use three (3) dashes to make the kernel parameter
         persistent across reboots.
    4) Follow on-screen instructions to complete the installation.


4.2.9 Switch Default ICE diags in rpmlists to xe diags for admin and default
------------------------------------------------------------------------------
HPCM-2319

HPCM provides field and performance diags packages for SGI 8600 (ice) systems
and for other systems (xe).  Starting with HPCM 1.8, the ice diags are no longer
the default diags packages selected for installation on the admin, service and
non-ice compute nodes. As such, for SGI 8600 customers only, when upgrading to
HPCM 1.8, HPCM will report file conflicts between the field_diags_licensed_xe
and field_diags_licensed_ice packages, as well as conflicts between the
perf_diags_licensed_xe and perf_diags_licensed_ice packages while running the
refresh-node or refresh-image commands with the default rpmlist.

To work around this issue, replace the field_diags_licensed_xe and
perf_diags_licensed_xe packages with the field_diags_licensed_ice and
perf_diags_licensed_ice packages in the generated rpmlists before attempting
to run the refresh-node or refresh-image commands during the HPCM upgrade.


4.3 Networking
------------------------------------------------------------------------------
4.3.1 ibN ifcfg files not automatically generated on the admin node
------------------------------------------------------------------------------
HPCM-844

The admin node is not automatically assigned IP addresses for the InfiniBand 
interfaces. These following instructions are the commands needed to assign 
addresses and create the ifcfg-ibX files if needed.

Adding IP address for ib0 interface:
# cm node nic add -n admin -N ib0 -w ib0 --compute-next-ip --interface-name admin-ib0

if ib1 is required (optional):
# cm node nic add -n admin -N ib1 -w ib1 --compute-next-ip --interface-name admin-ib1

Write the config to the database:
# cm node update config --sync -n admin

Print the admin ibX IP address values to use in building the ifcfg-ib[0,1]
interface files.

# cadmin --show-ips --node admin
IP Address Information for node: admin

ifname        ip               Network
admin         172.23.0.1       head
admin-bmc     172.24.0.1       head-bmc
admin-ib0     10.148.0.1       ib0
admin-ib1     10.149.0.1       ib1

Use the admin-ib0 IP address from the above output for the IPADDR field in
the ifcfg-ib0 file. Repeat if necessary for the ifcfg-ib1 file.

SLES15 ifcfg file location:
/etc/sysconfig/network/ifcfg-ib1
/etc/sysconfig/network/ifcfg-ib0

# SLES15 ib0
STARTMODE='onboot'
BOOTPROTO='static'
IPADDR='10.148.0.1'
NETMASK='255.255.0.0'
WIRELESS='no'
LINK_REQUIRED='no'

Use the admin-ib0 IP address from the above output for the IPADDR field in
the ifcfg-ib0 file

# SLES15 ib1
STARTMODE='onboot'
BOOTPROTO='static'
IPADDR='10.149.0.1'
NETMASK='255.255.0.0'
WIRELESS='no'
LINK_REQUIRED='no'

RHEL8x ifcfg file location :
/etc/sysconfig/network-scripts/ifcfg-ib0
/etc/sysconfig/network-scripts/ifcfg-ib1

# RHEL8x ib0
DEVICE=ib0
TYPE=InfiniBand
BOOTPROTO=static
PREFIX=16
IPADDR=10.148.0.1
ONBOOT=yes

# RHEL8x ib1
DEVICE=ib1
TYPE=InfiniBand
BOOTPROTO=static
PREFIX=16
IPADDR=10.149.0.1
ONBOOT=yes


4.3.2 configure-cluster: "Unable to start master OpenSM on host.." Errors
------------------------------------------------------------------------------
IM#1001810442

When attempting to administer the InfiniBand fabric in configure-cluster ->
Configure Infiniband Fabric, failures may be reported due to missing the opensm
package. This is caused because opensm is no longer a dependency for the
opensource-opensm-multifabric package. To work around the issue, HPE recommends
that system administrators install MLNX OFED on the nodes or image in order to
administer the InfiniBand fabric.  Alternatively, administrators may install
the opensm package provided in the operating system repository.


4.3.3 Enabling predictable net names on nodes with disk-bootloader enabled
------------------------------------------------------------------------------
IM#1001811844

net.ifnames is used to determine whether to enable predictable net names, which
is set by a conf.d script. This is normally supplied to nodes through tftpboot
configuration files, but for nodes that boot directly to disk or have disk
bootloader enabled, that conf.d script needs to be run before rebooting.

To determine which nodes have disk-bootloader enabled, run the following
command:

# cm node show -n "*" --disk-bootloader

To determine which nodes boot off disk directly, for nodes booted on efi,
run 'efibootmgr' and observe the "BootCurrent" device. For nodes booted on
legacy BIOS,  observer bootorder from the bmc or bios menu.

After upgrading to HPCM 1.8 (or later) on a compute or leader node that boots 
to disk or has boot_diskloader enabled, run the following command, where 
<node_name> is the name of the node:

# ssh <node_name> /etc/opt/sgi/conf.d/80-ondisk-kernel-parameters

Or if all nodes in a node group boot to disk (e.g. su-leaders), use pdsh
instead:

# pdsh -g su-leader /etc/opt/sgi/conf.d/80-ondisk-kernel-parameters


4.3.4 Setting backup-dns-server fails to start named service
------------------------------------------------------------------------------
IM#1001787548

When configuring a compute/service/login node as a backup DNS server, the new
DNS server will appear in /etc/resolv.conf for all other service nodes, but
the operation does not currently start the named service. The workaround is
to manually start and enable the named service on the backup-dns server after
setup is complete.


4.3.5 Mellanox OFED module compatibility issues
------------------------------------------------------------------------------ 
HPCM-3909

HPE has identified several possible issues with MLNX OFED modules not loading
with either the GA kernel or specific kernel updates. For instance, with MLNX
OFED 23.07-x modules do not load with the RHEL 8.x base kernels, but do load
with kernel updates.  For SLES, the situation is reversed.  HPE recommends 
reading through MLNX OFED release notes for specific details regarding MLNX 
OFED module and operating system kernel compatibility.  In some cases, the
'mlnxofedinstall' command may help:

  # ./mlnxofedinstall --add-kernel-support --kmp


4.4 High Availability
------------------------------------------------------------------------------
4.4.1 SAC-HA Requires both High Availability and ResilientStorage for Updates
------------------------------------------------------------------------------
IM#1001757666

The current SAC-HA solution for HPCM customers requires the High Availability
add-on as well as the Resilient Storage add-on product. The SAC-HA solution
requires the dlm package, which only ships as part of the Resilient Storage
add-on. Since the dlm package ships on the RHEL 8.x ISO image, the dependency
can be satisfied at initial install. However, it is possible to encounter a
case where updates to the High Availability packages require an update the dlm
package, which is only available through the Resilient Storage channel and
requires a valid subscription to access.  The SAC-HA solution for HPCM
customers has been tested with the packages provided on the RHEL 8.x media as
of the HPCM 1.6 release.


4.4.2 HA-RLC: installation on leaders requires ha_net_ip assignment
------------------------------------------------------------------------------
HPCM-1617

HA-RLC installation on leaders will fail on SLES15 SPx unless the ha_net_ip
variable is assigned the correct value.  In the cluster configuration file,
define ha_net_ip=192.168.161.1 for r1lead1 and ha_net_ip=192.168.161.2 for
r1lead2 and rediscover the leaders to work around the issue.


4.4.3 quorum-ha physical nodes not updating slot chooser grub after clone-slot
------------------------------------------------------------------------------
HPCM-1705

HPCM supports cloning slots on Quorum-HA physical nodes.  However, this only
lets sys admins work with the partitions related to the operating system and
HPCM.

To allot the maximum amount of space to the virtual machine, the disk or LUN
used for the admin node virtual machine image should not be split into pieces.

It is important to note that the gluster volume metadata is on the root
filesystem, and the gluster disks only have one partition with a large
filesystem by default.

Gluster does not support the rollback of gluster versions; only the roll
forward. Therefore, if the slot is cloned and the newly cloned slot updated
with new packages that also include a gluster version change, then the older/
original slot may be incompatible the gluster bricks mounted at /data on
the physical nodes.

If maintenance work may include gluster version change/update, HPE recommends
backing up the admin virtual image in some other way to support rollback.

In addition, HPCM 1.10 does not automatically update the label information
reported by the 'cadmin --show-root-labels' command.  This means that after
cloning, the above command may report 'slot 2: (no install found)'

This issue is purely cosmetic. HPE recommends reaching out to the support
organization for a procedure to update the label reported by the slot chooser.


4.4.4 Q-HA SLES15 SP5: unlabeled data drive tends to remain /dev/sda
------------------------------------------------------------------------------
HPCM-4948

When installing Quorum-HA, HPE recommends clearing the data drive using wipefs 
tools SLES15 SP5 OS to ensure that the disk containing the OS be recognized as
/dev/sda and the data disk as /dev/sdb.


4.4.5 Enabling LUKS2 Security on Q-HA Physcial Nodes
------------------------------------------------------------------------------
HPCM-5801

Starting with HPCM 1.11, customers may use luks2 encryption on the physical 
nodes making up the quorum-ha soluiton. Note that the gluster area is not 
encrypted at this time.  luks2 encryption may also be used in the admin 
virtual machine, although the SWTPM (software TPM) data is stored in a non-
encrypted shared gluster filesystem (the same one that houses the virtual
machine admin node itself) on the physical nodes.  HPE may investigate TPM
encryption in a future release.

HPE strongly recommends saving any luks2 password for the physical nodes and 
the virtual machine in case the TPM is not able to unseal. Testing has shown 
that the SWTPM may not save the enrollment data until the virtual machine is 
shut down. If following standard procedures, the virtual machine will reboot
when the installation of the admin node is complete.

NOTE: Changing non-encrypted root filesystems into encrypted root filesystems 
      is not supported. 

The following steps outline how to enable luks2 encryption on the physical 
admin nodes in the Quorum HA solution so that any physical node can start the
admin VM and the VM can unlock the root volumn using the SWTPM.

  1) Add UUID to adminvm.xml so that it doesn't regenerate each time:

       <domain type='kvm' id='4'>
         <name>adminvm</name>
         <uuid>361cedac-6aca-405b-bfe8-382cb46b39c9</uuid>
         <memory unit='KiB'>158059488</memory>
       ..
  2) Add TPM section with persistence

       <tpm model='tpm-crb'>
         <backend type='emulator' version='2.0' persistent_state='yes' />
       </tpm> 
  3) Run the following commands to create swtpm in shared storage and linking
     to /var/lib/libvirt/swtpm:
       # mkdir /adminvm/swtpm
       # chmod 711 /adminvm/swtpm
       # pdsh -g gluster rm -rf  /var/lib/libvirt/swtpm/
       # pdsh -g gluster ln -s /adminvm/swtpm /var/lib/libvirt/swtpm


4.5 SU-Leader
------------------------------------------------------------------------------
4.5.1 conserver reports errors on su-leaders if no nodes are assigned
------------------------------------------------------------------------------
IM#1001790841

The conserver service reports the following error if no consoles are found,
which is the case when there are no nodes assigned to the su-leader:

  Node leader1 reported error: Job for conserver.service failed because the
  control process exited with error code.

This error message will not appear once nodes are assigned to the leader,
which is the normal case.


4.5.2 Growing SU-Leaders No Longer Recommended
------------------------------------------------------------------------------
IM#1001811296, HPCM-1673, HPCM-1600, HPCM-6128

The original design of the SU Leader system includes support to add additional 
sets of 3 leaders at a later time. However, growing the gluster volumes, while 
technically supported by gluster, has proven difficult to do correctly and 
repeatably in automation without human intervention. As such, HPE no longer 
recommends attempting to grow SU-Leaders.

When there is a need to increase the number of SU leaders in use on site, HPE
recommends the following approach:

  1) Backup system logs and console logs (optional)
  2) De-couple the admin node from the su leaders (disable-su-leader)
  3) Discover additional leaders
  4) Run the su-leader-setup command to configure the SU leaders (see the
     procedure in the HPCM Installation Guide for details)

Contact HPE support for more information and help with this procedure. The 
section "Adding scalable unit (SU) leader nodes" will be revised in the 
HPCM Installation Guide in the future.


4.5.3 su-leader-collection package fails to install
------------------------------------------------------------------------------
IM#1001726164, IM#1001743636

The su-leader-collection package has a dependency on the ctdb package.
The ctdb package, in turn, expects a specific version of the samba packages.
If these two packages get out of sync, by installing samba updates and not
the corresponding ctdb updates for instance, the su-leader-collection
package will fail to install. Should the system end up in this state,
there are two possible workarounds. Option 1 is to make the high
availability repository available so that an updated ctdb package can be
pulled in to match the samba updates already installed on the system or
in an image. Option 2 is to downgrade the samba packages to match the
version designed to work with the ctdb package available in current software
repositories.

This problem may also present itself when installing the su-leader-collection
package on a system where a RHEL operating system updates repository is
available. If the high availability repository is also not available, the
samba-client-libs and samba-common packages can get out of sync and the
installation of the su-leader-collection will report dependency errors. The
workarounds are the same as those listed above.

One way to prevent a system from encountering these issues is to lock specific
packages.  For instance, by locking the samba base packages, it will be much
more difficult for the samba and ctdb packages to get out of sync.  When both
the samba and the corresponding ctdb packages are available, package locks may
be removed in order to complete the update. Refer to the Linux operating system 
doumentation for more details about how to lock/unlock packages.


4.5.4 NVME gluster disks must use /dev/disk/by-path in su-leader-nodes.lst
------------------------------------------------------------------------------
HPCM-1440

su-leader-setup documentation recommends using /dev/disk/by-path devices in the
list of devices for gluster disk, but /dev/sdX names have worked. However, with
NVME drives, using /dev/nvmeX names in su-leader-nodes.lst will produce a
series of errors. The tool will inform the user that the device is not using
/dev/disk/by-path device names and will produce a series of ugly "basename"
bash errors. These errors are easily avoided by using the documented
/dev/disk/by-path device names that HPE recommends.

Incorrect device names:
leader1,172.24.255.1,172.23.255.1,/dev/nvme0n1
leader2,172.24.255.2,172.23.255.2,/dev/nvme0n1
leader3,172.24.255.3,172.23.255.3,/dev/nvme0n1

Correct device names:
leader1,172.24.255.1,172.23.255.1,/dev/disk/by-path/pci-0000:44:00.0-nvme-1
leader2,172.24.255.2,172.23.255.2,/dev/disk/by-path/pci-0000:44:00.0-nvme-1
leader3,172.24.255.3,172.23.255.3,/dev/disk/by-path/pci-0000:44:00.0-nvme-1


4.5.5 Warning messages on su-leader reboot
------------------------------------------------------------------------------
HPCM-3259

After rebooting an su-leader, the following warning message may be displayed:

  Warning Mount /opt/clmgr/shared_storage, on leaderX, has EXTRA 
  fuse/glusterfs process, count: 2

A small number of duplicate mounts are normal and not a cause for concern. In 
a future release, HPE is investigating reduced use of bind mounts which will 
make it easier to properly manage mounts in the HA monitoring scripts.


4.5.6 Rebooting more than 3 leaders causes glusterd and ctdb issues
------------------------------------------------------------------------------
HPCM-2900

HPE has observed that when rebooting more than three (3) leaders at a time,
even when following quorum rules, glusterd can get stuck, causing a failure in
all brick processes, so mounts get stuck and ctdb status will remain 
DISCONNECTED.  HPE strongly recommends rebooting no more than three (3) 
leaders at a time, which observing quorum rules. HPE is investigating the 
issue.


4.5.7 command failed gluster-and-ctdb-health-check --quiet during upgrade
------------------------------------------------------------------------------
HPCM-5097

HPCM 1.10 introduced a script to check the health of gluster and ctdb.  HPE 
recommends running this script and fixing any problems BEFORE upgrading the SU 
leader nodes.  The script is /opt/clmgr/bin/gluster-and-ctdb-health-check.

Since this script was introduced in HPCM 1.10, it will not exist on SU leader 
nodes that haven't yet been upgraded.  A step has been added to the upgrade 
guide to manually copy the script to the SU leaders before upgrading them as
follows:

  # pdcp -g leader /opt/clmgr/bin/gluster-and-ctdb-health-check /opt/clmgr/bin/gluster-and-ctdb-health-check


4.6 System Config and Discovery
------------------------------------------------------------------------------
4.6.1 Card type, bmc username, password and baud rate required in config files
------------------------------------------------------------------------------
IM#1001727110,1001767084,1001765634

Starting in HPCM 1.4, you must provide additional information in any config
files you plan to use for discovery.

In the past, if the BMC user name, password, or baud was not specified in
the config file, the BMC would be probed for the values. This probing
did take time and did not have unlimited scaling abilities. The requirement
now is that you always specify all four of the following values:

  - card_type
  - bmc_username
  - bmc_password
  - baud_rate

The card_type values are currently case sensitive, so for iLO systems, use
'card_type=iLO' and systems with non-iLO BMCs, use 'card_type=IPMI". Failure
to use the correct cases will result in broken consoles.  HPCM will talk to
iLOs and non-iLO BMCs using different APIs, so failure to provide a card_type
value or the incorrect card_type value typically manifests as a failure for
conserver to work or for node discovery to take an abnormal amount of time
due to failed ping attempts.

If you have already discovered the nodes, you can use the 'cadmin' commands to
set values. Replace the username and password with the correct values:

  # cm node set --bmc-username USER --bmc-password PASSWORD -n NODES
  # cm node set --baudrate RATE -n NODES

The 'cm node set' command does not yet support setting card type, but that may
be done with the following legacy cmu_mod_node command as follows:

  # cmu_mod_node --mgt-card CARDTYPE --hostname NODES


4.6.2 Adding ICE Compute Nodes to Discover Config File Fail
------------------------------------------------------------------------------
IM#1001809637

Attempting to discover ICE compute nodes via the discover config file will
cause the nodes to appear in the HPCM database, but the hostnames will fail to
be recognized. To work around the issue, ICE compute nodes should not be listed
in the discover config file.


4.6.3 Netboot Files Not Cleaned Up on Node Deletion from Database
------------------------------------------------------------------------------
IM#1001787083

When nodes are added into the db with the 'cm node add' or 'discover'
commands, pxe config files are created for newly added nodes in
/opt/clmgr/tftpboot/grub2/cm.  These files are also created when invoking the
'cadmin --set-dhcp-bootfile' (e.g., switching between grub2 and ipxe-direct).

When a node is removed, the netboot file for the deleted node is not removed.
If nodes are added and the netboot environment is not refreshed, this can
cause issues. The current workaround is to manually remove netboot files for
deleted nodes.


4.6.4 Fastdiscover fails to add Cray EX blades to DB
------------------------------------------------------------------------------
IM#1001810367

On Cray EX systems, cmcinventory first adds the controllers to the
fastdiscover.conf file and then goes through the entire process of discovering
them. Once they are in the database and reachable, HPCM then scans for node MAC
addresses within the controllers, and adds them to the fastdiscover.conf file.

If a system administrator attempts to use a fastdiscover.conf file which is
already populated with NodeControllers and Nodes to add back nodes, the command
will display the following error:

  (): Attempt to add controllers failed: Controller name 'x9000c1r7b0' already used at /opt/sgi/lib/NewNodes.pm line 614.
  If failure was due to previously existing nodes consider option --skip-existing-nodes

To work around the issue, use the '--skip-existing-nodes' option as instructed
in the error message.


4.6.5 Blade fails to image with https, never moves to next interface
------------------------------------------------------------------------------
HPCM-1450, HPCM-3884

In some cases, when attempting to image blades with https, the blades will
fail to hang on the https interface and also fail to fall back to the next
available interface. To work around this problem, either change the boot 
order so that http(s) is not listed first or console into the node and 
select the pxe IPv4 option.


4.6.6 Unknown kernel command line parameters in dmesg log
------------------------------------------------------------------------------
HPCM-3781

 [    0.000000] Unknown kernel command line parameters 
 "BOOT_IMAGE=/vmlinuz-5.14.21-150400.24.41-default boot=LABEL=sgiboot 
 biosdevname=0", will be passed to user space.

The above message appears in the dmesg log. The reason for the message is the 
Linux kernel is being a little more explicit as to what parameters on the 
command line the kernel did not process.

It is known the BOOT_IMAGE parameter will be called out but it can be safely 
ignored. It is an informational message.


4.6.7 EX420 consoles hang at serial drv
------------------------------------------------------------------------------
HPCM-5148

When using COS or RHEL, if node console output stops with the following error:

  "Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled"

use the following command to set extra kernel parameters:

  # cm image set --kernel-extra-params PARAMS -n NODENAME 
  
where PARAMS for COS and RHEL are as follows:

  COS : 8250.nr_uarts=1 8250.use_polling=1 8250.skip_non_ioport=1
  RHEL: 8250.nr_uarts=5


4.7 Monitoring
------------------------------------------------------------------------------
4.7.1 het_trap_processor ERROR mqtt client connect failure
------------------------------------------------------------------------------
IM#1001740008

On a newly installed admin node, het_trap_processor reports a connection
failure to mqtt.  This error can occur whenever kafka and related monitoring
services are not yet running. To avoid the error, use the cm monitoring
commands to turn on kafka/elk/alerta-based monitoring services.

==> het_trap_processor.log <==
2020-02-18T19:34:00.643Z root ERROR mqtt client connect failure 127.0.0.1:1883
error(111, Connection refused)
2020-02-18T19:34:00.643Z root INFO Starting HET Processor


4.7.2 Grafana hpeclusterview_panel plugin is unsigned
------------------------------------------------------------------------------
HPCM-1215

The root_url for Grafana on HPCM systems is https://localhost:3000/grafana. The
Grafana community who owns the plugin signing process requires that plugins
be signed with root_url=https://localhost:3000. The hpeclusterview_panel plugin
can be signed with url_root=https://localhost:3000/grafana, but when loaded,
the panel will show that the plugin has an invalid signature. HPE has opened
a case with Grafana to allow more flexibility in the root_url value used by
private plugins and will continue to monitor. Until then, HPE recommends that
customers either (1) set allow_loading_unsigned_plugins to true in the Grafana
configuration or (2) do not load the hpeclusterview_panel plugin.


4.7.3 'cm monitoring ldms' Commands Only Work on Admin Node
------------------------------------------------------------------------------
HPCM-2491

Some 'cm monitoring' commands (e.g., 'cm monitoring kafka status') will print an
error when run on the leader nodes, but 'cm monitoring ldms' currently does
not. The 'cm monitoring ldms' commands should only be run on the admin nodes. 
HPE may provide a more appropriate error message in a future release.


4.7.4 Distributed Data Not Available on Grafana Dashboards
------------------------------------------------------------------------------
HPCM-2854

Anytime a change is made that expands the number of instances of a service that 
will be monitored, such as the case in distributing kafka or ELK (e.g., 
kfka-dist-setup, elk-dist-setup), Service Infrastructure Monitoring (SIM) must
be disabled, re-enabled, and then restarted. This is also the case for adding a
new leader, adding a new switch, and so on.  To re-enable SIM after distributing
kafka or ELK, run the following commands:

  # cm sim disable
  # cm sim enable
  # cm sim start


4.7.5 cm-postgresql-14 Sservice Fails to Start
------------------------------------------------------------------------------
HPCM-4012

Opensearch has a memory limit of 30G. On systems with less memory, starting at
about 60G, the cm-postgresql-14 service may fail to start with an error 
message similar to the following:

  529 -- Unit cm-postgresql-14.service has begun starting up.
  530 Mar 07 09:15:45 system postmaster[522170]: 2023-03-07 09:15:45.893 
    CST [522170] FATAL: could not map anonymous shared memory: Cannot allocate memory
  531 Mar 07 09:15:45 system postmaster[522170]: 2023-03-07 09:15:45.893 
    CST [522170] HINT: This error usually means that PostgreSQL's request for 
    a shared memory segment exceeded available memory, swap space, or huge 
    pages. To reduce the request size (currently 17436033024 bytes), reduce 
    PostgreSQL's shared memory usage, perhaps by reducing shared_buffers or 
    max_connections.

To work around the issue, reduce the default memory requirement of opensearch
by changing the "-Xms30g" and "-Xmx30g" values in /etc/opensearch/jvm.options.


4.7.6 Default directory formating for log rotation changed
------------------------------------------------------------------------------
HPCM-4880

The default log rotation directory formatting for /var/log/HOSTS and 
/var/log/consoles changed in HPCM 1.10. Previously, when a file was rotated, 
logrotate would put the file under /var/log/HOSTS/old/YYYY-MM or 
/var/log/consoles/old/YYYY-MM, where "YYYY-MM" signified the year and month 
when the file was rotated. 

Starting in HPCM 1.10, the format is now per day, YYYY-MM-DD. So any day a log 
is rotated, it will be put under its own dated directory.

The configuration for /var/log/HOSTS is in the file 
/etc/sysconfig/cm-logrotate-parallel-hosts on the admin node and the config 
file for /var/log/consoles is in /etc/sysconfig/cm-logrotate-parallel-consoles
on the admin node. 

To modify the format of the old directory where rotated logs are moved, modify
the directory-specific configuration file and change the "rotatedirname" 
variable under the preremove section to the desired format and comment out the
previous configuration of the "rotatedirname" variable. These configs have 
additional examples of different formats to use.

For example, to switch to a format containing both the hostname and the date, 
set rotatedirname in the configuration file to: 
rotatedirname="$(date +%Y-%m 2>/dev/null)/$(basename ${1%%-$(date +%Y%m%d 2>/dev/null)*})"

Any adjustments to these configuration files will persist across upgrades.

For more information, see the logrotate man page.


4.7.7 confluent-schema-registry.service shows failed state
------------------------------------------------------------------------------
HPCM-5965

If the SIM dashboard reports that the confluent-schema-registry.service is in 
a failed state on an SU leader, please run the following command on the su-leader
to mask the confluent-schema-registry.service on that leader:

  # systemctl mask confluent-schema-registry.service


4.7.8 GPU native monitoring failing on EX254n platform
------------------------------------------------------------------------------
HPCM-6283

GPU native monitoring is currently failing on the EX254n platform. HPE is 
investigating the problem and working on plans to address the issue.


4.7.9 Native monitoring GPU_AMD_temp display is 0 on EX255a platform
------------------------------------------------------------------------------
HPCM-6291

The native monitoring system only displays zero values for "GPU_AMD_temp" on 
the EX255a platform. HPE is investigating the problem and working on plans to 
address the issue.


4.8 Command Line Interface (CLI)
------------------------------------------------------------------------------
4.8.1 Node aliases used only by the cluster manager CLI
------------------------------------------------------------------------------
HPCM-449

HPCM provides the capability to set aliases for nodes. These node aliases can
be used by the cluster manager CLI. It does not add aliases for network names.
Future releases may setup /etc/hosts and or DNS based on node aliases on a
specific network for that node. Site administrators are free to add specific
entries to /etc/hosts and distribute that across the cluster.


4.8.2 Console failure due to delayed credential updates
------------------------------------------------------------------------------
HPCM-4017

When adding nodes to the system, it is best to add the nodes with the 
controller credientials (BMC, iLO, etc) at the time of adding the node. This 
ensures the database instantly has the credentials necessary for power and 
console services. For example, if you are using a cluster definition file to 
add a set of nodes, you should include bmc_username and bmc_password if you 
have them.

When you do not have the the credentials, a service tries to guess common 
controller usernames and passwords until a match is found. This is done 
against any node that lacks credientials when they are added. This process 
takes time. So if you add nodes and do not specify the controller credentials,
certain services may not start with the right information. This can be 
observed in the console service. You may find that the console service starts 
with missing credentials which leads to error messages.

If you hit that situation, you can just update the configuration a bit later 
after the correct credentials have been guessed (if possible) like this:

 # cm node update config --sync conserver -n admin

If your system has SU Leaders,

  # cm node update config -t role su-leader --sync conserver


4.9 Graphical Interface (GUI)
------------------------------------------------------------------------------
4.9.1 Image Management in the GUI
------------------------------------------------------------------------------

The Image Management section of the GUI has been deprecated.  HPE plans to 
make updates in a future version. For now, HPE recommends that users create 
images using the CLI.


4.9.2 History View of node information is not displayed between time ranges
------------------------------------------------------------------------------
HPCM-4747

Note that when using the History View in the GUI, the time entered in the 
dialog box is evaluated with the timezone of the server, not with the time of 
the laptop/desktop running the GUI.


4.10 Diags and Firmware
------------------------------------------------------------------------------

4.10.1 Omni-Path firmware flash operation fails
------------------------------------------------------------------------------
IM#1001790304

When attempting to flash firmware on the Omni-Path cards, you may see a
failure like the following:

service1:~ # hfi1_eprom -d all -u /usr/share/opa/bios_images/HfiPcieGen3_1.9.2.0.0.efi
Updating driver file with "/usr/share/opa/bios_images/HfiPcieGen3_1.9.2.0.0.efi"
Using device: /sys/bus/pci/devices/0000:12:00.0/resource0
Unable to mmap /sys/bus/pci/devices/0000:12:00.0/resource0, Invalid argument

This is a known issue and the workaround is documented online:

https://support.hpe.com/hpesc/public/docDisplay?docId=emr_na-a00060145en_us


4.10.2 HPE Apollo 9000 Tray Power-Down and Failure to Power-Up
------------------------------------------------------------------------------ 
HPCM-2766

HPE has observed that occasionally one or more trays will power down all the 
nodes in the tray and then will not allow the nodes to be powered back up. 
This issue is fixed with an update to the HPE Apollo Chassis Management
Controller firmware (CMC_20221121s.bin).


4.11 Miscellaneous
------------------------------------------------------------------------------
4.11.1 AIOps version information
------------------------------------------------------------------------------
IM#1001787597

The AIOps feature does not currently include a release file in /etc for simple
identification. When providing feedback on the AIOps feature, please provide
both the HPCM version (e.g., HPCM 1.11) and the version number of the cm-aiops
RPM (e.g., 1.11-729).


4.11.2 PBSPro: nodes are in offline state after OS provisioning
------------------------------------------------------------------------------
IM#1001790902

While testing the HPCM PBSPro connector, HPE noticed that nodes would always
show 'offline' after PBS OS provisioning, even though PBS services are running
on the compute nodes.  The problem is that the pcm_provision alarm default is
set too low (30).  To workaround the problem, set the hook pcm_provision alarm
to 1200. HPE is working with Altair to add this to the PBS connector
documentation.


4.11.3 SSH Key Content in Dump Does Not Match Expected
------------------------------------------------------------------------------ 
HPCM-665

SSH key content in /opt/sgi/secrets/root-ssh/sgi_kdump/ may be different from
the content saved to /var/crash/sgi_kdump/.ssh/. The difference can cause 
confusion when debugging, but it does not cause a kdump failure. HPE continues 
to investigate the issue.


4.11.4 PCIM Documentation
------------------------------------------------------------------------------ 
HPCM-1438

The documentation provided in the PCIM package at 
/opt/cmu/pcim/docs/Apollo_System_Manager.pdf calls out the Apollo System 
Management tool.  Updates to the documentation are planned for a future 
release.


4.11.5 kdump fails/panics on init
------------------------------------------------------------------------------ 
HPCM-3746

Driver issues can sometimes cause kdumps to fail. After initiating a dump with 
'echo c > /proc/sysrq-trigger', all nodes may eventually panic/stop during init.
The error message most often displayed will be similar to the following:

  BUG: unable to handle page fault for address: ffffffffffffffc8

Additional drivers may be needed in order for kdump to work on specific 
hardware configurations.


4.11.6 cpasswd command reports python deprecated warning
------------------------------------------------------------------------------ 
HPCM-6177

On some systems, the cpasswd command may display the following warning:

  # cpasswd -N node002
  /opt/sgi/sbin/cpasswd:21: DeprecationWarning: 'crypt' is deprecated and slated for removal in Python 3.13
  import crypt, getopt, getpass, logging, os, random, re, string, sys, traceback
  Enter new password:
  Enter new password (again):

The warning does not impact functionality and may be ignored. HPE will look
into addressing the warning in a future release.


4.11.7 iscsi errors when an image has not been activated
------------------------------------------------------------------------------
HPCM-6116

When using the iscsi diskless function for compute nodes, the following type 
of messages may appear in /var/log/messages on an admin or leader node:

2024-02-20T09:35:44.390138-06:00 leader3 kernel: [1202340.564016][T463167] Unable to locate Target IQN: iqn.1995-03.com.hpe.cm:windom-sles15sp5-ssdkms.squashfs in Storage Node
2024-02-20T09:35:44.390150-06:00 leader3 kernel: [1202340.575256][T463167] iSCSI Login negotiation failed.
 
This type of messages is generated when a node is attempting to boot an image 
that has not been activated. To track down the node that may be causing this, 
check for nodes assigned to the image. For example, the following command:

# cm node show --image | grep windom-sles15sp5-ssdkms 

will list the nodes with the windom-sles15sp5-ssdkms image. Find one that is 
attempting to boot and check the console log for messages such as:

  ro-root-tmpfs-overlay: ISCSI - LOGIN: iscsiadm --mode node --portal 172.23.255.241 --targetname iqn.1995-03.com.hpe.cm:windom-sles15sp5-ssdkms.squashfs --login
  Logging in to [iface: default, target: iqn.1995-03.com.hpe.cm:windom-sles15sp5-ssdkms.squashfs, portal: 172.23.255.241,3260]
  iscsiadm: Could not login to [iface: default, target: iqn.1995-03.com.hpe.cm:windom-sles15sp5-ssdkms.squashfs, portal: 172.23.255.241,3260].
  iscsiadm: initiator reported error (8 - connection timed out)
  iscsiadm: Could not log into all portals
  ro-root-tmpfs-overlay: iscsiadm login error. 

As shown in the example below, the 'ls' command confirms that the image exists
on the admin node, but that there is no squashfs found for it, as shown by the
second 'ls' command failure.

# ls -ld /opt/clmgr/image/images/windom-sles15sp5-ssdkms ; ls /opt/clmgr/image/image_objects/windom-sles15sp5-ssdkms.squashfs
drwxr-xr-x 22 root root 276 Feb 20 09:51 /opt/clmgr/image/images/windom-sles15sp5-ssdkms
ls: cannot access '/opt/clmgr/image/image_objects/windom-sles15sp5-ssdkms.squashfs': No such file or directory

To fix the issue, use the 'cm image activate' command to activate the image.


4.11.8 systemd-sysv-generator messages generated in console
------------------------------------------------------------------------------ 
HPCM-6253

In some distributions, users may see messages similar to the following:

  SysV service 'X' lacks a native systemd unit file. Automatically generating a 
  unit file for compatibility. Please update package to include a native 
  systemd unit file, in order to make it more safe and robust.

on the console and in console logs.  These messages are safe to ignore.  HPE 
will investigate these messages for possible resolution in a future release.


4.11.9 Unable to create RHEL9.x images in Q-HA admin virtual machine
------------------------------------------------------------------------------ 
HPCM-6314

HPE has confirmed a customer report regarding an inability to create RHEL 9.X
images on the Quorum-HA admin virtual machine. This failure is caused by 
missing CPU features in the /adminvm/adminvm.xml file on the physical admin 
nodes.

To workaround the problem, additional features must be added to the 
/adminvm/adminvm.xml configuration file on the Q-HA physical admin nodes. The 
cpu mode section needs to be updated to include (denoted with "+" in the 
example below) the last four (4) lines:

  <cpu mode='custom' match='exact' check='full'>
    <model fallback='forbid'>qemu64</model>
    <feature policy='require' name='x2apic'/>
    <feature policy='require' name='hypervisor'/>
    <feature policy='require' name='lahf_lm'/>
    <feature policy='disable' name='svm'/>
+   <feature policy='require' name='ssse3'/>
+   <feature policy='require' name='sse4.1'/>
+   <feature policy='require' name='sse4.2'/>
+   <feature policy='require' name='popcnt'/>
  </cpu>

The virt HA resource will need to be stopped and started again on the Q-HA 
node for this to take affect using either pcs or crm. Doing that will 
reboot/restart the admin node.


4.12 Ubuntu 
------------------------------------------------------------------------------ 

HPCM 1.11 provides improved support of Ubuntu Server support on the x86_64
architecture only.  Please note the following limitations:

- Upgrading HPCM 1.9 Ubuntu images is not supported
  Because Ubuntu was supported on the HPCM 1.9 release as a technical preview,
  upgrading existing Ubuntu HPCM 1.9 images is not officially supported. HPE
  recommends creating new HPCM 1.11 Ubuntu images for service/compute nodes.
  See the "Upgrading Ubuntu compute nodes" section in the HPCM 1.11 Upgrade 
  Guide for more information on upgrading HPCM 1.10 Ubuntu compute nodes.

- Supported on service/compute nodes only
  Ubuntu is only supported on service/compute nodes, not on management nodes
  such as admin nodes and su leaders. HPCM support is limited to long-term
  support (LTS) releases.

- Requires Ubuntu support for the platform
  HPCM support for Ubuntu on any given compute platform is contingent on the 
  platform itself being supported on Ubuntu (see the HPE Servers Support & OS 
  Certification Matrix for Ubuntu for details):
  https://techlibrary.hpe.com/us/en/enterprise/servers/supportmatrix/ubuntu.aspx

- Other HPE software products may not support Ubuntu
  HPE software products such as HPE Slingshot, HPE Cray Programming 
  Environment and HPE Message Passing Interface (MPI) may not provide support
  on Ubuntu releases.

- Some other vendors may also support Ubuntu
  Other vendors of note that provide Ubuntu support include NVIDIA (drivers,
  HPC SDK, CUDA), Mellanox Infiniband, Intel and AMD.

- Ubuntu is an internet-based distribution
  While Red Hat Enterprise Linux and SUSE Linux Enterprise Server provide
  installation ISOs containing many packages, the Ubuntu Server ISO image has
  very few packages, most of which are pre-installed in a squashfs image. 
  Ubuntu expects packages to be retrievable, either from the internet or via
  local mirrors. HPCM has attempted to make a basic compute node installable
  without internet access by including required Ubuntu packages on the HPCM
  repository setup ISO (e.g., cm-1.10.0*.iso) itself. Customers wishing to 
  further customize Ubuntu compute images will need to either mirror 
  Ubuntu repositories on the admin node or provide internet access to 
  remote repositories.


4.12.1 No Support for AIOPs
------------------------------------------------------------------------------ 

The AIOps features are not supported with Ubuntu.


4.12.2 named package not installed by default
------------------------------------------------------------------------------ 
HPCM-3020, HPCM-3182

Ubuntu does not install named by default, so it is not included in HPCM 
default Ubuntu compute node images.  On compute nodes, named support is only
used when secondary/backup DNS support is needed and administrators do not wish
to use leader nodes for this task.


4.12.3 nscd package not installed by default
------------------------------------------------------------------------------ 
HPCM-3018

The nscd package is provided in the Universe (community-support) repository,
so it is not included in HPCM Ubuntu images by default. If desired, it can be 
added to images.


4.12.4 FIPS support for Compute Nodes running Ubuntu
------------------------------------------------------------------------------ 
HPCM-2398

To enable FIPS on Ubuntu, refer to the available Ubuntu documentation:
ubuntu.com/security/certifications/docs/fips-enablement

Please note that switching the system to contain the FIPS certified packages 
cannot be easily undone. HPE recommends to use a testing system for 
experimentation before trying on production.


4.12.5 HPCM packages for Ubuntu are not signed
------------------------------------------------------------------------------ 
HPCM-3312

Unlike the HPCM packages built for RPM-based distributions, which are all 
signed with HPE digital keys, the HPCM packages built for use on Ubuntu are
not signed. Like the Ubuntu release itself, the release file is signed. The
HPE digital keys are installed on by the hpe-build-key package.


4.12.6 cm image 'rpmlist' option not supported
------------------------------------------------------------------------------ 
HPCM-2085, HPCM-3706, HPCM-3179, HPCM-3945

The 'rpmlist' option to the 'cm image' command is not supported with Ubuntu
images at this time.   Attempts to use the command with Ubuntu images will
fail as follows:

system:~ # cm image rpmlist -i ubuntu2204-x86_64 -W /root/ubuntu_rpmlist_version.txt --rpm-version
error: ubuntu2204-x86_64 is an Ubuntu target which is not rpm based.

HPE may address this in a future release.


4.12.7 NFS image provisioning with overlay on compute node fails
------------------------------------------------------------------------------ 
HPCM-5039

Any nodes assigned with an Ubuntu-based OS image and any expanded overlay 
writable type NFS rootfs (either tmpfs-overlay or nfs-overlay) will fail to 
boot and fall to a miniroot shell. 

HPE recommends the use of an 'overmount' NFS writable type or an image object 
by 'activating' the image.

The image can be set to overmount using the following command for nfs 
overmount, where <nodes> is the node expression matching the nodes to set the 
new writable type:

  # cm node set --rootfs nfs --writable nfs-overmount <nodes>

Or for tmpfs r/w area with overmount type:

  # cm node set --rootfs nfs --writable tmpfs-overmount -n <nodes>

Or activate the image using the following command, where '<image>' is the name
of the image to activate: 

  # cm image activate -i <image>


******************************************************************************
5.0 Feedback
******************************************************************************

Hewlett Packard Enterprise is committed to providing documentation that meets
your needs. To help us improve the documentation, send any errors,
suggestions, or comments to Documentation Feedback (docsfeedback@hpe.com).
When submitting your feedback, include the document title, part number (if
applicable), edition, and publication date located on the front cover of the
document. For online help content, include the product name, product version,
help edition, and publication date located on the legal notices page.


******************************************************************************
6.0 Appendix
******************************************************************************

6.1 Notes on Using Unsupported or Unmanaged Network Switches with HPCM
------------------------------------------------------------------------------

HPE Performance Cluster Manager 1.11 supports the following switches:

  - LG/Edgecore ECS4610-26T/ECS4610-50T: 1.4.2.25 (Final)
  - Extreme X440/X460/X670: 16.2.5.4
  - Extreme X440-G2/X460-G2/X670-G2: 22.7.2.4
  - HPE FlexNetwork 5510: 7.1.070 Release 3507P18-US/3507P09
  - HPE FlexFabric 5710: 7.1.070 Release 6710P03/6710P03-US
  - HPE FlexFabric 5900/5920: 7.1.045 Release 2432P61-US/2432P61
  - HPE FlexFabric 5940 48SFP+/6QSFP+ or 48XGT/6QSFP+: 7.1.070 Release 2612P10-US/2612P10
  - All other FlexFabric 5940 Models: 7.1.070 Release 6710P01-US/6710P03
  - HPE Flexfabric 5945: 7.1.070 Release 6710P03/6710P03-US
  - HPE FlexFabric 5950: 7.1.070 Release 6301/6301-US
  - Arista DCS-7010T-48: 4.21.6F
  - Aruba 6300M, 8320, 8325, 8360: FL/GL/LL/CL 10.12.1021

Some advantages to using supported Ethernet switches are:

  - You can use cluster manager tools, such as switchconfig, to manage the
    switch.
  - The cluster manager configures the supported switches with settings that
    segregate cluster management traffic from application data traffic and
    settings that support efficient transfer of operating system images.
  - With a supported switch, a command exists that allows users to configure
    various settings automatically for either management switches, compute or
    leader nodes:
    # switchconfig_configure_node --node <hostname> [--dry-run]
    Using "--dry-run" allows you to see what commands would be run before
    actually running them, which is the safer option when running this command.

If you use unsupported switches, you need to use its commands to complete some
configuration steps manually.  Unsupported switches are included in the
cluster as unmanaged switches.  For these switches, the cluster manager does
not attempt to automatically configure any switch settings. The following
procedure explains how to configure an unsupported switch into a cluster.

Configuring a cluster that uses an unmanaged switch:

1. Complete the installation instructions according to the HPE Performance
   Cluster Manager Installation Guide, but stop before you run the discover
   command.
2. Enter the following command to preserve the settings on the unsupported
   management switches:

   # cadmin --enable-discover-skip-switchconfig

   This command prevents the cluster manager from logging into management
   switches at a global level, which allows you to configure the unsupported
   switches later in the installation.

3. Configure the switches for multicast or configure the cluster manager to
   use unicast.  This step ensures that the leader and compute nodes receive
   their images from the admin node in an efficient manner.  Do one of the
   following:

   - Verify whether the unsupported switch is configured for "IGMP" and "IGMP
     Snooping", and configure those two settings if they are not in effect at
     this time.  The cluster manager uses a multicast protocol called udpcast
     to image leader and compute nodes during the boot process.  For multicast
     to be successful, the management switches must support IGMP and IGMP
     Snooping.  For information, see the switch configuration documentation.

   Or

   - Configure the cluster manager to use Rsync or BitTorrent when it images
     the compute nodes.  Rsync and BitTorrent are not a multicast methods and
     instead uses unicast.  For information about how to change the method by
     which the leader and compute nodes receive images, see the HPE Performance
     Cluster Manager Installation Guide.

4. Complete the rest of the installation procedure, beginning with running the
   discover command, to configure the rest of the components into the cluster
   manager.

   The discover command configures supported switches and all other components
   to be under cluster manager control.

   Because you ran the "cadmin enable-discover-skip-switchconfig" command
   before you ran discover, the discover command allows DHCP to assign
   supported switches an IP address so that you can SSH or Telnet to the
   supported switches if necessary.

5. (Optional) Enable DHCP on the switch.  See the documentation for the
   unsupported switch for information.  DHCP enables the cluster manager to
   assign an IP address to the switch. You might need to enable either Telnet
   or SSH, and then create a remote username and strong password in order to
   manage these switches remotely.

6. (Optional) Enable independent management of the unsupported switch by
   completing one or both of the following tasks:

   - Enable either Telnet or SSH and then create a remote username and strong
     password for the switch.  These credentials enable you to manage the
     switch remotely.
   - Enable DHCP on the switch.  DHCP enables the cluster manager to assign an
     IP address to the switch.

   For more information, see the switch documentation.


6.2 Supported Power Distribution Units (PDUs)
------------------------------------------------------------------------------

A power distribution unit (PDU) reads AC power and energy measurements on
cluster rack-level power domains. For the AC power measurement feature to
function, the cluster must have one or more of the following PDUs:

  - Server Technology Sentry3
  - Server Technology Sentry4
  - 880459-B21 (Raritan) HPE Mtrd 3P 39.9kVA/60A 48A/277V FIO PDU
  - PX-5946V-F5V2 (Raritan) HPE Mtrd 3P 17.3kVA/48A 9brkr PDU
  - P9R82A HPE G2 Metered 3Ph 17.3kVA/60309 4-wire 48A/208V
  - P9R84A HPE G2 Metered 3Ph 22kVA/60309 5-wire 32A/230V

For more details on power management, see the HPE Performance Cluster Manager
Power Consumption Management Guide.


6.3 HPE Power and Cooling Infrastructure Monitor (PCIM) Supported Devices
------------------------------------------------------------------------------
The HPE Power and Cooling Infrastructure Monitor provides insight into the
state of the hardware related to the power and water-cooling components of an
HPE water-cooled solution. Supported devices include the following:

  - HPE Apollo 9000 CDU (Cooling Distribution Unit)
  - HPE Apollo 9000 Chassis (Power Supplies and Switches)
  - HPE Cray EX CDU (1.2 MW and 1.6 MW)
  - Apollo DLC Passive CDU (for A2k and A6500 clusters)
  - HPE SGI 8600 CDU
  - ARCS (Adaptive Rack Cooling System)
  - SGI 8600 CRC (Cooling Rack Controller)
  - Motivair RDHX (Rear Door Heat Exchanger)
  - Raritan PDUs (Power Distribution Unit)
  - HPE Cray EX VCDU (Virtual Cooling Distribution Unit)
  - HPE PDUs
  - ServerTech Cray ClusterStor Switch 63A 400V PDU (R4M34A)
  - ServerTech Cray ClusterStor Switch 60A 415V PDU (R4M35A)


6.4 HPCM Update Repository Guide
------------------------------------------------------------------------------

HPCM update repositories are hosted on the HPE Software Delivery Repository 
(SDR). Patches for HPCM releases (available for both x86_64 and aarch64 
architectures) are available on the SDR.


6.4.1 Accessing the HPCM Update Repository the First Time
------------------------------------------------------------------------------

To access the HPCM updates on the SDR, you must have the following items:

  - HPE Account
  - HPE Support Center User Token
  - HPCM Service Agreement ID (SAID)
  - Support Account Reference (SAR) entitlements

The SAR entitlements must be linked to both an HPE Account and to an HPE 
Support Center User Token to gain access. In some cases, multiple HPE 
Accounts can be linked to the same SAR entitlements to allow multiple 
users access.


6.4.1.1 Creating an HPE Account
------------------------------------------------------------------------------

(1) Go to Create a new account and enter all required information.

(2) Select the "Provide additional business contact information" option at the 
    bottom of the page, and enter your business contact information.
    ** You must enter a value for the "Company name." Not completing this 
    ** field will prevent you from logging in to the HPCM repository on the 
    ** SDR.

(3) Select "Create account."


6.4.1.2  Linking entitlements to an HPE Account
------------------------------------------------------------------------------

(1) Go to the HPE Support Center at https://support.hpe.com/

(2) Select the "Preferences" icon, and then select "Sign in."

(3) Enter the user ID and password for your HPE Account, and select
    "Sign In."

(4) In the "Toolkit Library", select "My Contracts."

(5) Select "Add a Support Agreement," and enter your Service Agreement ID 
    (SAID) and your Support Account Reference (SAR) in the required fields.

(6) Choose "My Group (Private)" as the "Group."

(7) Select "Next" and verify your "Contract ID/SN" information.


6.4.1.3  Logging in to the HPCM Repository on the SDR
------------------------------------------------------------------------------

* The example below uses the HPCM 1.9 release as an example. Adjust version 
  numbers in the examples below accordingly for the HPCM 1.10 release.

(1) Create an HPE Support Center User Token at 
    https://hpsc-pro-site1-hpp.austin.hpe.com/hpsc/swd/entitlement-token-service/generate

    A token will automatically be created after you log in with your HPE 
    Account. However, you must wait an hour for the token to be activated 
    before using it. You can save this token to use for all future SDR 
    logins, or you can create a new token for each login. However, you 
    must wait an hour for each new token to be activated. Creating multiple 
    tokens does not nullify any of the previously created tokens. All tokens 
    are active and valid for authentication.

(2) Log in to the HPCM Repository on the SDR at 
    https://update1.linux.hpe.com/repo/hpcm/
    Enter your HPE Account username (email address) in "Username," 
    and enter your HPE Support Center User Token in "Password."
    Contact a support representative if you cannot log in to the HPCM 
    repository after waiting an hour for a new HPE Support Center User Token 
    to be activated.

Upon a successful login, you should see a directory listing similar to the 
following:

  Index of /repo/hpcm
    * Parent Directory
    * centos/
    * rhel/
    * rocky/
    * sles/

HPCM updates for releases supported on the SLES15 SP4 operating system on the 
x86_64 architecture, for example, are available at the following location:

  https://update1.linux.hpe.com/repo/hpcm/sles/15sp4/x86_64/
  Index of /repo/hpcm/sles/15sp4/x86_64
    * Parent Directory
    * 1.9.0/
    * 1.10.0/

Further selecting the HPCM 1.9.0 release directory will display the updates 
applicable to the HPCM 1.9 release on SLES15 SP4:

  https://update1.linux.hpe.com/repo/hpcm/sles/15sp4/x86_64/1.9.0/
  Index of /repo/hpcm/sles/15sp4/x86_64/1.9.0
    * Parent Directory
    * 11778/
    * 11779/
    * repodata/


6.4.2 Mirroring an HPCM Repository on the SDR
------------------------------------------------------------------------------

It is also possible to mirror an HPCM update repository to a local system with
a simple shell script. For example, the following is a shell script to mirror 
the HPCM 1.9.0 update repository on a local system:

  #!/bin/sh
  umask 022
  
  USERNAME="<HPE Account Email>"
  PASSWORD="<HPE Support Center User Token>"
  BASEURL="update1.linux.hpe.com/repo/hpcm/"
  
  cd /<local mirror location>
  wget --no-parent -nH -r -c --cut-dirs=1 --auth-no-challenge \
  --user=${USERNAME} --password=${PASSWORD} \
  https://${BASEURL}/sles/15sp4/x86_64/1.9.0/

Tailor the above script example to meet any site specific requirements.


6.5 List of CASTs Addressed in HPCM 1.11.0
------------------------------------------------------------------------------

The following CASTs were closed out with the HPCM 1.11 release:

CAST-32447  HPCM SIM shows 2 (of 9) switches down, start as up on reboot but 
            show down over time
CAST-32556  Problems when using hpemon and trying to get the Sec running on 
            the leaders due to site restrictions on ssh between cNs
CAST-32621  [RFE] Add the DNS search path to cminfo via a new cm-configuration 
            script
CAST-32639  HPCM 1.6 - cm health alert configuration assistance needed
CAST-33302  1.8/1.9: /etc/prometheus/snmp.yml is missing auth: community: 
            default-community for the flexnetwork_mib
CAST-33681  [RFE] HPCM provided gpu_sizzle should be statically compiled if 
            possible (highly preferred)
CAST-33772  Package conflict during admin node upgrade (HPCM 1.8 / rhel 8.6)
CAST-33795  Kafka Connect: ERROR: index row size 3176 exceeds btree version 4 
            maximum 2704 for index "label_pkey"
CAST-33987  If admin kernel version = image kernel version, cannot cm dnf 
            remove the kernel from the image
CAST-34052  cm command line should not allow underscore in network names - 
            breaks named
CAST-34626  RHEL: mariadb error when cloning slot on admin
CAST-34978  Gluster NFS allows unprivileged mounts


6.6 List of Issues Addressed in HPCM 1.11.0
------------------------------------------------------------------------------

Incident numbers from HPE's tracking system are provided for reference:

HPCM-219  HPCM: cm image yum/dnf/zypper node/image: group/pattern install needs 
          examples for installing with group names containing spaces
HPCM-289  checkDbReady doesn't work, only returns true, should have a proper, 
          global implementation to check if the database is operational
HPCM-425  script to rebuild kafka cluster
HPCM-485  conserver needs to support sending a break signal on Cray hardware
HPCM-538  Enable JMX metrics to be gathered from cmdb (Java Grizzly webserver)
HPCM-547  Slingshot Telemetry- No way to activate the Inactive configurations:
HPCM-824  AMD GPU monitoring needs support for alerts
HPCM-1054  Add ability to set Native monitoring SEC priority via the cm-cli
HPCM-1114  Timescale monitoringdb Postgres Users and Schema
HPCM-1232  Take all logs from /opt/clmgr/log to Elastic
HPCM-1320  Q-HA: setup fails to ask for independent BMC network when using 
           interactive mode
HPCM-1680  add SUDO_USER to cm.log
HPCM-1765  Add Flashing support for AMI MegaRAC based Gen 11 aka Cray XD 
           computes
HPCM-1907  Q-HA, RHEL 8.6, ifconfig/ipaddr deprecated or removed from distro
HPCM-2012  gluster-exporter causes "gluster volume status" to continuously say 
           that locking failed
HPCM-2319  Switch default ICE diags in rpmlists to xe diags for admin/default
HPCM-2427  HPCG for AMD Gpus
HPCM-2428  HPCG for nvidia gpus
HPCM-2500  Slingshot Reporting - Report ports with ber and tx/rx pause
HPCM-2589  Can't upgrade iLO firmware via cfirmware
HPCM-2650  Grafana dashboard - Top level view of link and switch state
HPCM-2749  /opt/clmgr/lib/cluster-configuration command prints one error 
           message
HPCM-2900  HPCM1.8: SLES15SP4: glusterd seems to get stuck and not launch all 
           brick processes so mounts get stuck after su-leader reboot. ctdb 
           status remains DISCONNECTED for some leaders
HPCM-2957  XD6500/XD670 B1 SPR Support on HPCM: Compute/Service
HPCM-3213  pdu-collect: Add PCIM metrics
HPCM-3573  Put rhel8 monitoring rpms back to Recommends
HPCM-3590  conserver is wide open access-wise including from compute
HPCM-3780  SLES15 SP4: Booting q-ha cloned slot results in unknown network 
           interface, PNN disabled
HPCM-3886  Rework log writing to Kafka
HPCM-3916  Native monitoring fails on computes with a dedicated custom user
HPCM-3935  Upgrade sdu components
HPCM-3999  cm-power-service: new node route. (like controller/chassis)
HPCM-4053  Improve NativeProcessExecutor with Java 9+ ProcessHandle API
HPCM-4073  Online diags EX255a support 
HPCM-4075  CHC Add AMD MI300 tools
HPCM-4080  Enhance error handling in SS health reporting
HPCM-4095  Report ports with UCW and llr_replay errors
HPCM-4096  Report ports which are not configured
HPCM-4124  HPCM CrayEX Hardware Dashboard Incorrect
HPCM-4138  Native monitoring can't start due to various file permission errors
HPCM-4139  If MONITORING_SECMD_PRIORITY is set, it is not honoured
HPCM-4209  Enhance CrayEx HW alerts (CEC/BMC) as per customer requirements
HPCM-4223  Add etcd to HPCM needed for system power capping support
HPCM-4241  System Power Capping support in HPCM
HPCM-4243  Provide CM Node Power cap for System Power Capping
HPCM-4244  Provide an inventory interface for System Power Capping
HPCM-4268  Integrate HPE Cluster View Dashboard Automation (mPhasis) w/ HPCM
HPCM-4291  Productize and generalize power map and integrate it into the cli
HPCM-4300  [RFE] Add the DNS search path to cminfo via a new cm-configuration 
           script
HPCM-4303  Add support for EX235n to the Hardware Triage Tool
HPCM-4307  Add support for EX425 to the Hardware Triage Tool
HPCM-4308  Add support for Parry Peak to the Hardware Triage Tool
HPCM-4337  Performance tools - Shibuya Stream testing all xgmi links between 
           sockets
HPCM-4353  asyncssh: ssh_cmd replacement for tnet_ssh
HPCM-4362  Improve cm monitoring ss config command for SS 2.2
HPCM-4377  Opensearch Grafana Dashboards giving Unexpected Error
HPCM-4379  cm image show -d should show complete image size 
HPCM-4419  Build Diagnostics using EX254n blade in snowdon
HPCM-4427  Collect and Test PDU Metrics
HPCM-4430  persist logs of /opt/clmgr/log from admin/leaders
HPCM-4431  HPCM 1.10: cm monitoring timescaledb show requires a default 
           behavior when no option is specified
HPCM-4434  HPCM 1.10: cm monitoring timescaledb retention does not update the 
           retention period
HPCM-4444  hpe-python: Add aiofiles
HPCM-4447  Run all arm supported diagnostics on EX254n node 
HPCM-4449  EX255a: Upgrade AGT binary
HPCM-4452  EX255a: Check rectifier status and telemetry for issues, Make sure 
           they are running and balanced
HPCM-4458  Check APUs are meeting minimum performance 
HPCM-4459  Check APUs are meeting minimum HBM Performance
HPCM-4460  Check NICs are meeting minimum performance and bandwidth
HPCM-4462  Upgrade Telegraf to Latest Version in HPCM
HPCM-4472  EX255a- Add wrapper for mpiBench,presta and sqmr to Online diags
HPCM-4526  jobmonitor.conf: Add rest_server_Ip option.
HPCM-4554  Configure etcd for system power capping support
HPCM-4599  Add cli tool to interface with clmgr-power REST API
HPCM-4601  chassis_routes: Added Perif ops
HPCM-4602  healthcheck and fix for bad kafka topics
HPCM-4606  SIM dashboard are not enabled after upgrade
HPCM-4624  Triage tool kit should accept log folder as input and analyze the 
           hardware failure 
HPCM-4633  Alerting rule reference file needs more comments and schema file 
           needs doc string 
HPCM-4638  Make updating head/head-bmc networks easier in configure-cluster
HPCM-4650  Change names for input.yml and input_on.yml for better understanding
HPCM-4656  Change the on and off flow as part of revised workflow 
HPCM-4657  Collect logs and serial information of all types of node whether 
           supported or not
HPCM-4675  optimize alerting spec file
HPCM-4741  Make changes to the WLM dashboards and Logstash files to accommodate 
           remlog-collect to Telegraf transition for wlm (slurm/pbs) monitoring
HPCM-4759  Support chassis types in cm controller, chassis types have no 
           mechanism to update credentials from the CLI
HPCM-4799  Upgrade procedure for quorum ha physical nodes
HPCM-4820  Need cfirmware solution for XD6500/XD665 M4 Genoa
HPCM-4851  cm node template show '--bmc-info' flag should be '--credentials'
HPCM-4866  Segregate wlm monitoring from cluster-health
HPCM-4882  Increase task.shutdown.graceful.timeout.ms
HPCM-4894  clmgr-power REST API: add BMC type
HPCM-4904  Slingshot Health Reporting and Alerts - Phase 3
HPCM-4917  HPCM1.10: Grafana alert dashboard to include a link to the 
           Alertmanager page
HPCM-4967  RHEL/ROCKY 9.3 Support (Compute/Service nodes only)
HPCM-4968  RHEL/ROCKY 8.9 Support (includes TOSS 4.7)
HPCM-4969  hpe python include dbus_next library
HPCM-4975  HPCM fabmgmt: minor dialog menu improvement
HPCM-4990  nfs-ganesha has bad systemd unit file fro nfs-ganesha-lock.service
HPCM-5019  HPCM1.11/aarch64/apollo70/apollo80: Does not boot with experimental-
           grub-cm-arm64.efi
HPCM-5022  HPCM1.10: cm image set command displayed unwanted output of associated 
           repo group
HPCM-5033  WLM Telemetry Fails to Write to Timescale
HPCM-5044  support 'slingshot' network types for hsn dns entries
HPCM-5061  HPCM1.10:Cluster health-Verify AMD GPU dgemm and stream test failed. 
HPCM-5077  SAC-HA: SLES15 SP5: 30-virt-setup deprecated commands
HPCM-5092  Minor edit for cluster-configfile manual
HPCM-5093  Add option to assign controllers to computes in cm_create_fake_
           configfile
HPCM-5114  Add nvidia HPL for EX254n
HPCM-5122  Hardware triage tool: validate 'On' flow 
HPCM-5124  EX254n: validate 'On' flow
HPCM-5125  EX4252: validate 'On' flow
HPCM-5126  EX255a: check node health
HPCM-5128  EX4252: check node health 
HPCM-5141  Add EX254n diagnostics
HPCM-5151  ROADMAP: XD224 Support
HPCM-5152  HPCM Slingshot hardware dashboards: Correct few issues
HPCM-5176  EX255a cbios support
HPCM-5181  Remove --noplugins flag from being added to cinstallman rpmmgr image
           yume commands
HPCM-5186  New async_apis rpm
HPCM-5190  HPCM1.10: Netchk reports inaccurate errors in the log files of 
           EX254n/EX255a nodes. 
HPCM-5191  HPCM1.10:ARM64: (memchk)Memory size and DIMM speed are not reported 
           in EX254n nodes.
HPCM-5197  HPCM doc work for COS refactoring to CNE/COS-base 
HPCM-5202  Permissions on files under HOSTs can vary
HPCM-5210  HPCM1.10: Obtaining the CPU of a NUMA Ppin results in an error output 
           for CN's.
HPCM-5212  Identify different BERT and MCE Errors and Repair action
HPCM-5213  cmutils (twisted/async) removed errand run_cmu_expand
HPCM-5221  HPCM Unified Alerting - Phase 2
HPCM-5225  Need the Python Library requests-toolbelt 1.0.0 added
HPCM-5226  GUI is throwing and freezing when a metric max value is set to 0
HPCM-5232  Support NVME disks in diskchk, diskperf and fsperf
HPCM-5233  Add 'loop' and 'fabric' parameters to cpuperf, cwcpuperf and 
           fabricperf
HPCM-5235  Add OpenBMC default creds to power service credential detect
HPCM-5237  Chassis System Group regression from HPCM-5224
HPCM-5242  HPCM1.10: SIM: logstash-exporter messages continue flooding in 
           /var/log/messages after adding monitoring-services group in SIM
HPCM-5246  conserver reload issue on large cluster
HPCM-5247  system monitoring (cn): timescaledb not showing data on Grafana 
           Dashboard
HPCM-5249  Set up DNS server correctly for cray-sdu-rda container for HPCM
HPCM-5251  cm-network-show man page has incorrect flags for controllers and 
           configfile
HPCM-5254  HPCM1.10-slurm/jobmonitor/grafana - dashboards have incorrect or 
           missing partition information 
HPCM-5260  change Stuck_in_bios_boot logic for EX4252
HPCM-5261  update Check_PCIe_Missing for EX4252
HPCM-5264  power fault issue: empty PWR_STS_CAP
HPCM-5265  blade latch test 
HPCM-5266  pmbus decoder: Repair actions should be called for bit 6 
HPCM-5268  change log path to /var/log
HPCM-5269  Identify DIMM failures
HPCM-5270  SIVOC failed to power up the 48V
HPCM-5277  Routing and unrouting alerts to kafka and opensearch
HPCM-5282  Add repair action for RAS poison error
HPCM-5294  confluence page to map OS versions to OFED, CUDA and ROCM versions 
           and Download links
HPCM-5297  asyncio_cmdb add to_thread for io blocking functions
HPCM-5300  Identify os kernel panic 
HPCM-5301  change Unexpected_Booted pattern from 'warm reset' to reset
HPCM-5302  EX4252: Check_PCIe_Missing
HPCM-5303  EX4252: Unable_to_apply_bios (validation)
HPCM-5304  EX4252: Stuck in UEFI shell
HPCM-5306  Use async-apis/asyncio_cmdb instead of tlib/asyncio_cmdb_utils
HPCM-5309  Remove IB mention in cluster-configfile man page
HPCM-5312  IPID decoder for bardpeak
HPCM-5314  Identify Missing NMC
HPCM-5320  cm_util clustershell group processing not working correctly
HPCM-5325  Routing and unrouting alerts to Slack
HPCM-5326  enabling all rules should be re-worked when there is a validation 
           failure in the middle.
HPCM-5327  Alertmanager email routing: Separate notification for warning alerts 
           to diff recipients
HPCM-5328  DOC: Support the corelated alert rule configuration for opensearch 
           alerting.
HPCM-5329  cm health alertman: csv / json/ text dump of alerts
HPCM-5330  Convert the SS Cassini error from elastalert to new Unified Alerting 
           Infra
HPCM-5331  Provide a framework for timescale alerting
HPCM-5335  ClusterShell.CmUtil user and ssh options fields not working
HPCM-5337  aiclientsession: Add more kwargs filters
HPCM-5345  Remove agt & AMDXIO from stout728
HPCM-5346  HPCM 1.10: Replace underscore with dash in hwtriage options to 
           conform to other GNU CLI options.
HPCM-5349  Serial Numbers information file is getting printed while using 
           log_path
HPCM-5351  Increase Prometheus scrape interval
HPCM-5352  /etc/prometheus/snmp.yml is missing auth: community: default-
           community for the flexnetwork_mib
HPCM-5354  DOC: HPCM 1.10: Provide a method to disable the opensearch retention
           policy
HPCM-5356  HPCM1.10: When heartbeat elk indices are generated, the node down/up 
           Alert Rules Status is not updated
HPCM-5357  HPCM 1.10: cm monitoring kafka status reports that aarch64 is not 
           supported even though kafka topics report entries from ARM nodes.
HPCM-5361  Block zookeeper startup when slots don't match
HPCM-5362  Fix cluster.id reset for confluent-kafka service
HPCM-5364  Remove push_key.py from clmgr-power
HPCM-5371  Upgrading systemimager-server on su-leaders can hang or fail making 
           upgrade difficult
HPCM-5374  Handle WNC for supported cardname
HPCM-5375  Copy /tmp/miniroot-mgmt-network-device to /opt/clmgr/etc to handle 
           upgrades
HPCM-5378  HPCG-local: If job fails on one node due to UME slurm kills off on 
           all other node
HPCM-5379  Procedure to upgrade an ubuntu compute image and node
HPCM-5380  HPCM 1.10: cm aiops enable should remove dependency on alerta
HPCM-5382  cfirmware: ModuleNotFoundError - 'requests_toolbelt'
HPCM-5383  Add library dependency for cfirmware to the sgi-talib.spec.in
HPCM-5387  If admin kernel version = image kernel version, cannot cm dnf 
           remove the kernel from the image
HPCM-5388  Handle timeout Error
HPCM-5389  Show MCE errors to console
HPCM-5390  run on branch after off branch 
HPCM-5391  Support adding list of bios versions in hardware.yml file
HPCM-5392  triage_output.json contains extra colon 
HPCM-5393  Ubuntu image update from HPCM 1.9 -> HPCM 1.10 skips updating 
           sgi-service-node and sgi-csn
HPCM-5394  Ubuntu upgrade: sgi-csn displays error, postinst script needs to 
           handle abort case
HPCM-5395  cinstallman --update-image with apt does not check rpmmgrImage 
           return code properly. 
HPCM-5397  Ubuntu upgrade: refresh-image fails on installing cmdb-rest-lib, 
           conflicts with cm-rest-lib
HPCM-5399  Generate epd file for EX255a
HPCM-5403  ERROR entry messages found in *.s files 
HPCM-5404  Unit off should be identified only for SIVOC, 48V ECB, 48v-12V
HPCM-5405  EX255a: Add repair actions
HPCM-5408  Alerting enable validation should continue when there is a failure 
           instead exiting
HPCM-5409  Modify cm monitoring alerting status command output to include 
           routing status
HPCM-5416  asyncio_cmdb: Add map_keys to CmutilAsyncIO functions.
HPCM-5418  Add support for reporting MBE in HPCM Slingshot reporting 
HPCM-5421  Update to 1.10 patch 11793
HPCM-5422  EX255a: Add repair actions for few registers
HPCM-5424  Add new cm-power-services source code
HPCM-5427  clientsession_kw: regression with duplicate timeout args
HPCM-5428  Add CONSERVER_RW and CONSERVER_RO to configure-cluster, and cluster
           configuration file
HPCM-5435  Validate EX425 On flow 
HPCM-5442  Add XD670 support to cfirmware
HPCM-5445  HPCM changes to use SS 2.2 heartbeat feature
HPCM-5449  Add wrapper to run rochpcg
HPCM-5451  Add latest HTT to HPCM - patch
HPCM-5452  EX254n: Add repair actions
HPCM-5456  su-leader /etc/hosts should have other leader nodes listed
HPCM-5457  update all scripts with short args as i/p
HPCM-5458  check_accessibility.py: Specifications of name of Management 
           switches
HPCM-5459  Fabric inventory: permission denied for both right and wrong ip
HPCM-5460  controllers such and the nC will have firmware in tar.gz format. 
           Need to support ingesting this file type
HPCM-5463  'DAC stall' Error 
HPCM-5466  Add babelstream binaries and script for EX255a
HPCM-5467  Add transferbench binary for EX255a
HPCM-5469  Remove rochpl & rochpcg build part and mhist the binary 
HPCM-5471  Catch invalid mac-addresses for active_gateway in switchconfig
HPCM-5480   Remove ESM check for EX425
HPCM-5481  check_node IFS issue
HPCM-5482  hardware-triage-tool: make the logpaths RFC 3339/ISO_8601 compliant 
HPCM-5485  timescale sink writing empty labels
HPCM-5486  HPCM Slingshot Dashboards need update to improve performance
HPCM-5488  SS 2.2: Alerting support for heartbeat: switch status
HPCM-5489  discover_skip_switchconfig has a comma in configure-cluster 
           preventing it from being set
HPCM-5494  Add an option for just collecting serial numbers 
HPCM-5495  RFE: HPCM provided gpu_sizzle should be statically compiled if 
           possible (highly preferred)
HPCM-5496  DOCS: Monitoring Guide Updates Needed for Patch 11796
HPCM-5497  HPCM 1.11 Customer Reported RFEs
HPCM-5498  HPCM 1.11 Non-roadmap Features and Improvements
HPCM-5499  HPCM 1.11 Code Clean Up and Internal Facing Improvements
HPCM-5500  HPCM 1.11 Upgrade external/mhisted components
HPCM-5504  Add nvidia HPCG for EX254n
HPCM-5505  cfirmware sc check|update|type not working on new slingshot blade 
           switches
HPCM-5508  pdu-collect: logging broken
HPCM-5509  Fix Postgres pg_hba file to allow ipv6 connections
HPCM-5510  IPv6_rpfilter=yes in firewalld.conf blocks IPv6 traffic on QHA VM.
HPCM-5511  uboot not rebooting after update
HPCM-5512  Add node-power cap service to build
HPCM-5513  Update Oblex diagnostics for EX255a
HPCM-5514  cfirmware cannot update Cassini with HPCM 1.10
HPCM-5515  node power cap: Fix patch return components list
HPCM-5516  cfirmware nc checkall gives random results
HPCM-5519  Add dgemm for A1 APUS for EX255a
HPCM-5521  Change Hardware recipe names to external names 
HPCM-5522  HTT: Investigate how to address the confidentiality concerns
HPCM-5525  ADMIN: Gluster vlumes are mounted multiple times over head and 
           head-bmc
HPCM-5528  Add miniHPL with crayMPI for EX255a
HPCM-5529  Add 4 point compliance matrix text EX255a
HPCM-5530  CheckNodeHealth failure throwing incomplete file path
HPCM-5531  Alerting: cm health alertman shows incorrect error message when API 
           status is down
HPCM-5532  Alerting: cm monitoring alerting cmd should handle api errors 
           gracefully (due to proxy issues)
HPCM-5533  Provide (hpcm_pcs.go)  for System Power Capping
HPCM-5534  cfirmware HPCM 1.10  fails to update cc --recovery_image
HPCM-5536  cm Command to show connector configs and set properties
HPCM-5539  update shibhuya stream to run cpu-hbm
HPCM-5540  move diags across perf, field and noship for patch
HPCM-5541  Remove hpl EX254n with openmpi from diags
HPCM-5543  stout728 sles11sp4 x86_64 Nov21-23 rpm-phase build failed in 
           diags:cluster
HPCM-5546  cm cli for wlm (SLURM/PBS) monitoring - enable/disable/status
HPCM-5547  nvidia-gpu-xhpl PERCENT 5 throwing Out Of Memory error
HPCM-5548  HPCM 1.10: PDU Monitoring grafana dashboard fails to load any data
HPCM-5553  RHEL: mariadb error when cloning slot on admin
HPCM-5554  run_4pt_screen.sh always fails on 2nd run
HPCM-5555  Add power and bios support for XD6500/XD670 B1 SPR
HPCM-5556  add monitoring support for XD6500/XD670 B1 SPR
HPCM-5559  sgi-fabmgmt can conflict with opensource filelock python module
HPCM-5560  Add power support for XD224
HPCM-5561  su-leader-setup --destroy fails with 1.10 patches installed
HPCM-5564  Aruba Switch Firmware Refresh + Qualification (HPCM 1.11)
HPCM-5565  HPE Switch Firmware Refresh (HPCM 1.11)
HPCM-5573  Timescale helpers allow null or empty filters
HPCM-5574  Add check_timeline to Patroni config
HPCM-5582  EX254n diags failure due to recipe change
HPCM-5583  Compile EX255a binaries with rocm6.0 
HPCM-5584  PIP warnings during library installation
HPCM-5585  HPCM 1.10: su-leaders missing dependency on iptables for ip failover 
           event from ctdb
HPCM-5586  sensormon: Add AMI support to redfish polling 
HPCM-5588  psycopg v3 python bindings
HPCM-5589  Remove overlap between perf and noship diags
HPCM-5590  Add new cm-power-services rpm
HPCM-5592  q-ha physical admin node upgrade: sgi-admin-node %post scriptlet 
           fails
HPCM-5593  distro-rpm-lists should call crepo --recreate-rpmlists on upgrade
HPCM-5594  30-set-dns returns error on q-ha
HPCM-5596  Rename distro-rpm-lists to distro-pkg-lists
HPCM-5597  Q-HA: Upgrade sles15sp4 HPCM 1.9 -> sles15sp5 HPCM 1.10 AdminVM 
           fails to configure - unsupported configuration: chardev 'spicevmc'
HPCM-5598  Remove no-shippable diags from perf aarch64 rpm
HPCM-5599  Don't turn off services in clone-slot
HPCM-5600  cm-cli localhost node option doesn't work when cmdb isn't running
HPCM-5602  HPCM1.10 admin DNS_DOMAIN set to cluster instead of house
HPCM-5603  Fix linpack and nvidia-gpu-xhpl on EX254n
HPCM-5604  HTT fails with UnicodeDecodeError while checking MCE Errors
HPCM-5606  shibuya stream random BW results APU to APU
HPCM-5607  cm-power-service: Add sysd,config, install
HPCM-5608  implment iscsi diskless root support
HPCM-5609  SS dashboards enablement through cm monitoring slingshot CLI
HPCM-5614  Fix Aruba switch hardware limit 15 vMACs / switch pair on Cray EX
HPCM-5615  Add switchconfig find for OIDs for Aruba switches
HPCM-5618  Improve the kafka notification policy in Grafana alerting
HPCM-5620  Upgrade CPE on black to 23.12
HPCM-5621  Node-PowerCap: Bug with empty requests
HPCM-5622  Q-HA upgrade from HPCM 1.9 -> 1.10 SLES15 SP5: HA - VM fails to be 
           accessible outside of physical host
HPCM-5623  cm node slot copy allow 'localhost' option to be used for cloning 
           slots on admin node for q-ha 
HPCM-5624  iscsi provision image names not compatible with IQN convention
HPCM-5626  cmcinventory - consider enabling iscsi by default for root
HPCM-5627  iscsi provision - route configure-iscsi log through systemd for time 
           stamps
HPCM-5632  quorum-ha virtual admin with heavy connection count causes physical 
           host to be fenced
HPCM-5634  Alerting: CDU telemetry metrics
HPCM-5636  Alerting: Leak Events - CDU/Cabinet/ 
HPCM-5638  rochpcg looks for intelmpi
HPCM-5639  Remote support: Add memory event support for XD6500
HPCM-5640  Remote support: Add support for CPU events
HPCM-5642  pdu-collect: Fix community string
HPCM-5643  HPCM 1.10: cm health check fabricperf must support ib2 and above
HPCM-5644  Additional fru inventory values for Intel (and fixes)
HPCM-5645  missing file for kafka setup
HPCM-5647  HPCM1.10 + patch11795 Babelstream diag -device option not working.
HPCM-5650  system-power-capping hpcm inventory includes too many nodes
HPCM-5655  run_dgemm_EX255a runs on each GPU serial needs to be parallel
HPCM-5656  confluent-kafka needs to be masked when setup as proxied on admin
HPCM-5662  pdu-collect: kafka_push is broken
HPCM-5663  Create a monitoring support collection tool
HPCM-5666  HPCM1.10: Elevate the quality of error handling in Slingshot health 
           reporting.
HPCM-5667  monitor scripts didn't account for leaders with no squashfs 
           existing, check and egg issue for startup
HPCM-5669  80-enable-sysrq is broken
HPCM-5671  kdump crashkernel cmdline param not defined on virt or phys admins
HPCM-5672  luks2 root disk encryption using TPM 2 admins, leaders, compute - 
           gluster spaces not encrypted
HPCM-5800  cm cli with luks2 root disk encryption using TPM 2
HPCM-5801  q-ha TPM state when VM migrates physicals for luks2 root disk 
           encryption using TPM 2
HPCM-5802  System Power Capping enablement and usage documentation in HPCM 1.11
HPCM-5803  be sure iscsi is enabled by default server side (was only enabled 
           for new installs before)
HPCM-5804  DOC: Update Cluster Manager Ports Information in Admin Guide
HPCM-5806  Update TS query on GPU dashboards
HPCM-5807  Upgrade AMDXIO to resolve AMDXIO seg fault issue
HPCM-5808  Update Opensearch port from hardcoded values
HPCM-5809  rhel89: admin fails to load on disk
HPCM-5811  Update TS queries on CDU dashboards
HPCM-5812  Don't require nscd and instead make it recommended package
HPCM-5817  Remote Support: Drive Collection in Failed Drive Events
HPCM-5820  Allow a name to be specified when creating a repo
HPCM-5822  HTT: Update README file
HPCM-5827  HTT: Invalid MCE bank and ipid pattern for EX255a
HPCM-5828  HPCM 1.10: Logstash grafana dashboard does not have the external 
           links to SIM, Monitoring services, AIOPs services and the SU_leader 
           services
HPCM-5831  Placeholder Task: Removal of deprecated features/options/CLI/files
HPCM-5834  HPCM should not configure admin bonding when virtualized
HPCM-5835  HPCM1.11/rhel93: miniroot creation fails
HPCM-5836  Fix the build failure issue on black for noship diags rpm buils 
           after deleting the lines corresponding to the common files of perf 
           diags and noship diags in noship spec file to remove overlap.  
HPCM-5837  cm command line shouldn't allow underscore in network names - breaks 
           named
HPCM-5840  admin node luks2 password prompt should not echo password
HPCM-5843  Add Admin Node VM awareness to switchconfig sanity_check
HPCM-5850  q-ha upgrade: systemimager should not re-enable services on physical
           admin nodes
HPCM-5851  sles15sp5: pulling image from node fails
HPCM-5854  Diags failure on x86 sles15sp5 cluster because of craype lib 
           mismatch
HPCM-5856  async_apis: Fix Error with missing creds
HPCM-5857  /opt/clmgr/tools/cm_check_ips always returns status 1 when checking 
           IP addr
HPCM-5858  cpwrcli: Fix perif-power-on|off option
HPCM-5859  cm monitoring alerting status command to show the alert rule group
HPCM-5866  Update power docs:
HPCM-5870  Add nfs-utils for rhel and nfs-client for sles as recommended 
           packages
HPCM-5872  direct attach nVME may result in unable to boot on admin node
HPCM-5875  jobmonitor: missing restserver_host option in config
HPCM-5876  Change new cm-power-services port:
HPCM-5877  stout7: iscsi target does not get setup on leaders 
HPCM-5880  Add kdump as a recommended package to cm-recommends
HPCM-5881  HPCM 1.11: "cm monitoring slurm status" generates an error.
HPCM-5882  HPCM 1.11: Alerting Grafana Framework issue
HPCM-5885  SLES15 SP5 QU c-c fails on brltty and libbrllap versions too old
HPCM-5886  QHA, linpack, and Slurm issue
HPCM-5888  Add deprecation notice to alerta related CLI (all)
HPCM-5890  Change Error Pattern for MCA bank error
HPCM-5891  sensors_node pipeline doesn't exist
HPCM-5892  HPCM 1.11: Alerting enable command not gracefully manage API errors
           (due to proxy issues)
HPCM-5896  HPCM 1.11: The absence of slingshot-heartbeat.service is attributed 
           to the absence of slingshot-monitoring in the cm package.
HPCM-5897  Listener is not set with SS 2.2 telemetry config 
HPCM-5899  SAC-HA: RHEL 8.9 /images umount hung during FIPS upgrade procedure.
HPCM-5902  Add screen to cm-managed-recommends
HPCM-5903  HPCM1.10: Slingshot dashboards cannot be enabled using the command 
           "cm monitoring slingshot enable".
HPCM-5904  HPCM1.10: unable to establish the configuration of Slingshot with 
           active FMN.
HPCM-5906  cm-power-srv. REST API fix needed for multi-node ctrl
HPCM-5911  cm-slingshot-udev doesn't sort devices correctly for EX254n nodes
HPCM-5912  mofed bits don't make it to image when script is run
HPCM-5913  HPCM1.11/rhel89: DL365gen11 panics booting rhel89 (rhel88, rhel810 
           are fine) [Make it possible for extra kernel params admin node 
           post-boot]
HPCM-5918  cm chassis cmc firmware show command fails
HPCM-5919  asyncio_cmdb: Fix/update get_node_fields/get_compute_node_fields
HPCM-5921  HPCM 1.11: Missing argument in sprintf at chc_wrapper.pm file
HPCM-5922  HPCM 1.11: "cm health report slingshot link mbe" reports an 
           'linkmbe_parser' is not defined.
HPCM-5926  AMDXIO core dumps
HPCM-5928  CSM: Hardware Triage Toolkit unexpectedly failing when incorrect 
           hardware config file is provided
HPCM-5929  HPCM 1.11: run_babelstream not handling proper exit for empty 
           arraysize 
HPCM-5931  HPCM 1.11 check_node.sh leaves executable shell script on the target
           node
HPCM-5932  gluster leaders - disable insecure NFS by default
HPCM-5933  HPCM 1.11: cxi_nic_failure.sh leaves executable bash script in the 
           root dir of target nodes
HPCM-5939  hpcg failing for EX254n
HPCM-5940  HPCM-5521 is not merged into HPCM 1.11
HPCM-5942  HTT fails with TypeError on EX255aNC
HPCM-5945  pulling image from running ubuntu node fails
HPCM-5948  switchconfig reports head nodes bonding as active-backup, yet is 
           set to 802.3ad
HPCM-5949  fabricperf reporting lower bandwidth on NDR fabric
HPCM-5950  HPCM 1.11 hwtriage leaves nfpga_print_regs script on the node 
           controller and does not clean-up
HPCM-5951  HPCM 1.11: During the configuration setup for Slingshot, the 
           periodicity values fail to update in FMN (as observed through 
           "fmn-show-telemetry-config").
HPCM-5953  HPCM 1.11: Inactive configurations of Slingshot are attempting to 
           be activated, resulting in a mix-up of configurations.
HPCM-5955  LDMS dashboard is not working and reporting this error "Failed to 
           upgrade legacy queries Datasource"
HPCM-5958  Update or remove .diagnose_node script from HTT
HPCM-5959  cadmin throws traceback when checking root labels
HPCM-5960  Recent change broke checkall function for cfirmware
HPCM-5963  ss cassini alerts CXI_EVENT_LINK_DOWN not being resolved when 
           CXI_EVENT_LINK_UP event received
HPCM-5964  Fix: Perf and noship Diags overlap fixes
HPCM-5965  HPCM 1.11: confluent-schema-registry.service remains in failed state
           on su-leader. Grafana Dashbord keep notifying the same. 
HPCM-5967  HPCM 1.11: Grafana dashboard not working due to timescaledb user 
           issue
HPCM-5968  cm image create missing option to create bt tarball
HPCM-5969  Show if bt tarball was created in cm image show
HPCM-5970  HPCM 1.11: Cluster health AMD command not working in Bardpeak node
HPCM-5972  cm-power-services: rpm post, link update not working
HPCM-5973  cm-power-service: controller endpoint don't include cec
HPCM-5974  cm-power-services: add @secure wrapper to POST/PATCH routes
HPCM-5977  sles15sp5 install fails, grub has no initrd, related to change from 
           mkinitrd to dracut
HPCM-5982  HPCM 1.11: Rectifier Check dashboard throws error about incorrect 
           data source
HPCM-5983  stout7 discover iscsi bug - roofs checking nfs in iscsi section 
           addServiceNodeToCluster
HPCM-5986  Fix diags build related to COOP-1296
HPCM-5987  Add latest HTT to HPCM 1.11
HPCM-5991  cm node slot copy and cm-node-slot-copy.8 should specify that 
           leaders and non-diskful nodes aren't supported
HPCM-5992  amdgpu-xhpl not working on bardpeak
HPCM-5993  EX235n nvidia-gpu-xhpl breaking
HPCM-5997  "Failed to upgrade legacy queries Datasource" and fixing Grafana UID
HPCM-5998  cm node add / discover fails; can't find admin in head mgmt net
HPCM-6000  export_fabric_template fails in SlingShot 2.2 , so cm health report 
           does not work
HPCM-6001  Update TS queries on PDU Dashboards 
HPCM-6003  HPCM 1.11: All functional health checks are showing failures on 
           Ubuntu nodes.
HPCM-6004  HPCM 1.11: The "cm health check cpuchk" command is indicating 
           failures
HPCM-6007  HPCM 1.10: su-leader-setup --help should run despite configuration 
           issues.
HPCM-6008  cm node slot copy should show progress of sync
HPCM-6009  Q-HA: SLES15 SP5: Physical hosts are installed such that 2,049 nfsd 
           processes are running
HPCM-6011  HPCM 1.11: cm health alertman switch command not working
HPCM-6012  HPCM 1.11: Slingshot Heartbeat not working properly in alerting. 
HPCM-6013  provide wiki instructions mfg - stage and re-use images and repos
HPCM-6014  system-power-capping get health fails
HPCM-6015  Q-HA: Rocky: Recovery from link down may not bring gluster online
HPCM-6021  nvdidiagpu-xhpl failing on EX235n
HPCM-6022  HPCM1.11 PBS alert not working 
HPCM-6023  Add copytruncate to various services logrotate settings
HPCM-6029  fix typo in echo statement miniroot-functions
HPCM-6031  HPCM1.11 sles15sp6 image fails - no mkinitrd
HPCM-6033  HPCM 1.11: Unable to retrieve the report of link MBE events within 
           a specific time or timeframe.
HPCM-6034  HPCM 1.11: Unable to display the report with additional fields such 
           as CableId in the MBE report.
HPCM-6035  HPCM 1.11: Unable to fetch the Columbia switch/port details for 
           inclusion in Slingshot health reporting.
HPCM-6036  HPCM 1.11: Unable to execute any functionalities related to "cm 
           health report slingshot port rxpause/txpause".
HPCM-6043  mpower --gpu --get-power: Not Displaying values
HPCM-6046  nvdidiagpu-xhpl failing on EX235n - prob size not calculated for 
           input percent
HPCM-6047  HPCM 1.11: Executing "cm monitoring rackmap map temperature/power 
           -d" results in NullValueNotAllowed errors.
HPCM-6049  HPCM 1.11: Slingshot switch group status not working properly in 
           grafana
HPCM-6050  Rebuild xkdiags and rank for SLES CPE 23.12 (x86)
HPCM-6053  False Ping Failures on large non-su-leader systems
HPCM-6054  Update docs to properly upgrade and versionlock conflicting HPCM 
           packages if COS or EPEL is used
HPCM-6055  85-nid-hostname does not work with non-padded hostnames
HPCM-6057  cmcinventory: arch template info not working. 
HPCM-6058  XD224: Hello_world diag execution hangs while executing.  
HPCM-6059  XD224: test4 diag execution failing with slurm error.
HPCM-6061  XD224: hpcg diag execution failing with slurm error.
HPCM-6063  HPCM 1.11: Please handle Error messages for "cm monitoring pbs 
           status/enable" command 
HPCM-6064  HPCM 1.11: AIOPS service fails after admin upgrade from HPCM1.10 to 
           HPCM1.11
HPCM-6067  Upgrade SDU to 2.3.2
HPCM-6070  HPCM 1.11: Upgrade: "iSCSI Login negotiation failed. rx_data 
           returned 0, expecting 48." error messages keep flooding on 
           su-leader's console as well as in /var/log/messages/ after rupgrade
HPCM-6071  HPCM 1.11: UPGRADE: Upgrade documentation should include ISCSI 
           related steps after upgrade from hpcm1.10 to hpcm1.11
HPCM-6072  HPCM 1.11: Upgrade: Some slingshot connecters do not enable after 
           upgrade from HPCM1.10 to HPCM1.11
HPCM-6073  Add/Fix  cbios & cpower support for XD224
HPCM-6074  add-ipv6-bond0.py should use full path to call 'cm' command
HPCM-6077  On systems with SU-leaders, opentracker-ipv4 and cm-aria2c start 
           before gluster is mounted
HPCM-6078  Add latest HTT to HPCM 1.11 Feb 15
HPCM-6079  linpack,hpcc & stream not getting installed as part of perf_diag
HPCM-6080  HPCM 1.11: AIOPs grafana dashboards have to use mon_reader user 
           instead of postgres user
HPCM-6081  mpower: XD224 fix get-limit (NVIDIA)
HPCM-6084  disable-su-leader should talk about iscsi in addition to nfs
HPCM-6088  HPCM 1.11: After leaders reboot elk services not running: opensearch 
HPCM-6089  DOC: HPCM 1.11: Upgrade: Old grafana dashbords needs to be 
           handle/deleted after upgrade. 
HPCM-6090  HPCM 1.11: After upgrade alerting got disabled so not getting alerts
HPCM-6097  HPCM 1.11 + SS 2.2: In "cm monitoring slingshot config" collector is 
           set as FMN, But in "/telemetry/configurations/hpcm_config/" 
           collector is coming as Listener.
HPCM-6100  Recompile ARM HPCG with new CPE 
HPCM-6101  80-tftp-setup: grep: /usr/lib/systemd/system/tftp.socket.d/tftp-
           override.conf: No such file or directory
HPCM-6102  RHEL8: Rebooting into cloned-slot fails when fips enabled
HPCM-6103  HPCM 1.11 + SS 2.2: In "cm monitoring slingshot config" periodicity 
           is set as 10, But in "/telemetry/configurations/hpcm_config/" 
           periodicity is coming as 60.
HPCM-6104  80-logstash-configure scp'ing to localhost, which may not be 
           permitted to ssh
HPCM-6111  cm health report slingshot refresh not dumping neighbour ports for 
           local and global ports
HPCM-6112  HPCM1.10:ROCKY88: Patch: No way to disable slingshot_congestion from 
           alerting because of it we are getting failure messages in 
           /var/log/messages on non slingshot system
HPCM-6116  release note certain iscsi errors that happen when a node is assigned to an image that isn't activated
HPCM-6117  Q-HA: gluster | dshbak -c after stopping libvirtd does not match 
           upgrade example
HPCM-6118  cm-logrotate-parallel needs to avoid any log files that were already 
           compressed
HPCM-6119  Stopping cmdb causes several services to also stop
HPCM-6120  Final HPCM 1.11 Aruba Firmware Recommendation
HPCM-6123  HPCM 1.11: UPgrade: Patroni service not started after admin upgrade 
           from hpcm1.10 to hpcm1.10. All monitoring were enabled and runing 
           before upgrade.
HPCM-6124  PIP warnings during library installation execution  failed in RHEL89 
           and ROCKY89 distros
HPCM-6129  HPCM 1.11: UPGRADE: Running scriptlet: clusterhealth seems not 
           successefull during upgrade
HPCM-6131  HPCM1.11/cfirmware fails to update slingshot switch 
HPCM-6132  Rackmap throws exception when map doesn't exist
HPCM-6133  pdu-collect: Readme not rendering from landing page
HPCM-6134  Fix output of timescale show schema upgrades
HPCM-6135  showrev does not output CM Release and CM Build when -j selected
HPCM-6136  Q-HA: internal-set-root-label is not working on Rocky
HPCM-6137  stout7: rocky89/rhel89: su-leader install failed due to 
           80-csn-distro-services failure
HPCM-6138  remlog-collect: Regression with session_key removal
HPCM-6144  HPCM 1.11: UPGRADE: On ICE admin sles15sp5 upgrade from hpcm1.10 
           to hpcm1.11 fails to upgrade kernel. 
HPCM-6146  Add file containing port information in the /docs directory of the 
           ISO and on the system
HPCM-6147  XD224 Nodes Need Sensormon support
HPCM-6148  HPCM 1.11: cm support moncollect syntax does not reflect that -w 
           and -n dependent on each other and are not mutually exclusive
HPCM-6149  HPCM 1.11: ssh fails on name resolution while running cm support 
           moncollect
HPCM-6150  HPCM 1.11: UPGRADE: Image details like(imageObject, imageObjectSize, 
           imageObjectCreationTime) shows Undefined after image upgrade. "cm 
           image show -v" command takes more time than expected. 
HPCM-6152  HPCM 1.11: ALERTING: cm health alertman sim -s throws Exception: 
           'datasource' ERROR: Failed to connect to alertmanager.
HPCM-6154  xkdiags failing for x86 and arm
HPCM-6156  HPCM 1.11: AIOPS: Trainer bugs
HPCM-6157  Q-ha SLES15sp5 QU1 qemu-generated adminvm.xml files generated
HPCM-6163  HPCM 1.11: UPGRADE: monitoringdb Version and USER do not upgrade. 
           Database connection fails with error "psql: FATAL: role "mon_reader" 
           does not exist" 
HPCM-6164  fabricperf not giving expected performance
HPCM-6168  cpwr REST API: content flag not being passed along
HPCM-6169  HPCM 1.11: UPGRADE: Ubuntu image fails to upgrade. 
HPCM-6174  HPCM 1.11: Regression: python urllib3 error with hwtriage CLI
HPCM-6175  mpwrcli: node --set-limit Regression  (uri_key no longer 'Chassis'
HPCM-6176  Add timeout setting to pdf-settings.ini
HPCM-6180  Add latest HTT to HPCM 1.11
HPCM-6182  cm controller show produces exception when controller with missing 
           nic exists in db
HPCM-6184  HPCM 1.11: Regression: cm monitoring pbs status throws error if 
           telegraf rpm is not installed
HPCM-6187  Parser.pm includes admin when a node hostname that matches the admin 
           hostname is specified
HPCM-6188  cinstallman should use /opt/clmgr/bin/pdsh instead of the default 
           pdsh
HPCM-6189  build online diags with rocm 6.0.0 (black)
HPCM-6190  sles15sp5: logrotate service is failing, some configs call 
           /etc/init.d/syslog which no longer exists
HPCM-6197  HPCM 1.11: cm monitoring timescaledb show --patroni-state throws an 
           error
HPCM-6198  HPCM 1.11: Regression: timescaledb show command when using 
           --patroni-state option fails
HPCM-6200  Change HPCM to 1.11 in the message so CrayOPC can filter events 
           correctly 
HPCM-6202  Setting up timescale fails when using --no-schema-upgrade
HPCM-6208  HPCM 1.11: Upgrade:AARCH64:Rocky88-Rocky89: Running scriptlet: 
           field_diags_licensed_aarch64 script fails with error during upgrade
HPCM-6209  DOC: Detailed performance dashboard uses DESC when it should be ASC 
           in query
HPCM-6211  Slinshot metric names exceed Postgres table name limit
HPCM-6213  Fabric summary dashboard use incorrect datasource
HPCM-6214  Move slingularity examples out of manuals and into separate doc
HPCM-6215  cm support collect does have a separator between repo group outputs
HPCM-6222  Unittests Only: asynctest is dead.  Fix async unittest
HPCM-6230  AIC: NHC checks not running on compute nodes
HPCM-6234  Remove all the clusterstor lustre
HPCM-6242  HPCM 1.11: AMD cm health check commands failed 
HPCM-6243  Diags build failing on black which is building with rocm
HPCM-6245  HPCM 1.11: Keep the Timescale Grafana alerting CDU rules disable by 
           default
HPCM-6247  HPCM 1.11: Regression: All cm commands genereate SyntaxError: 
           unmatched ')' in cm.log
HPCM-6248  Regression:Data from AMD GPUs is not being generated in native 
           monitoring.
HPCM-6250  current # cm wlm install setup for slurm has a bug in it
HPCM-6254  HPCM1.11 Reduced the slingshot switch alert query to 1 min


******************************************************************************

Product-ID: HPE Performance Cluster Manager 1.11.0 - Update 01
Last edit: Wed Mar 27 14:13:44 CDT 2024