HP XC System Software
Release Notes
Version 3.2
HP Part Number: A-XCRN3-2G
Published: March 2008
Table of Contents
About This Document.........................................................................................................7
1 New and Changed Features......................................................................................15
2 Important Release Information....................................................................................19
3 Hardware Preparation.................................................................................................21
4 Software Installation On The Head Node.................................................................23
5 System Discovery, Configuration, and Imaging........................................................25
Table of Contents
3
6 Software Upgrades......................................................................................................29
9 Programming and User Environment.........................................................................35
10 Cluster Platform 3000................................................................................................37
11 Cluster Platform 4000................................................................................................39
12 Cluster Platform 6000................................................................................................41
14 Interconnects...............................................................................................................45
14.3 QsNetII Interconnect......................................................................................................................45
15 Documentation............................................................................................................47
4
Table of Contents
6
About This Document
This document contains release notes for HP XC System Software Version 3.2. This document
contains important information about firmware, software, or hardware that might affect the
system.
An HP XC system is integrated with several open source software components. Some open source
software components are being used for underlying technology, and their deployment is
transparent. Some open source software components require user-level documentation specific
to HP XC systems, and that kind of information is included in this document when required.
HP relies on the documentation provided by the open source developers to supply the information
you need to use their product. For links to open source software documentation for products
Documentation for third-party hardware and software components that are supported on the
HP XC system is supplied by the third-party vendor. However, information about the operation
of third-party software is included in this document if the functionality of the third-party
component differs from standard behavior when used in the XC environment. In this case, HP
XC documentation supersedes information supplied by the third-party vendor. For links to
Standard Linux® administrative tasks or the functions provided by standard Linux tools and
commands are documented in commercially available Linux reference manuals and on various
Web sites. For more information about obtaining documentation for standard Linux administrative
tasks and associated topics, see the list of Web sites and additional publications provided in
Intended Audience
The release notes are intended for anyone who installs and configures an HP XC system, for
system administrators who maintain the system, for programmers who write applications to run
on the system, and for general users who log in to the system to run jobs.
The information in this document assumes that you have knowledge of the Linux operating
system.
Typographic Conventions
This document uses the following typographical conventions:
%, $, or #
A percent sign represents the C shell system prompt. A dollar
sign represents the system prompt for the Korn, POSIX, and
Bourne shells. A number sign represents the superuser prompt.
audit(5)
A manpage. The manpage name is audit, and it is located in
Section 5.
Command
A command name or qualified command phrase.
Text displayed by the computer.
A key sequence. A sequence such as Ctrl+x indicates that you
must hold down the key labeled Ctrl while you press another
key or mouse button.
Computer output
Ctrl+x
ENVIRONMENT VARIABLE
[ERROR NAME]
Key
The name of an environment variable, for example, PATH.
The name of an error, usually returned in the errnovariable.
The name of a keyboard key. Return and Enter both refer to the
same key.
Term
User input
The defined use of an important word or phrase.
Commands and other text that you type.
Intended Audience
7
Variable
The name of a placeholder in a command, function, or other
syntax display that you replace with an actual value.
[ ]
The contents are optional in syntax. If the contents are a list
separated by |, you can choose one of the items.
{ }
The contents are required in syntax. If the contents are a list
separated by |, you must choose one of the items.
. . .
The preceding element can be repeated an arbitrary number of
times.
|
Separates items in a list of choices.
WARNING
A warning calls attention to important information that if not
understood or followed will result in personal injury or
nonrecoverable system problems.
CAUTION
A caution calls attention to important information that if not
understood or followed will result in data loss, data corruption,
or damage to hardware or software.
IMPORTANT
NOTE
This alert provides essential information to explain a concept or
to complete a task.
A note contains additional information to emphasize or
supplement important points of the main text.
HP XC and Related HP Products Information
The HP XC System Software Documentation Set, the Master Firmware List, and HP XC HowTo
documents are available at this HP Technical Documentation Web site:
The HP XC System Software Documentation Set includes the following core documents:
HP XC System Software Release Notes
HP XC Hardware Preparation Guide
Describes important, last-minute information about firmware,
software, or hardware that might affect the system. This
document is not shipped on the HP XC documentation CD. It
is available only on line.
Describes hardware preparation tasks specific to HP XC that
are required to prepare each supported hardware model for
installation and configuration, including required node and
switch connections.
HP XC System Software Installation Guide
HP XC System Software Administration Guide
Provides step-by-step instructions for installing the HP XC
System Software on the head node and configuring the system.
Provides an overview of the HP XC system administrative
environment, cluster administration tasks, node maintenance
tasks, LSF® administration tasks, and troubleshooting
procedures.
HP XC System Software User's Guide
Provides an overview of managing the HP XC user environment
with modules, managing jobs with LSF, and describes how to
build, run, debug, and troubleshoot serial and parallel
applications on an HP XC system.
QuickSpecs for HP XC System Software
Provides a product overview, hardware requirements, software
requirements, software licensing information, ordering
information, and information about commercially available
software that has been qualified to interoperate with the HP XC
System Software. The QuickSpecs are located on line:
See the following sources for information about related HP products.
8
HP XC Program Development Environment
The Program Development Environment home page provide pointers to tools that have been
tested in the HP XC program development environment (for example, TotalView® and other
debuggers, compilers, and so on).
HP Message Passing Interface
HP Message Passing Interface (HP-MPI) is an implementation of the MPI standard that has been
integrated in HP XC systems. The home page and documentation is located at the following Web
site:
HP Serviceguard
HP Serviceguard is a service availability tool supported on an HP XC system. HP Serviceguard
enables some system services to continue if a hardware or software failure occurs. The HP
Serviceguard documentation is available at the following Web site:
HP Scalable Visualization Array
The HP Scalable Visualization Array (SVA) is a scalable visualization solution that is integrated
with the HP XC System Software. The SVA documentation is available at the following Web site:
HP Cluster Platform
The cluster platform documentation describes site requirements, shows you how to set up the
servers and additional devices, and provides procedures to operate and manage the hardware.
These documents are available at the following Web site:
HP Integrity and HP ProLiant Servers
Documentation for HP Integrity and HP ProLiant servers is available at the following Web site:
Related Information
This section provides useful links to third-party, open source, and other related software products.
Supplementary Software Products This section provides links to third-party and open source
software products that are integrated into the HP XC System Software core technology. In the
HP XC documentation, except where necessary, references to third-party and open source
software components are generic, and the HP XC adjective is not added to any reference to a
third-party or open source command or product name. For example, the SLURM sruncommand
is simply referred to as the sruncommand.
The location of each Web site or link to a particular topic listed in this section is subject to change
without notice by the site provider.
•
Home page for Platform Computing Corporation, the developer of the Load Sharing Facility
(LSF). LSF-HPC with SLURM, the batch system resource manager used on an HP XC system,
is tightly integrated with the HP XC and SLURM software. Documentation specific to
LSF-HPC with SLURM is provided in the HP XC documentation set.
Related Information
9
Standard LSF is also available as an alternative resource management system (instead of
LSF-HPC with SLURM) for HP XC. This is the version of LSF that is widely discussed on
the Platform Web site.
For your convenience, the following Platform Computing Corporation LSF documents are
shipped on the HP XC documentation CD in PDF format:
—
—
—
—
—
Administering Platform LSF
Administration Primer
Platform LSF Reference
Quick Reference Card
Running Jobs with Platform LSF
LSF procedures and information supplied in the HP XC documentation, particularly the
documentation relating to the LSF-HPC integration with SLURM, supersedes the information
supplied in the LSF manuals from Platform Computing Corporation.
The Platform Computing Corporation LSF manpages are installed by default. lsf_diff(7)
supplied by HP describes LSF command differences when using LSF-HPC with SLURM on
an HP XC system
The following documents in the HP XC System Software Documentation Set provide
information about administering and using LSF on an HP XC system:
—
—
HP XC System Software Administration Guide
HP XC System Software User's Guide
•
•
Documentation for the Simple Linux Utility for Resource Management (SLURM), which is
integrated with LSF to manage job and compute resources on an HP XC system.
Home page for Nagios®, a system and network monitoring application that is integrated
into an HP XC system to provide monitoring capabilities. Nagios watches specified hosts
and services and issues alerts when problems occur and when problems are resolved.
•
•
Home page of RRDtool, a round-robin database tool and graphing system. In the HP XC
system, RRDtool is used with Nagios to provide a graphical view of system status.
Home page for Supermon, a high-speed cluster monitoring system that emphasizes low
perturbation, high sampling rates, and an extensible data protocol and programming
interface. Supermon works in conjunction with Nagios to provide HP XC system monitoring.
•
•
Home page for the parallel distributed shell (pdsh), which executes commands across HP
XC client nodes in parallel.
Home page for syslog-ng, a logging tool that replaces the traditional syslogfunctionality.
The syslog-ngtool is a flexible and scalable audit trail processing tool. It provides a
centralized, securely stored log of all devices on the network.
•
Home page for SystemImager®, which is the underlying technology that distributes the
golden image to all nodes and distributes configuration changes throughout the system.
10
•
•
Home page for the Linux Virtual Server (LVS), the load balancer running on the Linux
operating system that distributes login requests on the HP XC system.
Home page for Macrovision®, developer of the FLEXlm™ license management utility, which
is used for HP XC license management.
•
•
Web site for Modules, which provide for easy dynamic modification of a user's environment
through modulefiles, which typically instruct the modulecommand to alter or set shell
environment variables.
Home page for MySQL AB, developer of the MySQL database. This Web site contains a link
to the MySQL documentation, particularly the MySQL Reference Manual.
Related Software Products and Additional Publications
This section provides pointers to Web
sites for related software products and provides references to useful third-party publications.
The location of each Web site or link to a particular topic is subject to change without notice by
the site provider.
Linux Web Sites
•
Home page for Red Hat®, distributors of Red Hat Enterprise Linux Advanced Server, a
Linux distribution with which the HP XC operating environment is compatible.
•
This Web site for the Linux Documentation Project (LDP) contains guides that describe
aspects of working with Linux, from creating your own Linux system from scratch to bash
script writing. This site also includes links to Linux HowTo documents, frequently asked
questions (FAQs), and manpages.
•
•
Web site providing documents and tutorials for the Linux user. Documents contain
instructions for installing and using applications for Linux, configuring hardware, and a
variety of other topics.
Home page for the GNU Project. This site provides online software and information for
many programs and utilities that are commonly used on GNU/Linux systems. Online
information include guides for using the bashshell, emacs, make, cc, gdb, and more.
MPI Web Sites
•
Contains the official MPI standards documents, errata, and archives of the MPI Forum. The
MPI Forum is an open group with representatives from many organizations that define and
maintain the MPI standard.
•
A comprehensive site containing general information, such as the specification and FAQs,
and pointers to other resources, including tutorials, implementations, and other MPI-related
sites.
Related Information
11
Compiler Web Sites
•
Web site for Intel® compilers.
•
•
Web site for general Intel software development information.
Home page for The Portland Group™, supplier of the PGI® compiler.
Debugger Web Site
Home page for Etnus, Inc., maker of the TotalView® parallel debugger.
Software RAID Web Sites
•
•
A document (in two formats: HTML and PDF) that describes how to use software RAID
under a Linux operating system.
Provides information about how to use the mdadmRAID management utility.
Additional Publications
For more information about standard Linux system administration or other related software
topics, consider using one of the following publications, which must be purchased separately:
•
•
•
•
•
•
•
•
Linux Administration Unleashed, by Thomas Schenk, et al.
Linux Administration Handbook, by Evi Nemeth, Garth Snyder, Trent R. Hein, et al.
Managing NFS and NIS, by Hal Stern, Mike Eisler, and Ricardo Labiaga (O'Reilly)
MySQL, by Paul Debois
MySQL Cookbook, by Paul Debois
High Performance MySQL, by Jeremy Zawodny and Derek J. Balling (O'Reilly)
Perl Cookbook, Second Edition, by Tom Christiansen and Nathan Torkington
Perl in A Nutshell: A Desktop Quick Reference , by Ellen Siever, et al.
Manpages
Manpages provide online reference and command information from the command line. Manpages
are supplied with the HP XC system for standard HP XC components, Linux user commands,
LSF commands, and other software components that are distributed with the HP XC system.
Manpages for third-party software components might be provided as a part of the deliverables
for that component.
Using discover(8) as an example, you can use either one of the following commands to display a
manpage:
$ man discover
$ man 8 discover
If you are not sure about a command you need to use, enter the mancommand with the -koption
to obtain a list of commands that are related to a keyword. For example:
$ man -k keyword
12
HP Encourages Your Comments
HP encourages comments concerning this document. We are committed to providing
documentation that meets your needs. Send any errors found, suggestions for improvement, or
compliments to:
Include the document title, manufacturing part number, and any comment, error found, or
suggestion for improvement you have concerning this document.
HP Encourages Your Comments
13
14
1 New and Changed Features
This chapter describes the new and changed features delivered in HP XC System Software Version
3.2.
1.1 Base Distribution and Kernel
The following table lists information about the base distribution and kernel for this release as
compared to the last HP XC release.
HP XC Version 3.2
HP XC Version 3.1
Enterprise Linux 4 Update 4
Enterprise Linux 4 Update 3
HP XC kernel version 2.6.9-42.9hp.XC
Based on Red Hat kernel version 2.6.9-42.0.8.EL
HP XC kernel version 2.6.9-34.7hp.XC
Based on Red Hat kernel version 2.6.9-34.0.2.EL
1.2 Support for Additional Hardware Models
In this release, the following additional hardware models and hardware components are supported
in an HP XC hardware configuration.
•
HP ProLiant servers:
—
—
—
—
—
—
HP ProLiant DL360 G5
HP ProLiant DL380 G5
HP ProLiant DL580 G4
HP ProLiant DL145 G3
HP ProLiant DL385 G2
HP ProLiant DL585 G2
•
HP Integrity servers and workstations:
—
—
—
HP Integrity rx2660
HP Integrity rx4640
HP xw9400 workstation
1.3 OpenFabrics Enterprise Distribution for InfiniBand
Starting with this release, the HP XC System Software uses the OpenFabrics Enterprise Distribution
(OFED) InfiniBand software stack.
OFED is an open software stack supported by the major InfiniBand vendors as the future of
InfiniBand support. OFED offers improved support of multiple HCAs per node. The OFED stack
has a different structure and different commands from the InfiniBand stack that was used in
previous HP XC releases.
See the following web page for more information about OFED:
The HP XC System Software Administration Guide provides OFED troubleshooting information.
1.4 HP Scalable Visualization Array
HP Scalable Visualization Array (SVA) software is now included on the HP XC System Software
DVD distribution media. SVA provides a comprehensive set of services for deployment of
visualization applications, allowing them to be conveniently run in a Linux clustering
environment.
1.1 Base Distribution and Kernel
15
The following are the key features of SVA:
•
•
Capturing and managing visualization-specific cluster information
Managing visualization resources and providing facilities for requesting and allocating
resources for a job in a multi-user, multi-session environment
•
•
•
Providing display surface configuration tools to allow easy configuration of multi-panel
displays
Providing launch tools, both generic and tailored to a specific application, that launch
applications with appropriate environments and display surface configurations
Providing tools that extend serial applications to run in a clustered, multi-display
environment
See the HP XC QuickSpecs and the SVA documentation set for more information about SVA
features. The SVA documentation set is included on the HP XC Documentation CD.
Because the SVA RPMs are included on the HP XC distribution media, the SVA installation
process has been integrated with the HP XC installation process. The HP XC System Software
Installation Guide was revised where appropriate to accommodate SVA installation and
configuration procedures.
1.5 Partition Size Limits on Installation Disk
Because the installation disk size can vary, partition sizes are calculated as a percentage of total
disk size. However, using a fixed percentage of the total disk size to calculate the size of each
disk partition can result in needlessly large partition sizes when the installation disk is larger
than 36 GB. Thus, for this release, limits have been set on the default partition sizes to leave space
on the disk for other user-defined file systems and partitions.
1.6 More Flexibility in Customizing Client Node Disk Partitions
You can configure client node disks on a per-image and per-node basis to create an optional
scratch partition to maximize file system performance. Partition sizes can be fixed or they can
be based on a percentage of total disk size. To do so, you set the appropriate variables in the
/opt/hptc/systemimager/etc/make_partitions.shfile or set the variables in
user-defined files with a .partextension.
The procedure that describes how to customize client node disk partitions is documented in the
HP XC System Software Installation Guide.
1.7 Enhancements to the discover Command
. The following options were added to the discovercommand:
•
The --nodesonlyoption reads in the database and discover all nodes if the hardware
configuration contains HP server blades and enclosures. This option is valid only when the
--enclosurebasedoption is also used
•
The --nothreadsoption runs the node discovery process without threads if the hardware
configuration contains HP server blades and enclosures. This option is valid only when the
--enclosurebasedoption is also used.
1.8 Enhancements to the cluster_config Utility
The cluster_configutility prompts you to specify whether you want to configure the Linux
virtual server (LVS) director to act as a real server, that is, a node that accepts login sessions.
If you answer yes, the LVS director is configured to act as a login session server in addition to
arbitrating and dispersing the login session connections.
If you answer no, the LVS director does not participate as a login session server; its only function
is to arbitrate and disperse login sessions to other nodes. This gives you the flexibility to place
16
New and Changed Features
the loginrole on the head node yet keep the head node load to a minimum because login
sessions are not being spawned.
This configuration choice is documented in the HP XC System Software Installation Guide.
1.9 System Management and Monitoring Enhancements
System management and monitoring utilities have been enhanced as follows:
•
A new resource monitoring tool, resmon, has been added. resmonis a job-centric resource
monitoring Web page initially inspired by the open-source clumonproduct. resmoninvokes
useful commands to collect and present data in a scalable and intuitive fashion. The resmon
Web pages update automatically at a preconfigured interval (120 seconds by default).
See resmon(1) for more information.
•
The HP Graph Web interface has been enhanced to include a cpu temperaturegraph.
To access this new graph, select temperature from the Metrics pull-down menu at the top
of the Web page.
1.10 Enhancements to the OVP
The operation verification program (OVP) performance health tests were updated to accept an
option to specify an LSF queue. In addition, you can run two performance health tests,
network_stressand network_bidirectional, on systems that are configured with standard
LSF or configured with LSF-HPC with SLURM.
1.11 Installing and Upgrading HP XC System Software On Red Hat
Enterprise Linux
The HP XC System Software Installation Guide contains two new chapters that describes the
following topics:
•
•
Installing HP XC System Software Version 3.2 on Red Hat Enterprise Linux
Upgrading HP XC System Software Version 3.1 on Red Hat Enterprise Linux to HP XC
System Software Version 3.2 on Red Hat Enterprise Linux
1.12 Support For HP Unified Parallel C
This release provides support for the HP Unified Parallel C (UPC) application development
environment.
HP UPC is a parallel extension of the C programming language, which runs on both common
types of multiprocessor systems: those with a common global address space (such as SMP) and
those with distributed memory. UPC provides a simple shared memory model for parallel
programming, allowing data to be shared or distributed among a number of communicating
processors. Constructs are provided in the language to permit simple declaration of shared data,
distribute shared data across threads, and synchronize access to shared data across threads. This
model promises significantly easier coding of parallel applications and maximum performance
across shared memory, distributed memory, and hybrid systems.
See the following Web page for more information about HP UPC:
1.9 System Management and Monitoring Enhancements
17
1.13 Documentation Changes
The following changes were made to the HP XC System Software Documentation Set
•
The following manuals have been affected by the new functionality delivered in this release
and have been revised accordingly:
—
—
—
—
HP XC Hardware Preparation Guide
HP XC System Software Installation Guide
HP XC System Software Administration Guide
HP XC System Software User's Guide
•
•
The information in the Configuring HP XC Systems With HP Server Blades and Enclosures -
Edition 9 HowTo was merged into the HP XC Hardware Preparation Guide and HP XC System
Software Installation Guide, reducing the number of documents you have to read to install
and configure an HP XC system that contains HP server blades and enclosures.
The HP XC System Software Release Notes are updated periodically. Therefore, HP recommends
version of this document because the version you are reading now might have been updated
since the last time you downloaded it.
HP XC HowTos On the Worldwide Web
HP XC information that is published between releases is issued in HowTo documents at the
following Web site:
18
New and Changed Features
2 Important Release Information
This chapter contains information that is important to know for this release.
2.1 Firmware Versions
The HP XC System Software is tested against specific minimum firmware versions. Follow the
instructions in the accompanying hardware documentation to ensure that all hardware
components are installed with the latest firmware version.
The master firmware tables for this release are available at the following Web site:
The master firmware tables list the minimum firmware versions on which the Version 3.2 HP
XC System Software has been qualified. At a minimum, the HP XC system components must be
installed with these versions of the firmware.
Read the following guidelines before upgrading the firmware on any component in the hardware
configuration:
•
Never downgrade to an older version of firmware unless you are specifically instructed to
do so by the HP XC Support Team.
•
The master firmware tables clearly indicate newer versions of the firmware that are known
to be incompatible with the HP XC software. Incompatible versions are highlighted in bold
font. Do not install these known incompatible firmware versions because unexpected
system behavior might occur.
•
•
There is always the possibility that a regression in functionality is introduced in a firmware
version. It is possible that the regression could cause anomalies in HP XC operation. Report
regressions in HP XC operation that result from firmware upgrades to the HP XC Support
Team:
Contact the HP XC Support Team if you are not sure what to do regarding firmware versions.
2.2 Patches
Software patches might be available for this release. Because network connectivity is not
established during a new installation until the cluster_preputility has finished preparing
the system, you are instructed to download the patches when you reach that point in the
installation and configuration process. The HP XC System Software Installation Guide provides
more information about where to access and download software patches.
2.1 Firmware Versions
19
20
3 Hardware Preparation
Hardware preparation tasks are documented in the HP XC Hardware Preparation Guide. This
chapter contains information that was not included in that document at the time of publication.
3.1 Upgrading BMC Firmware On HP ProLiant DL140 G2 and DL145
G2 Nodes
This note applies only if the hardware configuration contains HP ProLiant DL140 G2 or DL145
G2 nodes and you are upgrading an existing HP XC system from Version 2.1 or Version 3.0 to
Version 3.2.
The HP ProLiant DL140 G2 (G2) and DL145 G2 series of hardware models must be installed with
BMC firmware version 1.25 or greater. However, the BMC version 1.25 firmware was not
supported by HP XC Version 3.0 or earlier. As a result, you must update the BMC firmware on
these nodes after you upgrade the system to HP XC Version 3.2, which is contrary to the upgrade
instructions for a typical upgrade.
Before upgrading an HP XC system to Version 3.2, contact the HP XC Support Team and request
the procedure to upgrade the BMC firmware on HP ProLiant DL140 G2 and DL145 G2 nodes:
3.1 Upgrading BMC Firmware On HP ProLiant DL140 G2 and DL145 G2 Nodes
21
22
4 Software Installation On The Head Node
This chapter contains notes that apply to the HP XC System Software Kickstart installation
session.
4.1 Manual Installation Required For NC510F Driver
The unm_nic driver is provided with the HP XC software distribution, however, it does not load
correctly.
If your system has a NC510F 10 GB Ethernet card, run the following commands to load the driver:
# depmod -a
# modprobe -v unm_nic
Then, edit the /etc/modprobe.conffile and specify unmas the driver for the ethdevice
assigned to the NC510F driver.
4.1 Manual Installation Required For NC510F Driver
23
24
5 System Discovery, Configuration, and Imaging
This chapter contains information about configuring the system. Notes that describe additional
configuration tasks are mandatory and have been organized chronologically. Perform these tasks
in the sequence presented in this chapter.
The HP XC system configuration procedure is documented in the HP XC System Software
Installation Guide.
platform-specific notes apply to the system discovery, configuration, or imaging process.
5.1 Notes That Apply Before You Invoke The cluster_prep Utility
Read the notes in this section before you invoke the cluster_preputility.
5.1.1 Required Task for Some NIC Adapter Models: Verify Correct NIC Device
Driver Mapping
On head nodes installed with dual-fiber NIC server adapter models NC6170 or NC7170, Ethernet
ports might be reordered between the Kickstart kernel and the subsequent HP XC kernel reboot.
Use the procedure described in this section to correct the mapping if a re-ordering has occurred.
At the time of the Kickstart installation, the fiber ports are identified as eth0 and eth1, and the
onboard ports are identified as eth2 and eth3.
The /etc/modprobe.conffile is written as follows:
• alias eth0 e1000
• alias eth1 e1000
• alias eth2 tg3
• alias eth3 tg3
You must correct this mapping if you find that upon the HP XC kernel reboot, eth0 and eth1 are
the tg3 devices, and eth2 and eth3 are the e1000 devices. To get the external network connection
working, perform this procedure from a locally-connected terminal before invoking the
cluster_preputility:
1. Unload the tg3 and e1000 drivers:
# rmmod e1000
# rmmod tg3
2. Use the text editor of your choice to edit the /etc/modprobe.conffile to correct the
mapping of drivers to devices. The section of this file should look like this when you are
finished:
alias eth0 tg3
alias eth1 tg3
alias eth2 e1000
alias eth3 e1000
3. Save your changes and exit the text editor.
4. Use the text editor of your choice to edit the
/etc/sysconfig/network-scripts/ifcfg-eth[0,1,2,3]files, and remove the
HWADDRline from each file if it is present.
5. If you made changes, save your changes and exit each file.
6. Reload the modules:
5.1 Notes That Apply Before You Invoke The cluster_prep Utility
25
# modprobe tg3
# modprobe e1000
7. Follow the instructions in the HP XC System Software Installation Guide to complete the cluster
configuration process (beginning with the cluster_prepcommand).
5.2 Notes That Apply To The Discover Process
The notes in this section apply to the discoverutility.
5.2.1 Discovery of HP ProLiant DL140 G3 and DL145 G3 Nodes Fails When
Graphics Cards Are Present
When an HP ProLiant DL140 G3 or DL145 G3 node contains a graphics card, the nodes often
fail to PXE boot. Even when the BIOS boot settings are configured to include a PXE boot, these
settings are often reset to the factory defaults when the BIOS restarts after saving the changes.
This action causes the discovery and imaging processes to fail.
Follow this procedure to work around the discovery failure:
1. Begin the discovery process as usual by issuing the appropriate discovercommand.
2. When the discovery process turns on power to the nodes of the cluster, manually turn off
the DL140 G3 and DL145 G3 servers that contain graphics cards.
3. Manually turn on power to each DL140 G3 and DL145 G3 server one at a time, and use the
cluster’s console to force each node to PXE boot. Do this by pressing the F12 key at the
appropriate time during the BIOS start up.
After you complete this task for each DL140 G3 and DL145 G3 server containing a graphics card,
the discovery process continues and completes successfully.
The work around for the imaging failure on these servers is described in “ HP ProLiant DL140
the appropriate place to perform the task.
5.3 Notes That Apply Before You Invoke The cluster_config Utility
Read the notes in this section before you invoke the cluster_configutility.
5.3.1 Adhere To Role Assignment Guidelines for Improved Availability
When you are configuring services for improved availability, you must adhere to the role
assignment guidelines in Table 1-2 in the HP XC System Software Installation Guide. Role
assignments for a traditional HP XC system without improved availability of services is slightly
different, thus it is important that you follow the guidelines in Table 1-2.
5.4 Benign Message From C52xcgraph During cluster_config
You might see the following message when you run the cluster_configutility on a cluster
with an InfiniBand interconnect:
.
.
.
Executing C52xcgraph gconfigure
Found no adapter info on IR0N00
Failed to find any Infiniband ports
Executing C54httpd gconfigure
.
.
.
26
System Discovery, Configuration, and Imaging
This message is displayed because the C52xcgraph configuration script is probing the InfiniBand
switch to determine how many HCAs with an IP address are present. Because the HCAs have
not yet been assigned an IP address, C52xcgraph does not find any HCAs with an IP address
and prints the message. This message does not prevent the cluster_configutility from
completing.
To work around this issue, after the cluster is installed and configured, run
/opt/hptc/hpcgraph/sbin/hpcgraph-setupwith no options.
5.5 Processing Time For cluster_config Might Take Longer On A Head
Node With Improved Availability
The cluster_configutility processing time can take approximately ten minutes longer if it
is run on a head node that is configured for improved availability with Serviceguard when the
remaining nodes of the cluster are up and running.
After the entire system has been imaged and booted, you might need to re-run the
cluster_configprocedure to modify the node configuration. If the other node in the availability
set with the head node is up and running, the Serviceguard daemons attempt to establish
Serviceguard related communication with the node when they are restarted. Because the other
node in the availability set is not actively participating in a Serviceguard cluster, it will not
respond to the Serviceguard communication.
The Serviceguard software on the head node retries this communication until the communication
times out. On a system running with the default Serviceguard availability configuration, the
timeout is approximately ten minutes.
5.6 Notes That Apply To Imaging
The notes in this section apply to propagating the golden image to all nodes, which is
accomplished when you invoke the startsyscommand.
5.6.1 HP ProLiant DL140 G3 and DL145 G3 Node Imaging Fails When Graphics
Cards Are Present
Cards Are Present” (page 26), the discovery and imaging processes might fail on HP ProLiant
DL140 G3 or DL145 G3 servers containing graphics cards.
The work around for the discovery failure is described in “Discovery of HP ProLiant DL140 G3
and DL145 G3 Nodes Fails When Graphics Cards Are Present” (page 26), and the work around
for the imaging process described in this section assumes that all nodes were discovered.
Follow this procedure to propagate the golden image to DL140 G3 and DL145 G3 servers
containing a graphics card:
1. Issue the appropriate startsyscommand and specify one of the DL140 G3 or DL145 G3
nodes with a graphics card in the [nodelist]option of the startsyscommand.
2. When power to the node is turned on, use the cluster console to connect to the node and
force it to PXE boot by pressing the F12 key at the appropriate time during the BIOS start
up.
3. When the node is successfully imaged, repeat this process for the remaining nodes containing
graphics cards.
4. When all nodes containing graphics cards are imaged, issue the startsyscommand without
the [nodelist]option to image all remaining nodes of the cluster in parallel.
5.5 Processing Time For cluster_config Might Take Longer On A Head Node With Improved Availability
27
28
6 Software Upgrades
This chapter contains notes about upgrading the HP XC System Software from a previous release
to this release.
from a previous release to this release. Therefore, when performing an upgrade, make sure you
also read and follow the instructions in those chapters.
6.1 Do Not Upgrade If You Want Or Require The Voltaire InfiniBand
Software Stack
HP XC System Software Version 3.2 installs and uses the OFED InfiniBand software stack by
default. Previous HP XC releases installed the Voltaire InfiniBand software stack. If you want
to continue using the Voltaire InfiniBand software stack, do not upgrade to HP XC System
Software Version 3.2.
6.1 Do Not Upgrade If You Want Or Require The Voltaire InfiniBand Software Stack
29
30
7 System Administration, Management, and Monitoring
This chapter contains notes about system administration, management, and monitoring.
7.1 Perform A Dry Run Before Using The si_updateclient Utility To Update
Nodes
The si_updateclientutility can leave nodes in an unbootable state in certain situations. You
can still use si_updateclientto deploy image changes to nodes. However, before you update
any nodes, HP recommends that you perform a dry run first to ensure that files in the /boot
directory are not updated. Updating files in /bootcan result in nodes being unable to boot.
You can retieve a list of files that will be updated by si_updateclientby specifying
--dry-runon the command line.
7.2 Possible Problem With ext3 File Systems On SAN Storage
Issues have been reported when an ext3 file system fills up to the point where ENOSPCis returned
to write requests for a long period of time, and the file system is subsequently unmounted. A
forced check is initiated (fsck -fy) before the next mount. It appears that the fsckchecks
might corrupt the file system inode information.
This problem has been seen only on fibre channel (SAN) storage; it has not been seen with directly
attached storage or NFS storage.
For more information about details and work arounds, consult Bugzilla number 175877 at the
following URL:
7.1 Perform A Dry Run Before Using The si_updateclient Utility To Update Nodes
31
32
8 HP XC System Software On Red Hat Enterprise Linux
The notes in this chapter apply when the HP XC System Software is installed on Red Hat
Enterprise Linux.
8.1 Enabling 32–bit Applications To Compile and Run
To compile and run 32-bit applications on a system running HP XC System Software on Red Hat
Enterprise Linux 4 on HP Integrity platforms, use the following commands to install the
glibc-2.3.4-2.25.i686.rpmfrom the HP XC distribution media DVD:
# mount /dev/cdrom
# cd /mnt/cdrom/LNXHPC/RPMS
# rpm -ivh glibc-2.3.4-2.25.i686.rpm
8.1 Enabling 32–bit Applications To Compile and Run
33
34
9 Programming and User Environment
This chapter contains information that applies to the programming and user environment.
9.1 MPI and OFED InfiniBand Stack Fork Restrictions
With the introduction of the OFED InfiniBand stack in this release, MPI applications cannot call
fork(), popen(), and system()between MPI_Initand MPI_Finalize. This is known to
affect some applications like NWChem.
9.2 InfiniBand Multiple Rail Support
HP-MPI provides multiple rail support on OpenFabric through the MPI_IB_MULTIRAIL
environment variable. This environment variable is ignored by all other interconnects. In multi-rail
mode, a rank can use up to all cards on its node, but it is limited to the number of cards on the
node to which it is connecting.
For example, if rank A has three cards, rank B has two cards, and rank C has three cards, then
connection A--B uses two cards, connection B--C uses two cards, and connection A--C uses three
cards. Long messages are striped among all the cards on that connection to improve bandwidth.
By default, multi-card message striping is off. To turn it on, specify -e MPI_IB_MULTIRAIL=N
where N is the number of cards used by a rank:
•
•
•
If N <= 1, message striping is not used.
If N is greater than the maximum number of cards M on that node, all M cards are used.
If 1 < N <= M, message striping is used on N cards or less.
If you specify -e MPI_IB_MULTIRAIL, the maximum possible cards are used.
On a host, all the ranks select all the cards in a series. For example: given 4 cards and 4 ranks per
host:
•
•
•
•
rank 0 will use cards 0, 1, 2, 3
rank 1 will use cards 1, 2, 3, 0
rank 2 will use cards 2, 3, 0, 1
rank 4 will use cards 3, 0, 1, 2
The order is important in SRQ mode because only the first card is used for short messages. The
selection approach allows short RDMA messages to use all the cards in a balanced way.
For HP-MPI 2.2.5.1 and older, all cards must be on the same fabric.
9.3 Benign Messages From HP-MPI Version 2.2.5.1
When running jobs with XC Version 3.2, OFED InfiniBand, and HP-MPI Version 2.2.5.1 the
following messages are printed once for each rank:
libibverbs: Warning: fork()-safety requested but init failed
HP-MPI Version 2.2.5.1 has support for fork()using OFED 1.2, but only for kernels more recent
than version 2.6.12. HP XC Version 3.2 is currently based on kernel version 2.6.9. This message
is a reminder that fork()is not supported in this release.
You can suppress this message by defining the MPI_IBV_NO_FORK_SAFEenvironment variable,
as follows:
% /opt/hpmpi/bin/mpirun -np 4 -prot -e MPI_IBV_NO_FORK_SAFE=1 -hostlist nodea,nodeb,nodec,noded /my/dir/hello_world
9.1 MPI and OFED InfiniBand Stack Fork Restrictions
35
36
38
40
12 Cluster Platform 6000
This chapter contains information that applies only to Cluster Platform 6000 systems.
12.1 Network Boot Operation and Imaging Failures on HP Integrity
rx2600 Systems
An underlying issue in the kernel is causing MAC addresses on HP Integrity rx2600 systems to
be set to all zeros (for example, 00.00.00.00.00), which results in network boot and imaging failures.
To work around this issue, enter the following commands on the head node to network boot
and image an rx2600 system:
1. Prepare the node to network boot:
# setnode --resync node_name
2. Turn off power to the node:
# stopsys --hard node_name
3. Start the imaging and boot process:
# startsys --image_and_boot node_name
12.2 Notes That Apply To The Management Processor
This section describes limitations with the management processor (MP) that are expected to be
resolved when a new firmware version is available.
12.2.1 Required Task: Change MP Settings on Console Switches
Perform this task before invoking the discovercommand.
In order for the discovery process to work correctly using the MP in DHCP mode, you must
increase the amount of time the console switches hold MAC addresses. Increase this value from
the default of 300 seconds to 1200 seconds. Make this change only on the console switches in the
system, typically the ProCurve 26xx series.
From the ProCurve prompt, enter the configuration mode and set the mac-age-timeparameter,
as follows:
# config
(config)# mac-age-time 1200
12.2.2 MP Disables DHCP Automatically
A known limitation exists with the MP firmware that causes the MP to disable DHCP
automatically.
To work around this issue, the HP XC software performs the discovery phase with DHCP enabled.
You must then perform a procedure to change the addresses on all MPs in the system to use the
address received from DHCP as a static address.
For more information on how to perform this procedure, contact the HP XC Support Team at
12.2.3 Finding the IP Address of an MP
Because the IP addresses for the MPs are being set statically for this release, if a node must be
replaced, you must set the IP address for the MP manually when the node is replaced.
To find the IP address, look up the entry for the MP in the /etc/dhcpd.conffile. The MP
naming convention for the node is cp-node_name .
12.1 Network Boot Operation and Imaging Failures on HP Integrity rx2600 Systems
41
42
13 Integrated Lights Out Console Management Devices
This chapter contains information that applies to the integrated lights out (iLO and iLO2) console
management device.
13.1 iLO2 Devices In Server Blades Can Hang
There is a known problem with the iLO2 console management devices that causes the iLO2
devices to hang. This particular problem has very specific characteristics:
•
•
This problem is typically seen within one or two days of the initial cluster installation.
Most of the time, but not always, all iLO2 devices in a particular enclosure hang at the same
time.
•
The problem usually affects multiple enclosures.
The work around for this problem is to completely power cycle the entire cluster (or at least all
enclosures) after the initial cluster installation is complete or if the problem is encountered. This
problem has never been reported after the power has been cycled and the cluster is in its normal
running state.
This problem is targeted for resolution in iLO2 firmware Version 1.28, but at the time of
publication, had not been tested yet.
13.1 iLO2 Devices In Server Blades Can Hang
43
44
14 Interconnects
This chapter contains information that applies to the supported interconnect types:
•
•
•
14.1 InfiniBand Interconnect
The notes in this section apply to the InfiniBand interconnect.
14.1.1 enable Password Problem With Voltaire Switch Version 4.1
The instructions for configuring Voltaire InfiniBand switch controller cards requires you to
change the factory default passwords for the adminand enableaccounts, as follows:
Insert new (up to 8 characters) Enter password :
An issue exists where you must enter a password with exactly eight characters for the enable
account. The adminaccount is not affected.
If the new password does not contain exactly eight characters, the following message appears
when you try to log in with the new password:
Unauthorized mode for this user, wrong password or illegal mode in the first word.
This problem has been reported to Voltaire. As a work around, choose a password that is exactly
eight characters.
14.2 Myrinet Interconnect
The following release notes are specific to the Myrinet interconnect.
14.2.1 Myrinet Monitoring Line Card Can Become Unresponsive
A Myrinet monitoring line card can become unresponsive some period of time after it has been
set up with an IP address with DHCP. This is a problem known to Myricom. For more information,
see the following:
If the line card becomes unresponsive, re-seat the line card by sliding it out of its chassis slot and
then slide it back in. You can do this while the system is up; doing so does not interfere with
Myrinet traffic.
14.2.2 The clear_counters Command Does Not Work On The 256 Port Switch
The /opt/gm/sbin/clear_counterscommand does not clear the counters on the Myrinet
256 port switch. The web interface to the Myrinet 256 port switch has changed from the earlier,
smaller switches.
To clear the switch counters, you must open an interactive Web connection to the switch and
clear the counters using the menu commands. The gm_prodmode_monscript, which uses the
clear_counterscommand, will not clear the counters periodically, as it does on the smaller
switches.
This problem will be resolved in a future software update from Myricom.
14.3 QsNetII Interconnect
The following release notes are specific to the QsNetII® interconnect.
14.1 InfiniBand Interconnect
45
14.3.1 Possible Conflict With Use of SIGUSR2
The Quadrics QsNetII software internally uses SIGUSR2to manage the interconnect. This can
conflict with any user applications that use SIGUSR2, including for debugger use.
To work around this conflict, set the environment variable LIBELAN4_TRAPSIGfor the application
to a different signal number other than the default value 12 that corresponds to SIGUSR2. Doing
this instructs the Quadrics software to use the new signal number, and SIGUSR2can be once
again used by the application. Signal numbers are define in the /usr/include/asm/signal.h
file.
14.3.2 The qsnet Database Might Contain Entries To Nonexistent Switch Modules
Depending on the system topology, the qsnetdiagnostics database might contain entries to
nonexistent switches.
This issue is manifested as errors reported by the /usr/bin/qsctrlutility similar to the
following:
# qsctrl
qsctrl: failed to initialise module QR0N03: no such module (-7)
.
.
.
In the previous example, the switch_modulestable in the qsnetdatabase is populated with
QR0N03even though the QR0N03module is not physically present. This problem has been
reported to Quadrics, Ltd.
To work around this problem, delete the QR0N03entry (and any other nonexistent switch entries)
from the switch_modulestable, and restart the swmloggerservice:
# mysql -u root -p qsnet
mysql> delete from switch_modules where name="QR0N03";
mysql> quit
# service swm restart
In addition to the previous problem, the IP address of a switch module might be incorrectly
populated in the switch_modulestable, and you might see the following message:
# qsctrl
qsctrl: failed to parse module name 172.20.66.2
.
.
.
Resolve this issue by deleting the IP address from the switch_modulestable and restarting
the swmloggerservice:
# mysql -u root -p qsnet
mysql> delete from switch_modules where name="172.20.66.2";
mysql> quit
# service swm restart
NOTE: You must repeat the previous procedure if you invoke the cluster_configutility
again and you choose to re-create the qsnetdatabase during the cluster_configoperation.
46
Interconnects
15 Documentation
This chapter describes known issues with the HP XC documentation.
15.1 Documentation CD Search Option
If you are viewing the main page of the HP XC Documentation CD, you cannot perform a
literature search from the Search: option box at the top of the page.
for More options. The Advanced search options page is displayed, and you can perform the
search from the advanced page.
15.2 HP XC Manpages
The notes in this section apply to the HP XC manpages.
15.2.1 New device_config.8
A manpage is available for the device_configcommand. The device_configcommand
enables you to modify the device configuration information in the HP XC command and
management database (CMDB). Uses for this command include configuring a range of default
external network interface cards (NICs) across multiple nodes and configuring one or two
additional, external NICs on the same node.
15.2.2 Changes to ovp.8
Note the following two changes to the ovp(8) manpage:
1. Under -o options , --opts_for_test[=]options, add the following before
--user=username:
--queue LSF_queue
Specifies the LSF queue for the performance health tests.
2. Change the following portion of the -v component, --verify[=]component as follows:
OLD:
For all users:
This option takes the form --verify=perf_health/test
cpu
Tests CPU core performance using the Linpack benchmark
NEW:
For all users:
This option takes the form --verify=perf_health/test
NOTE: Except for the network_stress and network_bidirectional
tests, these tests only apply to systems that install
LSF-HPC incorporated with SLURM. The network_stress and
network_bidirectional tests also function under Standard
LSF.
cpu
Tests CPU core performance using the Linpack benchmark.
15.2.3 New preupgradesys-lxc.8
The preupgradesys-lxc(8) manpage was not included in the HP XC Version 3.2 distribution.
15.1 Documentation CD Search Option
47
preupgradesys-lxc(8)
NAME
preupgradesys-lxc - Prepares a system for an XC software upgrade
SYNOPSIS
Path: /opt/hptc/lxc-upgrade/sbin/preupgradesys-lxc
DESCRIPTION
Running the preupgradesys-lxc command is one of several commands that are
part of the process to upgrade HP XC System Software on Red Hat Enterprise
Linux to the next release of HP XC System Software on Red Hat Enterprise Linux
The software upgrade process is documented in the HP XC System Software
Installation Guide. This command is never run for any reason other than during a
software upgrade.
The preupgradesys-lxc command prepares your system for a XC software upgrade
by modifying release-specific files, recreating links where required,
and making backup copies of important files. It also removes specific XC
RPMs that do not upgrade properly. Running preupgradesys-lxc is a
required task before beginning a software upgrade.
The preupgradesys-lxc command does not prepare your system for upgrading Red Hat
Enterprise Linux RPMs.
OPTIONS
The preupgradesys-lxc command does not have any options.
FILES
/var/log/preupgradesys-lxc/preupgradesys-lxc.log
Contains command output and results
SEE ALSO
upgradesys-lxc(8)
HP XC System Software Installation Guide
15.2.4 New upgradesys-lxc.8
The upgradesys-lxc(8) manpage was not included in the HP XC Version 3.2 distribution.
upgradesys-lxc(8)
NAME
upgradesys-lxc - For XC software upgrades, this command upgrades and migrates
configuration data to the new release format
SYNOPSIS
Path: /opt/hptc/lxc-upgrade/sbin/upgradesys-lxc
DESCRIPTION
Running the upgradesys-lxc command is one of several commands that are
part of the process to upgrade HP XC System Software on Red Hat Enterprise
Linux to the next release of HP XC System Software on Red Hat Enterprise Linux
The software upgrade process is documented in the HP XC System Software
Installation Guide. This command is never run for any reason other than
during a software upgrade.
48
Documentation
The upgradesys-lxc utility is run immediately after the head node is
upgraded with the new XC release software and any other required
third-party software products. The upgradesys-lxc utility performs the
following tasks to upgrade your system:
o Makes a backup copy of the database from the previous
release.
o Modifies attributes in the database to signify that the sys-
tem has been upgraded.
o Removes RPMs from the previous release that are no longer
supported in the new release.
o Executes internal migration scripts to migrate system con-
figuration data to the new release format.
OPTIONS
The upgradesys-lxc command does not have any options.
FILES
/opt/hptc/lxc-upgrade/etc/gupdate.d
Location of migration scripts
/opt/hptc/etc/sysconfig/upgrade/upgradesys.dbbackup-date_time_stamp
Location of database backup
/var/log/upgradesys-lxc/upgradesys-lxc.log
Contains the results of the RPM upgrade process and lists
customized configuration files
SEE ALSO
preupgradesys-lxc(8)
HP XC System Software Installation Guide
15.2 HP XC Manpages
49
50
Index
B
H
base operating system, 15
hardware preparation tasks, 21
hardware support, 15
HowTo, 18
C
C52xcgraph error, 26
clear_counters command, 45
client node disk partition, 16
C52xcgraph error message, 26
new features, 16
Web site, 8
HP documentation
providing feedback for, 13
HP Scalable Visualization Array (see SVA)
HP-MPI
fork restrictions with kernel version, 35
fork restrictions with OFED, 35
init failed, 35
CP3000 system, 37
CP4000 system, 39
CP6000 system, 41
multiple rail support, 35
SIGUSR2 signal, 46
I
D
iLO, 43
data corruption on ext3 file systems, 31
discover command
new features, 16
documentation, 47
additional publications, 12
changed in this release, 18
compilers, 12
iLO2
hang, 43
InfiniBand
multiple rail support, 35
InfiniBand interconnect
failed to find ports, 26
inode information, 31
installation notes, 23
integrated lights out console management device (see
iLO) (see iLO2)
FlexLM, 11
HowTo, 8
HP XC System Software, 8
Linux, 11
interconnect, 45
LSF, 10
manpages, 12
K
kernel version, 15
master firmware list, 8
Modules, 11
Kickstart installation, 23
MPI, 11
L
MySQL, 11
Linux operating system, 15
LSF
Nagios, 10
pdsh, 10
documentation, 10
reporting errors in, 13
rrdtool, 10
M
SLURM, 10
management processor (see MP)
manpages, 12
software RAID, 12
Supermon, 10
MPI (see HP-MPI)
multiple rail support, 35
Myrinet interconnect, 45
syslog-ng, 10
SystemImager, 10
TotalView, 12
E
ext3 file system, 31
N
NC6170 NIC adapter, 25
NC7170 NIC adapter, 25
new features, 15
F
failed to find InfiniBand ports, 26
feedback
NIC device driver mapping, 25
e-mail address for documentation, 13
firmware version, 19
found no adapter info on IR0N00, 26
O
OFED, 15
fork restrictions with HP-MPI, 35
51
OVP
enhancements, 17
P
partition size limit, 16
patches, 19
Q
qsnet diagnostics database, 46
QsNet interconnect, 45
R
reporting documentation errors
feedback e-mail address for, 13
S
signal
Quadrics QsNet, 46
software RAID
documentation, 12
system administration
notes, 31
system configuration, 25
system management
enhancements, 17
notes, 31
system monitoring, 17
T
temperature graph, 17
U
unified parallel C, 17
UPC, 17
upgrade, 29
upgrade installation, 29
W
Web site
HP XC System Software documentation, 8
52
Index
53
|