Review of Moab HPC Suite

I’ve been using Moab HPC suite for more than a year now and have finally got some time to write up a complete in-depth review of all the features. Hopefully this is helpful for those looking at incorporating Moab in your environment.
Moab HPC Suite

Moab Adaptive HPC Suite is a complete solution to manage a HPC environment with complete support for workload management, job scheduling and an adaptive OS switcher for Linux & Windows workloads all rolled into one.

Moab Workload Manager is a highly advanced scheduling and management system designed for clusters, grids, and on-demand/utility computing systems. At a high level, Moab applies site policies and extensive optimizations to orchestrate jobs, services, and other workload across the ideal combination of network, compute, and storage resources. Moab enables true adaptive computing allowing compute resources to be customized to changing needs and failed systems to be automatically fixed or replaced. Moab increases system resource availability, offers extensive cluster diagnostics, delivers powerful QoS/SLA features, and provides rich visualization of cluster performance through advanced statistics, reports, and charts.

API

Moab provides an API with various bindings that support Job Control and Job Management functions. C, Perl and Java bindings are currently provided. Moab also has an extensive CLI that can also support XML that acts as a complete interface into Moab.

Enterprise Interoperability

Moab sports an easy to use, pluggable interface that allows integration with third party schedulers — provided the third party scheduler supports a CLI or API for Job Management functions. Moab can schedule, monitor, and manage jobs on external schedulers as well as provide a single view to Administrators. This allows for existing scheduling and resource management infrastructure to be managed through Moab.

Moab can also interface with multiple Resource Managers and schedule jobs on resources contained therein. While Moab makes the scheduling and allocation decisions, the Resource Managers provide Moab with input on current resource availability, but the Resource Manager itself is in charge of orchestrating the actual job staging and job execution. Moab, by default, supports Torque, SGE and Slurm. Other Resource Managers can be added with some integration code.

Moab also coordinates logical resources (database connections, HTTP connections, networks, license managers etc.). Moab can also pull information about a resource from multiple independent sources and aggregate them to provide a single status.

Multiple instances of Moab can be layered in a hierarchical fashion as well as be setup as peers with the ability to integrate the monitoring of the different peers/instances.

High availability and fail over

The Moab architecture provides complete support for High availability and there is no single point of failure. A fail over or backup instance of Moab can be created that would automatically take control if the primary instance fails. All jobs continue to run under their respective Resource Manager as well.

Internationalization and Localization

Moab does not support any form of Internationalization or localization.

Product Requirements

Moab can run on a minimal and shared hardware. The same applies for Torque and its agents. For ideal performance, however, the Moab server needs to be on a standalone node.

Job Definition

Moab supports creation of ad-hoc and predefined jobs through job templates, but Moab lacks some functionality in defining job templates for later reuse. Job templates support standard job parameters and resources that could be requested for a job and used in job matching. Creation/deletion of job templates can only be done through the configuration or CLI — it is not supported through the GUI. Moab supports the specification of various resource parameters during job submission: nodes, memory, cpu, generic resources, wall time, node features, start time, etc. Moab supports options for passing in runtime parameters to jobs, but they are limited in scope when used with job templates. Job Templates may have been updated with a later version, I didn’t have a chance to play around with the latest version.

Job Management

Moab provides all the basic job management functions such as start, stop, cancel, hold, restart, suspend/resume. All of these functions are available through the GUI, CLI, and API. Moab can capture the execution status of a job and use it for conditional branching in workflow jobs. It can also provide the user with the exit status.

Job Monitoring

Job status and stages can be monitored through the GUI, API and CLI. The job logs can only be viewed through the web interface or by logging into the scheduler node. The list of resources utilized by a job can be monitored through the GUI, API and CLI. Jobs can be viewed based on the following filters: user, groups, compute nodes, OS or architecture, queues and many more.

Platform and OS Support

Moab Server: Linux/Solaris on x86 and 64-bit
Resource Managers: Torque (Linux/Solaris), SGE (Linux/Solaris/Windows)
Desktop GUI: Any OS that has a JVM
Web GUI: Any modern web browser (client) and any OS that Java supports (server)

Prioritization and Queues

Moab has a full set of features for job prioritization. It supports priorities based on credentials, resources, usage, and job attributes. Priorities of jobs can be changed while the job is queued and user priorities can be provided at runtime. The effective priority of a job is the sum of the priorities of the various individual factors. Moab also automatically increases the priorities of jobs based on their queue time to avoid starvation.

Moab supports the concept of queues which act as a conduit for jobs into the system. Queues can be assigned resources and settings and provide functionality for prioritization, throttling, and effective utilization of the available resources. Authorization can be attached to the queues by providing credentials that may have access to the queues.

Reporting Capabilities

Moab has some reporting capabilities within the Java client. There are some basic charts on usage like total processors used, total wall time, average queue time of jobs, and number of jobs executed during a period in time. More advanced data can also be generated as reports that can be exported as PDF. All the data is available and can be exported into an external reporting system. The reporting capabilities with the Java Client are not very robust and lack proper visualization.

Reservation

Moab provides advanced capabilities for reserving resources for any period of time. Moab supports three types of reservations; single one time reservation, multiple recurring reservations, and infinite reservation. Moab guarantees the availability of the reserved resources when a reservation is started. The advanced reservations enable Moab to backfill jobs, provide deadline based scheduling, and QoS support. Recurring jobs can make use of reservations to guarantee availability of resources. Moab also provides support to kick-off jobs when a reservation starts.

Resource Management

Moab has a pluggable architecture for interfacing with external Resource Managers. Using Job templates, certain resource requirements can be matched based on the job properties. New resources can be defined, and depending on the specific Resource Manager, various actions such as adding/removing resources can take place. Resources can be assigned to queues or grouped based on custom attributes that can be attributed to a resource. Moab supports a feature to offline a resource for maintenance. That will prevent Moab from utilizing that resource for scheduling decisions.

Scalability and Performance

Moab is designed to run thousands of jobs per hour across thousands of nodes. Due to unavailability of resources, this could not be verified, the maximum I tested was with 64 nodes. Moab supports various configurations to serve the needs of the environment. Moab can be configured for long running jobs, or high throughput jobs, thereby providing appropriate response times and latency based on the configuration.

Scheduling

Moab has an extensive set of scheduling algorithms. It can schedule batch jobs, parallel jobs, and service workload. Moab’s support for parallel jobs relates directly to the parallel support provided by the Resource Managers. Torque and SGE both support parallel and array jobs.

Time based recurrence jobs are supported in Moab through the use of Triggers and Reservations. Using Reservations guarantees availability of resources and job start time — which might be critical for recurrence jobs. It also has very fine grained Calendar support but a lot of that functionality can only be achieved through the Moab configuration settings. This might limit users on their flexibility in calendar based scheduling. File arrival jobs are not supported directly in Moab, but the functionality can be achieved using triggers.

Moab supports various event based scheduling in the form of triggers. Some of the supported events are: threshold limits in reservations, jobs hold/preempt scenarios, and scheduler events. Moab can schedule workflows (which can be a group of jobs). Moab supports workflow creation through templates, API and custom processes.

Security

Moab defers to the local system for authentication; hence LDAP or other authentication systems can be utilized. Moab provides fine grained authorization. However, it can only be configured within Moab. The complete framework (GUI, CLI and API) all utilize this one unified authentication and authorization mechanism. It supports 5 roles which can be customized to suit your authorization needs.

The communication between the GUI and Moab Scheduler is through SSH tunelling. Interaction between the Torque agents is through SSH and SCP but keys need to be enabled for password less access.

SLA/QoS based Scheduling

SLA and QoS is built into Moab’s core scheduling. SLA and QoS are provided through reservations, job backfilling, and for service workloads by automatically increasing the number of service instances to meet a specified SLA.

User Interface

Moab provides a thick Java Client and a Web Portal. The Web Portal is intended for end users. As such, it only provides job submission, monitoring, and the ability to create reservations. The Java Client has more functionality and is geared towards operations. The supported features are: Job submission, monitoring, resource monitoring, management of reservations, credentials, triggers, setting node features, creation of partitions, and reporting.

Dynamic Service Management

Moab has a feature to manage service workloads. A web farm or application server cluster is an example of a service workload. Moab can manage the size of the service clusters based on performance parameters that can be passed back from the service. It administers this workload through dynamic jobs. Moab, based on performance information and SLA for the services, can automatically grow/shrink the service nodes. This allows Moab to meet SLA needs while maximizing the utilization of the resources.

Node Provisioning / Hybrid environments

Moab can interface with xCat or VMWare for node provisioning. Moab, using this functionality, can manage a hybrid workload environment consisting of Linux and Windows jobs. Moab switches between either platforms based on requirements of the jobs in the queue.

Job Templates

Moab’s Job Template functionality seems to be new and not well integrated with the core job submission. For instance, a job script specified in the template does not get executed when the template is invoked. The main reason is that Moab’s view of the job template is just as a template and not as a complete job definition. The templates can only be created through the Moab configuration or the CLI.

Workflow definitions

The main problem is that there are several ways to define a workflow and not one consistent method that would work as expected. There’s no support for creating a workflow through the GUI, and it also lacks support to pass in a workflow as XML (which could, ideally, be generated from an external system). The core concept of workflows and job dependencies seems to be well defined in the core system, however the ability to create and execute one is missing.

Workflow submission

Submission of workflows using templates is not a good idea as each workflow template will have to be linked with an account or some other parameter. Dynamic workflows can be created using Triggers and System Jobs, but there’s no facility to create and submit such workflows.

GUI

The functionality available in the Java client is not full featured. Certain job management functions don’t work from the GUI. Only a limited subset of Reservations and Triggers functionality is supported from the GUI. Job Templates and workflows cannot be created/deleted from the GUI.

, , , , ,

3 Responses to Review of Moab HPC Suite

    Error thrown

    Call to undefined function ereg()