Skip to main content

Notice: This Wiki is now read only and edits are no longer possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

Jump to: navigation, search

PTP/designs/scalability

< PTP‎ | designs

Overview

PTP is designed to interact with a variety of computer systems in order to aid the development of parallel and scientific applications. Many of these systems are extremely large, comprising many tens or hundreds of thousands of cores. In addition, the applications that run on these systems can also be very large, not just in terms of the number of lines of code, but also in their memory usage and the number of computational tasks required to deliver adequate performance.

Providing a development environment that allows developers to maximize their productivity in these environments is a challenging task. Not only do the applications need to scale, but the tools that are required to monitor, debug, and analyze these application must also scale.

The following document outlines the design elements of PTP that are key to ensuring that it will continue provide a useful and productive development platform, regardless of the system or application scaling requirements.

Further Reading

Key designs that are referenced in this document are as follows:

Schemes used by the framework are as follows:

  • RM Schema - Schema for the target system configuration files
  • LML Schema - Schema for the LML protocol

Resource Manager Framework

The primary mechanism for interacting with a target system is via the resource manager framework. This framework allows developers to launch and monitor applications on a variety of different computer systems, and consequently must support interaction via a range of different job schedulers and runtime system.

Previous implementations of the RM framework have suffered from two primary scaling problems. The first of these was a monitoring system that displayed a representation of the entire computer system layout down to the node level, as well as individual tasks for each job run. While this was adequate when systems and jobs comprised only a few thousand elements, the current generation of machines has significantly exceeded this threshold. The second issue was the non-scalability of the proxy communication protocol used to obtain monitoring information from the target system. While attempts were made to add scaling features to the protocol (including a variety of compression techniques), these added significant complexity, and ultimately reduced the overall reliability of the system.

In order to overcome these issues, a new framework is required that provides features that specifically address the scalability requirements of the platform. The major architectural improvement provided by this framework is to separate the control and monitoring aspects of interaction with target system. This separation ensures that the scalability sensitive monitoring functions do no impact on the less sensitive control functions. Detailed key design elements of this framework are detailed below.

Control Framework

The goals of the new control framework are to eliminate the need for proxy agents to be run on the target system, simplify the process of adding support for new resource managers, and ensure that the framework is extensible enough to cater for virtually any type of system. In order to achieve these goals, we have completely redesigned and reimplemented the old framework and provided the ability to add target system support purely through the use of XML configuration files.

The RM control framework provides a set of functions that allow the user to control application launch on the target system without the need for complex proxy agents or protocols. For simple interactive runs, the control framework will be the equivalent of issuing a shell command that starts the application. For more complex batch systems, the control framework is capable of generating the appropriate batch script and submitting this to the job scheduler.

Design Description

The control framework employs a model-view-controller design pattern that utilizes the Java Architecture for XML Binding (JAXB) to enable support for a target system to be completely specified via an XML configuration file. This includes the commands that are required for job submission, definition of the scripts (if required), and a description of the user interface that will be presented to the user in order to specify job attributes and parameters.

The advantage of employing an XML based configuration approach is simplicity and extensibility. In prior designs, it would often take 6 months or more to add support for a new target system. This is because each implementation needed to be coded from scratch, and the complexity of some resource managers (such as LoadLeveler) required considerable effort in order to be fully supported. The same resource manager took 1-2 weeks to implement using the XML approach. The XML schema is also powerful enough to support virtually any type of resource manager, but can be relatively easily extended if additional functionality is required.

The following diagram shows the overall design of the control framework.

Control arch.png

The central component of the control framework is the JAXB model. This is a representation of the XML configuration for a target system using Java objects. Each XML entity type is represented by a Java class that is generated from the RM Schema document. This model is then used by drive the view creation and controller components of the framework.

The model is created when the user creates a new launch configuration and selects a target system configuration (or opens or runs an existing launch configuration). At this time, the XML configuration file is validated against the schema, then unmarshalled to generate the model. The model is always associated with a LaunchController object, since certain commands (such as the start-up-command) may need to be run prior to using the model. Since the model is a static object, the framework also provides an attribute map for storing dynamic data using the RMValueMap class. This is a map containing attributes that are defined by the XML configuration, and that are use to store the run-time data associated with the model.

The Parallel Application launch configuration Resources tab uses the model to generate the SWT widgets and layout information necessary to populate the tab with controls. The controls are also "wired up" so that events generated by one control can be used to modify the state of other controls on the tab. Widgets on the Resources tab generally store their values directly into the attribute map.

When the user clicks on the Run button, the LaunchController#submitJob() method is invoked to manage communication with the target system. The LaunchController uses the model to first generate any scripts necessary for the job submission, then these scripts can be staged on the target system if desired. Once the scripts have been generated, the controller runs the pre-launch-command if required. Next, the submission command is generated by appending each argument in turn, while expanding references to attributes from the map. The command is then sent to the target system via the Remote Services API along with any required environment variables. If the model specifies that a parser is to be used, this is configured and attached to the stdout or stderr (or both) from the command.

Job status is monitored asynchronously using the ICommandJobStatus interface. This interface is implemented by the CommandJobStatus class which will check for state changes in the running job and for the presence of any output files generated by the job. This is used by the controller to notify the launch system that the job has started running, or has completed, etc.

When the LaunchController is no longer required, the framework will call the LaunchController#stop() method. This will cancel any running threads and dispose of any resources associated with the job launch.

The XML configuration file scheme is described in the RM Schema document. A description of the XML configuration file format is contained in the Resource Manager XML Guide section of the PTP Developer Guide.

Plugin Layout

The control framework is defined in five plugins, with sample configuration files contained in a sixth plugin. These are:

org.eclipse.ptp.rm.jaxb.configs 
Sample configuration files
org.eclipse.ptp.rm.jaxb.control.core 
Implementation of the job submission system
org.eclipse.ptp.rm.jaxb.control.ui 
Implementation of the dynamic components of the launch configuration Resources tab
org.eclipse.ptp.rm.jaxb.core 
Implementation of the JAXB classes
org.eclipse.ptp.rm.jaxb.ui 
Implementation of the configuration import wizard, as well as UI helper classes
org.eclipse.ptp.rm.launch 
Implementation of an Eclipse launch delegate and support for dynamic tab generation

Each of these plugins will be described in more detail below.

org.eclipse.ptp.rm.jaxb.configs

This plugin contains the sample configuration files to support a range of generic system types. These include:

  • IBM LoadLeveler
  • IBM Parallel Environment
  • Grid Engine
  • MPICH2
  • Open MPI
  • PBS
  • Generic remote launch
  • SLURM
  • TORQUE

User's are presented with a choice of these target system configurations on the Resource tab of the Parallel Application launch configuration (more detail below.)

org.eclipse.ptp.rm.jaxb.control.core

This plugin contains the implementation of control elements of the XML schema, which are used for resource discovery and job launch. The plugin provides the following main classes:

LaunchController 
Component of the JAXB framework responsible for handling job submission, termination, suspension and resumption. Also provides on-demand job status checking
LaunchControllerManager 
Class for managing LaunchControllers. Used to obtain a LaunchController given the target configuration and remote connection information.
JobStatusMap 
Extends the java.lang.Thread class to provide asynchronous handling of notification of job status changes.
ManagedFilesJob 
Extends the org.eclipse.core.runtime.jobs.Job class to provide a staged file service that allows scripts (and other files) to be staged to the target machine, then automatically cleaned up when the submission is completed.
CommandJob
Extends the org.eclipse.core.runtime.jobs.Job class to provide the primary mechanism for executing external commands on the target system. Handles both batch and interactive job submission, as well as debug job submission. Also prepares the remote environment based on the attributes available to the launch, and attaches regular expression parsers to the stdout and stderr of the remote command.
CommandJobStatus 
Handles changes in the status of CommandJob execution, waiting for receipt of job ID's, and waiting for output files to be generated.
RMVariableMap 
Maintains the control "environment" containing the attributes that are specified via the XML configuration.

Job launch and control is initiated by obtaining a LaunchController object using LaunchControllerManager#getLaunchController(). Job submission is performed using the LaunchController#submitJob() method, and job control via the LaunchController#control() method. The LaunchController#start() method must be call prior to using any other methods on LaunchController. The LaunchController#stop() method should be called once the LaunchController is no longer required in order to stop any threads and release resources.

org.eclipse.ptp.rm.jaxb.control.ui

This plugin contains the implementation of UI elements of the XML schema, which are used for specifying the resources required for job launch. The plugin classes are accessed by the dynamic launch configuration tab implementations in the org.eclipse.ptp.rm.launch plugin.

The plugin provides the following main classes:

ControlStateListener 
Implements the control state rules when a button event is associated with a widget.
ValueUpdateHandler 
Class for managing widget updates. Whenever a widget commits a change, this class will refresh all widget values from the underlying attributes an signals an update to the parent tab.
AbstractUpdateModel 
Base class for controlling the data associated with a widget. Implements the IUpdatedModel interface.
LaunchTabBuilder 
Class responsible for constructing the Resources tab contents, as well as building and registering the related model objects and listeners.
UpdateModelFactory 
Utility class for creating update models for control widgets, cell editors and viewers.
LCVariableMap 
Map containing the attributes used by the widgets on the Resources tab. Provides a backing map that allows attributes to be swapped in and out when a particular tab is selected. Valid attributes from this map are copied to the RMVariableMap prior to executing the launch command.

org.eclipse.ptp.rm.jaxb.core

This plugin contains the JAXB classes corresponding to the types defined in the XML configuration schema. These classes are used to construct an internal model that represents a specific XML configuration that has been selected by the user. This model can then be examined by the control and UI components of the framework.

org.eclipse.ptp.rm.jaxb.ui

This plugin provides one UI component, the configuration import wizard. This wizard allows a user to import one of the configurations from the org.eclipse.ptp.rm.jaxb.configs plugin into their workspace so that it can be modified to suit a specific target system. An structured XML editor is included in the PTP distribution so that the user can easily modify these configuration files.

In addition, this plugin provides some helper classes that are used to aid in the construction of UI elements. The main class provided is WidgetBuilderUtils which provides methods to create all the widgets available via the XML specification.

org.eclipse.ptp.rm.launch

Implements an Eclipse launch delegate and supplies the tabs that are displayed when configuring the launch. Defines the main Resources tab, which uses the JAXB model to generate the tab contents dynamically, and stores launch configuration values into the framework attribute map.

The plugin provides the following main classes:

ParallelLaunchConfigurationDelegate 
The main launch delegate class. This class is specified in the org.eclipse.debug.core.launchDelegates extension.
AbstractParallelLaunchConfigurationDelegate 
Abstract base class for the launch delegate.
ResourcesTab 
Defines the fixed components of the Resources tab, including target system configuration and remote connection.
JAXBContollerLaunchConfigurationTab 
Main class that generates the entire dynamic area of the Resource tab. Corresponds to the launch-tab element in the XML configuration.
JAXBDynamicLaunchConfigurationTab 
Class that generates sub tabs in the Resources tab dynamic area. This corresponds to the dynamic element of the XML configuration.
JAXBImportedScriptLaunchConfigurationTab 
Class that generates a tab for importing scripts. This corresponds to the import element of the XML configuration.

Monitoring Framework

The monitoring framework is key to achieving scalability in PTP, since on very large systems, or for very large application sizes, it must be able to transfer and display large amounts of data in a way that does not overwhelm Eclipse or the user. There are many issues to overcome in order to achieve this goal, and we have borrowed a number of techniques from the LLView system, which already deals with many scaling issues, and implemented these within PTP. The following sections will present this design in more detail.

Design Description

The monitoring framework is divided into two components, a client UI, and a backend data collection engine. Communication between the client and backend is via an XML-based protocol called LML. The backend data collection engine is designed to be modular, so that support for new systems can be added easily. A good description of how to configure and extend the LML system is available here.

The overall architecture of the monitoring framework is show in the diagram below.

Monitor arch.png

The central component of the monitoring framework is the LML model, which is a Java representation of the LML XML specification. The framework utilizes JAXB for this in much the same manner as the control framework described previously. Model content is provided by two sources: the MonitorControl class, and the persisted state.

The IMonitorControl controller interface manages all communication with the target system, and is implemented by the MonitorControl. The interface provides three main methods: start(), stop(), and refresh(). Invoking the IMonitorControl#start() method initializes the controller, including transferring the DA Driver code, if required, and starting a thread that periodically launches the DA Driver on the target system. Any persisted state information will also be loaded at this time. Each time the DA Driver is started, an LML formatted request is first sent to the command on stdin, then an LML reply read from stdout. The LML reply is unmarshalled into the JAXB model at this point. The IMonitorControl#refresh() can be called at any time, and will cause the controller thread to start the DA Driver immediately. Finally, the IMonitorControl#stop() is used to save the current state in persistent storage, terminate the thread, and clean up any resources.

Consideration of the load the backend data collection engine places on the target system must also be considered. Currently, the backend is restricted to running every 60 seconds to minimize this load. This appears to be adequate for more situations, however, a large number of users could still place a significant burden on the system, particularly if data collection consumes non-trivial resources. One option to overcome this is to centralize the data collection to a single server process, then make this data available via an HTTP service. While the architecture for this is in place, the implementation of such a service is still ongoing. Additional information on the backend data collection engine can be found in the LML DA Driver document.

The design of the monitoring user interface needs to pay particular attention to scalability issues as it must avoid excessive memory consumption and processing overhead in the JVM, as well as overloading the user with too much detailed information. These issues are dealt with by employing a hierarchical data representation that reduces the amount of information that is presented to the user at any time. Increasing levels of detail can be obtain by user interaction, "drilling into" the display, but otherwise only a moderate level of detail is presented at each level of the hierarchy. The internal model that is used to represent the monitored system is correspondingly structured, so that only the minimum amount of information is transmitted and stored at any particular time.

The following diagram shows the layout of the monitoring framework user interface.

Monitor view.png

The monitoring framework uses the System Monitoring perspective for arranging views in the Eclipse workbench. The Monitors View is used for managing monitors, including creating, deleting, starting, stopping, and refreshing monitor instances. There is one IMonitorControl instance for each monitor in the Monitor view. The Active Jobs and Inactive Jobs Views are used to display running and suspended jobs respectively. Colors are assigned to each active job, which are displayed in the left column of the Active Jobs View. The System Monitoring View is used to display system and node status information from the target machine. The configuration of this display is determined by the type of resource manager on the remote system. Hierarchical data is displayed by nested rectangles, and user's are able to drill into the display by double clicking on the relevant section. The Active Jobs and System Monitoring Views are linked together so that the location that jobs are running on the system is visible. If the user clicks on a job in one of the views, then the job will also be highlighted in the other corresponding view. For example, if a user clicks on a job in the Active Jobs View, then the location of the job will be displayed in the System Monitoring View. The Messages View is used to display the contents of the target system /etc/motd if available. This is also used to show more detailed job information when a job is selected.

Plugin Layout

The monitoring framework is defined in five plugins. These are:

org.eclipse.ptp.rm.lml.core 
Contains the LML model classes and events and listeners for notification of model changes.
org.eclipse.ptp.rm.lml.da.server 
Implements a remote server extension that is used to launch the DA Driver. Contains the DA Driver code that is copied to the target system.
org.eclipse.ptp.rm.lml.monitor.core 
Implementation of the controller (IMonitorControl) for the monitoring framework
org.eclipse.ptp.rm.lml.monitor.ui 
Provides the Monitor View and the handlers for actions on the view.
org.eclipse.ptp.rm.lml.ui 
Provides the Jobs Views and the System Monitoring View.

Each of these plugins will be described in more detail below.

org.eclipse.ptp.rm.lml.core

This plugin contains the JAXB LML model classes and a number of classes for managing the model instance. The following main classes are provided:

LMLManager 
A singleton that supplies the entry point for controlling access to the model. The LMLManager#openLgui() method is used to initialize the model and the LMLManager#closeLgui() method to dispose of the model once it is no longer required. Since multiple models can exist simultaneously, the LMLManager#selectLgui() method is used to specify which model is currently displayed in the user interface. Finally, the LMLManager#update() method is called to update a model with new data.
JobStatusData 
Bridge class that manages data about a particular job's status. This is used to avoid any dependencies on the control framework classes.
ILguiItem 
Main interface to a particular model. Provides methods for accessing model elements, as well as registering listeners on the model, and updating the model with new data.

org.eclipse.ptp.rm.lml.da.server

This plugin is used to provide a org.eclipse.ptp.remote.core.remoteServer extension. This extension uses the Remote Tools framework to manage the lifecycle of the DA Driver server process. The DA Driver is stored inside this plugin as a tar file that is unpacked and launched when the extension is invoked. The LMLDAServer class is required by the extension point to extend the AbstractRemoteServerRunner abstract base class. The extension itself provide details on the payload and command used to launch the server. See the Remote Services API for more information on this extension point.

org.eclipse.ptp.rm.lml.monitor.core

This plugin provides the main controller interface IMonitorControl along with a concrete implementation. The plugin also provides a singleton MonitorControlManager which is the main entry point for obtaining an managing IMonitorControl instances.

org.eclipse.ptp.rm.lml.monitor.ui

This plugin provides the extension for the Monitor View, the MonitorView class, a dialog that is displayed when adding a new monitor, and handlers for each of the button actions on the view.

org.eclipse.ptp.rm.lml.ui

This plugin provides extensions for the remaining views: the Active/Inactive Jobs Views, the System Monitoring View, and the Messages View. These views are implemented by the TableView, NodesView, and InfoView classes respectively. The remaining classes are primarily for implementing the view functionality.

Back to the top