Документ взят из кэша поисковой машины. Адрес оригинального документа : http://hpc.msu.ru/?q=node/52
Дата изменения: Sat Apr 9 22:25:06 2016
Дата индексирования: Sat Apr 9 22:25:07 2016
Кодировка: IBM-866
PARCON: CLUSTER-OPTIMIZED MONITORING TOOLS | Moscow University Supercomputing Center

PARCON: CLUSTER-OPTIMIZED MONITORING TOOLS

In order to perform efficient optimization of computationally intensive applications it is necessary to investigate their behavior on a wide range of hardware platforms and configurations. In other words, programs need to be certified for a variety of environments. For such an investigation, hardware platforms, applications and their sources alone are not enough. What is really required is methodology for testing and a set of tools to estimate efficiency of applications as well as its dependency on various hardware platforms. Therefore, software package that would allow performing efficiency investigation should provide a system for monitoring and collecting information about the program being tested, tools for pairing efficiency of the program and nodes to program arguments, as well as tools for subsequent visualization and analysis.

The monitoring system has a number of very specific requirements. It must collect performance information with high degree of accuracy; however, it should not interfere with the running task itself. These requirements were fundamental when we started the development of our monitoring system named AntMon. Its modular architecture allows to add quickly new functionality when required while minimization principle used for its development decreases overheads.

Apart from the monitoring system, tools for matching performance data with start time and argument of the tasks are required. Cleo batch system has sufficient capacity for extension, and that makes it simple to collect information on both running and completed tasks.

However, simply having monitoring data and its matching is not enough. The data collected should be analyzed, results of different executions should be compared, and possible bottlenecks should be identified. ParCon software package allows to perform these tasks. It collects AntMon and Cleo data into a single database and allows to analyze running as well as finished programs. ParCon, Cleo and AntMon as well as flexible testing methodology make comprehensive certification of user applications feasible and effective.




Task CPU usage disbalance example

User login