Документ взят из кэша поисковой машины. Адрес оригинального документа : http://angel.cs.msu.su/~oxana/image_processing/papers/2009-i09-104-final.pdf
Дата изменения: Tue Jul 7 16:56:38 2009
Дата индексирования: Sat Apr 9 23:50:52 2016
Кодировка:
IEEE International Workshop on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications 21-23 September 2009, Rende (Cosenza), Italy

Web and Grid Services for Training in Earth Observation
1

Marian Neagul, Silviu Panica, Dana Petcu, Daniela Zaharie1, Dorian Gorgan

2

West University of Timisoara, Bd.V.Parvan 4, RO-300223 Timisoara, gisheo-uvt@lists.info.uvt.ro, gisheo.info.uvt.ro 2 Technical University of Cluj-Napoca, G.Baritiu 28, RO-400027 Cluj Napoca, Dorian.Gorgan@cs.utcluj.ro tailored solutions were proposed for Web and Grid services based architectural design, data management, image processing service deployment, workflow-based service composition, and user interaction. A particular attention was given to the basic services for image processing that are reusing free image processing tools like GDAL, GRASS, GIMP, OpenGIS, WMS and ESA products. Special features of the platform are the connection with the GENESI-DR catalog [3] and the reuse of the middleware components developed in the frame of SEE-Grid-SCI [4] in order to implement complex research scenarios for Earth observation. The platform design concepts of GiSHEO (On demand Grid services for high education and training in Earth observation) were shortly presented in [5] and the details about the e-learning component can be found in [7]. The user interfaces and workflow related solutions were presented in [8]. In this paper we describe the solutions for data processing and storage and a simple case study for supporting the activities in teaching in archaeology. We underline also the platform's specificity through a simple description of the requirements coming from the elearning component. The paper is organized as follows. The next section presents an overview of the platform architecture. Section 3 discusses the solutions for data management. Section 4 refers to the e-learning component of the platform. Section 5 presents the case study. II. REQUIREMENTS
AND

Abstract - Remote sensing instruments are producing daily huge quantities of data about Earth surface. The data processing and storing can be done nowadays only using wide-area distributed systems. The management of data distribution and fast processing is still an issue despite the increased availability of computing, storage and access facilities to supercomputing and data centers. The Grid architectures are responding partially to the needs of remote sensing community and this fact has been recognized in the latest years. While intensive research activities in the direction of building Grid-based platforms for remote sensing processing have been registered recently, the training activities are lagging behind. We discuss in this paper the special requirements of a Grid-based platform for training and high education in Earth observation and the technical solutions that have been proposed to overcome the problems of distributed data management. Moreover, a case study for training in archaeology using remote sensing data is described and discussed. Keywords ­ Distributed data management, Web and Grid services, Earth observation, Image processing

I.

I

NTRODUCTION

The research and commercial applications involving remote sensed data needs currently huge computational power and storage capacities. Grid computing platforms, fast evolved in the last decade, promise to make feasible the creation of environments able to handle hundreds of distributed databases, heterogeneous computing resources, and simultaneous users for the applications involving remote sensing data. Grid-based experimental platforms were developed already at the beginning of this century with the strong support from NASA and ESA. An overview of the technological challenges and user requirements in remote sensed image processing, as well as the solutions provided by the Grid-based platforms built in the last decade was provided in the reports of DEGREE [1]. Moreover, production platforms like G-POD [2] have proved the usefulness of the Grid concept for real applications like flood area detection. Unfortunately, there is clear gap between the request for specialists in remote sensing able to master the latest technologies and the labor market offer, mainly due to the fact that training activities in the field are not following the developing activities. In this context, we have developed recently a special platform addressing the issue of specialized services for training and high education in Earth observation. Special

A

RCHITECTURE

The aim of GiSHEO project is to set up and develop a reliable resource for knowledge dissemination, high education and training in Earth observation. In order to answer to the on-demand high computing and high throughput requirements we are using the latest Web and Grid technologies. The Grid resources are usually employed to respond to the high requirements for computational-intensive or dataintensive tasks. Taking into account the educational aim of the platform, we use Grid resources in near-real time applications for short-time data-intensive tasks. The data sets that are used for each application are rather big (at least of several tens of GBs). The tasks are simple image processing operations. In order to obtain a response in near-real time, a scheme of instantiating a service where the data are located is required. In this context, Grid


services are quite convenient solution ­ we can consider that a fabric service is available at the server of the platform sustaining the user interface and this service instantiates the processing service where the pointed data reside. Figure 1 presents the conceptual view of the GiSHEO's platform architecture. WMS refers to the well-known Web Mapping Service that ensures the access to the distributed database.

GTD (Grid Task Dispatcher Web Service) is a service enabling an interface for easy task deployment on Grid resources. It is based on Globus Toolkit 4 (GRAM4-WS integration) and Condor (classad descriptor and job handling). The security related component, GiS, is based on username/password pairs and X.509 digital certificates. It supports certificate extended attributes using VOMS (Virtual Organization Management Service) as well as proxy delegation using myproxy service. Moreover it is fully integrated with Globus Toolkit 4 GSI mechanism. EUGridPMA signed certificate are required to access the full facilities of the platform. The Grid platform is based on four clusters that are geographically distributed at project partners. Due to the low security restriction between the partner institutions, data distribution between the clusters is done using Apache's Hadoop Distributed File System. The data transfer from and to external databases is done using GridFTP ­ this is for example the case of the connection with GENESI-DR database. III. D
ISTRIBUTED

D

ATA

MAN

AGEMENT

Fig. 1. GiSHEO's platform architectural design.

WAS is the acronym for the GiSHEO' specific Web Application Service that is invocated by user interface at run-time. It allows workflows description for user specific scenarios of data processing and storage combining existing services or deploying new ones. The Workflow Service Composition (WSC) and Workflow Manager (WfM) are the engines behind WAS and are connected with the tasks manager (GTD-WS). Each simple image processing operation is viewed as a task. Several tasks can be linked together to form a workflow in an order that is decided at client side (either the teacher, or the student interface). The workflow engine is based on an EventCondition-Action approach that offers dynamism and adaptability to changes in workflow tasks and resourcestates. In order to respond to the special requirements of the platform a rule-based language has been also developed. More details about WAS, the workflow engine, and the rule-based language can be found in [8]. The platform has distributed data repositories. It uses PostGIS for storing raster extent information (postgis polygons) and in some cases vector data. Moreover, the data search is based on PostGIS spatial operators. GDIS is a data index service build as Web service providing information about the available data to its clients. It intermediates access to data repositories, stores the processing results, ensures role based access control to the data, retrieves data from various information sources, queries external data sources, and has a simple interface that is usable by various data consumers. More details about GDIS can be found in the next section.

One of the fundamental components of the GiSHEO platform is the one responsible for storing and querying the data. Two types of data are involved: databases containing the remote sensing data and the processing applications. GiSHEO's Data Indexing and Storage Service (GDIS) provides features for data storage, indexing data using a specialized RDBMS, finding data by various conditions, querying external services, and for keeping track of temporary data generated by other components. GDIS is available to other components or external parties using a special Grid service. This service is also responsible for enforcing data access rules based on specific Grid credentials (e.g. VO attributes). The storage layer of GDIS is responsible for storing the data by using available storage back-ends such as local disk file systems (e.g. ext3), local cluster storage (e.g. GFS or GPFS), or distributed file systems (e.g. HDFS, KosmosFS, or GlusterFS). An important requirement for the storage component is that of a unique interface exposing the data distributed across various storage domains (local or remote). This requirement fulfillment was achieved by implementing a front-end GridFTP service capable of interacting with the storage domains on behalf of the clients and in a uniform way. The GridFTP service also enforces the security restrictions provided by other specialized services and related with data access. The GridFTP service has native access to the Hadoop Distributed File System offering access to data stored inside the internal HDFS file systems and providing the required access control facilities. The GridFTP service provides also special features for manipulating the data repository through basic methods for managing data like upload, deletion, retrieval, etc.


The data indexing is performed by PostGIS, an extension for the PostgreSQL RDBMS engine. The PostGIS layer indexes the metadata and location of the geographical data available in the storage layer. The metadata usually represents the extent or bounding box and the geographical projection of the data (representing the exact geo-location). The PostGIS layer provides also advanced geographical operations (backed by a GiST index) which allows searching the data by using various criteria including interaction with raw shapes, interaction with shapes representing geo-political data (like country, city, road, etc.) or any other type of geographical data which can be represented in PostGIS. The geo-political data is typically provided by data imported from the Open Street Map (OSM). Based on the advanced data indexing capabilities of the PostGIS layer, GiSHEO's platform provides an advanced and highly flexible interface for searching in project's repositories. The search interface is built around a custom query language, named LLQL (Lisp Like Query Language), designed to provide fine grained access to the data in the repository and to query external services like TerraServer or GENESI-DR. The syntax of the query language is inspired from the syntax of the LISP language and partially by LDAP filters. The language allows querying the repository both for raster images like in
(select '(url, owner) (and (or (ogc: interacts (osm: country "Italy")) (ogc: interacts (osm: country "Austria")) ) (gdis:type "RASTER/AERIAL") ) )

similar to the filters used by mainstream search engines: vendor:NASA type:DEM place:Timisoara,Timis,Romania Like in LLQL case, this language is translated into PostgreSQL spatial query's. Another set of tasks handled by GDIS are represented by the interaction with external services. In this case GDIS represents a thin middleware layer interacting with external repositories and exposes only one unique interface (similar and possibly integrated with the internal repositories). One example of external back-ends supported by GDIS is represented by the GENESI-DR catalog. IV.
EGLE

­ GISHEO'

S ELEARNING

ENV

IRONMENT

and also for various aggregated data or object properties, like in
; Find cities in Romania ; filter by bbox (select-place '(name) (and (ogc:interacts (ogc:bbox 16.69 43.97 24.8 48.48)) (osm:country "Romania") (osm:type "city") ) )

The PostGIS related queries are translated directly to PostGreSQL queries, while the external lookups are resolved prior to submitting the query's to PostGIS. Besides the developer oriented LLQL filters GDIS also provides a simple, user oriented query language usable on the public search interfaces. This simple query filters are

A particular component of WAS is eGLE (GiSHEO eLearning Environment) that aims to offer to the teachers the ability to easily create lessons for different topics. It is based on the gProcess platform [7], which represents the intermediate level between the eLearning Oriented Level and the Grid Infrastructure by providing a set of services and tools supporting the flexible description, instantiation, scheduling and execution of the workflows (see Fig.2). eGLE provides mechanisms for knowledge presentation and assessment based on Grid processing capabilities, both for teachers and students. The platform implements user interaction tools as well as other components required for the development, execution and management of the teaching materials. Using these tools the teacher has the ability to: search the available sources for existing learning objects and material that could be added to his lesson; create new didactic materials through the implementation and execution of new algorithms using gProcess; create visual containers for information display and format their appearance; manage the acquired learning components and combine them using visual elements in order to create the lesson; specify the desired interactivity level for each of the lesson components; Through eGLE interface and tools the teacher actually uses the Grid capabilities in a transparent manner. When searching for existing teaching materials (already processed satellite images, complex algorithms expressed through workflows etc.) the user is automatically connected to available distributed databases and remote repositories (see previous section), without his explicit intervention, the results being displayed in a unified manner (as they are provided by the same data source). Similarly, the teacher may use the Grid based execution to process satellite images, to execute specific algorithms through workflow descriptions or to visualize previously


Temperature computing

gProcess
Editor Scheduler Manager Viewer

This is the content of the lesson. It exemplifies interpolation related to temperature values and resolution.

Executor

Output

Workflow

Input Input Input

Fig. 2. Grid processing based lesson execution.

created lessons. The students have only the ability to execute the lessons according to the constraints established by the teacher. Depending on the interaction level specified, they could also be allowed to describe and experiment new workflows or choose different input data (e.g. satellite images, discrete values) for existing ones. The eGLE related database includes conceptual and particular workflow based descriptions, teaching materials and lesson resources, satellite and spatial data. A detailed lesson-architecture description can be found in [6]. V. CASE S
TUDY

A simple example of platform usage, as well as an innovative one, is about assisting the students in history to identify archaeological sites through visual inspection and interpretation of aerial images. The aim of the lesson is either a detailed analysis of known archeological sites in order to identify anthropological characteristics or to indentify new archaeological sites through intensive study of a large set of images. The specific requirements are related to: image enhancement, assistance in the identification of morphological elements like circular, rectangular or linear shapes, assistance in the identification of morphological characteristics like size of pit houses or distribution and distances between them. The image enhancement process depends on the aim of the analysis. We consider for example the case of searching for circular shape indicating the extension of areas corresponding to human activities (house yard). In this case the image enhancement should refer to gray level conversion and histogram equalization. In order to identify the elements of interest further transformation should be applied, e.g. quantization and thresholding. Figure 3 presents an example of the user interface that allows the teacher or student to select an area of interest, detect the available data in that area and to request the

application of the (linear) workflow consisting of the above four simple transformation to all available data corresponding to the area of interest. The image result is in this case overlapped on the initial images (Figure 3.a). The second image (Figure 3.b) captures the selection of a new area of interest and the debugging window associated with the user interface that display among other information the number of requests that are made to the platform. If the number of data to be treated is of ten orders, the student or the teacher cannot expect to receive an answer to his or her request in few seconds using a single computing node (around 4 minutes for the area depicted in Figure 3.b). In order to obtain a fast response the pairs are scheduled by GTS-WS on different computing nodes of the platform. Image enhancement are needed also for identifying linear shapes corresponding to wave-like fortifications; in this case the appropriate pipeline of transformations could include: gray level conversion, emboss (convolution operation), histogram equalization and layers combination. Such transformations just enhance the image in order to help the user to identify the shapes of interest by visual inspection. In order to provide a semi-automatic tool for linear shapes identification more sophisticated operations should be used. One of these operations is the Hough transform which is applied to the binary image which locates the edges in the image. A typical flow of operations allowing the identification of linear shapes in an image is: (i) gray level conversion; (ii) histogram equalization; (iii) edge detection (e,g. by using the Canny filter); (iv) lines identification by Hough transform. In Fig. 4 (middle) is presented the result obtained by applying this flow of operations on the image in Fig. 4 (top) which corresponds to a land on which one can identify the marks of a wave-like roman fortification. The main problem in identifying the marks of this fortification is the fact that it is somewhat obscured by the marks of current land division. Therefore there are a lot of


lines detected by the Hough transform. However the lines corresponding to the current land division are almost parallel while the fortification has a rather different direction. Thus by eliminating all lines having close enough slopes one can identify the ancient marks. The result obtained by keeping only the lines for which the difference between their slopes is at least 0.1 is illustrated in Fig. 4 (bottom). However besides the target line another one has been detected.

A possible way to mask the obvious shapes and to enhance the other ones is to use the singular value decomposition of the image and to ignore the components corresponding to the highest singular value(s) (which contain the most important features in the image). Substituting histogram equalization with the transformation which replaces the image with the sum of the components corresponding to the singular values of ranks between 2 and 200 we obtained the results illustrated in Figure 5: the image obtained by retaining the specified components (top), the lines detected by Hough transform (middle) and the line detected after post-processing (bottom) which in fact corresponds to the real mark of a roman fortification. All the transformations presented above match a sequential pipeline template. Other more complex transformations related to clustering (for statistics related issues in archaeology) are not only data-intensive but also computing-intensive requiring further split: different tasks of the workflow on different computational nodes (distributed tasks or distributed pipeline) or parts of the input data on different computational nodes (parallel tasks). Templates different from the sequential pipeline should be used in these cases. VI. CONCLU
SIONS

GiSHEO's platform promises to deliver real-time services for satellite data processing for training activities for Earth observation. The technical solutions highlighted in this paper are combining Grid technologies with newer solutions coming from other fields of distributed systems, e.g. data management. Intensive tests of the platform's reliability are expected to be performed in the near future. A
CKNOWLEDGMENT

This research is supported by ESA PECS Contract no. 98061 GiSHEO - On Demand Grid Services for High Education and Training in Earth Observation. REFER
[1] [2] ENCES DEGREE Consortium. "Dissemination and Exploitation of Grids in Earth Science". http://www.eu-degree.eu L. Fusco, R. Cossu, and C. Retscher. "Open Grid Services for Envisat and Earth observation applications", in High Performance Computing in Remote Sensing, A. Plaza and C. Chang (Eds.), Chapman & Hall, Taylor & Francis Group, 2008, pp. 237­280. GENESI-DR Consortium. "Ground European Network for Earth Science Interoperations ­ Digital Repositories". http://genesi-dr.eu SEE-Grid-SCI Consortium, "SEE-GRID eInfrastructure for Regional eScience". http://www.see-grid-sci.eu S. Panica, M. Neagul, D. Petcu, T. Stefanut, and D. Gorgan, "Desiging a Grid-based training platform for Earth observation", in Procs. SYNASC'08, IEEE Computer Press, 2009, pp. 394-397. D. Gorgan, T. Stefanut, and V. Bacu, "Grid based training environment for Earth observation", In Advances in Grid and Pervasive Computing, N. Abdenaher and D.Petcu (eds.), Springer, LNCS 5529, 2009, pp. 98-109. M.E. Frincu, S. Panica, M. Neagul and D. Petcu, "Gisheo: On demand Grid service based platform for EO data processing", in, Procs.HiperGrid09,Politehnica Press,Bucharest,2009,pp. 415­422. A. Radu, V. Bacu, and D. Gorgan, "Diagrammatic description of satellite image processing workflow". in Procs. SYNASC'08, IEEE Computer Press, 2009, pp. 341-348.

[3] [4] [5] [6]

Fig. 3. Helping archaeologist to detect shapes: (a) result of applying the transformation on a particular zone (b) moving to another region and the requests issued by the user interface.

[7] [8]


Fig 4. Searching for land marks of linear fortifications: original image (top), lines detected by applying gray scale conversion, histogram equalization, Canny filter and Hough transform (middle), lines selected in the post-processing step (bottom).

Fig. 5. Using singular value decomposition to enhance the ancient marks: image transformed by retaining the components corresponding to singular values with ranks between 2 and 200 (top), lines detected by applying Canny filter and Hough transform (middle), the line obtained by post-processing (bottom).