####################### ####################### #### IPG Pour v1.0 #### ####################### ####################### ############ # Contents # ############ NOSA.txt Pour license README This file PourService.wsdd Pour deployment descriptor build.properties Ant build properties build.xml Ant build file client Pour client examples doc/grid04.pdf Pour conference paper doc/api Pour javadocs (skeleton only) drift Pour spout example java/lib Pour required libraries java/gov Pour java source ############### # Preparation # ############### 1. Java 1.4 or above Download from http://java.sun.com. Tested with Java 1.4.2. 2. JDOM Download from http://www.jdom.org. Copy "jaxen-core.jar", "jaxen-jdom.jar", "jdom.jar", and "saxpath.jar" to "java/lib" directory. Tested with JDOM 1.0. 3. eXist XML Database Although Pour works with any XML:DB compliant database, eXist has been found to have significantly superior performance for this application, thus is the only one recommended for use with Pour. Download from http://www.exist-db.org. Copy "exist.jar", "commons-pool-*.jar", "xmldb.jar", "xmlrpc-*.jar", "jakarta-regexp-*.jar", and "commans-jxpath-*.jar" to "java/lib" directory. Tested with eXist 1.0b2. 4. Globus WS Core Download from http://www.globus.org. Set "GLOBUS_LOCATION" environment variable to root of Globus installation. Tested with Globus 3.0 and Globus 3.2. Note that the version of xmlrpc distributed with Globus 3.0 and 3.2 contains a bug that will break the functionality of Pour. You should replace xmlrpc-*.jar in $GLOBUS_LOCATION/lib with the version included with eXist. 5. Globus WS Gars Download from http://www.globus.org. The only gars required from the set are "grim.gar" and "mjs.gar". These can either be deployed using "ant deploy -Dgar.name=/gar/path/..." from $GLOBUS_LOCATION or can be unzipped and the required files copied to $GLOBUS_LOCATION. If the unzip method is chosen, copy "grim.jar" from "grim.gar" and "mjs.jar" from "mjs.jar" to $GLOBUS_LOCATION/lib. In addition, copy "schema/base/gram" from "mjs.jar" to $GLOBUS_LOCATION/schema/base. 6. Ant Download from http://ant.apache.org. Add "bin" directory of Ant installation to "PATH" environment variable. Tested with Ant 1.6.2. ############### # Compilation # ############### 1. Java source After the requirements have been set up as described in the previous section, run "ant". ############## # Deployment # ############## 1. As a grid service If using the Globus standalone container, run "ant deploy -Dgar.name=$POUR_DIR/build/lib/ipg-pour.gar" from $GLOBUS_LOCATION, where $POUR_DIR is the root of the Pour installation. To undeploy, run "ant undeploy -Dgar.id=ipg-pour" from $GLOBUS_LOCATION. If running Globus in Tomcat or similar, follow above instructions, then run "ant deployTomcat" or "ant reDeployTomcat" as appropriate from $GLOBUS_LOCATION. See Globus documentation at http://www.globus.org for more details. ################# # Configuration # ################# 1. As a grid service All configuration is done via the "$GLOBUS_LOCATION/server-config.wsdd" file. There are three main parameters of interest. For a complete list of all parameters, see "gov.nasa.ipg.pour.Config". ... The "pourDbUri" parameter must be set to the URI of the XML database instance. The default value is an instance of eXist running on port 8080 of localhost. Note that the database must be installed, configured, and started separately as described in the relevant documentation for the database you are using. The "pourDbDriver" parameter must be set to the implementation of the "org.xmldb.api.base.Database" class for the database instance given by "pourDbUri". The default driver is set to the appropriate eXist class. The "pourConfigFile" parameter must be set to the Pour XML configuration file. This file is described in the next section. ########## # Spouts # ########## This section assumes familiarity with the concepts in "doc/grid04.pdf". A Pour configuration file defines the set of spouts: ... A spout describes a type of information and the methods used to produce it. Each spout defines the XML namespace for information it supplies, which includes the XML namespace URI, the XML prefix used for all attribute and element names, and the name of the root element for all XML documents produced. In addition to the XML namespace, a spout must define how it produces its periodic, on-demand, and user-specified information through pumps and drains that are hooked into the system. As an example, we will create a simple grid information service called "drift". The configuration files for drift are located in the "drift" directory. To be usable, search for "/some/dir" and change to the location of your Pour distribution directory. http://ipg.nasa.gov/drift drift drift ... ... Periodic information is produced by three types of periodic drains based on OGSA notifications, keyed hashed, and embedded Java objects. 1. gov.nasa.ipg.pour.drain.OGSANotificationDrain An OGSA notification drain must specify the URI of another grid service to which the spout will subscribe to the subset of information for which it is responsible (i.e. its XML namespace). http://some.host:8080/ogsa/services/base/index/IndexService 2. gov.nasa.ipg.pour.drain.KeyedDrain A keyed drain must specify a secret key in hexadecimal (two hex digits per byte) and the MAC algorithm. The algorithm must be the name of a MAC algorithm from the "Java Cryptography Extension (JCE) Reference Guide". The default algorithm is "HmacSHA1". In the example below, the key is set to the hexadecimal equivalent of "0123456789abcdef". 30313233343536373839616263646566 HmacMD5 3. gov.nasa.ipg.pour.drain.EmbeddedDrain An embedded drain must specify the name of the class that is to be loaded into the Pour service. See "gov.nasa.ipg.drift.Drift" for sample code. gov.nasa.ipg.drift.Drift User-specified information is produced by a user drain. 1. gov.nasa.ipg.pour.drain.UserDrain A user drain must specify an XML schema to which all user data must conform. A description of XML schemas is beyond the scope of this document. An example can be found in "drift/drift.xsd". In addition, a user drain must specify the time in milliseconds for which the data is considered valid. After this time expires, the data will be purged from the database. /some/dir/drift.xsd 604800000 On-demand information is produced by two types of pumps based on GRAM jobs and hierarchical Pour instances. 1. gov.nasa.ipg.pour.pump.GramPump A GRAM pump must specify the set of XPath prefixes for which it can produce information. It must also specify the set of XPath restrictions that must be met for the pump to activate. When the pump is activated, it will copy the specified executable to the specified host and run it with the specified arguments. The host and arguments may refer to the value of fields in the restrictions element. For example, below, the host will refer to the computer name that is given in the user's query. Finally, like a user drain, a GRAM pump must specify the time in milliseconds for which the data is considered valid. /computer/os /computer/arch /computer[@name] /some/dir/drift.sh restrictions[0] restrictions[0] 604800000 2. gov.nasa.ipg.pour.pump.PourPump A Pour pump must specify the URI of another Pour instance that will be contacted when the requested information is not found in the local instance. Like the GRAM pump, a Pour pump must also specify a set of XPath prefixes for the information that it is responsible for. Finally, it must specify the time in milliseconds for which the data is considered valid. /computer 604800000 http://pour.host:8080/ogsa/services/ipg/PourService ######### # Usage # ######### The Pour client can be accessed directly through its Java APIs or can alternatively be accessed using an XML-based command-line interface. The command-line interface can be easily wrapped in other languages such as Perl and Python. Detailed examples with comments can be found in the "client" directory. All examples in this directory assume that the Pour grid service with sample drift spout is running in a container on localhost port 8080. Shell usage can be found in "client/cli.sh". Perl usage can be found in the "client/cli.pl" file, which contains commented Perl code for adding, removing, and querying Pour XML documents. Java usage for accessing Pour as a grid service can be found in "client/cli.java". This file can be built by running "ant" in the "client" directory. An example of using a keyed drain to add information without a grid certificate is given in "client/cron.pl". An example query file "client/ondemand.xml" shows the structure of an on-demand query for the drift spout. Change "some.host" to a host running the GRAM MasterForkManagedJobFactoryService and run a query as described in the next section. For hosts running this service on a port other than the default (8080), the host can be written "host:port" in the query. The Perl examples require the modules "Digest::HMAC_MD5" and "Digest::HMAC_SHA1". These can be obtained from http://search.cpan.org. All examples can be run by changing to the "client" directory and running "test.sh" with arguments of "java", "perl", "sh", or "cron". ###################### # Command-Line Usage # ###################### Usage: PourClient [ add | addPeriodic | remove | query ] must be a valid OGSI handle for a Pour instance. must be a file containing an XML document that is valid according to the XML Schema Definitions allowed in the system. The file "drift/drift.xsd" describes the documents that can be added for the example drift spout. The example document below (also in "client/add.xml") describes a computer named "myhost.mydomain" with an os of "MyOS" and an architecture of "MyArch": MyOS MyArch can be any string identifying the source of the periodic information. must be the cryptographic hash of concatenated with using the designated MAC algorithm. must be a file containing an XML query that is valid against query.xsd. This file contains a set of XPaths as well as the options to be used in the query. Currently, the only available options are "pour:anonymous", which disables the use of any operations requiring a grid credential, and "pour:ondemand", which disables on-demand processing. The example query below (also in query.xml) requests an ELF executable named "java": /swim:swim/swim:file[swim:name='java'][swim:type='elf'] Perl usage can be found in cli.pl, which contains commented Perl code for the various operations. Java usage can be found in cli.java, which contains commented Java code for direct API access. 1. Add an XML document cli.sh http://localhost:8080/ogsa/services/ipg/PourService add add.xml 2. Add an XML document using a keyed drain cli.sh http://localhost:8080/ogsa/services/ipg/PourService \ addPeriodic periodic.xml aa3709f69553b32e581b98ed2adbb6e5d0531b41 3. Remove the XML documents associated with an XPath. Note that single and double quotes must be escaped or they will be removed by the shell. cli.sh http://localhost:8080/ogsa/services/ipg/PourService remove \ /drift:drift/drift:computer[@drift:name=\'myhost.mydomain\'] 4. Query a set of XPaths cli.sh http://localhost:8080/ogsa/services/ipg/PourService query query.xml ####################### # Contact Information # ####################### Paul Kolano kolano@nas.nasa.gov http://www.nas.nasa.gov/~kolano/projects/pour.html