#######################
#######################
#### IPG Pour v1.0 ####
#######################
#######################
############
# Contents #
############
NOSA.txt Pour license
README This file
PourService.wsdd Pour deployment descriptor
build.properties Ant build properties
build.xml Ant build file
client Pour client examples
doc/grid04.pdf Pour conference paper
doc/api Pour javadocs (skeleton only)
drift Pour spout example
java/lib Pour required libraries
java/gov Pour java source
###############
# Preparation #
###############
1. Java 1.4 or above
Download from http://java.sun.com. Tested with Java 1.4.2.
2. JDOM
Download from http://www.jdom.org. Copy "jaxen-core.jar",
"jaxen-jdom.jar", "jdom.jar", and "saxpath.jar" to "java/lib"
directory. Tested with JDOM 1.0.
3. eXist XML Database
Although Pour works with any XML:DB compliant database, eXist has been
found to have significantly superior performance for this application,
thus is the only one recommended for use with Pour. Download from
http://www.exist-db.org. Copy "exist.jar", "commons-pool-*.jar",
"xmldb.jar", "xmlrpc-*.jar", "jakarta-regexp-*.jar", and
"commans-jxpath-*.jar" to "java/lib" directory. Tested with eXist
1.0b2.
4. Globus WS Core
Download from http://www.globus.org. Set "GLOBUS_LOCATION" environment
variable to root of Globus installation. Tested with Globus
3.0 and Globus 3.2.
Note that the version of xmlrpc distributed with Globus 3.0 and 3.2
contains a bug that will break the functionality of Pour. You should
replace xmlrpc-*.jar in $GLOBUS_LOCATION/lib with the version included
with eXist.
5. Globus WS Gars
Download from http://www.globus.org. The only gars required from the
set are "grim.gar" and "mjs.gar". These can either be deployed using
"ant deploy -Dgar.name=/gar/path/..." from $GLOBUS_LOCATION or can be
unzipped and the required files copied to $GLOBUS_LOCATION.
If the unzip method is chosen, copy "grim.jar" from "grim.gar" and
"mjs.jar" from "mjs.jar" to $GLOBUS_LOCATION/lib. In addition, copy
"schema/base/gram" from "mjs.jar" to $GLOBUS_LOCATION/schema/base.
6. Ant
Download from http://ant.apache.org. Add "bin" directory of Ant
installation to "PATH" environment variable. Tested with Ant 1.6.2.
###############
# Compilation #
###############
1. Java source
After the requirements have been set up as described in the previous
section, run "ant".
##############
# Deployment #
##############
1. As a grid service
If using the Globus standalone container, run "ant deploy
-Dgar.name=$POUR_DIR/build/lib/ipg-pour.gar" from $GLOBUS_LOCATION,
where $POUR_DIR is the root of the Pour installation.
To undeploy, run "ant undeploy -Dgar.id=ipg-pour" from $GLOBUS_LOCATION.
If running Globus in Tomcat or similar, follow above instructions,
then run "ant deployTomcat" or "ant reDeployTomcat" as appropriate from
$GLOBUS_LOCATION. See Globus documentation at http://www.globus.org for
more details.
#################
# Configuration #
#################
1. As a grid service
All configuration is done via the "$GLOBUS_LOCATION/server-config.wsdd"
file. There are three main parameters of interest. For a complete list
of all parameters, see "gov.nasa.ipg.pour.Config".
...
The "pourDbUri" parameter must be set to the URI of the XML database
instance. The default value is an instance of eXist running on port
8080 of localhost. Note that the database must be installed,
configured, and started separately as described in the relevant
documentation for the database you are using.
The "pourDbDriver" parameter must be set to the implementation of the
"org.xmldb.api.base.Database" class for the database instance given by
"pourDbUri". The default driver is set to the appropriate eXist class.
The "pourConfigFile" parameter must be set to the Pour XML configuration
file. This file is described in the next section.
##########
# Spouts #
##########
This section assumes familiarity with the concepts in "doc/grid04.pdf".
A Pour configuration file defines the set of spouts:
...
A spout describes a type of information and the methods used to produce
it. Each spout defines the XML namespace for information it supplies,
which includes the XML namespace URI, the XML prefix used for all
attribute and element names, and the name of the root element for all
XML documents produced. In addition to the XML namespace, a spout must
define how it produces its periodic, on-demand, and user-specified
information through pumps and drains that are hooked into the system.
As an example, we will create a simple grid information service called
"drift". The configuration files for drift are located in the "drift"
directory. To be usable, search for "/some/dir" and change to the
location of your Pour distribution directory.
http://ipg.nasa.gov/driftdriftdrift
...
...
Periodic information is produced by three types of periodic drains
based on OGSA notifications, keyed hashed, and embedded Java objects.
1. gov.nasa.ipg.pour.drain.OGSANotificationDrain
An OGSA notification drain must specify the URI of another grid service
to which the spout will subscribe to the subset of information for
which it is responsible (i.e. its XML namespace).
http://some.host:8080/ogsa/services/base/index/IndexService
2. gov.nasa.ipg.pour.drain.KeyedDrain
A keyed drain must specify a secret key in hexadecimal (two hex digits
per byte) and the MAC algorithm. The algorithm must be the name of a
MAC algorithm from the "Java Cryptography Extension (JCE) Reference
Guide". The default algorithm is "HmacSHA1". In the example below, the
key is set to the hexadecimal equivalent of "0123456789abcdef".
30313233343536373839616263646566HmacMD5
3. gov.nasa.ipg.pour.drain.EmbeddedDrain
An embedded drain must specify the name of the class that is to be
loaded into the Pour service. See "gov.nasa.ipg.drift.Drift" for
sample code.
gov.nasa.ipg.drift.Drift
User-specified information is produced by a user drain.
1. gov.nasa.ipg.pour.drain.UserDrain
A user drain must specify an XML schema to which all user data must
conform. A description of XML schemas is beyond the scope of this
document. An example can be found in "drift/drift.xsd". In addition,
a user drain must specify the time in milliseconds for which the data
is considered valid. After this time expires, the data will be purged
from the database.
/some/dir/drift.xsd604800000
On-demand information is produced by two types of pumps based on GRAM
jobs and hierarchical Pour instances.
1. gov.nasa.ipg.pour.pump.GramPump
A GRAM pump must specify the set of XPath prefixes for which it can
produce information. It must also specify the set of XPath restrictions
that must be met for the pump to activate. When the pump is activated,
it will copy the specified executable to the specified host and run it
with the specified arguments. The host and arguments may refer to the
value of fields in the restrictions element. For example, below, the
host will refer to the computer name that is given in the user's query.
Finally, like a user drain, a GRAM pump must specify the time in
milliseconds for which the data is considered valid.
/computer/os/computer/arch/computer[@name]/some/dir/drift.shrestrictions[0]restrictions[0]604800000
2. gov.nasa.ipg.pour.pump.PourPump
A Pour pump must specify the URI of another Pour instance that will be
contacted when the requested information is not found in the local
instance. Like the GRAM pump, a Pour pump must also specify a set of
XPath prefixes for the information that it is responsible for. Finally,
it must specify the time in milliseconds for which the data is
considered valid.
/computer604800000
http://pour.host:8080/ogsa/services/ipg/PourService
#########
# Usage #
#########
The Pour client can be accessed directly through its Java APIs or can
alternatively be accessed using an XML-based command-line interface.
The command-line interface can be easily wrapped in other languages such
as Perl and Python.
Detailed examples with comments can be found in the "client" directory.
All examples in this directory assume that the Pour grid service with
sample drift spout is running in a container on localhost port 8080.
Shell usage can be found in "client/cli.sh". Perl usage can be found
in the "client/cli.pl" file, which contains commented Perl code for
adding, removing, and querying Pour XML documents. Java usage for
accessing Pour as a grid service can be found in "client/cli.java".
This file can be built by running "ant" in the "client" directory. An
example of using a keyed drain to add information without a grid
certificate is given in "client/cron.pl".
An example query file "client/ondemand.xml" shows the structure of an
on-demand query for the drift spout. Change "some.host" to a host
running the GRAM MasterForkManagedJobFactoryService and run a query as
described in the next section. For hosts running this service on a
port other than the default (8080), the host can be written "host:port"
in the query.
The Perl examples require the modules "Digest::HMAC_MD5" and
"Digest::HMAC_SHA1". These can be obtained from http://search.cpan.org.
All examples can be run by changing to the "client" directory and
running "test.sh" with arguments of "java", "perl", "sh", or "cron".
######################
# Command-Line Usage #
######################
Usage: PourClient
[ add
| addPeriodic
| remove
| query ]
must be a valid OGSI handle for a Pour instance.
must be a file containing an XML document that is valid
according to the XML Schema Definitions allowed in the
system. The file "drift/drift.xsd" describes the documents
that can be added for the example drift spout. The example
document below (also in "client/add.xml") describes a
computer named "myhost.mydomain" with an os of "MyOS" and an
architecture of "MyArch":
MyOSMyArch can be any string identifying the source of the periodic
information.
must be the cryptographic hash of concatenated
with using the designated MAC algorithm.
must be a file containing an XML query that is valid
against query.xsd. This file contains a set of XPaths as
well as the options to be used in the query. Currently, the
only available options are "pour:anonymous", which disables
the use of any operations requiring a grid credential, and
"pour:ondemand", which disables on-demand processing.
The example query below (also in query.xml) requests an ELF
executable named "java":
/swim:swim/swim:file[swim:name='java'][swim:type='elf']
Perl usage can be found in cli.pl, which contains commented Perl code
for the various operations. Java usage can be found in cli.java, which
contains commented Java code for direct API access.
1. Add an XML document
cli.sh http://localhost:8080/ogsa/services/ipg/PourService add add.xml
2. Add an XML document using a keyed drain
cli.sh http://localhost:8080/ogsa/services/ipg/PourService \
addPeriodic periodic.xml aa3709f69553b32e581b98ed2adbb6e5d0531b41
3. Remove the XML documents associated with an XPath. Note that single
and double quotes must be escaped or they will be removed by the
shell.
cli.sh http://localhost:8080/ogsa/services/ipg/PourService remove \
/drift:drift/drift:computer[@drift:name=\'myhost.mydomain\']
4. Query a set of XPaths
cli.sh http://localhost:8080/ogsa/services/ipg/PourService query query.xml
#######################
# Contact Information #
#######################
Paul Kolano
kolano@nas.nasa.gov
http://www.nas.nasa.gov/~kolano/projects/pour.html