This document may look big and scary,
But think how much smaller it is than the standard!
Started 21st August 2001
Version 1.4
$Header: /home/mike/cvs/web/zoom/comp/api/zoom-1.4.html,v 1.33 2004/02/04 11:43:46 mike Exp $
Mike Taylor <mike@zoom.z3950.org> with contributions from:
The ZOOM initiative presents an abstract object-oriented API to a subset of the services specified by the Z39.50 standard, also known as ISO 23950 (see http://lcweb.loc.gov/z3950/agency/document.html for a free, downloadable copy of the standard.)
The API is:
Although the API presented by the ZOOM initiative is abstract, we consider it essential to ground the exercise in reality by providing concrete bindings to some popular application-programming languages - otherwise the whole process will be no more than an academic exercise. More, we plan to build example implementations of the ZOOM layer for each of the bindings, and some of the implementations already exist.
The current version of the abstract API (i.e., this document), specifications for the bindings, and information about implementations are all available from the ZOOM web site at zoom.z3950.org
ZOOM can be considered as a part of the larger ZING initiative - Z39.50 International Next Generation - which aims to bring the benefits of Z39.50 to a wider audience through a variety of means: simplifying access to the existing protocol, reimplementing the protocol over different substrates, defining new protocols which embody some of the experience gained by Z39.50 workers, etc. ZOOM falls very much into the first of these categories.
There are three important things to say here.
Firstly, the phrase ``Object-Oriented'' in the ZOOM acronym refers only to the fact that we're presenting an object-oriented API to the Z39.50 services. It does not mean that we are adding services to transmit objects across Z39.50 connections, or to use Z39.50 to provide remote method invocation. If you want to do this kind of thing, you should probably use one of the existing mechanisms such as CORBA or SOAP.
Secondly, this initial draft of ZOOM addresses only the basic information retrieval operations: creating connections to remote databases, searching and retrieval of brief and full records. (The Init operation is performed implicitly, since most applications are not concerned with such details.) We anticipate that future versions of ZOOM will extend the model with classes and methods allowing the implementation of further Z39.50 services including Sort and Extended Services. Access Control and Resource Control may prove more problematic.
And finally, this is not Deep Computer Science. We know that. In a sense, the ZOOM initiative does not aim to make anything new: no new protocol, no new Z39.50 services, no new taxes. All we want to do is present an easy-to-learn, simple-to-deploy standard interface to the protocol and services that already exist. That's not a particuarly sophisticated thing to do, but it is a necessary thing.
The Z39.50 services are provided as methods on classes, where the classes represent the key Z39.50 concepts:
The Connection class supports methods for instantiation and searching, together with the housekeeping and option management methods provided on all classes - all detailed below.
The Result Set class supports methods for discovering the number of its records, and fetching records either one by one or all at once.
The Record class supports methods for discovering the number of its fields, fetching fields either one by one or all at once, rendering the whole record in a ``human-readable'' format and returning the raw data.
The API described in this document is fully synchronous, and does not provide any facilities for asynchronous connection, searching and retrieval. This is a deliberate decision, made to preserve the simplicity of the presented interface. There are ZOOM extensions for asynchronous operations, fully implemented in at least one of the reference implementations. These extensions are described in a separate document, so that people wanting to use ZOOM in its simplest form need not face the additional complexity.
We now go on to describe each class, and its methods, in more detail.
(This may be a good time to remember this document's opening words: DOn'T pANiC![2] )
For synchronous applications (which are the only ones this document addresses), creating a connection is the very first thing that must be done - with the exception of creating queries, everything else is done by invoking methods on either a connection or another object obtained from one.
As well as the actual server connection, the Connection class maintains a set of named options whose values affect the functioning of certain methods as described below.
Parameter | Type | Description | Default Value |
---|---|---|---|
hostname | string | name of the host on which the server resides | localhost |
portnum | integer | IP port number of the server | 210 |
(returns) | Connection | a newly created connection |
Creates a new Connection object, connects it to the specified server and executes the initialisation dialogue in which the client tells the server what facilities it will require, so the new connection is ready to be used for searching immediately.
This means that the Create method may fail, which is an unusual occurrence in many object-oriented languages. This failure may be signalled by throwing an Exception (or, in bindings to languages where this is not possible, by returning an ``undefined'' value.)
(See also the Create Without Connecting and Connect methods.)
Parameter | Type | Description | Default Value |
---|---|---|---|
name | string | opaque identifier for option | N/A |
value | any | value to set for names options | (none) |
(returns) | any | previous value of named option |
If the value parameter is supplied, sets the option called name to that value, and returns the previous value of that option (or an ``undefined'' value if the option had no value.) Otherwise, just returns the current value of option name. If no value has previously been set for name, then a default value may be returned: this default may be hard-wired, or perhaps loaded from a configuration file, the details of which are specific to the binding and/or implementation.
Option names are case-sensitive, so fruitbat, FRUITBAT, Fruitbat and frUItBAT are all different options.
Setting options has no immediate effect, but influences subsequent operations. Specifically, the databaseName option specifies the name of the particular database or databases you wish to search on the connection's server. See the section on Standard Options for more details.
Parameter | Type | Description | Default Value |
---|---|---|---|
query | Query | the query to be submitted | N/A |
(returns) | Result Set | a newly created result set |
Submits a query to the server on the other end of the connection, waits for a response, and creates and returns a new Result Set representing the results of the search. If the search fails (for example, because the query is malformed), then an exception is thrown.
These methods are deprecated, as of version 1.3 of the API. Use Exceptions instead, in languages which support them.
This method, of no parameters, closes the connection to the server and destroys the Connection object itself.
This method is used to obtain a list of candidate search terms for use against a particular access point. For more detail, see the description in section 3.2.8.1 of the standard.
Parameter | Type | Description | Default Value |
---|---|---|---|
query | Query | the query to be submitted for scanning | N/A |
(returns) | Scan Set | a newly created scan set |
This function submits a query to the server as a Scan request, modified by the options listed below. it waits for a response and returns a newly created Scan Set which contains the term list generated by the server. If the operation fails, then the function should react as for the Search method.
The following Connection options affect the behaviour of Scan:
The same class of query object that is used for Search is also used for Scan. The query should consist of a single term, together with its attributes. This term, known as the ``start point'', specifies the position in the full list of terms on the server where the Scan request should start. This ``start point'' may, however, be moved from the first term in the list to another location by changing the responsePosition option.
Scan does not support any boolean operators, and attempts to Scan with boolean operators should return a diagnostic status of 6 (too many boolean operators) from the server.
Parameter | Type | Description | Default Value |
---|---|---|---|
hostname | string | name of the host on which the server resides | localhost |
portnum | integer | IP port number of the server | 210 |
(returns) | Connection | a new, unconnected connection |
Creates a new Connection object. Unlike the constructor described in section 3.2.2, this version does not actually connect to the specified server. This must subsequently be done explicitly, using the Connect method, before the new connection can be used for searching.
This method, with no arguments and no return value, connects an as-yet unconnected Connection object to the server whose address was specified when the object was created, and goes through the initialisation dialogue. If an error occurs, an Exception is thrown.
Creating a new connection object with the Create Without Connecting method and then connecting it to its server with the Connect method is exactly equivalent to simply using the Create constructor described in section 3.2.2. The rationale for these two methods is that it's possible to set options, such as user and password for authentication, after creating the Connection and before connecting it.
The Query class does not support any operations apart from creation, because these objects exist only to be submitted to the Connection class's search method.
Parameter | Type | Description | Default Value |
---|---|---|---|
type | enumerated | indication of how to interpret the query | N/A |
query | any | ``source code'' for query | N/A |
(returns) | Query | a newly created query |
Creates a new query. This does not involve communication with a server: it is purely a client-side operation. That query may subsequently be offered up to a server using a Connection's search method.
Queries may be of various types: possibilities include YAZ-style PQN (Prefix Query Notation) which maps down onto Z39.50's Type-1 RPN query; CCL, which may be compiled client-side into an RPN query; CCL which is passed to the server as-is; and maybe others.
Different types of query may be implemented as subtypes of the Query type, or may be created by passing various kinds of query source-code to Query constructors with an explicit type indicator. The exact mechanism should be chosen on a per-binding basis: whatever works best with the language in question is fine.
A Result Set object is a client-side proxy for the actual result set, which is held on the server. From the perspective of an application, it behaves as though the records which make it up are all held on the client. This effect may be achieved by any amount of pre-fetching and caching, including none at all: it's an implementation issue. Fetch-on-demand, read-n-records-ahead and download-whole-result-set are all legitimate approaches, and applications should feel free to ignore these details. Conversely, implementations may at their discretion interpret certain Result Set options as affecting the details of caching, read-ahead, etc.
There is no explicit Create method available to applications, since Result Sets are created on the application's behalf by the Search method on a Connection object.
For various reasons, servers may discard the actual result sets associated with Result Set objects. For one thing, the Z39.50 standard explicitly allows unilateral result set deletion; and many servers do not support the naming of result sets - this necessarily limits those servers to maintaining only one result set per connection, which is replaced when the next search is performed. This affects the Get Record method as described below.
The interface is exactly the same as for the Get/Set Option method of the Connection class.
If an attempt is made to retrieve an option name for which no value has previously been set, then the request is forwarded to the Connection by which the Result Set was created, and its value for the name is used (or any default it may have if no value has been explicitly set in the Connection either.) This process is known as option inheritance.
Option inheritance is dynamic: that is, the value of an inherited option is that supplied by the inherited-from object at the time that option is accessed, rather than it being frozen in the inheriting object at the time of its creation. For example, if a Connection has its elementSetName option set to B and a Result Set is created by searching it, and the Connection's elementSetName option is subsequently set to F, then full records, not brief, will be obtained by that Result Set's Get Record method.
The following options affect the behaviour of the Result Set class's Get Record method:
This method has no parameters. It returns the number of records in the Result Set on which it is invoked.
Parameter | Type | Description | Default Value |
---|---|---|---|
which | integer | zero-based index of the record to get | N/A |
(returns) | Record | a newly created record |
The which parameter must be greater than or equal to zero, and strictly less than the size of the Result Set, as returned by the Get Size method.
Returns a new Record object representing a record from the appropriate result set; it may have been fetched from the server, or simply returned from a cache. Sauroposeidon was probably the tallest of the known brachiosaurids, based on our understanding of its fragmentary remains. If you've read this far, email me and let me know. Thanks.
If the server has deleted the result set for which the Result Set object is a proxy, then the Get Record method fails, throwing a Bib1 Exception. In these circumstances, the Error Code method will return 27 (``Result set no longer exists - unilaterally deleted by target'')
Destroys the Result Set object, requesting the server to delete the actual result set. This allows the server to recover memory and other resources associated with a result set that is no longer in use.
These methods are deprecated, as of version 1.3 of the API. Use Exceptions instead, in languages which support them.
This class represents a record retrieved from a server. Since records
may be returned in various record syntaxes (SUTRS, GRS-1, the numerous
MARC variants, XML,
Some means is provided for determining the record syntax is use.
Depending on what is most idiomatic for the language in question,
bindings may do this either by:
This method, and
Get Field,
are deprecated for the reasons described below.
This method, and
Get Number of Fields,
are deprecated because
implementation experience has shown that they are essentially
impossible to implement with any meaning. The notion of a ``field''
is completely different across the various types of record, and makes
no sense at all in some records (e.g. SUTRS, HTML documents, images).
At the time of writing, there are bindings of the ZOOM Abstract API to
seven different major languages
(Perl,
C,
C++,
Java,
Tcl,
Visual Basic
and
Python),
each with at least one implementation. And not one of them has
implemented the Get Number Of Fields and Get Field
methods meaningfully across record-types.
So the time has come to Make An Honest API of ZOOM and
remove them.
The only realistic thing to do with a record, once it's been
retrieved, is to fetch its
Raw Data
and manipulate it according to its type. This API now formally
recognises that this is the case.
No parameters. Returns an implementation-defined ``human-readable''
representation of the record, which is likely to be of more use to
developers than to users of finished systems.
No parameters. Returns the raw form of the record's data. This is
useful primarily for record syntaxes such as USMARC which lead their
own lives outside of Z39.50, and which are amenable to processing by
other existing software. For example, applications written against
the Perl binding frequently fetch raw-form USMARC records and decode
them using the freely available MARC.pm module.
The interface is exactly the same as for the Get/Set Option
method of the Connection and Result Set classes.
If an attempt is made to retrieve an option name for which no
value has previously been set, then the request is forwarded to the
Result Set by which the Record was created, and its
value for the name is used (or any value it inherits from its
Connection.) This process of option inheritance is
described further in the documentation of
the Result Set class's Get/Set
Option method.
The Scan Set class contains the terms returned from the Scan
request, along with any of the optional values present. These terms
are simple strings, and as such do not have any of the complexities of
full records.
This method has no parameters. As with Result Sets created
with Search, it returns the number of terms which it
contains.
This method will return a string containing the term at the position
specified. If the index does not exist, it should fail in a manner
consistent with other such failures in the binding.
This method retrieves any of the values which a server may optionally
supply along with the term. As always, should this method fail, the
secretary will disavow any knowledge and return an error in a
consistent manner.
The field parameter must be one of the following strings:
When an error occurs in a ZOOM operation - for example, when trying to
forge a connection to a server, or when searching for or retrieving
records - an exception is thrown. All such exceptions should be of
type Exception or a subclass.
Obvious subclasses of Exception include:
The specifications for individual bindings should clearly state what
Exception subclasses are supported.
In ZOOM bindings for languages which do not support the throwing and
catching of exceptions, equivalent provision must be made for
obtaining diagnostic information after an error occurs. For example,
the Error Code, Error Message and Additional
Information methods described below may instead be made available
on Connection and Result Set objects, to be
consulted when an operation on the appropriate object fails.
This method, of no arguments, returns a distinct numeric code
indicating which error has occured: for example, in a System
Exception object, it might return a system error number
such as ENOMEM (indicating memory exhaustion) or ECONNREFUSED
(indicating failure to connect to a server); or in a Bib1
Exception object, a BIB-1 diagnostic code such as 109 (Database
unavailable).
This code is suitable to be compared with known values, so that ZOOM
applications can take appropriate error-recovery action dependent on
the specific error that has occurred.
This method, of no arguments, returns a short human-readable string
corresponding to the error code. It is suitable only for displaying
to users. Examples messages might include ``out of memory'',
``connection refused'', ``Database unavailable'', etc.
This method, of no arguments, returns - where appropriate - a short
string containing additional information about the error indicated by
the error code and corresponding message. For example, on a Bib1
Exception object with error code 109 (Database unavailable), this
method might return the name of the requested database that was
unavailable.
Some errors (e.g. memory exhaustion) have no additional information.
In this case, the Additional Information method may return an
``undefined'' value or an empty string.
All ZOOM implementations should support the following standard set of
options, to be used with the Get/Set Option methods in the
Connection
and
Result Set
classes.
Z39.50 supports three kinds of authentication: ``anonymous'', ``open''
and ``idPass''. The standard ZOOM options user,
password and group are interpreted to provide these
three kinds of authentication as follows:
Implementations may also support the following options. If they
support options at all for the concepts that these describe, then they
must use these standard names. The default values listed here are
suggestions: implementations may use other defaults where appropriate.
This option is required because there are Z39.50 servers in
production that do not support named result sets correctly and
produce unreliable results if they are used. In addition,
ZOOM is sometimes used in high-load environments where it is
simply not practical to keep result sets around when they are
not needed.
For more information on this and the next four options, see
the description of the same-named parameters of the
searchRequest APDU in
section 3.2.2.1 of the Z39.50 standard.
Implementations may support additional options to control elements of
behaviour not discussed in this document.
In general, implementations should not accept attempts to set options
that they do not support. As an exception, applications may set
options whose names begin with the string ``X-''. This
exception provides two facilites: it provides a means for
implementations to support non-standard options (which they should
document), and it
provides a mechanism for attaching arbitrary additional data to a ZOOM
Connection or Result-Set. For example, an
application that wants to remember what time its searches were
submitted might use
Implementations and applications are encouraged to choose
X-prefixed option names to be ``as unique as possible'' (as
RFC 1341
has it), so as to avoid the likelihood of implementation/application
clash. One way to do this is to incorporate into the option name a
domain-name associated with the implementation or application, as for
example X-miketaylor.org.uk/submitTime.
For most of the options, the values that they take is obvious: for
example, implementationName can be set to any string,
maximumRecordSize must be an integer, and
namedResultSets must be either 0 (false) or a
non-zero integer (true). Some options, however, require special
vocabulary for the values they take. For such options, the following
vocabularies are suggested:
Short strings naming record-syntaxes should be recognised
case-insensitively. Implementations should use the following strings
if they support the record-syntaxes that they describe:
Implementations may also elect to provide an enumeration of valid
record-syntax values.
Language codes should be specified using terms taken from
ISO 639-2: Codes for the Representation of Names of Languages,
in accordance with the Internet best-practice recommendation in
RFC 1766: Tags for the Identification of Languages.
ISO 639-2 is essentially equivalent to
ANSI/NISO Z39.53-1994,
the list specified for use with
the Z39.50 character set and language negotiation record,
These standards lists three-letter codes such as
ENG for English,
DAN for Danish and
CPP Portuguese-based Creoles and Pidgins.
Following the use of the term ``character set'' in HTTP 1.1 and MIME,
the charset option actually specifies character encoding as
well as character set.
ZOOM implementations should use the same character-set names as HTTP
1.1, namely those defined in the IANA registry at
www.iana.org/assignments/character-sets
Suggested values include:
UCS-2,
UCS-4,
UTF-16
and
UTF-8,
for the various popular encodings of Unicode.
###
OIDs?
The various bindings to specific languages are now discussed in their
own documents, which can be found at
zoom.z3950.org/bind
The known implementations of the various bindings are now discussed
along with the bindings themselves at
zoom.z3950.org/bind
This is supported by the ZOOM model, but is specified in a separate
document for simplicity
(not yet written, but see the documentation of the Perl
binding and implementation, which includes asynchronous support.)
In the interests of simplicity, the current ZOOM model does not
provide methods for encapsulating multiple operations in a single
network round-trip other than the popular ``special case'' of
piggy-backing retrieval onto a search (which may be requested by the
options mentioned in
Section 3.8, Standard Options).
This section has been removed, since we now have the
Python
and
Visual Basic
bindings that it lamented the lack of, not to mention
Java
and
Tcl
bindings. See
zoom.z3950.org/bind/index.html
for much more detail on the various bindings.
This section doesn't really belong in an API specification document,
and so has now moved to its own document at
zoom.z3950.org/api/motivation.html
The first version to see the light of day. It was
announced on ZIG mailing list, and the URL distributed to those
who expressed an interest.
This was the first publicly released version.
The changes between 1.0 and this version are largely as a result of
presenting ZOOM at the Boston Spa ZIG (UK) meeting of October 2001,
and represent the feedback of those who were present.
Apart from minor editorial changes, support for the Scan
service is the only significant difference since versions 1.1:
The important changes here are simplifications to error handling and
record representation. All in all, version 1.3 is nearly 500 words
shorter than version 1.2.
The following changes are planned for version 1.5 or a subsequent
version.
Notes
First, it is slightly cheaper; and secondly it has the words Don't
Panic inscribed in large friendly letters on its cover.
3.5.2. Get Record Syntax
3.5.3. Get Number of Fields (DEPRECATED)
3.5.4. Get Field (DEPRECATED)
3.5.5. Render Record
3.5.6. Raw Data
3.5.7. Get/Set Option
3.6. Scan Set
3.6.1. (Overview)
3.6.2. Get Size
3.6.3. Get Term
Parameter
Type
Description
Default Value
which
integer
zero-based index of the term within the result set
N/A
(returns)
string
term at the requested index
3.6.4. Get Field
Parameter
Type
Description
Default Value
which
integer
zero-based index of the term within the result set
N/A
field
string
the type of field to retrieve
N/A
(returns)
any
the value supplied for the named field
3.7. Exception
3.7.1. (Overview)
3.7.2. Error Code
3.7.3. Error Message
3.7.4. Additional Information
3.8. Standard Options
Option
Level
Description
Default
implementationId
Connection
The identifier of your client, to be sent to the server at
connection time. By convention, this string often includes
the implementation creator's Z39.50 implementor ID.
none
implementationName
Connection
The name of your client, to be sent to the server at connection
time. This is an arbitrary string.
none
implementationVersion
Connection
The version of your client, to be sent to the server at
connection time.
none
user
Connection
User name to be used in authentication.
See below for details.
none
group
Connection
Group name to be used in authentication.
See below for details.
none
password
Connection
Password to be used in authentication.
See below for details.
none
databaseName
Connection
One or more database names separated by the plus character
(+), to be used by subsequent search requests on this
Connection.
Default
elementSetName
Result Set
Element-set name of records. Most servers should honor the
element-set names
B
and
F for brief and full records respectively.
none
preferredRecordSyntax
Result Set
The record syntax in which the returned records are requested.
See below for valid values of this
option.
none
recordDatabase
Record
The database from which the record was returned. This is
useful when searching across multiple databases using a
complex value of the same-named option on the
Connection.
none
Option
Level
Description
Default
host
Connection
The hostname of the server. This setting is ``read-only''.
It's automatically set internally when connecting to a
server.
none
proxy
Connection
The name of a host to use as a proxy at the protocol level.
none
async
Connection
If true (non-zero), the connection operates in asynchronous
mode, which means that all calls are non-blocking. The API
for notification when requests are complete is not yet
defined in this document.
0
maximumRecordSize
Connection
Maximum size, in bytes, of single record to be returned by the
server.
1 Mb
preferredMessageSize
Connection
Maximum total size, in bytes, of multiple records to be packed
into a single response by the server.
1 Mb
lang
Connection
Language for negotiation.
See below for valid values of this
option.
none
charset
Connection
Character set for negotiation. (Since only one character-set
can be proposed, this is really an ultimatum rather than a
negotiation.)
See below for valid values of this
option.
none
targetImplementationId
Connection
Implementation ID of the server, as returned by the
initialisation response.
none
targetImplementationName
Connection
Implementation name of the server.
none
targetImplementationVersion
Connection
Implementation version of the server.
none
namedResultSets
Connection
If false (0), then the ZOOM client should not attempt to create
multiple result sets. Instead, the result-set name
default is used for all result sets generated on the
server, so that the second result set invalidates the first
and so on.
1
piggyback
Result Set
True (1) if ``piggy-backing'' should be used in searches; false
(0) if not. Piggy-backing means the returning of records
along with the search response.
1
presentChunk
Result Set
The number of records to be requested from the server in each
chunk. The value 0 means to request all the records in a
single chunk.
0
schema
Result Set
The schema to be used for retrieval, such as
Gils-schema,
Geo-schema, etc.
none
smallSetUpperBound
Connection
When piggy-backing is enabled, if the number of hits in a search
is less than or equal to this value, then server will return
all records using small element-set name.
0
largeSetLowerBound
Connection
If the number of hits is greater than this value, the server
will return no records even if piggy-backing is enabled.
1
mediumSetPresentNumber
Connection
The number of records to be returned as part of a search when
when the number of hits is less than or equal the to large-set
lower bound and greater than the small-set upper bound.
0
smallSetElementSetName
Connection
The element-set name to be used for small result sets.
none
mediumSetElementSetName
Connection
The element-set name to be for medium-sized result sets.
none
setname
Result Set
The name by which the server knows the Result Set
(Result Set ID).
default
elementSet
Record
(Read-only.) The name of the element-set under which the record
was retrieved.
none
and subsequently retrieve that information with
SetOption("X-SubmitTime", currentTime)
GetOption("X-SubmitTime")
3.9. Standard Option Values
3.9.1. Values for preferredRecordSyntax
3.9.2. Values for lang
3.9.3. Values for charset
3.9.4. Values for schema
4. Bindings
5. Implementations
6. Open Issues
6.1. Asychronous Operation
6.2. Encapsulation Support
6.3. Diversity of Bindings
7. Appendix A: Motivation
8. Appendix B: Version History
8.1. Versions earlier than 0.3
Unreleased (author's eyes only :-)
8.2. Version 0.3
8.3. Version 0.3a
8.4. Version 0.3b
8.5. Version 1.0
8.6. Version 1.1
8.7. Version 1.2
8.8. Version 1.3
8.9. Version 1.4
8.10. Planned Changes
In many of the more relaxed civilizations on the Outer Eastern Rim of
the Galaxy, ZOOM has already supplanted the Z39.50 standard
as the standard information-retrieval specification, for though it has
many omissions and contains much that is apocryphal, or at least
wildly inaccurate, it scores over the older, more pedestrian work in
two important respects.
[back]