=========
TAP query
=========

A library to query TAP servers
========================================================================

:Author: Markus Demleitner
:Email: gavo@ari.uni-heidelberg.de


.. contents:: 
  :depth: 2
  :backlinks: entry
  :class: toc


TAP is a relatively complex protocol to execute potentially long-running
(ADQL) queries on remote servers.  As a part of the GAVO VOTable
library, we provide a shallow client that speaks TAP.  It is the
intention of this document to keep the `TAP spec`_ from your reading
list.  If it doesn't do that, complain, and we'll try to fix it.


Obtaining the library
---------------------

See the `GAVO VOTable Library documentation`_


For the impatient
-----------------

All you need to query a server is its access URL.  What we need here is
the root of the hierarchy, i.e., without any ``sync`` or ``async``.

Then, you can say something like::

  from gavo import votable
  accessURL = "http://dc.zah.uni-heidelberg.de/__system__/tap/run/tap"
  query = "SELECT TOP 3 * FROM TAP_SCHEMA.tables"
  job = votable.ADQLTAPJob(accessURL, query)
  job.run()
  dataIterator, metadata = votable.load(job.openResult())
  job.delete()
  print list(dataIterator)

More on dataIterator and metadata can be found in the `GAVO VOTable
library documentation`_.


The ADQLTAPJob
--------------

The class you will usually deal with is ADQLTAPJob_, which is
constructed with the endpoint URL and the query.  The construction will
access the, which means it may very well raise network-related
exceptions.

You can pass a ``userParams`` dictionary to the constructor.  This is
intended for tho TAP parameters (in particular, ``FORMAT``, ``MAXREC``,
``RUNID`` or service-defined ones).  You should not use ``userParams``
for ``UPLOAD`` but instead use the ``addUpload`` method described below.

You can also change the parameters later using 
``setParameter(key, value)``.  This again causes a server connection
to be made, as are accesses to ADQLTAPJob's properties, viz.,

* ``executionDuration`` -- the number of seconds after which the server
  will kill your job.  Simply assign some integer to change it, though
  of course the server might not let you.
* ``destruction`` -- a ``datetime.datetime`` (in UTC) at which the job
  will be completely removed (i.e., even the results) from the remote
  server.  Again, you can assign ``datetime`` instances to try and change it.
* ``phase`` -- the current phase of the job.  This is a string
  containing magic values; the possible values are PENDING, QUEUED, 
  EXECUTING, COMPLETED, ERROR, ABORTED (these are also available as
  symbols in tapquery -- this provides some ward against typos).  Most 
  of them are pretty self-explanatory, except that PENDING means you 
  still can change the query, whereas QUEUED jobs are, well, in the 
  server's queue and cannot be changed any more.  You should not assign 
  to ``phase`` manually.
* ``quote`` -- returns an estimate of the number of seconds your job
  will execute on the remote machine.  This is, of course, a guess even
  under the most favorable circumstances.  Some servers choose to not
  even try to guess, in which case you'll get a None.  This
  cannot be assigned to.
* ``parameters`` -- a dictionary containing your parameters.  You cannot
  assign to parameters, and changing things here has no effect.  Use
  setParameter(key, value) instead.


Running a job
-------------

Once you have constructed (and possibly modified) a job, call ``start()``
to tell the remote server to put it into its execution queue.
You can then poll the job's phase (now and then)::

  job = votable.ADQLTAPJob(...)
  job.executionDuration = 6000
  job.start()
  while job.phase not in set([tapquery.ERROR, tapquery.COMPLETED]):
    time.sleep(10)


You can call ``abort()`` to kill a running job, and you can call
``delete()`` when done.  As a matter of fact, although server operators
will eventually destroy your job anyway, it's common courtesy to clean
up behind you unless you have good reason to keep the result data.  So
the recommended pattern is::

  job = votable.ADQLTAPJob(...)
  try:
    ... do your thing ...
  finally:
    job.delete()

Now, doing the polling by hand is tedious, and this there is a function
``waitForPhases(phases, ...)`` that takes a set of phases to wait for,
and then polls at increasing intervals (that you could control though
the method's further arguments; see `API docs`_ for details).

But really, the ``run`` method is what you'd usually use; it has the
added advantage that it raises a ``RemoteError`` or a ``RemoteAbort`` exception
if something went wrong on the server side.  So, the standard pattern is::

  job = votable.ADQLTAPJob(...)
  try:
    ... add uploads, set parameters if you must ...
    ... change UWS parameters (executionDuration, destruction, etc) ...
    job.run()
    ... read results ...
  finally:
    job.delete()


TAP parameters
--------------

The following parameters are defined by the TAP spec:

* ``FORMAT`` -- the format you want to retrieve the data in.  This
  defaults to votable, and you should probably keep that default with
  this library (since, if you have it, you can parse VOTables, right?).
  Other possibile values include csv, tsv, fits, text, or html.  Servers
  must support VOTables, the other formats are optional
* ``MAXREC`` -- a limit as to how many rows are to be returned.  This
  basically works like the ``TOP`` phrase in ADQL and is rather
  superfluous when using ADQL.
* ``RUNID`` -- some identifier you can pass.  It could be used for
  tracking and similar.  The server should include it in
  results.  If you don't know what it's for, you probably don't need it.
* ``LANG`` -- the query language.  In this library, it defaults to
  ``ADQL``.  Let's see if the library is flexible enough to support
  other languages (which are not specified yet).
* ``REQUEST`` -- specifies the operation you want from the server.
  ``"doQuery"`` is what ``ADQLTAPJob`` fills in for you, and it's
  what you should leave it at.
* ``QUERY`` -- the query you are posing.  ``ADQLTAPJob`` specifies
  it for you, but if you really wanted to, you could override it (e.g.,
  using ``setParameter``.
* ``UPLOAD`` -- table uploads and such.  While you could manipulate
  this manually, don't.  Use the ``addUpload`` method.


Getting results
---------------

Right now, this library heavily leans towards ADQL.  When doing ADQL
queries, there is only one result.  You can access it using the
``openResult()`` method, which returns a file-like object (actually,
it's what ``urllib.urlopen`` returns) that you can read your results
from.  Since successful ADQL queries return single-table VOTables, you
can feed this directly to votable.load(f)::

  data, metadata = votable.load(job.openResult())

More complex query patterns could yield more results; in that case,
you will have to inspect ``job.results``, which is a list of triples of
an access URL, an opaque string-typed id, and a UWS "result type" that
you probably can safely ignore.


Errors
------

All exceptions originating in the library are subclasses of
``tapquery.Error``.  The `API docs`_ list some exceptions you should
expect; also, of course, all kinds of network-related exceptions could
come out of the library.  No serious attempt is being made to catch
such exceptions and translate them.  If something is wrong network-wise,
we feel it's better to freely admit this.


Sync Querying
-------------

TAP also supports a synchronous querying mode; for simple, quick
queries, this is simpler and has less overhead.  So, if you are certain
that your query will run quickly and the result set is small, you can
use the ADQLSyncJob.  This works mostly like the ADQLTAPJob, except of
course everything that deals with remote state management basically is a
no-op.  Instead, either ``run`` or ``start`` just query the remote
server and return when the server is done.  For convenience, they return
the job itself, so that you can say things like::

	print votable.ADQLSyncJob(
	  "http://dc.zah.uni-heidelberg.de/__system__/tap/run/tap",
	  "SELECT * FROM TAP_SCHEMA.tables"
    ).run().openResult().read()

All "unpredictable" exceptions on sync jobs are raised from within
``run``; these will usually be ``tapquery.WrongStatus`` or
``tapquery.NetworkError`` exceptions, for when the TAP server has
complained or something was wrong on the way between the client and the
server.  

The ``WrongStatus`` instances have a payload attribute that contains any
message body the server might have sent with the headers; frequently
this contains explanations what may have gone wrong.  Since no http
headers are available, there's no saying what format the error message
came in.  Tell us if that bugs you.

The error associated with the exception object will usually not be
particularly useful.  Instead, obtain an error message from the server
using the ``getErrorFromServer`` method like for ADQLTAPJobs.
TAP errors (those coming back as VOTables) are parsed out, data
that's not parseable as a TAP error message is returned as-is; thus the
string returned may be long (e.g., some fancy 404 HTML page).

------

.. _ADQLTAPJob: http://vo.ari.uni-heidelberg.de/docs/DaCHS/apidoc/gavo.votable.tapquery.ADQLTAPJob-class.html
.. _API docs: http://vo.ari.uni-heidelberg.de/docs/DaCHS/apidoc/toc-gavo.votable.tapquery-module.html
.. _GAVO VOTable Library documentation: http://vo.ari.uni-heidelberg.de/docs/DaCHS/votable.html
.. _TAP spec: http://www.ivoa.net/Documents/TAP/