================================ Development notes for GAVO DaCHS ================================ This is basically a heap of stuff I intend to amend with informal docs on what I'm doing to the software. While I hope that at some point it'll grow into a useful introduction to further developing the stuff, right now it's a random collection that contains quite a bit of outdated information. Caveat emptor. Package Layout ============== To alleviate cross-import pains, to facilitate later package splitting and also as a guideline of what goes where, apply the following rules: * Each functionality block is in a subpackage, the __init__ for which contains the main functions, classes, etc., of the the sub-package interface most clients will be concerned with. Clients needing special tricks may still import individual modules. * Within each subpackage, *no* module imports the sub-package, i.e., a module in base never says "from gavo import base" * A subpackage may have a module common, containing objects that multiple modules within that subpackage requires. common may *not* import any module from the subpackage, but may be imported from all of them. * There is a hierarchy of subpackages, where subpackages lower in the hierarchy may not import anything from the higher or equal levels, but only from lower levels. This hierarchy currently looks like this: imp < utils < (adql, stc) < base < rscdef < grammars < rsc < svcs < protocols < web < rscdef < (helpers, user) Error handling, logging ======================= It is the goal that all errors that can be triggered from the web or from within resource descriptors yield sensible error messages with, as possible, information on the location of the error. Also, major operations changing the content of the database should be loggable with time and, probably, user information. The core of error processing is utils.excs. All "sensible" exceptions (i.e., MemoryErrors and software bugs excepted) should be instances of gavo.excs.Error (or derived classes). The base class takes a hint argument at construction that gives additional information on how to fix a certain problem. Apart from message (in the first argument), the exceptions must always be constructed using keyword arguments. When there is structured information (e.g., line numbers, keys, and the like), always keep the information separate and use the __str__ method of the exception to construct something humans want to see. All built-in exceptions should accept a hint keyword. Testing ======= Many of the tests will require database connectivity. They should, in general, not need resources apart from the availability of a test db profile. I have this set up on my private machine like this: :: [profiles] test:test in ~/.gavorc and :: database=gavo user=msdemlei in ~/.gavo/etc -- of course, for this to work, you need a postgresql engine running on your machine with a database gavo in which your role is superuser (well, has sufficient rights...). There are doctest, pyunit-based tests, and trial-based tests yaddayadda, utils/roughtest.py Structures ========== Resource description within the DC works via instances of base.Structure. These parse themselves from XML strings, do validation, etc. A complete structure instance has the following callbacks: * completeElement -- called when the element closing tag is encountered, used to fill in computed defaults * validate -- called after completeElement, used to raise errors if some gross ("syntactic") mistakes are in the element * onElementComplete -- called after validate, i.e., elementCompleted can rely on seeing a "valid" structure In addition, structures can register onParentCompleted callbacks. These are called after the elementCompleted of the parent element. This processing is done automatically when parsing elements from XML. When building elements manually, you should call the finishElement method when done to arrange for these methods being called. If you override these methods, make sure you call the methods of the superclass. Since we might, at some point, want mixins to be able to define validators etc, use super()-based superclass calling, through _completeElementNext, _validateNext, and _onElementCompleteNext Metadata ======== Within the framework, there are two main sources of metadata. For one, the fields (via datadef.DataField) of a table or a document record carry metadata on their types, ucds, etc. Metadata pertaining to other entities than fields is kept with these entities, viz., ResourceDescriptor, DataDescriptor, DataSet, Table, and RecordDef, instances. All these mix in the parsing.meta.MetaMixin providing getMeta and addMeta methods. It is the metadata containers' responsibility to choose their parents and children. For this purpose, they have have to call a child's setMetaParent method when they notice a Meta-carrying child is being added. Our Metadata implementation has to deal with * Sequences of metadata -- there may be more than one item for a keyword, e.g. for "subject". * Compound metadata -- items may consist of various sub-items (e.g., coverage, creator) * There may be sequences of compound objects * The metadata should be sanely serializable into at least plain text, html, and VOResource * At least XML and key/value pairs should be supported as input * Handover: Metadata containers are hierarchical -- a service might be derived from a data set, which in turn sits within a resource descriptor. If the service doesn't have a piece of metadata, it has to hand over the question to its parent. * To keep the complexity of the meta trees down, we want certain types common types of metadata; for example, we generally don't want the title to be metadata for a link but rather keep it in the link meta value itself. This results in a rather messy implementation and an interface that's not really optimal. Describing Metadata +++++++++++++++++++ Metadata is organized by mapping keys to values. Keys are dot-seperated "atoms" (i.e., sequences of letters); most of them are defined in RMI. In addition, the system uses quite a number of "internal" keys, designated by leading underscores. They include: * _type -- on DataSets, this becomes the type attribute of the VOTable. * _query_status -- on DataSets, this can be used to communicate the value of an INFO element in the VOTable (see SIAP spec). These must be meta.InfoItem instances. * _legal -- human-readable unstructured information on the legal status of the data. * _infolink -- a URL pointing to further unstructured human-readable information to the data content Getting Metadata ++++++++++++++++ Metadata are accessed by name (or "key", if you will). The getMeta method usually follows the enclosure hierarchy up, meaning that if a meta item is not found in the current instance, it will ask its parent for that item, and so on. If no parent is known, the meta information contained in the configuration will be consulted. If all fails, a default is returned (which is set via a keyword argument that again defaults to None) or, if the raiseOnFail keyword argument evaluates to true, a gavo.NoMetaKey exception is raised. As an example for propagation, querying metadata on a Table will ask DataSet (XXX shouldn't it ask the RecordDef? Right now, that won't work because Tables don't get RecordDefs but fieldDefs XXX), DataDescriptor, RecordDef and finally config. If you require metadata exactly for the item you are querying, call getMeta(key, propagate=False). For metadata that has structure, getMeta will raise a gavo.MetaCardError when there is more than one matching meta item. For these, you will usually use a builder, which will usually be a subclass of meta.metaBuilder. web.common.HtmlMetaBuilder is an example of how such a thing may look like, for simple cases you may get by using ModelBasedBulder (see the registry code for examples). The builders are passed to a MetaMixin's buildRepr(metakey, builder) method that returns whatever the builder's getResult method returns. Setting Metadata ++++++++++++++++ You can programmatically set metadata on any metadata container by calling container.addMeta(key, value), where both key and value are (unicode-compatible) strings. You can build any hierarchy in this way, provided you stick with typeless meta values or can do with the default types. Those are set by key in meta._typesForKeys. To build sequences, call addMeta repeatedly. To have a sequence of containers, call addMeta with None or an empty string as value, like this: m.addMeta("p.q", "x") m.addMeta("p.r", "y") m.addMeta("p", None) m.addMeta("p.q", "u") m.addMeta("p.r", "v") More complex structures require direct construction of MetaValues. Use the makeMetaValue factory for this. This function takes a value (default empty), and possibly a key and/or type arguments. All additional arguments depend on the meta type desired. The type argument selects and an entry in the meta._typesForKeys table that specifies that, e.g., _related meta items always are links. You can also give the type directly (which overrides any specification through a key). This can look like this: m.addMeta("info", meta.makeMetaValue("content", type="info", infoName="someInfo", infoValue="GIVEN")) VOTables ======== table.Table instances are primarily meant to be serialized into VOTable tables. Since most of the metadata of tables will be contained in the parent DataSet's docRec, PARAM elements of tables will be taken from there and will be located in the RESOURCE element that contains the table(s). There's currently no provision for having PARAMS to VOTABLE elements (these would probably reside in the resource descriptor) or TABLE elements (this would require a special attribute of table.Table, I guess). XXX TODO: there's also INFO and LINK. Have some convention as to what goes where. However, Tables are meta containers and can contain meta information. With VOTables, correct formatting of values becomes a particular problem. While presentation is largely a non-issue, it is paramount that the literals actually match what the ucds, units and types give. Therefore, displayHints (apart from "suppress", which by default is honored) are ignored. Instead, votable.py defines MapperFactories. These are just callables taking ColProperties (in a pinch, dicts having "sufficient" keys will do too, where sufficient at least includes ``ucd``, ``unit``, ``datatype``, ``arraysize``, and ``dbtype``, possibly more) and returning either None (meaning they won't handle values for this column) or a callable returning a string. These MapperFactories are organized in a Registry that can be queried for a mapper. If you need to do some special mapping, get a copy of the default mapper registry by calling ``votable.getMapperRegistry``, write mappers (a couple of the keys available are listed above, but votable calls the mappers with properly filled out votable.ColProperties instances, so you can, e.g., look at min, max, and hasNulls), and register them using the ``registerFactory`` method of the registry. The mappers will be called in reverse order of registration, so you can override default behaviour, and you should register the most special mappers last. Mapper factories may decide to alter the type they're returning (in fact, for things like date they'll in all likelihood need to). To do that, change the datatype and arraysize attributes. After the mappers have run, nothing will look at the dbtype any more. To add new default mappers, add one in votable.py and call _registerDefaultMF on them. The New Web Structure ===================== The new web structure is based on resource descriptors. These define data, adapters and services. A service is a combination of a renderer, a core, and optionally input and output adapters. A core is a machinery that takes a DataSet as input (mostly, only the docRec will be used, which typically contains the query parameters). If at all sensible, cores should return DataSets, since that is what most renderers expect. The renderer provides the interface to the client. This can be Form for something that outputs an HTML form and returns and HTML table. Other renderers include ones of DAL protocols (like SIAP), JPEG images or plain text. Adapters are basically datadef.DataTransformers, i.e., they generate tables using grammars. For input adapters, these grammars will typically be context grammars, for output adapters, table grammars. The way a query ideally is processed is roughly like this: :: query interface (http, dal, ...) <-> service ^ ^ ^ | | \ + -----/ V `--------+ V core V (input adapter) (output adapter) Services ++++++++ Services primarily know the fields the user is expected to provide and that the renderer should return. In addition, they carry most of the metadata necessary for publication (although much of it may really be in the resource descriptor by meta delegation). You probably will not subclass Service. A service provides the following methods: * getInputFields() -> seq of InputKeys -- what the user must/may give as the service input. * getInputData(rawInput) -> DataSet -- receives the input to the inputAdapter's Grammar (usually a dict) and returns the input data set * run(inputData, queryMeta) -> DataSet -- returns a web.common.SvcResult instance containing (hopefully) everything to build an answer to the query. * getCurOutputFields(queryMeta) -> seq of DataFields -- returns the colums of the primary table of the result. This is given so (a) cores capable of returning specific fields only (e.g., DbBasedCores) can ask a service what data to query and (b) so the registry interface or renderers wanting to know the structure of the data in advance (SOAP!) can obtain this information. This depends on queryMeta since the verbLevel can change what fields are returned. The return values of getInputData and run may be twisted deferreds (for getInputData, this is necessary because getting the input data may involve database operations or web queries). Cores +++++ A core is something that receives a DataSet and produces one. In between, they may issue database queries, run a program, whatever. Cores must have the following method: * run(inputData, queryMeta) -> thing The thing returned is either * a twisted.deferred firing a DataSet or * a pair of (headers, file-like object) if whatever is to be delivered is not really a table or * a string. InputData is a DataSet (unfortunately, there currently is no way for a core to describe the structure it expects in that DataSet), queryMeta is a common.QueryMeta instance. Cores must have avInputKeys and avOutputKeys sets. These contain strings naming input and output parameters the core supports. If at all possible, the should also give getInputFields and getOutputFields methods returning record.DataFieldLists. Several renderers may depend on these. In the typical case of a database based core, you'll usually want to use the condition descriptor infrastructure (currently in standardcores, CondDesc and friends, but that's probably going to move). A condition descriptor is an object that controls zero or more data fields, can describe them and can build SQL queries given values (usually coming from the docRec of a service's inputTable). CondDescs must support the following methods: * getInputFields() -> seq of DataField instances -- the embedding core will return the union of all theses sequences from its getInputFields * getQueryFrags(inPars, sqlPars, queryMeta) -> frag -- receives the input parameters and a QueryMeta instance and returns a fragment suitable for where clauses in SQL statements. It will add the parameters within this statement to sqlPars (which is passed so the implementations can check for free keys). Default Output Filter +++++++++++++++++++++ With DbBasedCores, the typical case is that the DataSet returned gives exactly the fields the service ordered. In that case, when constructing the SvcResult, the core's DataSet is taken as the SvcResult's original. If, however, the field keys are different, a transformation will be set up based on the service's outputFields. If any outputField declared as non-optional is not found in a coreResult's row, a ValiationError will be raised. Validation ++++++++++ XXX Standard validation: make a values child for field, yadda If some code implementing a service can't go on, it should raise some subclass of gavo.ValidationError with an explanation and the responsible field name if at all possible. web.resourcebased.Form catches such errors during the parsing of the input (i.e., interpretation of the form data) and (hopefully) during the actual run of the service. Any other errors right now lead to 500s. In the later stages of processing, fields will usually be present that are not present in the input form. If errors corresponding to these fields are raised, formal won't be able to display the message at the appropriate field. As a quick fix, Service provides a translateFieldName method that takes a field name and tries to map it to an originating field. Right now, it uses a table coded in the resource descriptor (nameMap element) for that, but that's only a temporary hack. We should come up with something residing in the adapters. Macros and processors an, while processing, set an attribute errorField. This should contain the name of the argument they are processing. If some error occurs, it will be caught in RowFunction.__call__, which in turn will change the error to a ValidatonError with the appropriate source field. Templates +++++++++ We support providing custom XHTML templates for both the query form and the response table. To do that say, something like ``