No title

=================== GAVO DaCHS Tutorial =================== .. contents:: :depth: 2 :backlinks: entry :class: toc Ingesting Data ============== Starting the RD --------------- To ingest data, you will have to write a resource descriptor (RD). We recommend to keep everything handled by a specific RD together in on directory that is a direct child of your inputs directory (see installation and configuration), though you could group resources in deeper subdirectories. So, go to your inputs directory and say:: mkdir lmcextinct The directory name will (normally) appear in URLs, so it's a good idea to choose something descriptive and short. This directory is called the resource directory. We recommend to put the RD in the root of this directory. A good default name for the RD is "q.rd"; the "q" will appear in the default URLs as well and usually looks good in there:: cd lmcextinct vi q.rd (where you can substitute vi with your favourite editor, of course). Writing resource descriptors is what most of the operation of a data center is about. Let's start slowly by giving some metadata:: Extinction within the LMC 2009-06-02T08:42:00Z Extinction values in the area of the LMC... Free to use. S. Author Large Magellanic Cloud Interstellar medium, nebulae Extinction You need to adapt the encoding attribute in the prefix to match what you are actually using if you plan on using non-ASCII-characters. You may want to use utf-8 instead of the iso-8859-1 given below depending on your computer's setup. The schema attribute on resource gives the (database) schema tables for this resource will turn up in. You should, in general, use the subdirectory name. If you don't, you have to give the subdirectory name in a resdir attribute. This attribute must be the name of the resource directory relative to the inputs directory specified in the configuration. In general, you should have exactly one RD per database schema. This is not enforced, but sharing schemata between RDs will cause many undesirable behaviours. An example is permissions: When importing a table, the schema access rights are adapted. If you have one RD A defining an ADQL-queriable table in schema X and another RD B that has no ADQL-queriable table, importing A will make schema X readable to untrusted queries, whereas importing B will make it unreadable again; this would lead to query failures (which could, in this case, fixed by adding untrusted to B's readRoles manually, but you get the idea). Otherwise, there is only meta information so far. This metadata is cruicial for later registration of the service. In HTML forms, it is displayed in a sidebar. See also `More on Metadata`_. Once you are here, you should "validate" your RD. This is, in general, a good idea before doing anything with the RD, since it will allow you to more easily catch errors than the in all likelihood even more byzantine error messages that may arise when something goes wrong later. So, say:: gavo val q.rd and read the output. If you don't understand what ``gavo val`` tells you, complain to gavo@ari.uni-heidelberg.de -- the command is really intended to help you catch errors, and if it doesn't do so, it's either a bug in ``gavo val`` or the documentation, and in either case we'd like to fix it. You can also pass an RD id to ``gavo val``, and you can specify more than one RD. Defining Target Tables ---------------------- Within the DC, data is represented in database tables, while metadata is mostly kept within the resource descriptors. A major part of this metadata is the table structure. It is defined in table elements, which usually are direct children of the resoource element. A resource element may contain multiple table definitions. Such a table definition might look like this:: Extinction values within certain areas on the sky.

In a table definition, you must give id, which will double as the table name within the database. The onDisk attribute specifies that the table is to reside on the disk as opposed to in memory (in-memory tables have applications in advanced operations). The adql attribute specifies that no access restrictions are to be placed on the table; if you run an ADQL or TAP service, users can access this table. Table elements may contain metadata. You do not need to repeat metadata given for the resource, because (in most cases) the DC performs metadata inheritance. This means that if a table is asked for a piece of metadata it does not have, it forwards that request to the embedding resource. Defining Columns '''''''''''''''' The main content of table is a sequence of column elements. These contain the definition of a single table column. The name attribute is central in that it will be the column name in the database, the key for the column's value in record dictionaries that the software uses internally, and it is usually used to reference the column from the outside. Column names must be legal identifiers for both python and SQL in DaCHS. SQL quoted identifiers thus are not allowed. The type attribute defaults to real, and can otherwise take values in valid SQL datatypes. The DC software knows how to handle, in addition to real, * text -- a string. You can also use types like char(7) and the like, but since that does not help postgres (or much anything else within the DC), this is not recommended. * double precision (or double) -- a floating point number. You should use in doubles if you need to keep more than about 7 digits of mantissa. * integer (or int) -- typically a 32-bit integer * bigint -- typically a 64-bit integer * smallint -- typically a 16-bit integer * timestamp -- a combination of date and time. While postgres can process a very large range of dates, the DC stores timestamps in datetime.datetime objects, which means that for "astronomical" times (like 10000 B.C. or 10000 A.D. you may need to use custom representations. Also, the DC assumes all times to be without time zones. Further time metadata (like distinguishing TT from UT) is given through STC specifications. * date -- a date. See timestamp. * time -- a time. See timestamp * box -- a rectangle. * spoint, scircle, sbox, spoly -- objects of spherical geometry, taken from pgSphere. Ask for documentation... Some more types (like raw and file) are available to tables in service definitions, but they should, in general, not appear in database tables. Futher metadata on columns includes: * unit -- the unit the column values are in. The syntax is that defined by Vizier, but that may change pending further standardization in the VO. Unit is left out for unitless values. * tablehead -- a very short string designating the content. This string is typically used for display purposes, e.g., as table headings or labels on input fields. * description -- a longer string characterizing the content. This may be in bubble help or VOTable descriptions. Since these could by longer, you may want to put them in a child element rather than an attribute; in both cases, whitespace is normalized, so you can enter line breaks and similar for readability in the source; they will always be rendered as a single blank. For even longer, note-like material, see Notes_. An example for a long descripton:: The aperture is the full-width-half-mean of the response function of our sage 3000 hyper-detector. * ucd -- a Unified Content Descriptor as defined by IVOA. To figure out "good" UCDs, the UCD resolver at http://dc.zah.uni-heidelberg.de/ucds/ui/ui/form can help. * requried -- True if value must be set in order for the record to be valid. By default, NULL (which in python is None) is a valid value for any column. For required columns, that is no longer the case. This is particularly important in connection with foreign keys. Column elements may have a child element `values <./ref.html#element-values>`_. This lets you specify metadata like maximum or minimum, or enumerate possible values. The most common use is the definition of null literals though. This is not necessary for floats, and usually not even strings, because these have useful (and actually non-overridable) null values in the VOTable representation (where this sort of thing counts most). It is, however, highly recommended to give null literals when defining integral types (including chars) that may have null values. DaCHS will try to pick useful null values for those automatically when possible, but when streaming tables, this is impossible, and errors will be raised during VOTable rendering when NULLs are encountered in such a situation. So, just define null values whenever you define a non-required integral column, like this:: After you have imported a table, it is a good idea to run ``gavo info`` with the id of the freshly imported table, e.g.,:: gavo info myres/q#thistable This will output several properties (min, max, avg) of numeric columns that may help spot import errors; it will also say which columns contain NULLs. Use this to mark every column containing integers either ``required="True"`` (to tell other people that no NULLs are possible here) or add an explicit null literal. Everyone will be grateful. Parsing Input Data ------------------ After you have defined the table, you will want to fill it. You will usually have one or more input files with "raw" data. We recommend putting such input data files into a subdirectory of their own named "data". Let's assume we have one input file for the table above, called lmc_extinction_values.txt. Suppose it looks like this, where tabs in the input are shown as "\\t":: RA_min\\tRA_max\\tDEC_min\\tDEC_max\\tE(V-I)\\tA_V\\tA_I 78.910625\\t78.982146\\t-69.557417\\t-69.480639\\t0.04\\t0.092571\\t0.123429 78.910625\\t78.982146\\t-69.480639\\t-69.403861\\t0.05\\t0.115714\\t0.154286 78.910625\\t78.982146\\t-69.403861\\t-69.327083\\t0.05\\t0.115714\\t0.154286 The first step for ingestion is lexical analysis. In the DC software, this is performed by grammars. There are many grammars defined, e.g., for getting values from FITS files, VOTables, or using column-based formats; you can also write specialized grammars in python. All grammars read "something" and emit a mapping from names to (mostly) string values. reGrammars '''''''''' In this case the easiest grammar to use probably is the `reGrammar <./ref.html#element-regrammar>`_. The idea here is that you give two regular expressions to separate the file into records and the records into fields, and that you simply enumerate the names used in the mapping. For the file given above, the RE grammar definition could look like this:: raMin, raMax, decMin, decMax, EVI, AV, AI The names given are values of the name attribute in the table definition. If you checked the documentation on reGrammar, you will have noticed that "names" is an "atomic child" of reGrammar. Atomic children are usually written as attributes, since their values can always be represented as strings. However, if strings become larger, it's more convenient to write them in elements. The DC software allows you to do just that in general: All attributes can be written as elements with tags named like the attribute. So, :: would have worked just fine, as would:: 1 raMin, raMax, decMin, decMax, EVI, AV, AI Structured children, in contrast, cannot be written as plain strings and thus can only be written in element notation. Though grammars can be direct children of resource, they are usually written as children of data elements (see below). columnGrammars '''''''''''''' Another grammar frequently useful when reading from text tables is the `columnGrammar <./ref.html#element-columngrammar>`_. It allows a rather dircect translation of VizieR-like "byte-by-byte"-descriptions. Column grammars define ``col`` elements. Each of these has a ``key`` attribute that gives a name. This could be the ``name`` of a target column in the simplest case, or it can be an auxillary identifier that you process in a rowmaker:: 1-9 10-18 ... The first column has the index 1, and -- contrary to python slices -- the last index is included in the selection. No expansion of tabs or similar is performed. As potential column names, the keys must be valid python identifiers. Mapping data ------------ A grammar produces a sequence of mappings from names to strings, the rawdicts. The database, on the other hand, wants typed values, i.e., integers, time stamps, etc, internally represented as dictionaries mapping column names to values called rowdicts. Also, data in input tables is frequently given in inconvenient formats (e.g., sexagesimal angles), units not suited to further processing, or distributed over multiple columns (e.g., date and time of an observation when we want a single timestamp). It is the job of row makers to transform the rough data coming from a grammar to whatever the table defines. Basically a row maker consists of * `var <./ref.html#element-var>`_ s -- assignments of expression values names in the rawdict. * procedure applications (see `apply <./ref.html#element-apply>`_) -- procedural manipulations of both rawdicts and rowdicts. * maps -- rowdict definition. When building a rowdict for ingestion into the database, a rowmaker first binds var names, then applies procedures and finally performs the mappings. For simple cases, maps will suffice; you may actually even be able to do without them. Maps must specify a dest attribute giving the rowdict key that is defined. To specify the value, the can * either give a src attribute specifying a rawdict key that will then be converted to a typed value using "sane" defaults (e.g., integers will be converted by python's int constructor, where empty strings are mapped to None) * or give a python expression in the character content, the value of which is then directly used as value for dest. No implicit conversions are performed. In python expressions, you can access the data handed over by the grammar as ``vars["key"]``; equivalently, you can use the abbreviation ``@key``. This notion is supported throughout rowmakers where applicable; e.g., you can use it in late bindings of procedure applications. In the case above, you could start by saying:: to copy over the rawdict (grammar) keys that directly map to table column names. Since this is a bit unwieldy, the DC provides a shortcut:: EVI:EVI,AV:AV,AI:AI which expands to exactly what is written above. The keys in each pair do not need to be identical; the first item of each pair is the table column name, the second the rawdict key. The case where the names of rawdict and rowdict keys are identical is so common (since the RD author controls both) that there is yet another shortcut for this:: EVI,AV,AI Idmaps sets up one map element each with both dest and src set to the value for every name in the comma separated list idmaps. You can abbreviate this further to:: idmaps values can contain shell patterns. They will be matched to the column names in the target table. For every column for which there is no explicit mapping, an identity mapping (with type conversion) will be set up. This leaves the bbox, centerAlpha, and centerDelta keys to be defined. No literals for those appear in the rawdicts since they are not part of the input data. We need to compute them. To facilitate computations, we first turn the bounds to floats; this can be done using vars:: float(@raMin) float(@raMax) float(@decMin) float(@decMax) No shortcut is available here, since this is a relatively rare thing. You could use procDef/apply to save on keystrokes if you find yourself having to do such simple conversions more frequently. Note the @-notation. As mentioned above, you could equivalently have written ``vars["raMin"]``. Both spellings evaluate to the value of the given name in the rawdict coming from the grammar. The remaining computations can be performed in mappings:: (@raMin+@raMax)/2. (@decMin+@decMax)/2. coords.Box((@raMin, @decMin), (@raMax, @decMax)) ``coords.Box`` is the internal type for SQL Box values; you will not usually see those. Still, you can access basically the whole DC code in this mapping. At some point we will define an API of "safe operations" that you can use without having to fear changes in the DC code. `Some functions useful for such mappings <./ref.html#functions-available-for-row-makers>`_ are listed in the reference manual. Of course, you can have values that do not even depend on grammar output:: datetime.datetime.now() Null values are always troublesome. Within DaCHS, the null value (almost) always is python's None. There is the rowmaker function ``parseWithNull`` to help you come up with those; say, some joker used 99.99 as a null value for a magnitude, you could say:: parseWithNull(@VmagSrc, float, "99.99") If you need to scale this (or if null values are chosen that they are invalid literals to begin with), a feature that lets you null out a value when an specific type of exception is raised comes in handy. This is map's ``nulExcs`` attribute, which is just a comma separated list of exceptions that should be caught and interpreted as "this is null". If, in the example above, the source would give the magnitude in millimags to save a comma, you could use:: parseWithNull(@VmagSrc, float, "99999")/1000. If parseWithNull here returns None, a TypeError will be raised and caught, and Vmag will be None. You can turn more than one exception into None. For example example, if both magicOffset has been parsed before and could be None, while magicLit is to be parsed and has the empty string as a Null literal, you could write:: @magicOffset+float(@magicLit) If magicOffset is None, magic will be None via the TypeError, whereas empty magicLits will result in Nones via a ValueError. Data elements ------------- We now have a table definition, a grammar, and a rowmaker. For purposes of importing, these three come together in a data element. These elements define what could be seen as the equivalent of a VOTable resource together with a recipe of how to build it. For onDisk tables, a side effect of building the data is that the tables are created in the database; in that sense, data elements also define operations, a notion that will become more pronounced as we discuss incremental processing. Let us assemble the pieces we have so far:: Extinction within the LMC 2009-06-02T08:42:00Z Extinction values in the area of the LMC... Free to use. S. Author Large Magellanic Cloud Interstellar medium, nebulae Extinction Extinction values within certain areas on the sky.

raMin, raMax, decMin, decMax, ev_i, a_v, a_i float(raMin) float(raMax) float(decMin) float(decMax) (raMin+raMax)/2. (decMin+decMax)/2. coords.Box((raMin, decMin), (raMax, decMax)) There are two new elements in data. For one, there's sources. Sources specify where the data will find its input files in its pattern attribute. This contains shell patterns that are interpreted relative to the resource directory. You can give multiple patterns if necessary like this:: inp2/*.txt inp1/*.txt There also is a recurse boolean attribute you can use when your sources are distributed over subdirectories of the path part of the pattern. The second new element is ``make``. It ties together a destination table an the rowmaker using id references. You may want to define the rowmaker as a direct child of make, which saves you some referencing. Though make looks quite inoccuous here, it is the element that drives the action. You can have multiple make elements in a single data element to build multiple tables (using different row makers) from the same grammar output. Makes can also carry scripts in SQL or python. For details, see `Scripting <./ref.html#scripting>`_. As you can see, we have put the grammar and the rowmaker into a data element. They could also be direct children of resource, which might be a good idea if they are used in more than one data; you would then give the rowmaker an id (make_table, say) and say something like ```_ element. It is a child of table. In general, index specifications can be rather involved, but simple cases remain simple. If you just wanted to define an index on EVI, you could say:: ... (the columns attribute would be "A_V,EVI" if you wanted an index on both columns). However, indices are not always that simple. For example, for a spatial index on centerAlpha, centerDelta, with the q3c scheme used by the DC software you would have to write something like:: q3c_ang2ipix(centerAlpha,centerDelta) The DC software has a mechanism that helps in this case: `Mixins <./ref.html#mixins>`_. A mixin conceptually is a guarantee of certain table properties, typically of the presence of certain columns; here, it is just the presence of an index. So, all you need to do to have a spatial index on the table is::

... This is UCD magic at work -- q3cindex selects the columns with pos.eq.*;meta.main as index columns. If you are curious how it does this, check scs.rd in the system RD directory. Mixins actually do much more than just help with indexing. Their main purpose is the definition of interfaces that can be relied upon. For example, an image table must have a certain structure determined by the SIA protocol. The mixins ``//siap#pgs`` and ``//siap#bbox`` make sure that tables have this structure, and they make sure that the table containing information on all the files in the data center is updated when the table is filled. Starting the Ingestion ---------------------- At this point, you can run the ingestion:: gavo imp q By default, ``gavo imp`` creates all data defined in a resource. If this is not what you want, you can explicitely specify a data id to process:: gavo imp q content For larger data sets, it may be wise to first try a couple of rows:: gavo imp --stop-after=300 q Try ``gavo imp --help`` to see more options (most of which are probably irrelevant to you now. By the way, the ``gavo`` command has lots of subcommands. The subcommand here has the full name ``import``; you could have said ``gavo import`` or even ``gavo im``, since any unique prefix into the command list is ok. Try ``gavo --help`` to see the commands available. Note that gavo imp interprets the RD argument as a file first and then as an RD id. An RD id is the inputs-relative path of the RD with the extension stripped. Our example RD thus has the RD id lmcextinct/q, and you could have said:: gavo imp lmcextinct/q from anywhere in the file system. Images ------ Images have relatively rich metadata. Partly these are covered in a mixin called "products", but astronomical images have even more metadata, like position on the sky or bandpass. To cope with them, use the `pgsSIAP `_ (and ignore the old bbox-based SIAP mixin). To define a table that carries images, simply mix in the appropriate mixin::

(of course, you can add more columns if you need them). Filling this table requires the use of a rowfilter and two procedure applications. Let's look at a data element for this table:: PI_COI "cars.images" vars["imageTitle"] "%s, %s"%(vars["OBSERVAT"], vars["TELESCOP"]) vars["dateObs"]+vars["startTime"]+( vars["endTime"]-vars["startTime"])/2 vars["FILTER"] So, step-by-step: * When ingesting images, you will almost always read from FITS images, i.e., FITS primary headers. A ``fitsProdGrammar`` delivers the key-value-pairs from a header as a rawdict. * The ``qnd`` attribute of the grammar is recommended. It makes some (weak) assumptions to yield significant speedups with large images. * The ``fitsProdGrammar`` will map keys with hyphens to names with underscores, which is required to make them accessible in rowmakers. The ``map``` example above therefore is superfluous since it orders default behaviour. You may need other (non-automatic) name mappings, though, which would work analoguously. * The grammar further needs a rowfilter. These are procedure applications working on rawdicts. The //products#define rowfilter lets you add keys on owners and embargo in case you want password protection for images, but most importantly it defines what table the data is destined for. This is crucial information, and if you ever get it wrong, you need to manually connect to the database and issue a command like ``DELETE FROM products WHERE sourcetable=''``. So, always bind table. Make sure to include the quotes, this is supposed to be a valid python expression yielding a string. * You then need to define a rowmaker that must apply two procs. For one, you need `computePGSSIAP `_ (if you mixed in ``pgsSIAP``). No bindings are required here. * The second proc application required is `setSIAPMeta `_ . Try to give all its keys somewhat sensible values, you will make your users' lives much easier. Warning: Do *not* use idmaps="*" with SIAP, since the auto-generated mappings will clobber the work of the xSIAP procs. Debugging --------- If nothing else helps you can watch what the software actually sends to the database. To do that, set the GAVO_SQL_DEBUG environment variable to any value. This could look like this:: env GAVO_SQL_DEBUG=1 gavo imp q create The first couple of requests are for internal use (like checking that some meta tables are present). Publishing Data =============== Once a table is in the database, it needs to get out again. Within DaCHS, there are three parties involved in delivering data to the user: * The core; it actually does the computation * The renderer; it formats the result in some way requested by the user and delivers it. There are renderers for web forms, VO protocols, imges, etc. * The service; it holds together the core and the renderer, can reformat core results, controls the metadata, etc. You will usually use pre-specified renderers, so these are not defined in resource descriptors. What you have to define are cores and services. For core, you will usually use the `dbCore <./ref.html#element-dbcore>`_ in custom services, though `many other cores <./ref.html#cores-available>`_ are predefined -- e.g., to run ADQL queries, to upload files, or to do feedback queries --, and you can `define your own <./ref.html#writing-custom-cores>`_ when you need special functionality. The dbCore generates a (single-table) query from condition descriptors and returns a table that you describe through an output table. Cores are defined as direct children of the resource. For the lmcextinction table above, it could look like this:: Cores always need an id. dbCores need a queriedTable attribute, the value of which must be a table reference. This is the table the query will run against. CondDescs define input fields (for the form renderer, these are actually form items people can fill in). Most commonly, you will either define them using the ``original`` attribute or using ``buildFrom``. The first case is typically used in connection with protocols and on tables having mixins; such condDescs result in zero or more input fields, and they typically inspect the queried table. For example, the humanScs core in the example locates the "main" positions as identified by UCDs and generates queries against them using two input fields, one it tries to guess a position from, and another for the search radius. When you define your condDesc using buildFrom, the result is almost always a single input field that allows posing restrictions against the column referred to in the buildFrom attribute, which in turn usually is the name of a column in the table queried (though you could use any field using id-based referencing). The software tries to make some useful input definition from that column, which in particular means that the types are "up-valued". String columns can be queried against using Vizier-like string expressions, real and double precision columns using Vizier-like float expressions, and so on. You can suppress that behaviour using more verbose forms explained elsewhere. Renderers other than form will expose the input fields in some other way than form items. In all cases, however, the condDescs of the dbCore define what fields can be queried. The service now ties the core together with a renderer. It might look like this:: lmcext_web While a service can run without a ``shortName``, it can lead to trouble later, so you should make a habit of assigning short names. See `the data checklist <./data_checklist.html>`_ for more information on short names. A service must have an id as well, and its core attribute must contain the id of a core. With this minimal specification, the service exposes a web form-based interface. To try this, run a server:: gavo serve debug and point a browser to http://localhost:8080/lmcextinct/q/cone/form (the host part, of course, depends on your configuration. If you did not change anything there, you should find the data at the given URL). More on Tables ============== Notes ----- Frequently, you need to say more about a column than is appropriate in the few-phrase description. Historically, such situations have been handled using notes. Since notes can be reused for multiple columns, we chose to follow that precedent rather than attach longish information onto the columns themselves. The notes themselves are kept in meta elements belonging to tables. Since the notes tend to be markup-heavy, their default format is restructured text. When entering notes in RDs, there is an attribute ``tag`` on these meta items:: ... The meaning of the flag is as follows: ===== ========== value meaning ===== ========== 1 value is 2 2 value is 1 ===== ========== ...

To assoicate a column with a note, use the column's note attribute:: As tag, you may use basically any string, but it's a good idea to keep it to numbers or at least characters not requiring URL encoding. The notes will exposed in HTML table heads, table and service description, etc. If you need to link to one, there is the built-in tablenote renderer that takes the table and the note from its query path. The most convenient way to is it is through the built-in vanity name tablenote, where you would access the note above using a URL like ``http://your.server/tablenote/demoschema.demo/1``. STC --- As soon as you have coordinates, you will want to define coordinate systems on them. In the introductory example, that was not necessary because SCS mandates that the coordinates you export are in ICRS, so either your coordinates are in ICRS or you are violating the SCS protocol -- in either case, nothing to declare. In the more general case, you will want to say what is what in your tables. DaCHS uses a language called STC-S to declare systems, reference points, etc. The STC-S description [TODO: Link to IVOA note] is a bit terse, but the good news is that you will get by with a few features most of the time. STC is defined in children of table elements, with references to table columns in quoted strings:: Position ICRS "ra" "dec" Error "e_ra" "e_dec" Position FK4 J1950.0 "ra_orig" "dec_orig" You do not need to change anything in the column definitions themselves, since the machinery will resolve your column references. If you refer to non-existing columns, RD parse errors will be thrown. More on Grammars ================ Row Generators -------------- TBD Source Fields ------------- Grammars can have a sourceFields element. It contains a standard procedure definition (i.e., you could predefine those and bind parameters), but usually you will just fill in the code. This code is called once for each source processed, and receives the sourceToken as argument. It must return a dictionary, the key/value pairs of which will be added to all rows returned by the row iterator. The purpose of sourceFields is to precompute values that depend on the source ("file") and are constant for all rows within it. An example for where you need this is when you want to create backlinks to the file a piece of data came from::


        srcKey = utils.getRelativePath(sourceToken,
            base.getConfig("inputsDir"))
        return locals()

You can then retrieve the path to the source file via srcKey key in rawdicts (and then, using render functions and static renderers, turn this into links). In addition to the sourceToken, you also have access to the data that will be fed from the grammar. This can be used to, e.g., retrieve the resource directory (``data.dd.rd.resdir``) or data descriptor properties (``data.dd.getProperty("whatever")``). Sometimes you want to do database queries from within sourceFields. This is tricky when you access the table being written or otherwise being accessed. This is because sourceTokens run in the midst of a transaction updating the table. So, something like::

 
    
    base.SimpleQuerier().query(...)

will wait for the transaction to finish. But the transaction is waiting for data that will only come when the query finishes -- this is a deadlock, and gavo imp will just sit there and wait (see also `Deadlocks `_). To get around this, you need to query using the data's connection. So, instead write::


    base.SimpleQuerier(connection=data.connection).query(...)

More on Services ================ Custom Templates ---------------- Within the data center, most pages are generated from templates [XXX TODO: write something about them generically]. This is true for the pages the form renderer on services displays as well. To effect special effects, you may want to override them (though in general, it is a much better idea to work within the standard template since that will give your service all kind of automatic updates and would make, e.g., changes much easier if your institution undergoes the yearly reorganization). The default response template can be found in resources/templates/defaultsresponse.html in the installed tree. To obtain the plainest output conceivable, try:: No title

Save this to a file within the resource directory, let's say "res/plain.html". Then, say:: in your service; this should do give you a minimally decorated page. Of course, this will display a severely degraded page. To get at least the standard style sheet and the standard javascript, say:: instead of the plain head. More on Cores ============= CondDescs --------- dbCores and cores derived from them take most of their power from condition descriptors or CondDescs. These combine inputKeys, which are basically column objects with some additional presentation-related information, with code generating SQL conditions. A condDesc can contain zero or more input keys (though having zero input keys makes no sense for user-defined condDescs since they would never "fire"). Having more than one input key is useful when input quantities can only be interpreted when present as a group. An example is the standard cone search, where you need both a position and a search radius. Automatic and manual control '''''''''''''''''''''''''''' However, most condDescs correspond to one input key, and the input key is mostly derived from a table column. This is the standard idiom, :: where somecol is a column in the table queried by the core. This construct will cause the an input key to be built from somecol. While doing this, the type will be mapped automatically. The primary rules are: * Numeric types will get mapped to numeric vizier-like expressions * Datetimes will get mapped to date vizier-like expressions * text and chars will get mapped to string vizier-like expressions * enumerated values (i.e., columns with value elements giving options) will not become vizier-like expressions but input keys that yield selection widgets. To have more control (e.g., if you do not want to allow vizier-like expressions, give the input key yourself):: (which would make a column required in the table optional in the query), or:: (which creates an input key matching everything literally), or even:: -- if the input key is required, queries not giving it will be rejected. The title attribute on option gives the label of an option in the HTML input widget; if it's missing, a string representation of the value will be used. In all those cases, the SQL generated from the condDesc is a conjunction of the input key's individual SQL expressions. Those, in turn, are simply comparisons for equality for plain types and more or less arbitrary expressions for vizier expression types. Incidentally, two properties on inputKeys are defined to only show inputs for certain renderers, viz., ``onlyForRenderer`` and ``notForRenderer``. Both have single strings as values. This is intended mainly for cases like SIAP and SCS where there are "human-oriented" versions of the input fields available. The built-in SCS and SIAP conditions already to that, so you can give both scs and humanSCS conditions in a core. Here is how you would define an input key that is only used for the form renderer:: form Phrase makers ''''''''''''' For complete control over what SQL is generated, condDescs may contain code called a phrase maker. This, again, is a procedure application, quite like with rowmaker procs, except that the signature of condDesc code is different. Phrase maker code has the following names available: * inputKeys -- the list of input keys for the parent CondDesc * inPars -- a dictionary mapping inputKey names to the values provided by the user * outPars -- a dictionary that is later used as the parameter dictionary to the query. The code should amend the outPars dictionary with the keys mentioned in the the conditions. The conditions themselves are yielded. So, a very simple condDesc with generated SQL could look like this::


        outPars["xxyy"] = "x"*inPars.get("val", 20)
        yield "someColumn=%(xxyy)s"

However, using fixed names in outPars is not recommended, if only because condDescs could be used multiple times. The recommended way uses the vizierexprs.getSQLKey function. It takes a name, a value, and the outPars dictionary. It will return a key unique to the query in question and enter the value into the outPars dictionary under that key. While that sounds complicated, it is actually rather harmless, as shown in the following real-world example that lets users input date, time and an interval in split-up form (e.g., when you cannot hope anyone will try to write the equivalent vizier-like expressions)::


          baseTS = datetime.datetime.combine(inPars["date"], inPars["time"])
          dt = datetime.timedelta(minutes=inPars["within"])
          yield "date BETWEEN %%(%s)s AND %%(%s)s"%(
            vizierexprs.getSQLKey("date", baseTS-dt, outPars),
            vizierexprs.getSQLKey("date", baseTS+dt, outPars))

More on Metadata ================ In general, most metadata for services and resources rather closely follows what's defined in `Resource Metadata for the Virtual Observatory`_; see also the `Reference Manual on RMI-style metadata`_. Coverage -------- One tricky spot is coverage, i.e., the parts of the STC space covered by what's in the resource. In general, you will define coverage more or less like this:: AllSky ICRS Optical The easy part is the waveband. Values here are from a fixed set of strings, viz., Radio, Millimeter, Infrared, Optical, UV, EUV, X-ray, Gamma-ray; capitalization is important, and you may give multiple elements (the software doesn't enforce this selection, but your registry documents will become invalid if you use anything else). The coverage.profile meta item has STC-S strings as values. See the `STC-S Note`_ as well as the `STC library documentation`_ for more information on the STC-S understood by DaCHS. In principle, you can get fancy here; for example, you could write:: TimeInterval TT BARYCENTER 1999-10-01T20:30:00 1999-10-02T20:30:10 unit s Error 10 Resolution 1 2 Circle FK5 J1980.0 GEOCENTER 0.13 0.45 0.03 unit rad PixSize 0.0001 0.0001 SpectralInterval HELIOCENTER 2000 6000 unit Angstrom Error 1 RedshiftInterval TOPOCENTER VELOCITY RELATIVISTIC -10 10 unit km/s However, the registries probably evaluate not very much of this information as yet, and you most certainly should try to give positions in ICRS. Copyright --------- Within the astronomical community, licensing issues have traditionally played a minor role – if you referenced properly, using data from other people was not only ok, it was encouraged. We should keep it that way, even in the days of easy reproducability. Still, formal statements about how your data may be used may be useful. These statements are called licenses. RMI has the copyright meta for this purpose. Right now, DaCHS doesn't do much with this information; it includes it in VOResource records, and the default response template shows it below the query form. We recommend either specifying something like "The data is in the public domain" or, if you want to use something that's more in line with scientific habits, the `Creative Commons Attribution`_ ("CC-BY"). To support this, DaCHS includes a macro that can be used in meta elements that are direct children of the resource element. Use it like this:: \RSTccby{Image metadata} Usage conditions for individual images could differ. See the COPYING FITS header. The advantage of using the macro is that you get a nice image, and in the future we may expand this to a formal, machine-readable declaration. .. _Creative Commons Attribution: http://creativecommons.org/licenses/by/3.0/ .. _Reference Manual on RMI-style metadata: ./ref.html#rmi-style-metadata .. _STC library documentation: ./stc.html .. _STC-S Note: http://www.ivoa.net/Documents/Notes/STC-S/ Active Tags =========== Active "tags" delemit elements within resource descriptor XML that do not directly contribute to result tree. Their typical use is to "record" event sequences and replay them later. Much of this is used internally. However, some applications of active tags are interesting for RD writers, too. Active tags always have names in all upper-case. LOOP ---- Loop lets you create multiple elements by rules. The simplest way to use it is by giving a space-separated list of "items":: The ``events`` child of the ``LOOP`` element creates a list of events (think "begin column element", "value for name attribute", "end column element"). These events are then replayed to the parser for each item in the LOOP's ``listItems`` attribute. Each occurrence of the ``\\item`` macro is replaced with the current item. So, in the resulting RD tree, the fragment above will have the same result as:: Sometimes the list items are used in multiple places in the same document. To avoid having to maintain multiple lists, you can define macros using RD's ``macDef`` element; this could look like this:: U B V R

parseFromString(MAG_\item) .... Note that macro names must be at least two characters long. Frequently, the loop variable should not just take on a single string. For such cases you can feed in tuples. The most convenient way to do this is ``csvItems``. The content of this element is a string literal containing comma separated values *with labels*, i.e., parsable with python's csv.DictReader. In your events, you can then refer to the labeled items using macros. For example:: band,source U 10-12 V 13-16

\source TODO: EDIT actives? Publishing DAL Services ======================= DAL is VO-speak for "Data Access Layer", the standard protocols the VO uses to allow remote querying of data. To support such a protocol, you usually need to arrange things in three places: * The table queried needs a certain set of columns * The core must support certain input and output fields * The renderer must exhibit specified behaviour as regards, e.g., the formatting of error messages, and it may require protocol-specific metadata This section discusses the individual protocols in turn. SCS --- SCS, the simple cone search, is the simplest IVOA DAL protocol -- it is just HTTP with RA, DEC, and SR parameters plus a special way to encode errors (in a way somewhat different from what has been specified for later DAL protocols). Tables '''''' In principle, SCS can expose any table that has a exactly one column each with the UCDs pos.eq.ra;meta.main and pos.eq.dec;meta.main. The query is then ran against the position specified in this way. However, you almost always want to have a spatial index on these columns. To do that, use the ``//scs#q3cindex`` mixin on the tables, like this:: ... Cores ''''' The SCS core simply is a dbCore. You must include the SCS condDesc, like this:: There is an alternative condDesc more suitable for humans. They can be used in parallel. The form renderer will then use the human-oriented one, the DAL renderer the protocol one. Thus, you will ususally write:: The example also shows how to add custom query field. If you want to add a larger number of them, you would use an active tag:: Service ''''''' To expose that core through a service, just allow the scs.xml renderer on it. As the core is built, you can have a web-based form interface for free:: Nice Catalog Cone Search NC Cone 10 10 0.01 The meta information given is used when generating registration records. In particular, you should make sure that a query with the given ra, dec, and sr actually returns some data. SIAP ---- DaCHS' SIAP implemention right now assumes you are publishing FITS files with WCS headers. Other arrangements are of course possible, but you'd have to write your own computeXXX procDef. Tables '''''' SIAP-capable tables should mix in ``//siap#pgs`` (the older ``//siap#bbox`` is deprecated; you could still use it if for some reason you have no pgSphere). When building them, use the ``//siap#computePGS`` and ``//siap#setMeta`` applys. Since SIAP tables contain products, you also need the ``//products#define`` row filter in the input grammar (which, of course, needs to be a fitsProdGrammar. Cores ''''' TBD. For the SIAP cutout core, the SIAP human condDesc must have ``required`` True, since the core will retrieve the default cutout size from the field size. The SIAP protocol condDesc is required anyway. Service ''''''' TBD. SSAP ---- Tables '''''' Currently, we only support "homogeneous" data collections, i.e., tables for which every data set was generated by the same instrument, code, or similar. Those mix in ``//ssap#hcd``. This mixin has lots of parameters that define the instrument; see `the SSAP HCD mixin in the ref doc <./ref.html#the-ssap-hcd-mixin>`_. For example, you could say::

//ssap#hcd

To fill such a table, it is recommended to use the ``//products#define`` rowfilter and the ``//ssap#setMeta`` rowmaker apply. This could look like this:: "\schema.data" @FILENAME "ivo://org.gavo.dc/ccd700/q#"+@FILENAME Caution: In the ssa table, we force the spectral axis to be a wavelength in meters. You must convert all values manually if necessary. For the spectra themselves you could use different units, but in our experience that's more confusing than helpful. In contrast to images where delivering FITS is likely all you need, there's a plethora of formats spectra are delivered in. To help a bit, you should make sure one of the formats you offer are VOTables conforming to the spectral data model (see `Making SDM Tables`_). If you want to deliver the "native" format as well, you'll have to have two rows for each spectrum. The standard way to achieve that is through a rowmaker in the grammar importing the spectra, like this::


       baseAccref = os.path.splitext(row["prodtblPath"])[0]
       row["prodtblAccref"] = baseAccref+".txt"
       row["prodtblMime"] = "text/plain"
       # this is the file as delivered from upstream
       yield row
       row["prodtblAccref"] = baseAccref+".vot"
       row["prodtblPath"] = "dcc://\rdIdDotted/mksdm?"+baseAccref+".txt"
       row["prodtblMime"] = "application/x-votable+xml"
       # this is our processed SDM VOTable
       yield row

SSAP's FORMAT parameter lets clients select what they want. The way the default FORMAT argument works, only application/x-votable+xml records are considered compliant. Cores ''''' Use the ssapCore for SSAP services. You must manually feed in the condition descriptors for the SSAP parameters. For homogeneous data collections, this is:: The ``hcd_condDescs`` includes condition descriptors for all mandatory and optional parameters meaningful in the case of homogeneous data collections (i.e., excluding those that match against constant values). Some of them may not be relevant to your service because your table never has values for them. For example, theoretical spectra will typically not give information on positions. The SSAP spec says that such a service should ignore POS rather than returning the empty set. If you think you must ignore certain conditions, you can use the PRUNE active tag. This looks like this:: Do not do this just because you don't have position information -- this would mean that you would dump your complete archive for (typical) queries with a position, and that is neither required by the spec (even if you might think so at first reading) nor desirable. Here is a table of parameter names and ids; you can always check them in ``$gavo_installed/resources/inputs/__system__/ssap.rd``. ============== =========== Parameter name condDesc id -------------- ----------- POS, SIZE coneCond BAND bandCond TIME timeCond ============== =========== For APERTURE, SNR, REDSHIFT, TARGETNAME, TARGETCLASS, PUBDID, CREATORDID, and MTIME, the condDesc id simply is ``_cond``, e.g., ``APERTURE_cond``. To have custom parameters, simply add condDesc elements as usual:: For SSAP cores, ``buildFrom`` will enable "PQL"-like query syntax such that users can post arguments like ``20000/30000,35000`` to ``t_eff``. Service ''''''' To expose SSAP services, use the `ssap.xml renderer`_. The metadata keys required for registration of these are documented in the reference manual. A complete declaration of a published SSAP service would then look like this:: mydata SSAP theory archival MAXREC=1 This service will expose all standard SSAP query parameters, and additionally condDescs built from the ``t_eff`` and ``log_g`` columns in the source table (see above). Incidentally, in web versions of such services, you may want to have specview-based "quick-view" links based on the ``run`` system rd that exposes the specview template. Here's an example of an ``outputTable`` (that would reside in the service element):: Some less cody approach would be welcome, but we'd need to collect some experience what people expect there. Also note that specview is (or possibly was, when you're reading this) very picky in what it accepts as VOTables; in the example, the ``dm=sed`` parameter is used to instruct DaCHS' SDM-making machinery to come up with a table palatable by current specviews. .. _ssap.xml renderer: ./ref.html#the-ssap-xml-renderer Making SDM Tables ''''''''''''''''' Compared to images, the formats situation with spectra is a mess. Therefore, in all likelihood, you will need some sort of conversion service to VOTables compliant to the spectral data model. DaCHS has a facility built in to support you with this. First, you will have to define the "instance table", i.e., a table definition that will contain a DC-internal representation of the spectrum according to the data model. There's a mixin for that:: //ssap#sdm-instance

In addition to adding lots and lots of params, the mixin also defines two columns, ``spectral`` and ``flux``; these have units and ucds as taken from the SSA metadata. You can add additional columns (e.g., a flux error depending the the spectral coordinate) as requried. The actual spectral instances get built by sdmCores. These cores, while potentially useful with common services, are intended to be used by the product renderer for dcc product table paths. They contain a data item that must yield a primary table that is basically sdm compliant. Most of this is done by the //ssap#feedSSAToSDM apply proc, but obviously you need to yield the spectral/flux pairs (plus potentially more stuff like errors, etc, if your spectrum table has more columns. This comes from the data item's grammar, which probably must always be an embedded grammar, since its sourceToken is an SSA row in a dictionary. Here's an example::


            labels = ("spectral", "flux")
            relPath = self.sourceToken["accref"].split("?")[-1]
            with self.grammar.rd.openRes(relPath) as inF:
              for ln in inF:
                yield dict(zip(labels,ln.split()))

The sdmCores are always combined with the sdm renderer. It passes an accref into the core that gets turned into an row from queried table; this must be an "ssa" table (i.e., right now something that mixes in ``//ssap#hcd``). This row is the input to the embedded data descriptor. Hence, this has no sources element, and you must have either a custom or embedded grammar to deal with this input. The actual data have to be located in the grammar; if they are in a text file, you could have a grammar for parsing those somewhere in the RD (TODO: example), or you could have the actual spectral data in the database. Whatever – the grammar has to return spectral and flux values. Also make sure that what you are return actually has the units promised by the metadata. To set the params from the ssa row, use the ``//ssap#feedSSAToSDM`` apply procDef in a ''parmaker''; this should mostly suffice in terms of metadata definition. When you have no additional columns, the default rowmaker (with ``idmaps="*"``) will do in the ``make`` of the spectrum table. ObsTAP ------ ObsTAP is basically a single table, ivoa.ObsCore. In DaCHS, this is a view generated from input tables. To include the products within a table, you must use one of the mixins from the //obscore RD and fill out some of the mixin's parameters. There is some documentation on what to put where in the mixin documentation, but frankly, as a publisher, you should have at least passing knowledge of the obscore data model. See the corresponding IVOA document [XXX TODO: add link when there's a WD out]. In the simplest case, a SIAP table, you could get by simply adding:: mixin="//obscore#publishSIAP" to the table definition's start tag. You do not have to re-import a table to publish it to obscore after the fact – ``gavo imp -m && gavo imp //obscore create`` will include an existing table to the obscore view. Even for SIAP, you will usually want to add metadata not contained in DaCHS' SIAP meta. To do this, add a mixin element to the table definition's body:: //obscore#publishSIAP On a table import, the obscore table will automatically be recreated to include the data. If you retrofit ObsCore support to large tables, you can avoid having to re-import everything by adding the mixin clause and then updating the metadata. In that case, you must manually remake the obscore table:: gavo imp -m path/to/my/rd gavo imp //obscore create Publishing DaCHS-managed tables via TAP --------------------------------------- In the simplest form, all you need to do to publish a table through the TAP endpoint is to add an ``adql="True"`` attribute to the table definition and update the metadata (by saying ``gavo imp -m ``). You should, however, take particular care that there's a useful description of the table, usually as a direct meta on the table. Keep in mind that people will stumble across the table in some sort of registry and need to be able to figure out whether the table contains useful data by that description and the column metadata alone. The TAP endpoint only exposes rather limited metadata. At least when there is no published service on the table, you may want to just publish the data to the registry, too. This leads to a much richer set of metadata, increasing people's chances to able to locate the data. To publish a nonservice (usually a table definition, but you can register data descriptors containing multiple tables, too), use the `register Element <./ref.html#element-register>`_ . For a simple table, just wringing ```` is enough, since the set name defaults to ``ivo_managed`` and ADQL-accessible tables are automatically related to the TAP services. When ``register`` is the child of a data item, you need to manually declare that child tables are TAP-accessible, like this:: Another thing you might want to do when publishing tables to TAP is add sample queries for them. As an extension to the usual tap_schema, DaCHS has an example table giving a name, a query and a description. TAP clients may exploit these examples to help users figure out what to usefully do with more arcane tables, and of course you can explain more interesting features of your server or data here. To add an example, create a file with a name ending in ``.sample`` in ``$GAVO_INPUTS/__system/adqlsamples/``. The grammar for theses files is defined in ``//tap#import_examples``. You write the three keys, viz., name, query and description, in this sequence, each followed by a double colon and any material you want in the field. The keys must start at the beginning of a line. You must add a double period at the and of the file, and it's one file per example. Here's what this should look like:: name::katkat bibliography query:: select * from katkat.katkat where gavo_hasword('variable', source) and minEpoch<1900 description::To search for title (or other) words in katkat's source field in some sort of bibliographic query, use the gavo_hasword locally defined function. This basically works a bit like you'd expect from search engines: case-insensitive, and oblivious to any context. .. After adding an example, run ``gavo imp //tap import_examples`` to update the database table. Publishing existing tables via TAP ---------------------------------- If you already have a database table and now want to use DaCHS to publish it via TAP, just write an RD as described above, except that the data element is trivial. Here's an example of how that could look like:: My great table ... (more metadata) id of object covered here

Then, say ``gavo imp -m ``; make sure you don't forget the ``-m``, because without it, ``gavo imp`` will drop the existing if it can, i.e., if gavoadmin has write access to the schema in question, and it should have that for reasons explained in the next paragraph. This adds the metadata you've given to all kinds of administrative tables DaCHS keeps but does not touch the data. It will also try to fix the permissions of the table such that DaCHS's untrusted user can read it. To let DaCHS manage the permissions, in psql say (assuming standard profiles):: GRANT ALL PRIVILEGES ON SCHEMA TO gavoadmin WITH GRANT OPTION; GRANT SELECT ON . TO gavoadmin WITH GRANT OPTION; If you have local users accessing the table, you should declare them in either the allRoles or readRoles attributes to the table definiton. Maybe even adapting the profiles in GAVOROOT/etc to match your existing infrastructure could make sense. The Registry Interface ====================== Introduction ------------ Conceptually, the VO's Registry is a set of resource records (i.e., descriptions of services, data, or other entities) to let users locate resources relevant to them (e.g., look for a service giving surface temperatures for OB stars). Whatever as a resource record is called *VO resource* in the following to keep them apart from whatever DaCHS resource descriptors describe; DaCHS RDs may descibe zero, one, or multiple VO resources. We apologize for the confused nomenclature. Physically, there are several services that keep and update this set and let people query them (a "full registry"), e.g., the `VAO registry`_, the `ESAVO registry`_, or the Astrogrid registry. All these should harvest each other and thus have identical content (this is currently not always true). To be part of the VO, you have to register your services. DaCHS makes this fairly easy since it contains a publishing registry. This is again a service that exposes a standard interface defined by the Open Archives Initiatives. There is a renderer for the OAI harvesting protocol (`OAI-PMH`_) called ``pubreg.xml`` that goes together with ``registryCore``. The service ``//services#registry`` with this renderer has a vanity name of ``/oai.xml``, which is you data center's publishing registry "endpoint". Full registries obtain the resource records present on your data center for there. Each VO resource has a unique identifier of the form:: ivo:/// -- is defined by the DaCHS software (to be ``/ is a globally unique string. It is recommended that you use your DNS name (or some appropriate part of it), which will provide some uniqueness. The authority is declared in your gavorc (see below). Details on VO identifiers can be found in `IVOA Identifiers`_. To claim an authority, you have to define who you -- as an organization -- are. For this, DaCHS will create a resource record for your organization, too, where "your organization" for DaCHS means whatever you give as creator.name in defaultmeta (see below), which in general should be something like "My Institute Data Center" rather than "My Institute". You can register "My Institute" as well, if you want, but, the way things are written now, not as the entity running managing the authority. To make the VO aware of the existence of your data center, you will need to tell the `RofR`_ (Registry of Registries) about your data center. Before you can do this, you need to fill in quite a bit of information in gavorc. The next section explains how this is done. .. _ESAVO registry: http://esavo.esa.int/registry/index.jsp .. _VAO registry: http://nvo.stsci.edu/vor10/index.aspx .. _IVOA identifiers: http://www.ivoa.net/Documents/REC/Identifiers/Identifiers-20070302.html .. _RofR: http://rofr.ivoa.net/ .. _OAI-PMH: http://www.openarchives.org/OAI/openarchivesprotocol.html DaCHS' Registry Interface ------------------------- As explained in the introduction, you must provide enough data to allow the VO to tell who you are before you can include your data center into the VO. The first step is to define your authority (i.e., something like org.g-vo.dc) in your config (``/etc/gavo.rc``), in the ``[ivoa]authoriy`` item. Then, add metadata about yourself in ``GAVO_ROOT/etc/defaultmeta.txt``. It has a `simple format`_, basically, ``: ``. In it, you must give basic information on your authority and some fallback for services: * authority.creationDate -- A UTC datetime (with trailing Z); technically, it should be the date the resource record is created, but realistically, just use "now". Example: ``2007-12-19T12:00:00Z``. * authority.title -- A human-readable descriptor of what the authority corresponds to. Example: ``The GAVO data center`` * authority.description -- A sentence or two on what you're up to. Example: ``The GAVO data center provides VO publication services to all interested parties on behalf of the German Astrophysical Virtual Observatory.`` (use backslashes an the end of the lines to break long lines). * authority.referenceURL -- A URL at which people can learn more about your organisation. Example: ``http://www.g-vo.org``. * publisher -- A short, human-readable name for you * publisher.ivoId -- An IVOA id for yourself; set this to ivo:///org unless you know what you are doing * contact.name -- A human-readable name for some entity people should write to. This is not necessarily different from publisher, but ideally people can write "Dear " in their mails. * contact.address -- A contact address for surface mail * contact.email -- An email address. It should be spam-proof. * contact.telephone -- A telephone number people can call if things really look bad. * creator.name -- A name to use when you give no creator in your resource descriptors. Could be some error sentinel ("we foget to give credit, please complain") or just contact.name if you produce resources yourself. * creator.logo -- A URL for a logo to use when none is given in the resource metadata. Use a small PNG here. .. _simple format: ref.html#meta-stream-format Registering DaCHS-external services ----------------------------------- The registry interface of DaCHS can be used to register entities external to DaCHS; actually, you're already doing this when you're claiming an authority. To register a non-service "resource", you can fill out a resRec RD element. You could reserve an RD (say, ``GAVOROOT/inputs/ext.rd`` to collect such external registrations, or you could put them alongside internal services into their respective RDs. You will then usually just use the resRec's id attribute to determine the IVORN of resource record. It will then be ``ivo:////``. In all likelihood, however, you will want to register services. To do that, use a normal service definition with with a nullCore. You probably need to manually give an accessURL. The most common case is that of a service with a ``WebBrowser`` capability. These result from ``external`` or ``static`` renderers. Thus, the pattern here usually is:: shortName: My external service description: This service does wonderful things, even though\ it's not based on GAVO's DaCHS software. http://wherever.else/svc Of course, you will normally need to add further metadata as discussed above. Running the DC Server ===================== You will probably want to run the DC server software from a start script. The ``gavo`` interface already works a bit like a SYSV initscript -- you can say ``gavo start``, ``gavo stop``, ``gavo restart`` and ``gavo reload``; you may want to define an init script anyway (instead of just linking) in order to define metadata. TBD. Updates to metadata are not usually picked up by the running software. If you change an RD, you may have to say ``gavo imp -m `` if the column metadata or permissions changed, or ``gavo publish `` if you changed the publication status. Even then, the home page will not show new local publications since those lists are currently cached. To update them, you could run ``gavo reload``, but it's better to just reload the services data. To do this, log in as gavoadmin and go to the service overview at ``/__system__/services/overview/form`` (the standard root has that in a small [s] link at the very bottom of the page). Choose "Admin me" from the sidebar and then the reload RD button. Similarly, if you edit anything in an RD, you will have to reload it, preferably through "Admin me", before the changes will be reflected. Note that right now, if your RD is invalid, any services on it will stop working on a reload. The Vanity Map -------------- DaCHS' URL scheme leads to somewhat clunky URLs that, in particular, reflect the file system underneath. While this doesn't matter to the VO registry, it is possibly unwelcome when publishing URLs outside of the VO. To overcome it, you can define "vanity names", single path elements that are mapped to paths. These mappings are read from the file ``GAVO_ROOT/etc/vanitymap.txt``. The file contains lines of the format:: [