========= How Do I? ========= Recipies and tricks for solving problems using GAVO DaCHS %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% :Author: Markus Demleitner :Email: gavo@ari.uni-heidelberg.de .. contents:: :depth: 2 :backlinks: entry :class: toc Ingestion ========= ...skip a row from a rowmaker? ------------------------------ Raise IgnoreThisRow in a procedure application, like this:: if 2+colX>22: raise IgnoreThisRow() However, it's probably more desirable to use the rowmakers' built-in ``ignoreOn`` feature, possibly in connection with a procedure, since it is more declarative. Still, the following is the recommended way to selectively ignore broken records defined via certain identifiers:: This proc filters out records too broken to ingest. for the set of ids in the toIgnorePar set([ 202, 405]) if @catid in toIgnore: raise IgnoreThisRow("Manually ignored from RD") ...skip a single source? ------------------------ If you want to skip processing of a source, you can raise SkipThis from an appropriate place. Usually, this will be a sourceFields element, like this:: if len(sourceToken)>22: raise base.SkipThis("%s skipped since I didn't like the name"% sourceToken) ...fix duplicate values? ------------------------ There are many reasons why you could violate the uniqueness constraints on primary keys, but let's say you just got a message saying:: Primary key could not be added ('could not create unique index "data_pkey" DETAIL: Table contains duplicated values.)' The question at this point is: What are the duplicated values? For a variety of reasons, DaCHS only applies constraints only after inserting all the data, so the error will occur at the end of the input. Not even the ``-b1`` trick will help you here. Instead, temporarily remove the primary key condition from the RD and import your data. Then, exececute a query like:: select * from ( select , count(*) as ct from group by ) as q where ct>1; ...cope with "Undefined"s in FITS headers? ------------------------------------------ Pyfits returns some null values from FITS headers as instances of "Undefined" (Note that this is unrelated to DaCHS' base.Undefined). If you want to parse from values that *sometimes* are Undefined, use code like:: parseWithNull(@RA, lambda l: hmsToDeg(l, ":"), checker=lambda l: isinstance(l, utils.pyfits.Undefined)) ...force the software to accept weird file names? ------------------------------------------------- When importing products (using //products#define or derived procDefs), you may see a message like:: File path 'rauchspectra/spec/0050000_5.00_H+He_0.000_1.meta' contains characters known to the GAVO staff to be hazardous in URLs. Please defuse the name before using it for published names. The machinery warns you against using characters that need escaping in URLs. While the software itself should be fine with them (it's a bug if it isn't), such characters make other software's lives much harder – the plus above, for example, may be turned to a space if used in a URL. Thus, we discourage the use of such names, and if at all possible, you should try and use simpler names. If, however, you insist on such names, you can simply write something:: "\schema.newdata" \inputRelativePath{True} (plus whatever else you want to define for that rowfilter, of course) in the respective data element. ...handle formats in which the first line is metadata? --------------------------------------------------------- Consider a format like the metadata for stacked spectra:: C 2.3 5.9 0 1 0 0 1 2 1 0 1 ... – here, the first line gives a center position in degrees, the following lines offsets to that. For this situation, there's the grammar's sourceFields element. This is a python code fragment returning a dictionary. That dictionary's key-value pairs are added to each record the grammar returns. For this example, you could use the following grammar:: with open(sourceToken) as f: _, cra, cde, _ = f.readline().split() return { "cra": float(cra), "cde": float(cde)} In the rowmaker, you could then do something like this:: @cra+float(@dra)*DEG_ARCSEC @cde+float(@dde)*DEG_ARCSEC ...use binaries when the gavo directory is mounted from multiple hosts? ----------------------------------------------------------------------- If your GAVO_ROOT is accessible from more than one machine and the machines have different architectures (e.g., i386 and amd64 or ppc, corresponding to test machine and production server), compiled binaries (e.g., getfits, or the preview generating code) will only work on one of the machines. To fix this, set platform in the [general] section of your config file. You can then rename any platform-dependent executable base-, and if on the respective platform, that binary will be used. This also works for computed resources using binaries, and those parts of the DC software that build binaries (e.g., the booster machinery) will automatically add the platform postfix. If you build your own software, a make file like the following could be helpful:: PLATFORM=$(shell gavo config platform) TARGET=@@@binName@@@-$(PLATFORM) OBJECTS=@@@your object files@@@ $(REG_TARGET): buildstamp-$(PLATFORM) $(OBJECTS) $(CC) -o $@ $(OBJECTS) buildstamp-$(PLATFORM): make clean rm -f buildstamp-* touch buildstamp-$(PLATFORM) You'll have to fill in the @@@.*@@@ parts and probably write rules for building the files in $OBJECT, but otherwise it's a simple hack to make sure a make on the respective machine builds a suitable binary. ...change the description, unit, whatever on a single field in a mixin? ----------------------------------------------------------------------- Sometimes some property of a field you get via a mixin – say, product or ssap – isn't quite right or could improved; this could be a description. In this case, just "overwrite" the field; there can only be one column with a given name in each table, and newer columns overwrite older ones. When overwriting, you should inherit from the orignal. Where that is depends on the mixin; for SSAP, this could look like this:: ...transform coordinates from one coordinate system to another? --------------------------------------------------------------- In general, that's fairly complicated, involving proper motions, radial velocities, and other complications. There's an interface in the stc module based on conformTo, but you'd not want to do that when filling tables since it's not fast and usually not what you need either. For simple things (i.e., change plain positions between Galatic and ICRS or Equinox B1965 and J2010) and bulk transforms, you can use the following pattern in a rowmaker apply (transforming from Galactic to ICRS):: gal2ICRS = stc.getSimple2Converter( stc.parseSTCS("Position GALACTIC"), stc.parseSTCS("Position ICRS")) @raj2000, @dej2000 = gal2ICRS(@galLon, @galLat) ...access the creation date of the file from which I'm parsing? --------------------------------------------------------------- The row iterator that feeds the current table is in rawdicts under the key ``parser_``. For grammars actually reading from files, the ``sourceToken`` attribute of these gives the absolute path of the input file. Hence, you'd say:: os.path.getmtime(@parser_.sourceToken) Services in General =================== ...set a constant input to a core? ---------------------------------- Use a service input key with a Hidden widget factory and a default:: ... ...add computed columns to a dbCore output? ------------------------------------------- Easy: define an output field with a select attribute, e.g.:: This will add an output field that looks to the service like it comes from the DB proper but contains the value of the ``ev_i`` column multiplied with 5.434. The expression must be valid SQL. There is a fairly common case for this that's somewhat tricky: Compute the distance of the position of a match to the position of a cone search (or endless variants of that). Tasks like that are hard for DaCHS, as the select clause (where the distance is computed) needs information on something only avaialable to the condition descriptors (the input position). There's a slightly hackish way around this. It builds on the predictability of the names chosen by ``base.getSQLKey`` and the way parameters are passed through to PostgreSQL. Hence, this is a portability liability. We'd like to hear about your applications for this, as that may help us figure out a better solution. Meanwhile, to add a spherical distance to a cone search service over a table with positions in raj2000, dej2000, say something like:: As you can see, this unconditionally assumes that the parameter names for the cone condition are in RA0, DEC0; it is likely that that is true, but this is not API-guaranteed. You could write your condDesc with fixed field names to be safe, but even then this builds on the specific API of psycopg, so it's certainly not terribly portable. Another issue is that not all queries necessarily give RA and Dec; as pulled in by the ususual ``//scs#coreDescs`` STREAM, cone conditions in forms are not mandatory. The above construct will yield fairly ugly errors when they are left out. To fix this, make humanInput mandatory, saying something like:: instead of the coreDescs (maybe adding the built-in MAXREC, too). ...import data coming in to a service? -------------------------------------- In a custom renderer or core, you can use code like:: from gavo import api ... def import(self, srcName, srcFile): dd = self.service.rd.getById("myDataId") with api.getWritableAdminConn() as conn: self.nAffected = api.makeData(dd, forceSource=srcFile, connection=conn).nAffected You want to use a separate connection since the default connections obtained by cores and friends are unprivileged and typically cannot write to table. The nAffected should contain the total number of records imported and could be used in a custom render function. srcName and srcFile come from a formal File form item. In submitActions, you obtain them like:: srcName, srcFile = data["inFile"] Note that you can get really fancy and manipulate ``data`` in some way up front. That could look like this:: from gavo import rsc ... data = rsc.Data.create(dd, parseOptions=api.parseValidating, connection=conn) data.addMeta("_stationId", self.stationRecord["stationId"]) self.nAffected = api.makeData(dd, forceSource=srcFile, data=data, connection=conn).nAffected ...define an input field doing google-type full text searches? -------------------------------------------------------------- Since version 8.3 (or so), postgres supports query modes inspired by information retrieval on text columns -- basically, you enter a couple of terms, and postgres matches all strings containing them. Within DaCHS, this is currently only supported using custom phrase makers. This would look like this:: yield ("to_tsvector('english', description)" " @@ plainto_tsquery('english', %%(%s)s)"%( base.getSQLKey("columnwords", inPars["columnwords"], outPars)) -- here, ``description`` is the column containing the strings, and the ``'english'`` in the function arguments gives the language according to which the strings should be interpreted. You may want to create an index supporting this type of query on the respective columns, too. To do that, say:: to_tsvector('english', bibref) ...protect my TAP service against usage? ---------------------------------------- Sometime there's a proprietarity time on some data, and you may want to password protect your data. DaCHS doesn't support protection of individual tables yet (we'll accept patches, but since this kind of thing is considered fairly antisocial by most of us, it looks bleak for an implementation of this from our side). You can, however, protect the entire TAP service (with HTTP basic auth, so this isn't for the gold of Fort Knox). First, create a group of authorized users and add a user to it: gavo admin adduser tapusers master_password gavo admin adduser smart_pi his_password gavo admin addtogroup smart_pi tapusers Then, limit access to the TAP service itself. To make it easy to redo your changes and return to mainline development, we recommend to use a custom TAP RD. To do that, create a ``__system__`` directory in your inputs directory; with the default config:: cd /var/gavo/inputs mkdir __system__ cd __system__ curl -O http://svn.ari.uni-heidelberg.de/svn/gavo/python/trunk/gavo/resources/inputs/__system__/tap.rd (or find your local tap.rd and use that, which may be a bit safer). In tap.rd, look for the actual service element:: In that copy, write ``limitTo="tapusers"``. You need to restart the server to pick up the new RD, after that, it should pick up changes by itself. To get back standard behaviour, just remove that copy. That's not enough, though; you'll need to do the same with the service in adql.rd (or remove that service altogether). This is not sufficient for top secret things -- people can still inspect your column metadata, descriptions and so on. But they shouldn't get to the data. o make this work with the TOPCAT TAP client, you'll need to set the star.basicauth.user and star.basicauth.password java system properties. See http://www.star.bristol.ac.uk/~mbt/topcat/sun253/jvmProperties.html on how to do this; by the time your read this, TOPCAT may already provide less scary ways to enter credentials, as may other TAP clients. ...have output columns with URLs in them ---------------------------------------- You might be tempted to use ``outputField/formatter`` for this, and indeed this lets you generate arbitrarily odd things, including javascript, in whatever you make of the value coming in. But nothing but a web browser can do anything with this kind of thing. Therefore, DaCHS does not execute formatters for non-HTML output, which means for something like:: return T.a(href="http://my.data.serv.er/customservice?"+data)[ "Download"] VOTable and friends will only have the value of whatever obsid is in their corresponding column, which in this case is probably *not* what you want. Instead, use the ``select`` attribute of ``outputField`` to generate the URL within the database, like this:: The ``type=url`` display hint will generate links in HTML (and possibly some other appropriate thing for URLs in particular in other output formats). The anchor text is the last item from the path component of the URL, which is not always what you want. To override it, you can use a formatter, or, in simple cases when a constant text will do, the anchorText property, like this:: Retrieve Data Form-based interfaces ===================== ...get a multi-line text input for an input key? ------------------------------------------------ Use a widgetFactory, like this:: ...make an input widget to select which columns appear in the output table? --------------------------------------------------------------------------- In general, selecting fancy output options currently requires custom cores or custom renderers. Ideas on how to change this are welcome. For this specific purpose, however, you can simply define an service key named _ADDITEM. This would look like this:: .... ... Setting showItems to -1 gives you checkboxes rather than a select list, which is mostly what you want. Try with and without and see what you like better. If you do that, you *probably* do not want the standard "additional fields" widget at the bottom of the form. To suppress it, add a line :: True to the service definition. The "True" in there actually is a bit of a red herring, the widget is suppressed for any value. ...add and image to query forms? -------------------------------- There are various variations to that theme -- you could go for a custom template if you want to get fancy, but usually putting an image into an _intro or _bottominfo meta section should do. In both cases, you need a place to get your image from. While you could put it somewhere into rootDir/web/nv_static, it's probably nicer to have it within a resource's input directory. So, add a static renderer to your service, like this:: static This lets you put service-local static data into resdir/static/ and access it as /static/ Usually, your _intro or _bottominfo will be in reStructured text. Plain images work in there using substitution references or simply the naked image directive:: The current data set comprises these fields: .. image:: \servicelink{cars/q/cat/static/fields-inline.png} The servicelink macro would ensure that the image would still be found if the server ran off-root. This is the recommended way of doing things. If, however, you insist on fancy layouts or need complete control over the appearance of your image (or whatever), you can use the evil "raw" meta format:: ]]> Make sure you enter valid HTML here, no checks are done by the DC software. ...put more than one widget into a line in web forms? ----------------------------------------------------- Use input table groups with a compact. In DB cores, however, you probably do not want to give inputTables explicitely since it's much less hassle to have them computed from the condDescs. In this case, the widgets you want to group probably come from a single condDesc. To have them in a group, define a group within the condDesc without any paramRefs (or colRefs) -- they cannot be resolved anyway. Give the group style and label properties, and it will be added to the input table for all fields of the condDesc:: compact Example vals If you are doing this, you probably want to use the ``cssClass`` property of input keys and the ``customCSS`` property of services. The latter can contain css specifications. They are added into form pages by the defaultresponse template (in your custom templates, you should have ``