========================================= GAVO DaCHS installation and configuration ========================================= .. contents:: :depth: 3 :backlinks: entry :class: toc These installation instructions cover the installation of the complete data center suite. Installing libraries or, say, the tapsh, is much less involved. See the respective pages at the `GAVO DC's software distribution pages`_ for details on those. .. _GAVO DC's software distribution pages: http://vo.ari.uni-heidelberg.de/soft Installation ============ Debian systems -------------- The preferred way to run DaCHS is on Debian stable or compatible systems. However, for the complete data center suite, the apt-based approach is not recommended yet, in particular since it is quite possible that you will run into bugs, and it will be much easier for us if you can update directly from our subversion repository. Still, you should `add our apt repository`_ to your system's sources.list. When you have done this, you *could* do:: sudo aptitude install gavodachs But really, if you want to operate a server, you should, for now, work from svn. .. _add our apt repository: http://vo.ari.uni-heidelberg.de/soft/repo Working with the subversion source ---------------------------------- Getting the source '''''''''''''''''' Though we may provide releases_ now and then, you probably should just check out whatever is in the subversion repository right now. Say, at some place you can write to:: svn co http://svn.ari.uni-heidelberg.de/svn/gavo/python/trunk/ dachs After that, the current source code is in the ``dachs`` subdirectory. This is development code, so *please* do not hesitate to contact us if something weird is going on with it. We mean it; even trivial reports help us to gauge where our software behaves contrary to expectations. Plus, we clearly don't have oodles of users, so chances are you won't get on our nerves. Try gavo@ari.uni-heidelberg.de using mail, or ++49 6221 541837 on the plain old telephone. You can also use XMPP ("jabber"), we'll give you an id on request. .. _releases: http://vo.ari.uni-heidelberg.de/soft/ Installing from source '''''''''''''''''''''' The DaCHS installer is based on setuptools; we do not use setuptools' dependency management, though, since in practice it seems more trouble than it's worth. So, first install the python dependencies: twisted, nevow, pyfits, numpy, pyparsing, PIL, soappy, the Zolera soap infrastructre, setuptools, and psycopg2. Furthermore, you will need a python that can build (python) packages, and (usually) a C compiler. On Debian Lenny systems, the following should work:: sudo aptitude install python-dev build-essential \ python-nevow python-psycopg2 python-pyparsing python-pyfits\ python-numpy python-imaging python-soappy python-zsi\ python-setuptools libxml2 (it's not important that the server-dev version matches the version of your sever at this point). Finally, in the ``dachs`` directory you checked out above, say:: sudo python setup.py develop (there are various options to get the stuff installed when you prefer not to install as root; refer to the `setuptools documentation`_ if necessary). .. _nevow: http://divmod.org/trac/wiki/DivmodNevow .. _setuptools documentation: http://peak.telecommunity.com/DevCenter/EasyInstall Setup ===== You should first create a user that the DaCHS server runs as later, and a group for running DC-related processes in:: sudo adduser --system gavo sudo addgroup --system gavo sudo adduser gavo gavo (or similar, depending on your environment). This user should not be able to log in, but it should have a home directory. Everyone working on the data center must be in the group created (this is because the log directory will be writable by this group); in particular, you should add yourself:: sudo adduser `id -nu` gavo You may want to create another account for "maintenance", or just use your normal account; if more than one person will feed the data center, you'll need more elaborate schemes; do not use the gavo group as the "data center maintenance group". Next, you need to decide on a "root" directory for DaCHS. Below it, there are data descriptions, cache files, logs, etc. (these locations can be changed later, but for a simple setup we recommend keeping everything together). By default, this is ``/var/gavo``. DaCHS is configured in an INI-style configuration file in ``/etc/gavo.rc`` (overridable using the envirvonment variable ``GAVOSETTINGS``). In addition, users, in particular the gavo user, can have ~/.gavorc files, the contents of which override settings in /etc/gavo.rc. `Configuration Settings`_ gives a walkthrough through the most important settings; for now, if you want to change the DaCHS root dir, use something like:: [general] rootDir: /data/gavo as /etc/gavo.rc. You can now let DaCHS create its file system hierarchy:: gavo init For this to work, you must ``rootDir`` must exist and be writable by you, or you must have sufficient privileges to create it. Do *not* run ``gavo init`` as root, since the files and directories it creates will be owned by whoever ran the program. In the typical situation in which you may not write to ``rootDir``'s parent, do something like:: sudo mkdir -p /data/gavo sudo chown `id -nu`:staff /data/gavo (replacing staff with some appropriate group). ``gavo init`` may spit out a warning or two on the first run. On repeated runs no output at all should appear. You can later run gavo init again. It will not clobber anything you did in the meantime (well, if it does, it's a bug and you should fiercely complain). In particular, this is the most convenient way to create directories if you changed locations in ``gavo.rc``. Next, you need to prepare the database. Database setup -------------- To make things work, you will also need to set up a database; the software requires at least two users, three if you let people to ADQL queries. Currently, we only support postgres. You should have one user that may create tables and schemas, so you probably need a dedicated database. Cluster Creation '''''''''''''''' You first need a database to play with, preferably in a suitable cluster; even if you want to use DaCHS to expose an existing database, we recommend you create a playground first to get a feeling for how things should look like. You can then switch to your existing database, e.g., by having a .gavorc setting configDir. Database cluster generator is very system-dependent, and ideally a database admin would assist you. On Debian systems dedicated for GAVO DaCHS, you can try the following: (#) Find out the version of the server you will be running (e.g., using ``dpkg -l``; in Debian, more than one version may be installed in parallel. It's probably a good idea to use the most recent one. Set your desired version for subsequent use:: export PGVERSION=8.3 (#) Drop the Debian default cluster (this will delete everything in there -- for a fresh install, that doesn't matter, but don't do this if other people use the database). If you don't do this, your database will listen do a different port, and you will have to adapt the default profiles:: sudo pg_dropcluster --stop $PGVERSION main (#) Create the new cluster used by DaCHS:: sudo pg_createcluster -d / \ --locale=C -e UNICODE\ --lc-collate=C --lc-ctype=C $PGVERSION pgdata The locale should currently be C, because only the C locale will allow you databases with all kinds of encodings. The database stores descriptions and similar entities, and you may encounter funny characters in there. It would be a shame if you couldn't store them (plus, you would get funny error messages for those). (#) Start the server:: sudo /etc/init.d/postgresql-$PGVERSION start (#) You probably want to make yourself a superuser account so you don't need to become postgres all the time. To do that, say:: sudo -u postgres createuser -P -adsr `id -un` On Debian, the configuration files for this cluster are at ``/etc/postgresql/$PGVERSION/pgdata/``. Database Preparation '''''''''''''''''''' The data center software accesses the database in various functions. These are mapped to profiles which correspond to access information (basically, the DSN, user, and password). There are three of them: * feed -- the "admin" profile, used for feeding tables into the normal database, for user management, credentials checking and the like. * trustedquery -- this profile is used for queries generated by the DC software (though usually on behalf of a user). The corresponding DB role can access all "normal" tables, privilege management is supposed to happen throught the web interface. * untrustedquery -- the profile used for user-contributed SQL. Only tables expressly opened up are accessible to it. You can adapt those names as necessary in the corresponding profiles. See `The profiles Section`. The following procedure sets up users and databases as expected by the default profiles (if you made yourself a superuser account as described above you do not need the ``sudo -u postgres`` in these commands):: # create the database that'll hold your data sudo -u postgres createdb --encoding=UTF-8 gavo # create the user that feeds the db... sudo -u postgres createuser -P -ADsr gavoadmin # and a user that usually has no write privileges sudo -u postgres createuser -P -ADSR gavo # and a user for ADQL queries (i.e., untrusted queries from the net) sudo -u postgres createuser -P -ADSR untrusted Enter the passwords you assign here into the ``feed``, ``trustedquery``, and ``untrustedquery`` profiles, respectively. These profiles are found in ``$GAVO_ROOT/etc``. Note that running a database server always is a security liability. You should make sure you understand what the pg_hba.conf (in postgres' configuration directory) says. As a minimum, you should have a line like:: local gavo gavo,gavoadmin,untrusted md5 in there, probably right below the line allowing the postgres user complete access (the order of lines in pg_hba.conf is significant); it allows password authentication for the three users above from the local machine. If you have two machines sitting on a reasonably trusted local network, you could say something like:: host gavo gavo,gavoadmin,untrusted xxx.xxx.xxx.0/24 md5 (where the x must be replaced with your network number). If you insist on having the data between DaCHS and the postgres server go through untrusted lines, see the postgres docs on how to set up an SSL connection. Finally, you need to let the various roles you just created access the database; you do this using the command line interface to postgres:: sudo -u postgres psql gavo \ -c "GRANT ALL ON DATABASE gavo TO gavoadmin" For the individual tables, rights to gavo and untrusted are granted by ``gavo imp``, so you do not need to specify any rights for them. Importing the first resources ============================= There are some built-in tables in DaCHS, related to metadata storage, certain protocols, and the like. You must import them before the DC software can be used. This also is a nice test that at least some things work. So, in this sequence, run:: gavo imp --system //dc_tables gavo imp --system //services gavo imp --system //users gavo imp --system //tap gavo imp --system //products gavo imp --system //obscore Output of the type ``Columns affected: 0`` is ok for these commands. The double slash in the identifiers above means "use system resources". All these really refer to resource descriptors (RD) in the __system__ resource directory; at this point, they are the RDs shipped with DaCHS. If you get error messages, add a ``--hints`` after the gavo command, like this:: gavo --hints imp --system //dc_tables This will (for the ``gavo`` command in general) give additional error info where available. You should now be able to run the examples in the tutorial. Configuration Settings ====================== These are for finetuning. You should at least briefly skip over this section. We will first provide an informal walkthrough of the various configuration sections and give a reference below. Walkthrough ----------- The general Section ''''''''''''''''''' This mainly sets paths. The most important is ``rootDir``, a directory most other paths are relative to. This is the one you'll most likely want to change. If you, e.g., wanted to have a private DaCHS tree, you could put :: [general] rootDir: /home/user/gavo into your ~/.gavorc. The other paths in this section are relative to rootDir, but absolute paths are legal, too. You may want to set tempDir and cacheDir to a directory local to your maching if rootDir is mounted via a network. Also note that we do no synchronization for writing to the log (and never will -- we will provide syslog based logging if necessary), so you may want to tweak this too to keep actions from seperate users seperate. The db Section '''''''''''''' In the db section, some global properties of the database access layer are defined. Currently, the most releveant one is profilePath. This is a colon-separated list of rootDir-relative paths in which we look for database profiles (expansion of home directories is supported). The first match in any of these directories wins. This is useful when you have a test setup and a production setup -- just say ``include dsn`` in the common profiles (by default in configDir) and have separate dsn files in the ~/.gavo directories of the accounts feeding the test and production databases. In addition, you can supply a comma separated list of database roles in maintainers. There are roles that get gavoadmin-like privileges on all tables created by the system. This is provided so people working on the system do not need to go through the gavoadmin account to do DB housekeeping. Note that when you change maintainers, you *must* keep gavoadmin in it unless you know excatly what you do. The DC software will revoke all kinds of privileges from its own feeder if gavoadmin isn't part of maintainers, which will be a mess to fix. queryRoles and adqlRoles correspond to the roles defined during db setup. Again, you can supply more than one such role, but you must at least give those defined in the trustedquery and untrustedquery profiles. The profiles Section '''''''''''''''''''' The profile section maps profile names to file names. These file names are relative to any of the directories in db.profilePath. The file names contain a specification of the access to the database in (unfortunately yet another, but simple) language. Each line in such a profile is either a comment (starting with #), an assignment (with "=") or an instruction (consisting of a command and arguments, separated by whitespace). Keywords available for assignment are * host -- the host the database resides on. Leave empty for a Unix socket connection. * port -- the port the database listens on. Leave empty for default 5432. * database -- the database your tables live in. * user -- the user through which the db is accessed. * password -- the password of user. There's just one command available, viz., * include -- read assignments and instructions from the profile given in the argument ``gavo init`` creates four profile files, ``dsn``, ``feed``, ``trustedquery``, and ``untrustedquery``. These are referred to in the default profiles section, and are basically required by the python code. See `Database setup`_ for infos how to edit them. You will not usually need more or different profiles or setup on top of what's discussed there. The web Section ''''''''''''''' * serverURL: the scheme and host under which the service is visible (like, http://vo.foo.org) * nevowRoot: a serverURL-relative URL to the place the nevow service appears in the server. * errorPage: debug or something else -- whether to show a traceback on the net or not. * staticURL: a serverURL-relative URL to static the data (for querulator) Reference --------- You can get an up-to-date version of this by running ``gavo config``. Section [general] ''''''''''''''''' Paths and other general settings. * cacheDir: path relative to rootDir; defaults to 'cache' -- Path to the DC's persistent scratch space * configDir: path relative to rootDir; defaults to 'etc' -- Path to the DC's non-ini configuration (e.g., DB profiles) * defaultProfileName: string; defaults to 'admin' -- Default profile name (used to construct system entities) * gavoGroup: string; defaults to 'gavo' -- Name of the unix group that administers the DC * inputsDir: path relative to rootDir; defaults to 'inputs' -- Path to the DC's data holdings * logDir: path relative to rootDir; defaults to 'logs' -- Path to the DC's logs (should be local) * logLevel: value from the list info, debug, warning, error; defaults to 'info' -- Verboseness of importer * operator: string; defaults to '' -- Mail address of the DC's operator(s). * platform: string; defaults to '' -- Platform string (can be empty if inputsDir is only accessed by identical machines) * rootDir: string; defaults to '/var/gavo' -- Path to the root of the DC file (all other paths may be relative to this * stateDir: path relative to rootDir; defaults to 'state' -- Path to the DC's state information (last imported,...) * tempDir: path relative to rootDir; defaults to 'tmp' -- Path to the DC's scratch space (should be local) * webDir: path relative to rootDir; defaults to 'web' -- Path to the DC's web related data (docs, css, js, templates...) Section [adql] '''''''''''''' Settings concerning the built-in ADQL core * webDefaultLimit: integer; defaults to '2000' -- Default match limit for ADQL queries via the web form * webTimeout: integer; defaults to '15' -- Default timeout for adql queries via the web form Section [db] '''''''''''' Settings concerning database access. * adqlRoles: set of strings; defaults to 'untrusted' -- Name(s) of DB roles that get access to tables opened for ADQL * defaultLimit: integer; defaults to '100' -- Default match limit for DB queries * interface: string; defaults to 'psycopg2' -- Don't change * maintainers: set of strings; defaults to 'gavoadmin' -- Name(s) of DB roles that should have full access to gavoimp-created tables by default * msgEncoding: string; defaults to 'utf-8' -- Encoding of the messages coming from the database * profilePath: shell-type path; defaults to ' ~/.gavo:$configDir' -- Path for locating DB profiles * queryRoles: set of strings; defaults to 'gavo' -- Name(s) of DB roles that should be able to read gavoimp-created tables by default Section [ivoa] '''''''''''''' The interface to the Greater VO. * authority: string; defaults to 'org.gavo.dc' -- the authority id for this DC * dalDefaultLimit: integer; defaults to '10000' -- Default match limit on DAL queries * registryIdentifier: string; defaults to 'ivo://org.gavo.dc/static/registryrecs/registry.rr' -- The IVOAid for this DC's registry Magic Section [profiles] '''''''''''''''''''''''' Mapping of DC profiles to profile definitions. The items in this section are all of type profile name. You can add keys as required. * admin: profile name; A name of a file in [db]profilePath * deploydb: profile name; A name of a file in [db]profilePath * test: profile name; A name of a file in [db]profilePath * trustedquery: profile name; A name of a file in [db]profilePath * untrustedquery: profile name; A name of a file in [db]profilePath Section [ui] '''''''''''' Settings concerning the local user interface * outputEncoding: string; defaults to 'iso-8859-1' -- Encoding for system messages. This should match what your terminal emulator is set to Section [web] ''''''''''''' Settings related to serving content to the web. * adminpasswd: string; defaults to '' -- Password for online administration, leave empty to disable * adsMirror: string; defaults to 'http://ads.ari.uni-heidelberg.de' -- Root URL of ADS mirror to be used * bindAddress: string; defaults to '127.0.0.1' -- Interface to bind to * enableTests: boolean; defaults to 'False' -- Enable test pages (don't if you don't know why) * errorPage: string; defaults to 'debug' -- set to 'debug' for error pages with tracebacks, anything else for a less informative page * favicon: path relative to webDir; defaults to 'None' -- Webdir-relative path to a favicon * graphicMimes: list of strings; defaults to 'image/fits,image/jpeg' -- MIME types considered as graphics (for SIAP, mostly) * maxPreviewWidth: integer; defaults to '300' -- Hard limit for the width of previews (necessary because previews on protected items are free) * nevowRoot: path fragment; defaults to '/' -- Path fragment to the server's root for operation off the server's root * previewCache: path relative to webDir; defaults to 'previewcache' -- Webdir-relative directory to store cached previews in * serverPort: integer; defaults to '8080' -- Port to bind the server to * serverURL: string; defaults to 'http://localhost:8080' -- URL fragment used to qualify relative URLs where necessary * sitename: string; defaults to 'GAVO data center' -- A short name for your site * sqlTimeout: integer; defaults to '15' -- Default timeout for db queries via the web * templateDir: path relative to webDir; defaults to 'templates' -- webDir-relative location of global nevow templates * user: string; defaults to 'None' -- Run server as this user (leave empty to not change user) * vanityNames: path relative to webDir; defaults to 'vanitynames.txt' -- Webdir-realtive path to the name map for vanity names * voplotCodeBase: URL fragment relative to the server's root; defaults to 'static/voplot/VOPlot' -- URL of the code base for VOPlot * voplotEnable: boolean; defaults to 'False' -- Enable the VOPlot output format (requires some external software) * voplotUserman: URL fragment relative to the server's root; defaults to 'static/voplot/docs/VOPlot_UserGuide_1_4.html' -- URL to the documentation of VOPlot Binaries -------- For extended functionality, the system uses external binaries. If your GAVO_ROOT is accessible from more than one machine and the machines have different architectures, this may be a problem, because one platform may not be able to execute another platform's binaries. To fix this, set platform in the DEFAULTS section of your config file. You can then rename any platform-dependent executable base-, and if on the respective platform, that binary will be used. This also works for computed resources using binaries, and those parts of the DC software that build binaries (e.g., the booster machinery) will automatically add the platform postfix. If you build your own software, a make file like the following could be helpful: :: PLATFORM=$(shell gavo config platform) TARGET=@@@binName@@@-$(PLATFORM) OBJECTS=@@@your object files@@@ $(REG_TARGET): buildstamp-$(PLATFORM) $(OBJECTS) $(CC) -o $@ $(OBJECTS) buildstamp-$(PLATFORM): make clean rm -f buildstamp-* touch buildstamp-$(PLATFORM) You'll have to fill in the @@@.*@@@ parts and probably write rules for building the files in $OBJECT, but otherwise it's a simple hack to make sure a make on the respective machine builds a suitable binary. Specifying meta information fallbacks ===================================== In the file ``$GAVO_ROOT/etc/defaultmeta.txt`` you should give some information filled in when the resources do not give this kind of metadata themselves. Don't sweat it for now, but you must fix it before you run your own registry. Customizing the appearance on the web ===================================== Uh. I need to work on this. Basically, you'd have to check out ``resources/web`` in the source distribution. You can take these files and copy them to ``$GAVO_ROOT/web/`` to edit them there. The machinery should then pick them up and use them instead of what comes with the distribution (it doesn't, yet, for all such files). This is clearly suboptimal. Good ideas for "shallow" configurability are welcome. Server startup ============== The server startup is done using the ``gavo serve`` command that already understands the ``start``, ``stop``, ``reload``, and ``restart`` actions (``reload`` currently is mostly a no-op). So, a sysv startup script largely is a trivial wrapper. We have the following in our /etc/init.d/dachs:: #!/bin/sh -e ### BEGIN INIT INFO # Provides: DaCHS # Required-Start: $local_fs $remote_fs $network # Required-Stop: $local_fs $remote_fs $network # Default-Start: 2 3 4 5 # Default-Stop: 0 1 6 # Short-Description: Start/stop DaCHS Virtual Observatory server ### END INIT INFO # ENV="env -i LANG=C PATH=/usr/local/bin:/usr/bin:/bin" . /lib/lsb/init-functions test -f /etc/default/rcS && . /etc/default/rcS test -f /etc/default/apache2 && . /etc/default/apache2 SERVER_BIN="$ENV /usr/bin/gavo --disable-spew serve" case $1 in start) log_daemon_msg "Starting VO server" "dachs" if $SERVER_BIN start; then log_end_msg 0 else log_end_msg 1 fi ;; stop) log_daemon_msg "Stopping VO server" "dachs" if $SERVER_BIN stop; then log_end_msg 0 else log_end_msg 1 fi ;; reload | force-reload) log_daemon_msg "Reloading VO server config" "dachs" if $SERVER_BIN reload $2 ; then log_end_msg 0 else log_end_msg 1 fi ;; restart) log_daemon_msg "Restarting VO server" "dachs" if $SERVER_BIN restart; then log_end_msg 0 else log_end_msg 1 fi ;; *) log_success_msg "Usage: /etc/init.d/dachs {start|stop|restart|reload|force-reload}" exit 1 ;; esac More dependencies ================= The following software components are not really hard dependencies, but they are in some ways used by very common functions of DaCHS, and this you *should* install them unless you know what you are doing. PgSphere -------- PgSphere is a postgres extension for spherical geometry. It is needed for support of the geometric types in DaCHS. Obtain the source from http://pgsphere.projects.postgresql.org/ and install it as described there. Q3C --- If at some point you have large data sets (more than 20000 rows, say) and you want to do positional searches on them (cone search, crossmatch, siap and such), you will want the Q3C library by Sergey Koposov et al, http://www.sai.msu.su/~megera/oddmuse/index.cgi/SkyPixelization. It's not particularly tricky to install, but you should skip it for now. To install it, get the source from the web site given above. You need a couple of dependencies (on debian etch systems, that would at least be libpam-dev, libreadline-dev, and postgresql-server-dev-). Then follow the instructions in the q3c distribution. To automatically create the necessary indices, use the q3cpositions interface. VOPlot ------ The software can generate code to load VOTable into VOPlot, a Java applet for working with VOTables. Due to the applet security model, the applet has to originate from the server that will deliver the data, and so you will need to install VOPlot locally if you want to use this feature. Do do this, `download VOPlot `_ and unpack the distribution at a convenient place. In the resulting folder, you'll find a subdirectory binaries/VOPlot. Copy that and (less importantly) docs/user_guide to some location your webserver can serve from, e.g., :: mkdir /var/www/htdocs/VOPlot cp -r binaries/VOPlot /var/www/htdocs/VOPlot/bin cp -r docs/user_guide/ /var/www/htdocs/VOPlot/doc Assuming that /var/www/htdocs corresponds to the root of your webserver, you can then tell the GAVO software where it can find VOPlot in your /etc/gavo.rc (see below):: [web] voPlotEnable: True voplotCodeBase: /VOPlot/bin voplotUserman: /VOPlot/doc wcstools -------- To support cutout services, you need getfits from the wcstools package available at http://tdc-www.harvard.edu/software/wcstools/. After building wcstools, the binary is in /bin/getfits. The system expects the binary getfits in inputsDir/cutout/bin/getfits, you can optionally add a platform suffix.