========================================= GAVO DaCHS installation and configuration ========================================= .. contents:: :depth: 3 :backlinks: entry :class: toc Installation ============ Debian systems -------------- The preferred way to run python-gavo is on Debian stable or compatible systems. Adding the GAVO repository '''''''''''''''''''''''''' On such systems, add the lines :: deb http://vo.ari.uni-heidelberg.de/debian lenny main deb-src http://vo.ari.uni-heidelberg.de/debian lenny main to your ``/etc/apt/sources.list``. We will not clobber packages from stable *except* when we are very confident they are backwards compatible. Thus, while we cannot actually promise anything, including this repository should not impact your systems's stability in any way, even though we may occansionally include an updated package or two (typically, python-psycopg2). You should then get our archive key to avoid nasty questions from aptitude/apt-get. Our archive key's id is D8C139FC, and it is on the key servers. Its fingerprint is:: B199 D643 9DBB 98D7 9F4E AD09 8B6C 75C0 D8C1 39FC The quick and dirty way to install the key on your machine is:: wget -qO - http://vo.ari.uni-heidelberg.de/docs/archive-key.asc \ | sudo apt-key add - Installing GAVO DaCHS ''''''''''''''''''''' After that, it's a matter of:: sudo aptitude update sudo aptitude install gavodachs This will pull in all the necessary dependencies. Non-Debian Systems/ez_install ----------------------------- If apt is not an option for you, you can install python-gavo based on easy_install. You first need a working python installation that can build python extensions. If you installed python yourself and have a C compiler, that should already be the case, on some distributions you need packages like python-dev, python-devel or somesuch installed in addition to the binary python. Once you have that, get the source tar.gz, unpack it somewhere, cd into resulting directory and then say :: sudo python setup.py install (there are various options to get the stuff installed when you're not root; refer to the `setuptools documentation `_ if necessary). Optional components ------------------- Q3C ''' If at some point you have large data sets (more than 20000 rows, say) and you want to do positional searches on them (cone search, crossmatch, siap and such), you will want the Q3C library by Sergey Koposov et al, http://www.sai.msu.su/~megera/oddmuse/index.cgi/SkyPixelization. It's not particularly tricky to install, but you should skip it for now. To install it, get the source from the web site given above. You need a couple of dependencies (on debian etch systems, that would at least be libpam-dev, libreadline-dev, and postgresql-server-dev-8.1). Then follow the instructions in the q3c distribution. To automatically create the necessary indices, use the q3cpositions interface. psycopg2 '''''''' psycopg2 is the binding of choice for postgresql (pgsql might still work, but what support is left is definitely scheduled for removal). However, the default psycopg2 distribution does not support queries with timeouts. You must install a patched package provided with the distribution. VOPlot '''''' The software can generate code to load VOTable into VOPlot, a Java applet for working with VOTables. Due to the applet security model, the applet has to originate from the server that will deliver the data, and so you will need to install VOPlot locally if you want to use this feature. Do do this, `download VOPlot `_ and unpack the distribution at a convenient place. In the resulting folder, you'll find a subdirectory binaries/VOPlot. Copy that and (less importantly) docs/user_guide to some location your webserver can serve from, e.g., :: mkdir /var/www/htdocs/VOPlot cp -r binaries/VOPlot /var/www/htdocs/VOPlot/bin cp -r docs/user_guide/ /var/www/htdocs/VOPlot/doc Assuming that /var/www/htdocs corresponds to the root of your webserver, you can then tell the GAVO software where it can find VOPlot in your /etc/gavo.rc (see below):: [web] voPlotEnable: True voplotCodeBase: /VOPlot/bin voplotUserman: /VOPlot/doc wcstools '''''''' To support cutout services, you need getfits from the wcstools package available at http://tdc-www.harvard.edu/software/wcstools/. After building wcstools, the binary is in /bin/getfits. The system expects the binary getfits in inputsDir/cutout/bin/getfits, you can optionally add a platform suffix. Setup ===== The tools want a "root directory" for GAVO, a directory in which the sources reside, ancillary data is kept, etc. By default, this is ``/var/gavo``. If you want something else, set the environment variable ``$GAVO_HOME`` or (probably better) adapt your /etc/gavorc. Logging ------- The tools will write a log (by default ``rootDir/logs/gavoops``). It may be a good idea to keep that log on a local file system and not share it with other instances, not the least because we do not provide for any kind of locking on that file. To do that, set the environment variable GAVO_LOGDIR to a directory you want the logs in. That directory should, of course, only be writable by you. If you insist, it wouldn't be hard to make the tools log to syslog. Database Setup -------------- To make things work, you will also need to set up a database and create at least one profile file (see `The profiles Section`_). Currently, we only support postgres. You should have one user that may create tables and schemas, so you probably need a dedicated database. Cluster Creation '''''''''''''''' You first need a database to play with, preferably in a suitable cluster. This is very system-dependent, and ideally a database admin would assist you. You may entirely skip this section on making a custom cluster if you the default cluster works for you. On Debian systems dedicated for GAVO DaCHS, you can try the following: (#) Find out the version of the server you will be running (e.g., using ``dpkg -l``; in Debian, more than one version may be installed in parallel. It's probably a good idea to use the most recent one. Set your desired version for subsequent use:: export PGVERSION=8.3 (#) Drop the Debian default cluster (this will delete everything in there -- for a fresh install, that doesn't matter, but don't do this if other people use the database). If you don't do this, your database will listen do a different port, and you will have to adapt the default profiles:: sudo pg_dropcluster --stop $PGVERSION main (#) Create the new cluster used by DaCHS:: sudo pg_createcluster -d /pgdata --locale=en_US.UTF-8\ --lc-collate=C --lc-ctype=C $PGVERSION pgdata The locale is not terribly important, but you should use one with utf-8 (or some other unicode) encoding. The database stores descriptions and similar entities, and you may encounter funny characters in there. It would be a shame if you couldn't store them (plus, you would get funny error messages for those). (#) Start the server:: sudo /etc/init.d/postgresql-$PGVERSION start (#) You probably want to make yourself a superuser account so you don't need to become postgres all the time. To do that, say:: sudo -u postgres createuser -P -adsr On Debian, the configuration files for this cluster are at ``/etc/postgresql/$PGVERSION/pgdata/``. Database Preparation '''''''''''''''''''' The data center software accesses the database in various functions. These are mapped to profiles which correspond to access information (basically, the DSN, user, and password). There are three of them: * feed -- the "admin" profile, used for feeding tables into the normal database, for user management, credentials checking and the like. * trustedquery -- this profile is used for queries generated by the DC software (though usually on behalf of a user). The corresponding DB role can access all "normal" tables, privilege management is supposed to happen throught the web interface. * untrustedquery -- the profile used for user-contributed SQL. Only tables expressly opened up are accessible to it. You can adapt all those names, but would need to change the default configuration files then. The following procedure sets up users and databases as expected by the default profiles:: sudo -u postgres createdb --encoding=UTF-8 gavo # creates the database that'll hold your data # create the user that feeds the db... sudo -u postgres createuser -P -ADSr gavoadmin # and a user that usually has no write privileges sudo -u postgres createuser -P -ADSR gavo # and a user for "scratch tables" writable from the world sudo -u postgres createuser -P -ADSR untrusted Enter the passwords you assign here into the ``feed``, ``trustedquery``, and ``untrustedquery`` profiles, respectively. These profiles are found in ``/var/gavo/etc``. Note that running a database server always is a security liability. You should make sure you understand what the pg_hba.conf (in postgres' configuration directory) says. As a minimum, you should have a line like:: local gavo gavo,gavoadmin,untrusted md5 in there; it allows password authentication for the three users above from the local machine. If you have two machines sitting on a reasonably trusted local network, you could say something like:: host gavo gavo,gavoadmin,untrusted xxx.xxx.xxx.0/24 md5 (where the x must be replaced with your network number). If you insist on having the data between DaCHS and the postgres server go through untrusted lines, see the postgres docs on how to set up an SSL connection. Finally, you need to let the various roles you just created access the database; you do this using the command line interface to postgres:: sudo -u postgres psql gavo GRANT ALL ON DATABASE gavo TO gavoadmin; For the individual tables, rights to gavo and untrusted are granted by gavoimp, so you do not need to specify any rights for them. Configuration ============= You can adapt the gavo environment through a configuraton file. It is located by default in ``/etc/gavo.rc``. You can change that location using the GAVOSETTINGS environment variable (or ``defaultSettingsPath`` in ``config.py``, though that will break on updates). You can override global settings in your private settings file, ``~/.gavorc`` (you can override that location through GAVOCUSTOM). Gavorcs have a roughly INI-style format with sections in square brackets and keys beyond them. To override a setting, you need to have the same key in the same section. We will first provide an informal walkthrough of the various configuration sections and give a reference below. Walkthrough ----------- The general Section ''''''''''''''''''' This mainly sets paths. The most important is ``rootDir``, a directory most other paths are relative to. This is the one you'll most likely want to change. If you, e.g., wanted to have a private gavo tree, you could put :: [general] rootDir: /home/user/gavo into your ~/.gavorc. The other paths in this section are relative to rootDir, but absolute paths are legal, too. You may want to set tempDir and cacheDir to a directory local to your maching if rootDir is mounted via a network. Also note that we do no synchronization for writing to the log (and never will -- we will provide syslog based logging if necessary), so you may want to tweak this too to keep actions from seperate users seperate. The db Section '''''''''''''' In the db section, some global properties of the database access layer are defined. Currently, the most releveant one is profilePath. This is a colon-separated list of rootDir-relative paths in which we look for database profiles (expansion of home directories is supported). The first match in any of these directories wins. This is useful when you have a test setup and a production setup -- just say ``include dsn`` in the common profiles (by default in configDir) and have separate dsn files in the ~/.gavo directories of the accounts feeding the test and production databases. In addition, you can supply a comma separated list of database roles in maintainers. There are roles that get gavoadmin-like privileges on all tables created by the system. This is provided so people working on the system do not need to go through the gavoadmin account to do DB housekeeping. Note that when you change maintainers, you *must* keep gavoadmin in it unless you know excatly what you do. The DC software will revoke all kinds of privileges from its own feeder if gavoadmin isn't part of maintainers, which will be a mess to fix. queryRoles and adqlRoles correspond to the roles defined during db setup. Again, you can supply more than one such role, but you must at least give those defined in the trustedquery and untrustedquery profiles. The profiles Section '''''''''''''''''''' The profile section maps profile names to file names. These file names are relative to any of the directories in db.profilePath. The file names contain a specification of the access to the database in (unfortunately yet another, but simple) language. Each line in such a profile is either a comment (starting with #), an assignment (with "=") or an instruction (consisting of a command and arguments, separated by whitespace). Keywords available for assignment are * host -- the host the database resides on. Leave empty for a Unix socket connection. * port -- the port the database listens on. Leave empty for default 5432. * database -- the database your tables live in. * user -- the user through which the db is accessed. * password -- the password of user. Commands available are * include -- read assignments and instructions from the profile given in the argument A typical setup could look like this (all files located in configDir, e.g., /var/gavo/etc): A file dsn: :: host = computer.doma.in port = 5432 database = gavo A file feed, giving access to the admin account: :: include dsn user=gavoadmin password=mySecretPassword A file trustedquery, giving access one of the queryRoles: :: include dsn user=gavo password=anotherSecretPassword A file untrustedquery, giving access to one of the adqlRoles: :: include dsn user=untrusted password=giveItAway The Web Section ''''''''''''''' * serverURL: the scheme and host under which the service is visible (like, http://vo.foo.org) * nevowRoot: a serverURL-relative URL to the place the nevow service appears in the server. * errorPage: debug or something else -- whether to show a traceback on the net or not. * staticURL: a serverURL-relative URL to static the data (for querulator) Reference --------- You can get an up-to-date version of this by running ``gavoconfig``. Section [general] ''''''''''''''''' Paths and other general settings. * cacheDir: path relative to rootDir; defaults to 'cache' -- Path to the DC's persistent scratch space * configDir: path relative to rootDir; defaults to 'etc' -- Path to the DC's non-ini configuration (e.g., DB profiles) * defaultProfileName: string; defaults to 'admin' -- Default profile name (used to construct system entities) * gavoGroup: string; defaults to 'gavo' -- Name of the unix group that administers the DC * inputsDir: path relative to rootDir; defaults to 'inputs' -- Path to the DC's data holdings * logDir: path relative to rootDir; defaults to 'logs' -- Path to the DC's logs (should be local) * logLevel: value from the list info, debug, warning, error; defaults to 'info' -- Verboseness of importer * operator: string; defaults to '' -- Mail address of the DC's operator(s). * platform: string; defaults to '' -- Platform string (can be empty if inputsDir is only accessed by identical machines) * rootDir: string; defaults to '/var/gavo' -- Path to the root of the DC file (all other paths may be relative to this * stateDir: path relative to rootDir; defaults to 'state' -- Path to the DC's state information (last imported,...) * tempDir: path relative to rootDir; defaults to 'tmp' -- Path to the DC's scratch space (should be local) * webDir: path relative to rootDir; defaults to 'web' -- Path to the DC's web related data (docs, css, js, templates...) Section [adql] '''''''''''''' Settings concerning the built-in ADQL core * webDefaultLimit: integer; defaults to '2000' -- Default match limit for ADQL queries via the web form * webTimeout: integer; defaults to '15' -- Default timeout for adql queries via the web form Section [db] '''''''''''' Settings concerning database access. * adqlRoles: set of strings; defaults to 'untrusted' -- Name(s) of DB roles that get access to tables opened for ADQL * defaultLimit: integer; defaults to '100' -- Default match limit for DB queries * interface: string; defaults to 'psycopg2' -- Don't change * maintainers: set of strings; defaults to 'gavoadmin' -- Name(s) of DB roles that should have full access to gavoimp-created tables by default * msgEncoding: string; defaults to 'utf-8' -- Encoding of the messages coming from the database * profilePath: shell-type path; defaults to ' ~/.gavo:$configDir' -- Path for locating DB profiles * queryRoles: set of strings; defaults to 'gavo' -- Name(s) of DB roles that should be able to read gavoimp-created tables by default Section [ivoa] '''''''''''''' The interface to the Greater VO. * authority: string; defaults to 'org.gavo.dc' -- the authority id for this DC * dalDefaultLimit: integer; defaults to '10000' -- Default match limit on DAL queries * registryIdentifier: string; defaults to 'ivo://org.gavo.dc/static/registryrecs/registry.rr' -- The IVOAid for this DC's registry Magic Section [profiles] '''''''''''''''''''''''' Mapping of DC profiles to profile definitions. The items in this section are all of type profile name. You can add keys as required. * admin: profile name; A name of a file in [db]profilePath * deploydb: profile name; A name of a file in [db]profilePath * test: profile name; A name of a file in [db]profilePath * trustedquery: profile name; A name of a file in [db]profilePath * untrustedquery: profile name; A name of a file in [db]profilePath Section [ui] '''''''''''' Settings concerning the local user interface * outputEncoding: string; defaults to 'iso-8859-1' -- Encoding for system messages. This should match what your terminal emulator is set to Section [web] ''''''''''''' Settings related to serving content to the web. * adminpasswd: string; defaults to '' -- Password for online administration, leave empty to disable * adsMirror: string; defaults to 'http://ads.ari.uni-heidelberg.de' -- Root URL of ADS mirror to be used * bindAddress: string; defaults to '127.0.0.1' -- Interface to bind to * enableTests: boolean; defaults to 'False' -- Enable test pages (don't if you don't know why) * errorPage: string; defaults to 'debug' -- set to 'debug' for error pages with tracebacks, anything else for a less informative page * favicon: path relative to webDir; defaults to 'None' -- Webdir-relative path to a favicon * graphicMimes: list of strings; defaults to 'image/fits,image/jpeg' -- MIME types considered as graphics (for SIAP, mostly) * maxPreviewWidth: integer; defaults to '300' -- Hard limit for the width of previews (necessary because previews on protected items are free) * nevowRoot: path fragment; defaults to '/' -- Path fragment to the server's root for operation off the server's root * previewCache: path relative to webDir; defaults to 'previewcache' -- Webdir-relative directory to store cached previews in * serverPort: integer; defaults to '8080' -- Port to bind the server to * serverURL: string; defaults to 'http://localhost:8080' -- URL fragment used to qualify relative URLs where necessary * sitename: string; defaults to 'GAVO data center' -- A short name for your site * sqlTimeout: integer; defaults to '15' -- Default timeout for db queries via the web * templateDir: path relative to webDir; defaults to 'templates' -- webDir-relative location of global nevow templates * user: string; defaults to 'None' -- Run server as this user (leave empty to not change user) * vanityNames: path relative to webDir; defaults to 'vanitynames.txt' -- Webdir-realtive path to the name map for vanity names * voplotCodeBase: URL fragment relative to the server's root; defaults to 'static/voplot/VOPlot' -- URL of the code base for VOPlot * voplotEnable: boolean; defaults to 'False' -- Enable the VOPlot output format (requires some external software) * voplotUserman: URL fragment relative to the server's root; defaults to 'static/voplot/docs/VOPlot_UserGuide_1_4.html' -- URL to the documentation of VOPlot Binaries -------- For extended functionality, the system uses external binaries. If your GAVO_ROOT is accessible from more than one machine and the machines have different architectures, this may be a problem, because one platform may not be able to execute another platform's binaries. To fix this, set platform in the DEFAULTS section of your config file. You can then rename any platform-dependent executable base-, and if on the respective platform, that binary will be used. This also works for computed resources using binaries, and those parts of the DC software that build binaries (e.g., the booster machinery) will automatically add the platform postfix. If you build your own software, a make file like the following could be helpful: :: PLATFORM=$(shell gavoconfig platform) TARGET=@@@binName@@@-$(PLATFORM) OBJECTS=@@@your object files@@@ $(REG_TARGET): buildstamp-$(PLATFORM) $(OBJECTS) $(CC) -o $@ $(OBJECTS) buildstamp-$(PLATFORM): make clean rm -f buildstamp-* touch buildstamp-$(PLATFORM) You'll have to fill in the @@@.*@@@ parts and probably write rules for building the files in $OBJECT, but otherwise it's a simple hack to make sure a make on the respective machine builds a suitable binary. Specifying meta information fallbacks ===================================== The file ``$DACHS/etc/defaultmeta.txt`` must exist. TBD Preparing the system tables =========================== DaCHS uses some internal tables that need to be prepared before you can start to ingest data. You should now be able to run the following commands (in that order):: gavoimp --system __system__/dc_tables gavoimp --system __system__/services gavoimp --system __system__/users gavoimp --system __system__/products The Portal ========== The root URL of the data center should return a page that explains what the site is all about and list what's available. The template for this page is located in root.html in templateDir (which defaults to GAVO_ROOT/web/templates and is defined in the web section of config). A list of services in alphabetic order can be obtained using something like ::