GUMS is a simulation of some part of the GAIA result set. A bit more info on what's in the files is on http://www.rssd.esa.int/wikiSI/index.php?title=GAP:GUMS10&instance=Gaia The data set was obtained via rsync 2012-02 directly from gaia.esac.esa.int and provided by William O'Mullane. The files came in the GAIA-internal "gbin" format, which is a concatenation of Java serialized strings containing zip archives of Java serialized objects. Hell knows why they did it like this. There's python code that can read the original gbin format (see `Parsing gbin directly`_), but that's too slow for the milky way data set. After some experimentation with a C-based JSO parser I decided writing one powerful enough to robustly parse the input was too tedious -- in particular, there's simply no way around keeping all class information and most objects in memory all the time. That's what bin/makebooster.py and everything in src are about, and all that is not used. The data in the database is based on the data.txt.gz files present in all data subdirectories. These were generated locally; see `Converting gbin files`_. People one could ask about this mess include: * Wil O'Mullane * Xavi Luri (xluri@am.ub.es; he basically gave permission to publish) Parsing gbin directly ===================== For experiments and/or convenience, you can import directly from the gbin files. The corresponding data items in q.rd ("gbinimp_x") are commented out right now. Here's how to come up with the custom grammars (res/*grammar.py) for these things: bin/getschema.makePythonExpression (edit the main function to use it) figures out the field sequence by inspecting what's in a given file (for most subdirectories of data/GUMS-10, the schema is different). It then spits out python source for mapping deserialized objects to rowdicts, which is all that's variable in the custom grammars. Incidentally, getschema.py has also been used to infer the table definitions in q.rd, but there's probably better ways to do this using GAIA's Java mess. Converting gbin files ===================== The code in gbindec uses gaiatools and friends to read the files. Unfortunately, the GAIA support code is a mess that's hard to cut through. Therefore, I've jarred together all class files in the vicinity of gaiatools to gbindec/gaiaenv.jar (roughly 100 Megs, not in version control, let's hope it stays where it is until all this can mercifully be forgotten). The code actually doing the work is below gbindec/gaia. Most interesting is the stuff in gbindec/gaia/cu3/gbin2ascii/converter -- this is the code that's DM-specific. It was generated using src/getschema.py, too; see the makeAllConverters.sh shell script within that file. To build the whole thing, say ``make`` in gbindec. If you want to see the java classes that are serialized into the gbindec files, unjar gaiaenv.jar and check gaia/cu1/mdb/cu2/um; interesting classes reside in dm/ (Root, PhotoRoot, UMAstroRoot) and umtypes/dm (rest). After that, call ``gbindec/bin2ascii`` with a directory name as its argument to dump the contents of all the gbin files within that directory to a text file. ``bin/convertAllToASCII.sh`` uses that to traverse the whole data tree and create the data.txt.gz files. The actual boosters were then built using:: gavo mkboost -s '|' q sn > res/snbooster.c gavo mkboost -s '|' q quasars > res/quasarsbooster.c gavo mkboost -s '|' q galaxies > res/galaxiesbooster.c gavo mkboost -s '|' q mw > res/hostedbooster.c -- plus a few manual fixes (variabilitytype can be NULL, hasphotocentermotion has values true/false that need to be converted to 1/0, at the end, one needs a *strchr(curCont, ' ') = 0 to terminate sourceextendedid). Since the column sequence in the RD matches the sequence within the ``data.txt.gz`` files, the boosters should work as generated.