{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "This notebook introduces a few VO techniques for use with python. You need astropy and pyvo installed to make this work. python3 is assumed. It is part of the pyvo course at http://docs.g-vo.org/pyvo, which probably will help a lot to understand what's going on here." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Our use case will be something like \"Find all time series of all bright AGB stars\", but the techniques introduced here have much wider applicability. Oh, and as of this writing, there are not too many time series in the VO, but we're working on this." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "While there are ways to do this with pre-made clients, scripting this gives you great flexibility as well as the analysis capabilities of python. So, let's interface python with the VO. The most complete module to do that is pyvo. See https://pyvo.readthedocs.io/en/latest for more documentation. If you don't have it, try pip3 install pyvo.\n", "\n", "You also want TOPCAT. If you don't have that yet, this is probably not something you'd like to try – get some less nerdy VO exposure first." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "import matplotlib.pyplot as plt\n", "import pyvo\n", "# the following calms down astropy's overzealous VOTable\n", "# parser\n", "import warnings\n", "warnings.filterwarnings('ignore', module=\"astropy.io.votable.*\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The first step is: Find a list of bright Herbig-Haro objects. There are many ways to do that, but a good first step towards problems like this is typically to use SIMBAD. And we want powerful query modes (that perhaps we don't really need here, but they're definitely good to have), so we're looking for a TAP service." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Since it's so much faster to discover Simbad's TAP service using TOPCAT's TAP window or registry interfaces like http://dc.g-vo.org/WIRR, we do that and find out that the TAP access URL is http://simbad.u-strasbg/simbad/sim-tap. Keep the table browser in TOPCAT open, as you will want to use it for query construction (not that you couldn't introspect table metadata from pyVO, but that interface is built for machines, not for humans)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First create an object representing the Simbad TAP service:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sim_tap = pyvo.dal.TAPService(\n", " \"http://simbad.u-strasbg.fr/simbad/sim-tap\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are analogous classes for other VO protocols (SIAP, SSA, SCS). They all have additional attributes allowing their manipulation and inspection. For a TAP service, your program might want to check table metadata. Here's an example looking for columns with magnitudes:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "for table_name, table in sim_tap.tables.items():\n", " for column in table.columns:\n", " if column.ucd and column.ucd.startswith(\"phot.mag\"):\n", " print(table_name, column.name)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Regrettably, this isn't useful in this case; the real magnitudes in Simbad are given in the allfluxes table, and tehy don't have UCDs there because... well, I simply don't know. Try asking them; a contact address in, for instance, in the Service tab in TOPCAT.\n", "\n", "Anyway, the TOPCAT table browser gets us on the right track (the allfluxes tables). Also, use the Reference URL from the Service tab to investigate the object types and what to write in otype. Once you have a query (and of course it's a good idea to prototype it in TOPCAT):" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "agbs = sim_tap.run_sync(\"\"\"\n", "select ra, dec, main_id\n", "from basic join allfluxes on (oidref=oid)\n", "where otype='AGB'\n", "and V<10\n", "\"\"\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What's coming back can be turned into an astropy table using the to_table() method:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "agbs.to_table()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now let's see if there's any time series for these out there. You could do an all-VO query using SSAP (and that's a good exercise; use servicetype=\"SSA\" in the registry query) -- SSAP is currently being used to publish time series, too. But my bets for the future are on obscore, so let's use that. \n", "\n", "Let's first develop a query on a single server. And let's use my own, http://dc.g-vo.org/tap" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What do we want to run? Well, check out the Obscore table structure; either in TOPCAT's table browser or even in the underlying standard (see http://ivoa.net/documents). You'll see we want to constrain dataproduct_type to timeseries, and we want to upload join s_ra and s_dec to the positions from Simbad. Let's try things first with one service; also note how table uploads work in pyVO:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "timeseries = pyvo.dal.TAPService(\"http://dc.g-vo.org/tap\"\n", " ).run_sync(\"\"\"\n", " select\n", " obs_collection, access_url, access_estsize, \n", " t_min, t_max, em_min, em_max, \n", " h.*\n", " from tap_upload.agbs as h\n", " join ivoa.obscore\n", " on 1=contains(point('', h.ra, h.dec), \n", " circle('', s_ra, s_dec, 1/3600.))\n", " where dataproduct_type='timeseries'\n", " \"\"\",\n", " uploads= {'agbs': agbs})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Mainly because of generalised confusion this query may run for some 10 seconds.\n", "\n", "In a few years, when everyone has TAP 1.1 and ADQL 2.1, you would certainly write what you can already write on this particular server for the join condition:\n", "\n", "```\n", "ON 1./3600>DISTANCE(s_ra, s_dec, h.ra, h.dec)\n", "```\n", "\n", "But alas, that wouldn't have worked on many ObsTAP servers yet (2018)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's see what we have:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "timeseries.to_table()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can now load a time series and plot it, perhaps like this. I frankly don't know if there's a simple way to make astropy fetch a table from a remote URL, and I got tired looking for one, so I define a quick function to do that:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from astropy import table\n", "from urllib.request import urlopen\n", "from io import BytesIO\n", "def load_remote_table(url):\n", " if isinstance(url, bytes):\n", " url = url.decode(\"utf-8\")\n", " f = urlopen(url)\n", " return table.Table.read(\n", " BytesIO(f.read()))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# If the following fails for you, don't worry -- you have an outdated\n", "# pyvo, that's all. Ignore it and happily continue.\n", "ts = load_remote_table(\n", " timeseries.to_table()[0][\"access_url\"])\n", "plt.plot(ts[\"obs_time\"], ts[\"flux\"])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Or we send the access URLs we've discovered to TOPCAT. Again, astropy's SAMP interface is quite clunky as of version 3, so let's define a couple of functions to make this more palatable (you don't need to understand everything that's happening in the next cell)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import contextlib, os, tempfile\n", "from astropy.vo.samp import SAMPIntegratedClient, SAMPProxyError\n", "\n", "\n", "def find_client(conn, samp_name):\n", " \"\"\"returns the SAMP id of the client with samp.name samp_name.\n", "\n", " This will raise a KeyError if the client is not on the hub.\n", " \"\"\"\n", " for client_id in conn.get_registered_clients():\n", " if conn.get_metadata(client_id).get(\"samp.name\")==samp_name:\n", " return client_id\n", " raise KeyError(samp_name)\n", "\n", "\n", "@contextlib.contextmanager\n", "def samp_accessible(astropy_table):\n", " \"\"\"a context manager making astropy_table available under a (file)\n", " URL for the controlled section.\n", "\n", " This is useful with uploads.\n", " \"\"\"\n", " handle, f_name = tempfile.mkstemp(suffix=\".xml\")\n", " with os.fdopen(handle, \"w\") as f:\n", " astropy_table.write(output=f,\n", " format=\"votable\")\n", " try:\n", " yield \"file://\"+f_name\n", " finally:\n", " os.unlink(f_name)\n", " \n", " \n", "def send_product_to(conn, dest_client_id, data_url, mtype, name=\"data\"):\n", " \"\"\"sends SAMP messages to load data.\n", "\n", " This is a helper for send_spectrum_to and send_image_to, which work\n", " exactly analogous to each other, except that the mtypes are different.\n", "\n", " If dest_client_id, this is a broadcast (and we don't wait for any\n", " responses). If dest_client_id is given, we wait for acknowledgement\n", " by the receiver.\n", " \"\"\"\n", " message = {\n", " \"samp.mtype\": mtype,\n", " \"samp.params\": {\n", " \"url\": data_url,\n", " \"name\": name,\n", " }}\n", " if dest_client_id is None:\n", " conn.notify_all(message)\n", " else:\n", " conn.call_and_wait(dest_client_id, message, \"10\")\n", "\n", "\n", "@contextlib.contextmanager\n", "def SAMP_conn(\n", " client_name=\"pyvo client\", \n", " description=\"A generic PyVO client\",\n", " **kwargs):\n", " \"\"\"a context manager to give the controlled block a SAMP connection.\n", "\n", " The program will disconnect as the controlled block is exited.\n", " \"\"\"\n", " client = SAMPIntegratedClient(\n", " name=client_name,\n", " description=description,\n", " **kwargs)\n", " client.connect()\n", " try:\n", " yield client\n", " finally:\n", " client.disconnect()\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "I told you the interface was clunky. But the reward is that SAMP is now quite simple:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "with SAMP_conn() as conn:\n", " topcat_id = find_client(conn, 'topcat')\n", " for match in timeseries:\n", " send_product_to(conn, \n", " topcat_id, \n", " match[\"access_url\"].decode(\"utf-8\"),\n", " \"table.load.votable\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You should now see the various time series popping up in TOPCAT, where you can investigate them as usual.\n", "\n", "Now it's your turn: Build a thing that does an all-VO obscore search for spectra – perhaps of these guys, or perhaps of something you are interested in.\n", "\n", "You'll need a few extra ingredients, though. First, here's how to discover the access URLs of all the TAP services out there that claim to support obscore (once you have those, you know how to query the services, right?):" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "for svc in pyvo.regsearch(datamodel='ObsCore'):\n", " print(svc.access_url)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When querying lots of external resources, it pays to expect failures. Let's define a function that runs TAP queries, well, resiliently:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def run_sync_resilient(svc, *sync_args, **sync_kw_args):\n", " try:\n", " return svc.run_sync(*sync_args, **sync_kw_args) \n", " except (\n", " pyvo.dal.DALServiceError, \n", " pyvo.dal.DALQueryError,\n", " requests.ConnectionError) as ex:\n", " print(\"{}:{}\".format(svc.baseurl, ex))\n", " return\n", " except KeyboardInterrupt: # Let the user abort slow queries\n", " return" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "One more think I should tell you to save you some poking around in documentation: How to merge the astropy tables coming back from different services. Here's a trivial example that should get you going:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "results = []\n", "for svc_url in [\n", " \"http://vao.stsci.edu/CAOMTAP/TapService.aspx\",\n", " \"http://dc.g-vo.org/tap\"]:\n", " svc = pyvo.dal.TAPService(svc_url)\n", " results.append(\n", " svc.run_sync(\n", " \"SELECT TOP 2 obs_collection, access_url FROM ivoa.obscore\"\n", " ).to_table())\n", "merged = table.vstack(results)\n", "merged" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What remains to do: Change the query above to your liking (at least add a TOP 10 or so lest you be flooded with results when someone puts up an AGB spectrum central), iterate over the services, and then merge the results. To investigate them (e.g., by wavelength and time range, etc), send the merged table to TOPCAT." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3" } }, "nbformat": 4, "nbformat_minor": 2 }