Log data of the GAVO DC.
`_ , we log
accesses to our services. For data protection, no IPs are stored. Instead,
we have user handles which are IPs shrouded with a monthly changing
random key. Thus, counts for distinct hosts and the like are
not possible across months.
The acc_month column in the database reflects the shrouding key used. At
the end of a month, some requests will be counted towards the
next month. Since the statistics are somewhat bogus anyway, we
will not change this behaviour.
To filter out maintenance and robot accesses as well as let users opt
out of logging, the following measures are taken:
* Certain IPs of machines used for development and maintenance are excluded.
* Accesses with certain known robot signatures are not logged (starting
Feb 2008; we did not have access to user-agent information before).
* IPs that access robots.txt are ignored after that access.
* Certain known spider subnets are excluded.
The identification of the service used currently is somewhat crude.
If the request is for one of the renderers in {form, custom, siap.xml,
scs.xml, soap}, the service is taken to be the first element in the URI.
Similarly, it is not trivial to tell accesses actually retrieving
data from those that
merely retrieve a form. A good measure for retrieving queries is the
number of accesses using the POST method. However, within the
DC retrieving queries may be posted using GET as well, and only a
close inspection
of the query itself and knowledge of the service queried could identify
them as retrieving queries. This is not done here.
]]>
2008-08-18T16:00:00
virtual-observatories
with base.getAdminConn() as conn:
conn.execute("vacuum analyze \schema.accesses")
execDef.spawnPython("bin/addgaiareports.py")
if 0==execDef.spawn(["gavo", "imp", "logs/logs"]):
# if all went well, remove files just imported
stashPath = rd.getAbsPath("delete")
for fName in os.listdir(stashPath):
os.unlink(os.path.join(stashPath, fName))
return 1
execDef.spawnPython("bin/pseudonymize.py")
uri text_pattern_ops
Import this data to create the access
table. It can also be used to feed the data from externally
computed log data.
Import this data to update the accesses
table from nevow logs rotated out.
GAIA access summaries
SELECT acc_year+(acc_month-1)/12. as monspec,
count(*) AS hosts,
sum(hostAccs) AS accesses
FROM (
SELECT acc_month, acc_year, handle,
count(handle) as hostAccs
FROM weblogs.accesses %s
GROUP BY acc_month, acc_year, handle) AS q
GROUP BY acc_month, acc_year
ORDER BY acc_year, acc_month
dc_stats
Access statistics for the GAVO data centre.
This service lets your query for
service usage on the GAVO data centre. See the service info on what data
is exposed here.
No field is required -- leave out year and month for all available
data, leave service blank for all recorded accesses.
Endpoint for GAIA log gatherer
logs_for_gaiastats/form
self.assertHasStrings("#Hosts", "td>3</td>")
/__system__/adql/query/form
self.assertHasStrings("permission denied for schema weblogs",
"Result link")