Geeking out.
Sep. 4th, 2014 10:38 pm![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
Things I want to play with: PANDAS. Time series data. RESTful interfaces. Machine learning algorithms.
To that end, I downloaded the WADL spec for data coming from ISO-NE.com. It's complete. There is no point meddling with WADL here, as all the service is read-only for yours truly. The important part of the WADL spec is the < grammars %gt; section, which is an XMLSchema.
That schema is complete, so I need something to parse it. I just tried generateDS.py, which almost does everything I need. The first failing is that it generates Python objects from the XML data I receive, and the objects all have the right attribute names and values, but it does not retain information about attribute types. I specifically need to know which attribute types are unique IDs (location IDs, mostly) so that I can make them columns in the PANDAS files I'm generating. The other failing is that it can only parse XML inputs. The JSON inputs which ISO-NE also provides, I cannot parse yet.
To that end, I downloaded the WADL spec for data coming from ISO-NE.com. It's complete. There is no point meddling with WADL here, as all the service is read-only for yours truly. The important part of the WADL spec is the < grammars %gt; section, which is an XMLSchema.
That schema is complete, so I need something to parse it. I just tried generateDS.py, which almost does everything I need. The first failing is that it generates Python objects from the XML data I receive, and the objects all have the right attribute names and values, but it does not retain information about attribute types. I specifically need to know which attribute types are unique IDs (location IDs, mostly) so that I can make them columns in the PANDAS files I'm generating. The other failing is that it can only parse XML inputs. The JSON inputs which ISO-NE also provides, I cannot parse yet.
no subject
Date: 2014-09-13 08:55 pm (UTC)Pandas rocks all socks.
Check out ddlgenerator in PyPI - it's kind of a mess but might do what you need very easily. And then once you've shoved that data into an RDBMS with ddlgenerator, you can use ipython_sql to query it out into Pandas DataFrames.
no subject
Date: 2014-09-13 09:15 pm (UTC)The data in question can be represented pretty intuitively as multisheeted spread sheets.
Each sheet belongs to a UID'd entity (e.g. a power generator or a transmission junction), each column on the sheet a numerical observation belonging to it, and each row a point in time.
If ddl-generator figures it out from the JSON output, that would pretty much rock.