In 2002, The Heinz Center released its
report on ecosystem conditions in the United Stated, “The State of the Nation’s Ecosystems.” The authors declared the assessment as incomplete due to the lack of data collection, reporting and systems infrastructure to sufficiently assess ecosystem condition. Yet the reality is that between the efforts of academic researchers, local and national governments and conservation organizations, an enormous amount of information deeply relevant to conservation
is being collected, even in digital form. However, variation in syntax (e.g., file format) and semantics (e.g., termonology) prevent practical aggregation and analysis of collected data.
In my
last post, I described this fundamental problem in conservation information systems, that of data model variability. I described how the variation in the schema of common conservation information systems entities such as observations, protected areas, conservation projects, conservation activities frustrates our ability to provide rich data entry/management applications as well as aggregation and analysis capabilities, capabilities that would significantly inform assessments like "The State of the Nations Ecosystems."
The solution is to develop a system that supports rich data entry/management/reporting and yet is independent of a specific data model. The system would treat the definition of entities such as observations or protected areas as data itself. Each of these entities has a core schema, the set of attributes that makes it what it is. A species observation, for instance, consists of an observer (person), location, date and time, and a species identifier. This core can then be augmented with observation attributes from a library (e.g. egg count, nest height, stratum, life stage). When needed, more advanced users can build their own attributes (solarization, acidity) and submit them to the common attribute library. Finally, the entity core schema and a selected set of extended attributes can be combined into an extended entity schema. Extended entity schema are useful for repeated use potentially reflecting and enforcing a standard or protocol. The library, shared across the natural resources management community, can thus contain core and extended entity schema and their component entity attributes.
For instance, a standard field protocol for surveying invasive species can be captured in a invasive-species observation schema and applied in numerous invasive species surveys. The schema can be extended to support surveys of specific invasives. (For instance, it was determined the length of the hind legs on the invasive Cane Toad was correlated with the geographical front edge of the invasion. "Hind Leg Length" would then be a key attribute of a survey of Cane Toads.)
The data management application parses the entity schema and component attributes as the definition of data types and behaviors. It then provides rich data entry, management, mapping, reporting, spatial and statistical analysis on the entered data. We can afford to invest heavily in the development of this system because the functionality is not specific to a given conservation data model. Attribute definitions are localizable into other languages including labels, help text, error messages, a feature critical for global conservation. Thus ends the tyranny of the software engineer. No longer are users beholden to software developers to create custom applications with rich functionality to support their data models, data models can that evolve with the needs of conservation and basic scientific understanding.
The other win for conservation is the ability to aggregate and analyze the resulting datasets arising from conservation organizations, the academic research community, even state agencies. When users populate datasets based on shared entity schema and extended attribute definitions, these datasets are inherently standardized and available for rich analysis. For instance ,species observation data can be harvested and mapped across all species surveys (using secure web services) for their common core (observer, observed species, date/time, location). Even this basic map would constitute a major breakthrough for conservation. Analysis of invasive species observations, based on an invasive species observation schema, would similarly bring insights to patterns in invasions. Population reductions or migrations over time associated with climate change can be mapped and analyzed based on surveys where climate change, per se, was not the primary focus of the survey. Again, while this approach would enable conservation to make use of an enormous wealth of basic observation data, these concepts apply equally well to information about lands managed for conservation (protected areas), stewardship activities, conservation projects themselves, etc.
We developed such a simple with a small team at NatureServe to support observations. The data entry/management/reporting tool is rich in functionality and yet supports any data model we could think throw at it. Parks Canada is the first customer and is already excited about the ability to support users conducting specialized surveys within parks and as well as those carrying out high level analysis of observation data across parks.
There is nothing specific to conservation in this technology. Indeed, I see examples of related systems existing and emerging on the web.
Freebase is the closest I've seen yet to supporting what we need. But I'm not sure
Metaweb is going where we need to go.
For instance, the support for combinations of attribute definitions into entity schema, corresponding to basic entities in conservation like observations and protected areas, is critical to support user-driven standards and protocols. We must have the ability to search and browse a community repository for core entity schema and their associated attributes. This open-source style resource would allow schema authors to post their submissions for use by the conservation community, solicit feedback, post modifications, and report on usage. In this way, subject-matter experts in various areas of conservation and biodiversity can directly share their expertise with the community in the form of widely-used entity and attribute definitions.
By separating the data model from the data management application functionality, we can provide conservation practitioners on the ground with powerful and usable tools to capture and manage their information. This same approach, to the extent we succeed in building a rich, common library of data model components, will enable unprecedented aggregation and analysis of similar, though not identical, data sets. The efficacy and efficiency of conservation at the project level can thus be improved as well as our overall understanding of the status and dynamics of nature.
Labels: Conservation IT, Ecoinformatics
0 Comments:
Post a Comment Hide comments