Thursday, November 15, 2007

Harvesting Data for Conservation
Part 1: The Problem

Biodiversity Conservation, as an "industry," has not only common interest but critical need to improve a) the productivity and assessment of conservation activities and b) downstream aggregation and analysis across conservation activities. This need has been formally expressed (see Yet its practical realization has thus far eluded us.

From VanCouver07
As I see it, the fundamental barrier to the development of rich applications to manage conservation projects as well as aggregation and analysis tools is this: data model variability. For instance, whereas a field observation consists of a fundamental core of information (observer, observed species, date/time and location), meaningful observations almost always describe more than just this core. This is in order to support the purpose of the observation activity. For example, if we're trying to understand reproduction rates among migrating bird species, an observation record will not only document the date/time, location, observer and bird species, but also the fact that this is a nest observation and how many total eggs are in the nest, how many of those appear to be in tact. An observation management system that only allows users to capture the core attributes would be useless for almost all specific observation activities.

The same is true for tracking and managing information on protected areas, stewardship activities (e.g. prescribed burns, reforestation) and other datasets critical to conservation. While there is a common core of attributes to describe these entities in conservation, users must be able to extend beyond this core in practical application.

Because of that variability of the data model, users end up pursuing one of two approaches to capture and manage their conservation datasets. The first and most prevalent option is to employ technologies that are completely generic (e.g. spreadsheets or simple databases). These systems meet immediate needs fairly well. However, their resulting datasets are completely nonstandard and therefore unavailable for aggregation with similar datasets. In this case, the needs of the data producers may be met, but data consumers are frustrated (see Mismatched Incentives).

Where aggregation of large datasets and/or specialized functionality is required, users pursue the other option: procuring the development of custom systems. Custom systems of course are developed at considerable cost and, because they are hard-bound to a static data model, these systems are suitable only for a single application or, at best, a similar class of applications. How unfortunate that our investments in conservation data management systems must be repeated for each new dataset. We can ill-afford to enrich the functionality (e.g. usability, mapping, reporting, feeds, import/export, wizards) or performance of any given system because this investment is specific to users of only a specific dataset. It is as if each dissertation, because of its unique content, required the development of a new word processor.

Neither the completely generic nor the custom approach supports our need for leveraged investment into rich data entry/management applications at the conservation activity level nor aggregation and analysis tools operating on standardized datasets.

Labels: ,


At 2:03 PM, Blogger frank said...

Hey Kristin. I couldn't agree more. We need minimum data models for the core conservation entities, including conservation projects, protected areas, and species and ecosystem occurrences. How do we get there?