I’m working on a Harte Lab project within the BiGC project. We’re doing analyses of datasets from as broad an array of sites as we can get, and working on making the analyses reusable for people who have data of their own. Therefore I’m paying more attention to formats for exchanging ecological data.

This is a unordered and untested collection of the open ecological data formats that look promising to me. One would like to group together the ones that could be transformed into each other automatically…

The ISA format, subject matter going from genomics to ecology, but mostly genomics.

NetCDF, of venerable age, subject matter going from atmospherics to… anything spatial? The query/analysis/display program Ferret isn’t flashy or GUI-user-friendly, but is superbly flexible and attaches a little summary of what the query was to the nice plots it produces. Also, Ferret has been around long enough to run into, fix, and document many complications that I hadn’t thought of yet.

The Knowledge Network for Biocomplexity uses the Ecological MetaLanguage; this format satisfies, I am told, ESA and NSF (perhaps only eco-field NSF?) grant and publication requirements. Claims to be automatically exportable to the CSDGM format. The Harte lab project is using EML to describe our datasets.

IEEE helped develop EuroGEOSS, which brokers several different things — mapping standards, keywords/nomenclature, people, metadata. Looks like they’re attacking the differences between countries and the differences between disciplines with the same tools. Initial concentraton on forestry, drought, and biodiversity. The goal is for the brokering/translation to be good enough to re-use models across boundaries. Ambitious!

Data Curation for Excel Project, which is still scoping out their work; planning to make it easier to properly manage data with the assumption that it’s stored and manipulated in Excel. They mention metadata, and it’s part of a larger NSF project.

DarwinCore: Library standard (DublinCore) extended for museum/biodiversity use. “The Darwin Core standard was originally conceived to facilitate the discovery, retrieval, and integration of information about modern biological specimens, their spatiotemporal occurrence, and their supporting evidence housed in collections (physical or digital). The Darwin Core today is broader in scope and more versatile. It is meant to provide a stable standard reference for sharing information on biological diversity. As a glossary of terms, the Darwin Core is meant to provide stable semantic definitions with the goal of being maximally reusable in a variety of contexts.”

There were two tracks devoted to this at AGU Fall ’11, too.

