Edit this page on GitHub

Needs Analysis Workshop/Summary

Background

The PhenoScape Needs Analysis Workshop was held September 17-18, 2007 at NESCent. We invited 9 scientists from the fields of morphology, development, evolution, genetics, and ichthyology to help us define what informatic tools need to exist in order to enable synthetic research that takes full advantage of the accumulated data in each of these fields. We hoped to identify driving biological questions and use cases to guide the development of our software tools and database.

Agenda

After being introduced to the project goals, the meeting participants each provided a brief chalk talk outlining their integrative research problems. The participants then split into two break-out groups, each arranged around either the developmental genetics of morphology or the evolution of morphology. At the end of the day the groups merged again to discuss the research questions arising in the breakout groups.

The second day began with three break-out groups: correlations, phylogeny, and semantics. As before, the whole group then discussed the uses cases arising in the break-out groups, and tried to identify priorities for the project.

Driving questions/use cases

The following is a bullet-point summary highlighting both research questions and software use cases produced by the discussion. Minutes from the relevant session are linked next to each point.

Requirements arising from the use cases

Overview

The use cases above present requirements for both our scientific framework/data model and our software tools (curation tools and web application). A good overview can also be found here.

Web application

The requirements primarily address capabilities of the web application used as a front end to our data collection. In many cases they do not significantly diverge from capabilities we would already be planning: using filters to restrict the data being fed into a view, providing link-outs to other genomic or museum databases, integrating data from those databases in useful ways, providing a means for a user to save and check favorite queries, and exporting the data to their local computer.

Some of the proposed views are more challenging but doable. The phylogenetic tree view is not a new idea, but the participants desired some advanced capabilities from it. Displaying character state changes on the tree requires programming the graphics to do that, as well as computing the reconstruction of ancestral states. Because there are several ways to do that reconstruction, we might choose a basic method and let users download the data for more complex analyses. All methods of reconstruction rely on the data being in character matrix form - more on that below. Further capabilities desired of the tree view lead to an almost complete implementation of the MacClade/Mesquite tree editing features within a webpage (presumably via JavaScript). This would be a significant project, although probably quite useful to the community. Some beginnings exist in other projects on the web.

The body plan view will require annotation of a complex image with various ontology terms. This could be fairly coarse, or as complex and fine-grained as we want to make it.

Data model

Matrices displaying covariation/co-occurrence of phenotypic traits with other traits (or other data) will in many cases require ancestral state reconstruction, already mentioned as a desirable analytic view of the data. This requires a tree, a reconstruction algorithm, and data in the proper format. While our data describe character diversity, the annotations require processing to be converted into evolutionary character matrix form. This processing relies on distinguishing between attributes and values in the PATO ontology and following a particular algorithm. Since this process is central to many of the forms of data analysis that came up in the meeting, I think we need to look more carefully at how we want matrix generation to work and what is possible with our annotation strategy and ontologies. Some of this is touched on in EQ for character matrices, but a more complete description of the current status of matrix generation will be forthcoming.

The other topic which could have an impact on our data model is the idea of dealing with genetic diversity along with phenotypic diversity. This was primarily driven by David Stern’s interests. Currently we are cataloging phenotypic diversity, and linking these changes to mutant phenotypes in a single species about which we have genetic data (zebrafish). Within David’s research, genetic changes are known in multiple species and a causal link can sometimes be demonstrated between specific evolutionary genetic changes and phenotypic evolution. Our system as currently proposed does not accommodate these data, nor provide a way to analyze them.