Difference between revisions of "CharaParser"

From phenoscape
(Created page with "CharaParser is a natural-language processing tool which analyzes the text of character-state descriptions to produce a structured output used to generate proposals for ontolog...")
 
 
(8 intermediate revisions by 3 users not shown)
Line 1: Line 1:
CharaParser is a natural-language processing tool which analyzes the text of character-state descriptions to produce a structured output used to generate proposals for ontological phenotype annotations. We are working to enhance the performance of CharaParser and also to integrate it with Phenex, so that data curators can take advantage of natural-language processing to accelerate their workflow.
+
CharaParser is a natural-language processing tool which analyzes the text of character and character state descriptions to produce a structured output. It was initially developed in [http://sites.google.com/site/biosemanticsproject/ the "Fine-Grained Semantic Markup of Descriptive Data for Knowledge Applications in Biodiversity Domains" project]. We are adapting it to generate proposals for ontological phenotype annotations. When it is ready, we plan to integrate it with [[Phenex]], so that data curators can take advantage of natural-language processing to accelerate their workflow.
  
==Current installation process==
 
  
==Phenex integration==
+
A prototype of adapted CharaParser participated in the BioCreative 2012 Workshop with [[Phenex]]. The evaluation results showed that CharaParser was quite capable of identifying candidate entity and quality phrases (it outperformed biocurators by 20% in recall on average), but it had difficulty translating candidate phrases to ontology terms. This is the area we are currently working on.
===Preliminary integration===
+
 
===Development plan===
+
 
 +
The Phenoscape customizations of CharaParser are available through an open source (MIT) license at [https://github.com/phenoscape GitHub] and the original code is on [http://sites.google.com/site/biosemanticsproject/project-source-code Google Code].
  
 
[[Category:Software]]
 
[[Category:Software]]
 
[[Category:Curation]]
 
[[Category:Curation]]
 +
[[Category:NLP]]

Latest revision as of 20:34, 26 March 2013

CharaParser is a natural-language processing tool which analyzes the text of character and character state descriptions to produce a structured output. It was initially developed in the "Fine-Grained Semantic Markup of Descriptive Data for Knowledge Applications in Biodiversity Domains" project. We are adapting it to generate proposals for ontological phenotype annotations. When it is ready, we plan to integrate it with Phenex, so that data curators can take advantage of natural-language processing to accelerate their workflow.


A prototype of adapted CharaParser participated in the BioCreative 2012 Workshop with Phenex. The evaluation results showed that CharaParser was quite capable of identifying candidate entity and quality phrases (it outperformed biocurators by 20% in recall on average), but it had difficulty translating candidate phrases to ontology terms. This is the area we are currently working on.


The Phenoscape customizations of CharaParser are available through an open source (MIT) license at GitHub and the original code is on Google Code.