Difference between revisions of "CharaParser"

From phenoscape
 
(6 intermediate revisions by 3 users not shown)
Line 1: Line 1:
CharaParser is a natural-language processing tool which analyzes the text of character and character state descriptions to produce a structured output. It was initially developed in [https://sites.google.com/site/biosemanticsproject/ the "Fine-Grained Semantic Markup of Descriptive Data for Knowledge Applications in Biodiversity Domains" project]. We are adapting it to generate proposals for ontological phenotype annotations. When it is ready, we plan to integrate it with [[Phenex]], so that data curators can take advantage of natural-language processing to accelerate their workflow.
+
CharaParser is a natural-language processing tool which analyzes the text of character and character state descriptions to produce a structured output. It was initially developed in [http://sites.google.com/site/biosemanticsproject/ the "Fine-Grained Semantic Markup of Descriptive Data for Knowledge Applications in Biodiversity Domains" project]. We are adapting it to generate proposals for ontological phenotype annotations. When it is ready, we plan to integrate it with [[Phenex]], so that data curators can take advantage of natural-language processing to accelerate their workflow.
  
A prototype of adapted CharaParser participated in the BioCreative 2012 Workshop with [[Phenex]]. The evaluation results showed that CharaParser was quite capable of identifying candidate entity and quality phrases (it outperformed biocurators by 20% in recall on average), but it had difficulty to translate candidate phrases to ontology terms.
 
  
 +
A prototype of adapted CharaParser participated in the BioCreative 2012 Workshop with [[Phenex]]. The evaluation results showed that CharaParser was quite capable of identifying candidate entity and quality phrases (it outperformed biocurators by 20% in recall on average), but it had difficulty translating candidate phrases to ontology terms. This is the area we are currently working on.
  
<!-- ==Current installation process==
 
  
==Phenex integration==
+
The Phenoscape customizations of CharaParser are available through an open source (MIT) license at [https://github.com/phenoscape GitHub] and the original code is on [http://sites.google.com/site/biosemanticsproject/project-source-code Google Code].
===Preliminary integration===
 
===Development plan===
 
--!>
 
  
 
[[Category:Software]]
 
[[Category:Software]]
 
[[Category:Curation]]
 
[[Category:Curation]]
 +
[[Category:NLP]]

Latest revision as of 20:34, 26 March 2013

CharaParser is a natural-language processing tool which analyzes the text of character and character state descriptions to produce a structured output. It was initially developed in the "Fine-Grained Semantic Markup of Descriptive Data for Knowledge Applications in Biodiversity Domains" project. We are adapting it to generate proposals for ontological phenotype annotations. When it is ready, we plan to integrate it with Phenex, so that data curators can take advantage of natural-language processing to accelerate their workflow.


A prototype of adapted CharaParser participated in the BioCreative 2012 Workshop with Phenex. The evaluation results showed that CharaParser was quite capable of identifying candidate entity and quality phrases (it outperformed biocurators by 20% in recall on average), but it had difficulty translating candidate phrases to ontology terms. This is the area we are currently working on.


The Phenoscape customizations of CharaParser are available through an open source (MIT) license at GitHub and the original code is on Google Code.