Difference between revisions of "CharaParser"
Todd Vision (talk | contribs) |
|||
Line 5: | Line 5: | ||
− | More information on the original CharaParser can be found at [http://sites.google.com/site/biosemanticsproject/project-source-code] | + | More information on the original CharaParser can be found at [http://sites.google.com/site/biosemanticsproject/project-source-code] and the newer version of the code is available through [https://github.com/phenoscape github]. |
[[Category:Software]] | [[Category:Software]] | ||
[[Category:Curation]] | [[Category:Curation]] | ||
[[Category:NLP]] | [[Category:NLP]] |
Revision as of 20:32, 26 March 2013
CharaParser is a natural-language processing tool which analyzes the text of character and character state descriptions to produce a structured output. It was initially developed in the "Fine-Grained Semantic Markup of Descriptive Data for Knowledge Applications in Biodiversity Domains" project. We are adapting it to generate proposals for ontological phenotype annotations. When it is ready, we plan to integrate it with Phenex, so that data curators can take advantage of natural-language processing to accelerate their workflow.
A prototype of adapted CharaParser participated in the BioCreative 2012 Workshop with Phenex. The evaluation results showed that CharaParser was quite capable of identifying candidate entity and quality phrases (it outperformed biocurators by 20% in recall on average), but it had difficulty translating candidate phrases to ontology terms. This is the area we are currently working on.
More information on the original CharaParser can be found at [1] and the newer version of the code is available through github.