Difference between revisions of "Curation workflow"

From phenoscape
(Term trackers)
Line 71: Line 71:
Guide to Character Annotation: [http://phenoscape.org/wiki/Guide_to_Character_Annotation]
Guide to Character Annotation: [http://phenoscape.org/wiki/Guide_to_Character_Annotation| Guide to Character Annotation]
BioPortal (search across all ontologies) - [http://bioportal.bioontology.org/]
BioPortal (search across all ontologies) - [http://bioportal.bioontology.org/]

Revision as of 01:19, 27 March 2012

Processing new publications

1. Add reference info to the Phenoscape II Curation Log. Use the log to keep track of curation status for publications (what’s been by who and when). 2. Add citation to the Phenoscape Endnote file (Phenoscape_pubs_A_papers.enl in Dropbox) 2a. Create PSPUB identifier (ask Jim or Wasila for explanation) 2b. In Phenex file for the paper, record the PSPUB identifier in the Dataset Tab’s Publication field 3. Upload PDF of paper to the Mendeley Vertebrate Curation Group

Phenex files are now located in DropBox Make sure you are using Phenex version 1.2 or later, which has the autosave feature and will automatically save to files in DropBox - no need for SVN or ZigVersion!

Phenex and Mesquite files locations in DropBox: These files are located in the “Phenoscape-data” folder within DropBox “incomplete-files” folder contains the working Mesquite (.nex) and Phenex (.xml) files before they are uploaded to the KB Paul and Nizar’s files are in the “Sereno” folder

Work Study instructions for creating matrices using Mesquite and Phenex

1. Gather archosaur publications (with matrices) for curation Add citation to Phenoscape II Curation Log in Google Docs Add citation and abstract info to Endote file “Phenoscape_pubs_full_list.enl” located in the “Publications” folder (in the “Data” folder in DropBox). Once the paper is ready to load to KB, move this publication citation to “Phenoscape_pubs_A_list.enl”. Italicize species names in title and abstract [work study student] Create PSPUB identifier for publication. Ask Jim or Wasila how to do this. Add PDFs to Mendeley obtain PDF online or, if not available, scan a hardcopy and create OCR using Adobe Acrobat [work study student]

2. Create matrix files for all pubs [Much of this work to be done by work study students!] Ask authors for electronic versions of matrices or have work study student manually put matrix into Mesquite (intermediate steps involving copy/paste into Excel or Text Wrangler) Manual entry of matrix by student from PDF: Copy and paste matrix from PDF to Word; replace polymorphisms with “?” because Phenex doesn’t handle polymorphism yet make sure all taxa are listed to the right (?), and that character states are in a single continuous line for each taxon Find/replace each state with a state + space (e.g., find “0” and replace with “0 “ copy/paste matrix to Excel Click on “text to columns”, then click “delimited”, click on “space” delimiters, and finish! Open the space delimited file in Mesquite. Determine if matrix contains higher level taxa and expand matrix (if needed) for actual species examined (do this step in Mesquite because it can’t be done easily in Phenex) Copy and paste character and state free text, taxa from matrix [work study student] Save matrix in Mesquite in NEXUS (.nex) format and import into Phenex; save Phenex file (NEXml format, .xml) Save the file in “incomplete-files” folder (“Sereno” folder) in Dropbox

3. Import the matrix file (.nex) into Phenex Go to File>Import. Verify that characters, states, taxa, and matrix have been imported Save the file to “incomplete_files” folder (“Sereno” folder) and change file extension to “.xml” Note: ask Jim for help if the import doesn’t work. You probably need to a space between a taxon name and matrix if there isn’t one for any taxon

4. Taxon list update in Phenex see notes on wiki: http://phenoscape.org/wiki/Update_Taxon_Lists Taxon list update - choose VTO taxon; not sure about the rest of workflow; will it involve requesting taxa from PaleoDB? Could be done by work study student; They will leave Valid Taxon blank where there is no exact match to VTO term.

Proofreading [by workstudy student] - matrices [OCR mistakes, student transcription mistakes] - character/state free text for OCR mistakes

Phenex Ontology Configuration

(full details here: http://phenoscape.org/wiki/Phenex#Ontology_configuration) Phenex comes pre-configured with the ontologies used for Phenoscape, but may need to be configured to load specific anatomy ontologies (e.g., AMAO or AAO), and taxonomies. Here’s how to specify which ontologies should be made available in data entry fields in Phenex. Term sources

To edit the list of terms sources, open the Ontology Sources panel by selecting View > Config > Ontology Sources from the menu. Add a new source by pressing the '+' button. Enter an HTTP or local file URL in the URL column, and an optional label of your choice in the Label column. Press "Apply" to save your list of ontology sources. You will need to relaunch Phenex in order for it to download the given files and load all the terms into its ontology session. Creating term filters

The set of terms available in a given entry field are determined by term filters. Term filters are just saved search specifications which can be created using the Search Panel.

Create a term filter by opening the Search Panel from View > Ontology > Search Panel. For the Entity field, an example filter is shown below. Make sure to correctly specify “matches all” vs. “matches any” for the main and subqueries.

Hit the save button, and save the filter (name it “entities.xml) to the following location: user > Library > Application Support > Phenex > Filters

Note: this folder is hidden on Mac OSX. Should you need to access it using Finder, first make the folder visible by going to Finder> Go > Go to Folder, and typing in "~/Library"


Guide to Character Annotation: Guide to Character Annotation

BioPortal (search across all ontologies) - [1]

OboFoundry (download ontologies)- [2]

Obo-Edit User’s Guide - [3]

Phenoscape Knowledgebase (public) - kb.phenoscape.org

Phenoscape Knowledgebase (staging, where prelease VAO-TAO will be uploaded) kb-staging.phenoscape.org

Term trackers

VAO - http://sourceforge.net/tracker/?group_id=76834&atid=2257941

PATO - http://sourceforge.net/tracker/?group_id=76834&atid=595654

Uberon - https://github.com/cmungall/uberon/issues?sort=created&direction=desc&state=open

TTO - http://sourceforge.net/tracker/?group_id=224046&atid=2519810

AMAO (to do)

AAO (to do)