Named Entity Recognition

PyOBO has high-level wrappers to construct literal mapping objects defined by ssslm.LiteralMapping, which can be used to construct generic named entity recognition (NER) and named entity normalization (NEN) tooling (e.g., using ScispaCy or Gilda as a backend)

You can use pyobo.ground() as an integrated workflow:

import pyobo
import ssslm

matches: list[ssslm.Match] = pyobo.ground("taxrank", "species")

You can get the grounder directly first using pyobo.get_grounder():

import pyobo
import ssslm

grounder: ssslm.Grounder = pyobo.get_grounder("taxrank")
matches: list[ssslm.Match] = grounder.get_matches("species")

You can get the ontology directly using pyobo.get_ontology() then construct a grounder with pyobo.Obo.get_grounder():

import pyobo
import ssslm

ontology: pyobo.Obo = pyobo.get_ontology("taxrank")
grounder: ssslm.Grounder = ontology.get_grounder()
matches: list[ssslm.Match] = grounder.get_matches("species")

You can load a custom ontology with pyobo.from_obo_path() then construct a grounder with pyobo.Obo.get_grounder():

import pyobo
import ssslm
from urllib.request import urlretrieve

url = "http://purl.obolibrary.org/obo/taxrank.obo"
path = "taxrank.obo"
urlretrieve(url, path)

ontology: pyobo.Obo = pyobo.from_obo_path(path, prefix="taxrank")
grounder: ssslm.Grounder = ontology.get_grounder()
matches: list[ssslm.Match] = grounder.get_matches("species")

Warning

When loading a custom ontology, it’s required that the prefix is registered in the bioregistry, since PyOBO does additional standardization and normalization of prefixes, CURIEs, and URIs that are not part of the OBO specification.