Derived Ontology

A challenge of naive extraction is knowing how to classify entities and properties found in a text corpus. While RDF provides semantic structure, RDF does not provide types for all the different subjects, predicates, and objects that might be found in a text corpus. Over the years, there have been many projects dedicated to creating and maintaining knowledge schemas. Here are 3 of the most well known:

All of the above ontologies are in wide use across the internet. However, these ontologies are static and are intended for “common” usages. It’s not practical to create ontologies for every possible knowledge use case. While TrustGraph could integrate and pull schemas from these sources, it’s more efficient to derive a custom ontology for each extraction process.

For example, when performing a naive extraction, not only does the LM identify entities, but also entity types. Not only does the LM identify properties for those entities, but types of properties. Using RDF, these types are then associated with their relevant entities and properties. Associating types and classes enables maintaining maximum semantic meaning for the RAG process.

note

The full naive extraction process requires more steps than simply finding entities and properties. Very small changes in LM instructions can generate massive differences in LM response quality. There is much performance still to be unlocked by continuing to develop the naive extraction methodology.