📄️ From NLP to LMs

Despite the appearance that AI is “new”, the core concepts that form the building blocks of Language Models (LMs) goes back decades. Natural Language Processing, NLP, has its roots going all the way back to the 1940s. While these concepts are far from new, the terminology has inconsistent usage. For TrustGraph, extraction is an information discovery process that ingests raw, unstructured text and converts it to a static, structured knowledge model.

📄️ Autonomous Knowledge Agents

Until recently, Retrieval-Augmented Generation, RAG, has largely focused on retrieving information from a known knowledge source for seeding a Language Model for a response. RAG has shown great potential for grounding LM responses for a given knowledge set. However, how do you manage large sets of text where the detailed knowledge contained in the text is unknown?

📄️ Customizing Knowledge Agents

The Naive Extraction process has no previous knowledge of the text corpus to extract. This process works well for text documents with minimal formatting. Once documents have complex tables, data annotations, code, or numeric data, the Naive Extraction process may struggle to extract value. The extraction process can be tailored for a unique data set and use case.

📄️ RDF Structure

TrustGraph and RDF

📄️ Derived Ontology

A challenge of naive extraction is knowing how to classify entities and properties found in a text corpus. While RDF provides semantic structure, RDF does not provide types for all the different subjects, predicates, and objects that might be found in a text corpus. Over the years, there have been many projects dedicated to creating and maintaining knowledge schemas. Here are 3 of the most well known: