From NLP to LMs

Despite the appearance that AI is “new”, the core concepts that form the building blocks of Language Models (LMs) goes back decades. Natural Language Processing, NLP, has its roots going all the way back to the 1940s. While these concepts are far from new, the terminology has inconsistent usage. For TrustGraph, extraction is an information discovery process that ingests raw, unstructured text and converts it to a static, structured knowledge model.

Until recently, NLP models and techniques have been considered the most effective at information discovery. Many NLP algorithms exist, with popular ones like TF-IDF and RAKE being the “go-to” for information discovery. These algorithms will, for example, extract entities and then score them using different methodologies to establish importance.

NLP techniques produce objective data. The scoring algorithms can be adjusted. Further calculations can be made with the scores. Yet, subjective, empirical results show that LMs have far surpassed NLP techniques in information discovery efficacy. TrustGraph uses LMs for the information discovery process in the place of NLPmodels. The techniques used are very similar, interchanging LMs for NLP models and algorithms. Of course, this approach removes the objective data, yet the empirical evidence strongly supports the shift to using LMs for these tasks.