Knowledge Graphs

Interacting with knowledge graphs can be challenging. Graph visualizations can be useful for subgraphs with small numbers of nodes and edges. However, the TrustGraph extraction process will likely extract 5,000 graph edges per 100 pages of a text corpus. At this amount of knowledge extraction, graph visualization tools provide little value. Production scale graphs, for example, often have billions of graph edges. The RAG process is an ideal solution to make sense of these extremely large knowledge graphs.

During the extraction process, it may be useful to confirm well-formed RDF triples are being produced. The script graph-show will display the triples text:

tg-graph-show

It is likely graph-show script will output a large amount of text. The text is a dump of the graph store in N-Triples format. N-Triples is not particularly human-readable as it relies on URIs for subjects and predicates. Instead of viewing the raw graph edges, it may be more pratical to know the number of graph edges that have been extracted:

tg-graph-show | wc -l

tip

It's not possible to predict how many graph edges will be extracted for every text chunk. However, if the number of graph edges does not increase as chunks are being processed, there are two likely scenarios. One, the information in the chunks is redundant and has already been captured in the knowledge graph. Two, the document parsing modules have produced empty chunks. As the extraction process nears the end of a text corpus, the number of new graph edges extracted for a chunk is likely to decrease.