Skip to main content

Knowledge Reformatting

Despite the maturity of RDF and SPARQL, extracting a subgraph for the purpose of RAG is very new. Let’s look at a simple example:

A simple statement ideal for providing to a LM:

The sky is blue.

That statement structured as a triple:

<sky> <has_the_color> <blue>

While the two statements contain the same information, the second is much more awkward for a LM. The second also consumes more tokens. Finding the best method of reformatting the subgraph knowledge for a LM is a critical path for development. Below are some approaches being tested:

  • Removing all special characters from the subgraph triples to form single-line statements
  • Formatting the triples as XML
  • Formatting the triples as JSON
  • Converting the RDF subgraph to openCypher

While LMs natively process XMLand JSON well, these approaches will consume significantly more tokens. openCypher is another method of structuring and querying knowledge graphs. More information on openCypher can be found here. Below is an example of a subgraph converted to openCypher:

(CISA)-[began]->(incurring costs related to CIRCIA implementation)
(CISA)-[required to report information]->(certain cyber incidents)
(CISA)-[is seeking to discuss]->(areas where CISA and its Federal counterparts might want to, and be able to, harmonize their respective reporting)
(Cybersecurity and Infrastructure Security Agency)-[definition]->(A federal agency within the Department of Homeland Security)
(CISA)-[may choose to not post]->(comments that CISA determines are off-topic or inappropriate)
(CISA)-[proposes using]->(instead of the CIRC Model Definition’s construction ‘‘a covered information system’’)
(CISA)-[designed to achieve]->(specific purposes)
(CISA)-[will engage]->(covered entities and other stakeholders)
(CISA)-[able to]->(provide early warnings)
(CISA)-[did not limit]->(type of feedback commenters could submit)

While the “ASCII art style” of Cypher is somewhat awkward for human readability, there is some empirical evidence that this structure provides value to a LM. Many tools already exist for converting a RDF subgraph to openCypher.