Knowledge Reformatting
Despite the maturity of RDF
and SPARQL
, extracting a subgraph
for the purpose of RAG
is very new. Let’s look at a simple example:
A simple statement ideal for providing to a LM:
The sky is blue.
That statement structured as a triple:
<sky> <has_the_color> <blue>
While the two statements contain the same information, the second is much more awkward for a LM. The second also consumes more tokens. Finding the best method of reformatting the subgraph
knowledge for a LM is a critical path for development. Below are some approaches being tested:
- Removing all special characters from the
subgraph
triples to form single-line statements - Formatting the triples as
XML
- Formatting the triples as
JSON
- Converting the
RDF
subgraph
toopenCypher
While LMs natively process XML
and JSON
well, these approaches will consume significantly more tokens. openCypher
is another method of structuring and querying knowledge graphs. More information on openCypher
can be found here. Below is an example of a subgraph
converted to openCypher
:
(CISA)-[began]->(incurring costs related to CIRCIA implementation)
(CISA)-[required to report information]->(certain cyber incidents)
(CISA)-[is seeking to discuss]->(areas where CISA and its Federal counterparts might want to, and be able to, harmonize their respective reporting)
(Cybersecurity and Infrastructure Security Agency)-[definition]->(A federal agency within the Department of Homeland Security)
(CISA)-[may choose to not post]->(comments that CISA determines are off-topic or inappropriate)
(CISA)-[proposes using]->(instead of the CIRC Model Definition’s construction ‘‘a covered information system’’)
(CISA)-[designed to achieve]->(specific purposes)
(CISA)-[will engage]->(covered entities and other stakeholders)
(CISA)-[able to]->(provide early warnings)
(CISA)-[did not limit]->(type of feedback commenters could submit)
While the “ASCII art style” of Cypher
is somewhat awkward for human readability, there is some empirical evidence that this structure provides value to a LM. Many tools already exist for converting a RDF subgraph
to openCypher
.