Skip to main content

RDF Structure

As the internet radically changed how people exchange information in the late 1990s, it became very quickly apparent that structuring information for consistent and reliable exchange was critically important. The first draft of what became the Resource Description Framework, RDF, appeared in 1997. RDF 1.0 was published by the W3C in 2004, who still manages RDF to this day. The most current RDF schema can be found here.

There are many ways to describe how to structure knowledge for a knowledge graph. Some like to use the terms node and edge. Others exchange the term arc for edge. Some even prefer to discuss triples as a set of subjects, predicates, and objects. Whatever the terminology, structuring information for a knowledge graph has a similar intent. Let’s look at a simple example:

The sky is blue.

While the above example is a simple statement, for storing in a knowledge graph, we would restructure it as follows:

Subject: sky
Predicate: has the color
Object: blue

For a simple example, this process seems like overkill. Let’s look at another example.

Fred is a cat.
Fred lives with Hope.
Fred has 4 legs.

The above set of statements requires context to maintain meaning. If any statement is removed from the other 2 statements, many interpretations exist for the remaining statements. For instance, let’s remove the statement that Fred is a cat. Now we know that Fred lives with Hope and has 4 legs. This set of statements gives us no information about what entity types Fred and Hope are. If we think about animals, many animals have 4 legs. But there are other entities that have 4 legs. A chair has 4 legs. The statement that “Fred lives with Hope” seems odd for a chair, but a furniture company might name a line of chairs “Fred” in the context of marketing.

While the probability that Fred is a chair seems low, that’s only because of human experience. Fred would be an unusual name for a chair, but not impossible. If we want to remove that statement from its original context, we introduce semantic ambiguity. This problem is especially critical to RAG, as the entire point is to remove knowledge from its original context and feed it back into a Language Model. The RDF approach captures all the classification necessary to remove knowledge from its original context and maintain its semantic meaning. In the case of the example with Fred the cat, it’s important to capture that Fred is a cat and that cat is a type of animal. Without going into the mechanics of RDF formats, below are two examples of how the 3 statements about Fred the cat would be converted to RDF triples.

tip

In RDF, subjects and predicates must always be in the form of a URI. However, the url, in this case http://example.org/ functions as nothing more than a namespace. There are instances where the URIs do point at the source of information, but are generally just namespaces. Formatting knowledge as URIs can be very counterintuitive and causes confusion. The format Turtle clearly identifies the URIs as namespaces, making it the most human readable of RDF formats.

N-Triples format:

<http://example.org.animal/fred> <http://example.org/property/has-legs> "4"^^<http://www.w3.org/2001/XMLSchema#integer> .
<http://example.org.animal/fred> <http://example.org/property/lives-with> <http://example.org.animal/hope> .
<http://example.org.animal/fred> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://example.org/type/cat> .
<http://example.org.animal/fred> <http://www.w3.org/2000/01/rdf-schema#label> "Fred" .
<http://example.org.animal/hope> <http://example.org/property/has-legs> "4"^^<http://www.w3.org/2001/XMLSchema#integer> .
<http://example.org.animal/hope> <http://example.org/property/lives-with> <http://example.org.animal/fred> .
<http://example.org.animal/hope> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://example.org/type/cat> .
<http://example.org.animal/hope> <http://www.w3.org/2000/01/rdf-schema#label> "Hope" .
<http://example.org/property/has-legs> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2000/01/rdf-schema#Property> .
<http://example.org/property/has-legs> <http://www.w3.org/2000/01/rdf-schema#label> "has legs" .
<http://example.org/property/lives-with> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2000/01/rdf-schema#Property> .
<http://example.org/property/lives-with> <http://www.w3.org/2000/01/rdf-schema#label> "lives with" .
<http://example.org/type/cat> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2000/01/rdf-schema#Class> .
<http://example.org/type/cat> <http://www.w3.org/2000/01/rdf-schema#label> "cat" .

Turtle format:

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix animal: <http://example.org/animal/> .
@prefix prop: <http://example.org/property/> .
@prefix type: <http://example.org/type/> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

animal:fred
prop:has-legs 4 ;
prop:lives-with animal:hope ;
a type:cat ;
rdfs:label "Fred" .

animal:hope
prop:has-legs 4 ;
prop:lives-with animal:fred ;
a type:cat ;
rdfs:label "Hope" .

prop:has-legs
a rdfs:Property ;
rdfs:label "has legs" .

prop:lives-with
a rdfs:Property ;
rdfs:label "lives with" .

type:cat
a rdfs:Class ;
rdfs:label "cat" .
note

While N-Triples and Turtle are two of the most common RDF formats, most LMs seem to natively understand RDF/XML and JSON-LD. LMs likely “understand” these formats well since they use XML and JSON structures. However, both are eyesores when it comes to human readability. In addition, RDF/XML and JSON-LD often consume significantly more tokens, especially when compared to Turtle.

tip

In the story of Fred the cat, there is no information as to the type of Hope. In actuality, Hope is also a cat. However, without more information, a LM might assume that Fred is a very hopeful cat, perhaps seeing he is about to be fed with his favorite food.