Ontology RAG: Schema-Driven Knowledge Extraction

Ontology RAG is an advanced RAG technique that uses formal ontologies (OWL schemas) to guide the extraction of structured, typed knowledge from unstructured text. Unlike GraphRAG's schema-free approach, Ontology RAG enforces conformance to predefined types, properties, and relationships, producing highly structured, semantically rich Knowledge Graphs.

What is Ontology RAG?

Ontology RAG combines:

Formal ontologies (OWL definitions) defining types, properties, and relationships
Guided extraction using ontology definitions to discover entities
Knowledge Graphs storing conformant, typed entities and relationships
Vector search for semantic similarity entry points
Graph traversal for relationship-aware retrieval

The Ontology RAG Pipeline

1. Document Chunking
   ↓
2. Ontology Loading (OWL definitions imported)
   ↓
3. Knowledge Extraction (guided by ontology)
   ↓
4. Entity Embedding (vector representations)
   ↓
5. Graph Storage (conformant to ontology schema)
   ↓
6. Semantic Retrieval (vector search finds entry points)
   ↓
7. Graph Traversal (discover related entities)
   ↓
8. LLM Generation (using structured subgraph)

Ontology RAG vs GraphRAG vs Document RAG

Aspect	Document RAG	GraphRAG	Ontology RAG
Retrieval Method	Vector similarity	Graph + vector	Graph + vector
Context Structure	Isolated chunks	Connected entities	Typed, conformant entities
Schema Requirements	None	None (schema-free)	OWL ontology required
Extraction Approach	N/A (no extraction)	Automatic discovery	Ontology-guided
Knowledge Validation	No validation	No validation	Schema conformance
Setup Complexity	Simple	Simple	Complex
Retrieval Precision	Low	High	Very High
Best For	Simple search	Complex relationships	Typed domains

Key Difference: Schema-Free vs Schema-Driven

GraphRAG (Schema-Free):

// GraphRAG discovers entities automatically
// No predefined schema needed
await tg.startLibraryProcessing({
  "flow-id": "graph-rag",
  // Entities and relationships discovered automatically
  // Flexible, no constraints
});

Ontology RAG (Schema-Driven):

// Ontology RAG requires predefined ontology
// Extracts only entities that match ontology types
await tg.putConfigItem({
  type: "ontology",
  key: "domain-ontology",
  // OWL ontology defines:
  // - Valid entity types (Person, Organization, Event)
  // - Valid properties (hasName, locatedIn)
  // - Valid relationships (worksAt, participatedIn)
  ontology: owlDefinitions
});

await tg.startLibraryProcessing({
  "flow-id": "onto-rag",
  // Extraction conforms to ontology schema
});

When to Use Ontology RAG

Use Ontology RAG When:

✅ Existing Ontologies Available

Domain has established ontologies (FOAF, SOSA/SSN, Dublin Core)
Industry standards exist (healthcare, cybersecurity, scientific domains)
Taxonomies and schemas already defined

✅ Type Precision Required

Need strict entity typing (this IS a Sensor, not maybe a Device)
Regulatory compliance requires structured data
Data integration across systems with different formats

✅ Complex Relationships with Types

Relationships need typing (observedBy, measuredAt, collectedBy)
Property constraints matter (temperature must be numeric)
Hierarchical relationships (subClassOf, subPropertyOf)

✅ Knowledge Graph Conformance

Integration with semantic web systems
Need RDF/OWL compatibility
Query using SPARQL with typed patterns

✅ Specialist Domains

Intelligence analysis
Cybersecurity (threat models, TTPs)
Scientific research (sensors, observations, measurements)
Legal documents (statutes, cases, citations)

Use GraphRAG Instead When:

❌ Schema Definition is Prohibitively Complex

Domain is too diverse or undefined
Creating ontology costs more than benefit
Schema changes frequently

❌ Schema-Free Flexibility Preferred

Exploratory data analysis
Rapidly changing data models
Unknown entity types

❌ Simple Use Cases

Document RAG sufficient for keyword search
No need for relationship understanding

Ontology Fundamentals

What is an OWL Ontology?

OWL (Web Ontology Language) defines:

Classes - Types of things (Sensor, Observation, Location)
Properties - Attributes and relationships (hasName, measuredAt, observedBy)
Individuals - Specific instances (Sensor-123, Observation-456)
Axioms - Rules and constraints (Observation must have exactly one Result)

Example: SOSA/SSN Ontology

SOSA (Sensor, Observation, Sample, and Actuator) is a W3C standard for sensor data:

@prefix sosa: <http://www.w3.org/ns/sosa/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .

# Class definitions
sosa:Sensor a owl:Class ;
    rdfs:label "Sensor" ;
    rdfs:comment "Device, agent, or software that observes" .

sosa:Observation a owl:Class ;
    rdfs:label "Observation" ;
    rdfs:comment "Act of carrying out observation procedure" .

sosa:ObservableProperty a owl:Class ;
    rdfs:label "Observable Property" ;
    rdfs:comment "Quality that can be observed" .

# Property definitions
sosa:observedProperty a owl:ObjectProperty ;
    rdfs:domain sosa:Observation ;
    rdfs:range sosa:ObservableProperty .

sosa:madeBySensor a owl:ObjectProperty ;
    rdfs:domain sosa:Observation ;
    rdfs:range sosa:Sensor .

sosa:hasResult a owl:DatatypeProperty ;
    rdfs:domain sosa:Observation ;
    rdfs:range xsd:string .

TrustGraph Ontology Format

TrustGraph uses OWL ontologies converted to proprietary JSON format:

{
  "classes": [
    {
      "id": "http://www.w3.org/ns/sosa/Sensor",
      "label": "Sensor",
      "comment": "Device that observes properties"
    },
    {
      "id": "http://www.w3.org/ns/sosa/Observation",
      "label": "Observation",
      "comment": "Act of carrying out observation"
    }
  ],
  "properties": [
    {
      "id": "http://www.w3.org/ns/sosa/madeBySensor",
      "domain": "http://www.w3.org/ns/sosa/Observation",
      "range": "http://www.w3.org/ns/sosa/Sensor"
    }
  ]
}

Implementing Ontology RAG

Step 1: Define or Import Ontology

Option A: Use Existing Ontology

# Download standard ontology (e.g., SOSA/SSN)
curl -o sosa.ttl https://www.w3.org/ns/sosa/

# Convert to TrustGraph format (if needed)
# Or use Workbench Ontology Editor to import

Option B: Create Custom Ontology

Use Workbench Ontology Editor to create domain-specific ontology:

@prefix intel: <http://example.org/intel#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .

# Intelligence domain ontology
intel:IntelligenceReport a owl:Class .
intel:Sensor a owl:Class .
intel:Location a owl:Class .
intel:Asset a owl:Class .

intel:collectedBy a owl:ObjectProperty ;
    rdfs:domain intel:IntelligenceReport ;
    rdfs:range intel:Sensor .

intel:locatedAt a owl:ObjectProperty ;
    rdfs:domain intel:Asset ;
    rdfs:range intel:Location .

Option C: Generate with AI

Use Claude or GPT-4 to generate ontologies from domain text:

# Provide domain description to LLM
echo "Generate an OWL ontology for maritime tracking including:
- Vessels (ships, cargo carriers)
- Ports and locations
- Tracking sensors (AIS, radar, satellite)
- Observations (position, speed, heading)
- Intelligence collection methods" | claude

# Review and refine generated ontology

Step 2: Install Ontology in TrustGraph

# Install ontology as configuration item
cat domain-ontology.json | tg-put-config-item \
  --type ontology \
  --key my-domain-ontology \
  --stdin

The ontology becomes available to all Ontology RAG flows.

Step 3: Create Collection

# Create collection for documents
tg-set-collection \
  -n "Domain Documents" \
  -d "Documents for ontology-based extraction" \
  domain-docs

Step 4: Add Documents

# Add document to library
tg-add-library-document \
  --name "Intelligence Report 2024-001" \
  --description "Maritime tracking intelligence" \
  --tags 'intelligence,maritime,tracking' \
  --id https://example.org/reports/2024-001 \
  --kind text/plain \
  report-2024-001.txt

Step 5: Create Ontology RAG Flow

# Create ontology RAG flow
tg-start-flow \
  -n onto-rag \
  -i onto-rag \
  -d "Ontology RAG processing flow"

The onto-rag flow type uses installed ontologies to guide extraction.

Step 6: Process Documents

# Submit document for ontology-guided processing
tg-start-library-processing \
  --flow-id onto-rag \
  --document-id https://example.org/reports/2024-001 \
  --collection domain-docs \
  --processing-id urn:processing-001

What happens during processing:

Document is chunked into segments
Ontology is loaded into extraction context
LLM extracts entities matching ontology classes:
- Only extracts Sensors, Observations, Locations (as defined in ontology)
- Ignores entities not in ontology
Relationships are typed according to ontology properties
Validation ensures conformance to ontology schema
Entities are embedded and stored in vector database
Knowledge Graph is populated with conformant triples

Step 7: Query with Ontology RAG

# Query using Ontology RAG
tg-invoke-graph-rag \
  -f onto-rag \
  -C domain-docs \
  -q "What sensors were used to collect intelligence?"

Query process:

Vector search finds semantically similar entities
Type filtering ensures results match ontology types (only Sensor entities)
Graph traversal follows typed relationships (collectedBy, observedBy)
Subgraph extraction builds context from conformant entities
LLM generation uses typed, structured subgraph

Real-World Example: Intelligence Analysis

Scenario: Maritime Intelligence

Analyzing intelligence reports about vessel tracking and maritime activity.

Ontology: SOSA/SSN Extended

@prefix sosa: <http://www.w3.org/ns/sosa/> .
@prefix intel: <http://example.org/intel#> .

# Base SOSA classes
sosa:Sensor, sosa:Observation, sosa:ObservableProperty

# Extended intelligence classes
intel:IntelligenceReport rdfs:subClassOf sosa:Observation .
intel:MaritimeSensor rdfs:subClassOf sosa:Sensor .
intel:Vessel a owl:Class .
intel:Port a owl:Class .

# Properties
intel:trackingMethod rdfs:subPropertyOf sosa:usedProcedure .
intel:targetVessel a owl:ObjectProperty .
intel:operatedFrom a owl:ObjectProperty .

Document: Intelligence Report

PHANTOM CARGO - Intelligence Report

HUMINT sources report a suspicious cargo vessel "PHANTOM CARGO"
tracked via AIS transponders in the South China Sea. Satellite
imagery from KEYHOLE-12 confirmed vessel position at coordinates
12.5°N, 109.3°E near Port of Sihanoukville, Cambodia.

Collection methods included:
- AIS maritime tracking (ELINT)
- KH-12 satellite reconnaissance (IMINT)
- Human intelligence from port authority contacts (HUMINT)

The vessel exhibited anomalous behavior characteristic of
sanctions evasion operations.

Extraction Results

With SOSA/SSN + Intelligence ontology, TrustGraph extracts:

Entities (typed by ontology):

{
  entities: [
    {
      type: "intel:Vessel",
      uri: "http://example.org/vessels/phantom-cargo",
      properties: {
        name: "PHANTOM CARGO",
        behavior: "anomalous"
      }
    },
    {
      type: "intel:MaritimeSensor",
      uri: "http://example.org/sensors/ais-tracking",
      properties: {
        sensorType: "AIS Transponder",
        method: "ELINT"
      }
    },
    {
      type: "intel:MaritimeSensor",
      uri: "http://example.org/sensors/kh12-satellite",
      properties: {
        sensorType: "KEYHOLE-12 Satellite",
        method: "IMINT"
      }
    },
    {
      type: "sosa:Observation",
      uri: "http://example.org/observations/position-001",
      properties: {
        latitude: "12.5°N",
        longitude: "109.3°E",
        timestamp: "2024-12-20T10:00:00Z"
      }
    },
    {
      type: "intel:Port",
      uri: "http://example.org/locations/sihanoukville",
      properties: {
        name: "Port of Sihanoukville",
        country: "Cambodia"
      }
    }
  ],

  relationships: [
    {
      subject: "http://example.org/observations/position-001",
      predicate: "sosa:madeBySensor",
      object: "http://example.org/sensors/ais-tracking"
    },
    {
      subject: "http://example.org/observations/position-001",
      predicate: "intel:targetVessel",
      object: "http://example.org/vessels/phantom-cargo"
    },
    {
      subject: "http://example.org/vessels/phantom-cargo",
      predicate: "intel:locatedNear",
      object: "http://example.org/locations/sihanoukville"
    }
  ]
}

Notice:

All entities conform to ontology types
All relationships use ontology-defined properties
Untyped information (e.g., "sanctions evasion") not extracted unless in ontology

Query: "What intelligence collection methods were used?"

// Ontology RAG query
const results = await tg.invokeGraphRag({
  "flow-id": "onto-rag",
  "collection": "intelligence-reports",
  "query": "What intelligence collection methods were used?"
});

// Response leverages typed entities:
// "Three intelligence collection methods were used:
//  1. ELINT (AIS maritime tracking via transponders)
//  2. IMINT (KEYHOLE-12 satellite reconnaissance)
//  3. HUMINT (human intelligence from port authority contacts)
//
//  These sensors observed the vessel PHANTOM CARGO at position
//  12.5°N, 109.3°E near Port of Sihanoukville, Cambodia."

// Response cites:
// - Sensor entities (typed as intel:MaritimeSensor)
// - Observation entity (typed as sosa:Observation)
// - Location entity (typed as intel:Port)

Advantages of Ontology RAG

1. Very Precise Retrieval

Type constraints ensure exact matches:

// Query: "Find all Sensors"
// Ontology RAG returns only entities of type sosa:Sensor
// GraphRAG might return sensors, detectors, monitors (ambiguous)

// Query: "What did Sensor-X observe?"
// Ontology RAG follows sosa:madeBySensor relationships
// GraphRAG relies on text similarity (less precise)

2. Conformant Knowledge Graphs

All entities match ontology schema:

// Ontology defines:
// - Observation must have exactly one hasResult
// - Sensor can observe multiple ObservableProperty
// - FeatureOfInterest is connected via isFeatureOfInterestOf

// Extracted graph enforces these constraints
// Invalid structures rejected during extraction

3. Semantic Interoperability

Use standard ontologies for integration:

// Using FOAF (Friend of a Friend) ontology
// Entities conform to standard vocab
// Compatible with other FOAF-using systems

// Using Dublin Core for documents
// Metadata fields standardized (dc:creator, dc:date)
// Queryable with SPARQL across systems

4. Validation and Quality

Schema conformance ensures quality:

// Ontology defines cardinality constraints
// Ex: Observation must have exactly 1 result
// Ex: Sensor must observe at least 1 property

// Extraction validates constraints
// Incomplete extractions flagged or rejected

5. SPARQL Querying

Typed entities enable precise SPARQL queries:

# Query using ontology types
PREFIX sosa: <http://www.w3.org/ns/sosa/>

SELECT ?sensor ?property ?result
WHERE {
  ?obs a sosa:Observation ;
       sosa:madeBySensor ?sensor ;
       sosa:observedProperty ?property ;
       sosa:hasResult ?result .

  FILTER(?sensor = <http://example.org/sensors/ais-tracking>)
}

Tradeoffs and Limitations

1. Ontology Creation Complexity

Creating comprehensive ontologies is challenging:

# Ontology design requires:
# - Domain expertise
# - Understanding OWL semantics
# - Balancing expressiveness vs. complexity
# - Iteration and refinement

# Time investment: Days to weeks
# Alternative: Use existing ontologies (FOAF, SOSA, Dublin Core)

2. Token Costs During Extraction

Ontology-guided extraction uses LLM tokens:

// For each document chunk:
// 1. Ontology sent to LLM (context tokens)
// 2. LLM extracts conformant entities (generation tokens)
// 3. Validation and storage

// Token cost scales with:
// - Ontology size (larger = more context tokens)
// - Document complexity
// - Number of entity types

3. Rigidity

Schema enforcement limits flexibility:

// If entity doesn't match ontology types:
// - GraphRAG extracts it anyway
// - Ontology RAG ignores it

// Example: New entity type emerges
// GraphRAG: Automatically discovers
// Ontology RAG: Requires ontology update

4. Extraction Miss Rate

May miss entities not in ontology:

// Document mentions "drone surveillance"
// Ontology only defines "Satellite" and "AIS Sensor"
// "Drone" not extracted (not in ontology)

// Solution: Iterate ontology to add new types
// Or: Use GraphRAG for discovery, Ontology RAG for precision

Best Practices

1. Start with Existing Ontologies

# Popular ontologies by domain:
# - FOAF: Social networks and people
# - Dublin Core: Document metadata
# - SOSA/SSN: Sensors and observations
# - PROV-O: Provenance and data lineage
# - FIBO: Financial industry
# - SNOMED CT: Medical terminology

# Don't reinvent the wheel
# Extend existing ontologies for your domain

2. Keep Ontologies Focused

# Bad: Overly complex ontology
intel:Entity a owl:Class .
intel:PhysicalEntity rdfs:subClassOf intel:Entity .
intel:AbstractEntity rdfs:subClassOf intel:Entity .
intel:TemporalEntity rdfs:subClassOf intel:Entity .
# ... 50+ classes with deep hierarchies

# Good: Focused ontology
intel:Sensor a owl:Class .
intel:Observation a owl:Class .
intel:Location a owl:Class .
intel:Asset a owl:Class .
# 5-10 key classes, clear purpose

3. Combine with GraphRAG

Use both approaches on the same data:

// Discovery phase: Use GraphRAG
await tg.startLibraryProcessing({
  "flow-id": "graph-rag",
  // Discover what entities exist
});

// Precision phase: Use Ontology RAG
await tg.startLibraryProcessing({
  "flow-id": "onto-rag",
  // Extract with strict typing
});

// Query both for comprehensive results

4. Monitor and Iterate

# Use Grafana dashboards to monitor:
# - Extraction success rate
# - Entity type distribution
# - Missed entity patterns
# - LLM token consumption

# Iterate ontology based on:
# - Low extraction rates (ontology too restrictive)
# - Unexpected entity types in text
# - User query patterns

5. Use AI to Generate Ontologies

# Provide domain text to Claude/GPT-4
echo "Generate OWL ontology from this domain description:
[Your domain text here]

Include classes, properties, and hierarchies.
Output as Turtle format." | claude

# Review, refine, and import

Ontology RAG vs GraphRAG: Decision Guide

Question	Choose GraphRAG	Choose Ontology RAG
Do you have existing ontologies?	No	Yes ✓
Is type precision critical?	No	Yes ✓
Will schema change frequently?	Yes ✓	No
Is setup complexity acceptable?	No	Yes ✓
Need SPARQL compatibility?	No	Yes ✓
Exploratory analysis?	Yes ✓	No
Regulated/compliance domain?	No	Yes ✓

Recommendation: Start with GraphRAG for exploration, add Ontology RAG for precision where types matter.

Related Concepts

GraphRAG - Schema-free knowledge extraction
Semantic Structures - Ontologies, schemas, taxonomies
Semantic Web - RDF, OWL, SPARQL standards
Ontology - Formal semantic models
Context Engineering - Building optimal LLM context