TrustGraphGet Started
key conceptsadvanced

Ontology RAG: Schema-Driven Knowledge Extraction

Learn how Ontology RAG uses formal ontologies to extract structured, typed knowledge from unstructured text. Understand when to use schema-driven extraction vs. schema-free GraphRAG.

13 min read
Updated 12/24/2025
TrustGraph Team
#ontology-rag#rag#ontology#knowledge-graphs#owl

Ontology RAG: Schema-Driven Knowledge Extraction

Ontology RAG is an advanced RAG technique that uses formal ontologies (OWL schemas) to guide the extraction of structured, typed knowledge from unstructured text. Unlike GraphRAG's schema-free approach, Ontology RAG enforces conformance to predefined types, properties, and relationships, producing highly structured, semantically rich Knowledge Graphs.

What is Ontology RAG?

Ontology RAG combines:

  • Formal ontologies (OWL definitions) defining types, properties, and relationships
  • Guided extraction using ontology definitions to discover entities
  • Knowledge Graphs storing conformant, typed entities and relationships
  • Vector search for semantic similarity entry points
  • Graph traversal for relationship-aware retrieval

The Ontology RAG Pipeline

1. Document Chunking
   ↓
2. Ontology Loading (OWL definitions imported)
   ↓
3. Knowledge Extraction (guided by ontology)
   ↓
4. Entity Embedding (vector representations)
   ↓
5. Graph Storage (conformant to ontology schema)
   ↓
6. Semantic Retrieval (vector search finds entry points)
   ↓
7. Graph Traversal (discover related entities)
   ↓
8. LLM Generation (using structured subgraph)

Ontology RAG vs GraphRAG vs Document RAG

AspectDocument RAGGraphRAGOntology RAG
Retrieval MethodVector similarityGraph + vectorGraph + vector
Context StructureIsolated chunksConnected entitiesTyped, conformant entities
Schema RequirementsNoneNone (schema-free)OWL ontology required
Extraction ApproachN/A (no extraction)Automatic discoveryOntology-guided
Knowledge ValidationNo validationNo validationSchema conformance
Setup ComplexitySimpleSimpleComplex
Retrieval PrecisionLowHighVery High
Best ForSimple searchComplex relationshipsTyped domains

Key Difference: Schema-Free vs Schema-Driven

GraphRAG (Schema-Free):

// GraphRAG discovers entities automatically
// No predefined schema needed
await tg.startLibraryProcessing({
  "flow-id": "graph-rag",
  // Entities and relationships discovered automatically
  // Flexible, no constraints
});

Ontology RAG (Schema-Driven):

// Ontology RAG requires predefined ontology
// Extracts only entities that match ontology types
await tg.putConfigItem({
  type: "ontology",
  key: "domain-ontology",
  // OWL ontology defines:
  // - Valid entity types (Person, Organization, Event)
  // - Valid properties (hasName, locatedIn)
  // - Valid relationships (worksAt, participatedIn)
  ontology: owlDefinitions
});

await tg.startLibraryProcessing({
  "flow-id": "onto-rag",
  // Extraction conforms to ontology schema
});

When to Use Ontology RAG

Use Ontology RAG When:

Existing Ontologies Available

  • Domain has established ontologies (FOAF, SOSA/SSN, Dublin Core)
  • Industry standards exist (healthcare, cybersecurity, scientific domains)
  • Taxonomies and schemas already defined

Type Precision Required

  • Need strict entity typing (this IS a Sensor, not maybe a Device)
  • Regulatory compliance requires structured data
  • Data integration across systems with different formats

Complex Relationships with Types

  • Relationships need typing (observedBy, measuredAt, collectedBy)
  • Property constraints matter (temperature must be numeric)
  • Hierarchical relationships (subClassOf, subPropertyOf)

Knowledge Graph Conformance

  • Integration with semantic web systems
  • Need RDF/OWL compatibility
  • Query using SPARQL with typed patterns

Specialist Domains

  • Intelligence analysis
  • Cybersecurity (threat models, TTPs)
  • Scientific research (sensors, observations, measurements)
  • Legal documents (statutes, cases, citations)

Use GraphRAG Instead When:

Schema Definition is Prohibitively Complex

  • Domain is too diverse or undefined
  • Creating ontology costs more than benefit
  • Schema changes frequently

Schema-Free Flexibility Preferred

  • Exploratory data analysis
  • Rapidly changing data models
  • Unknown entity types

Simple Use Cases

  • Document RAG sufficient for keyword search
  • No need for relationship understanding

Ontology Fundamentals

What is an OWL Ontology?

OWL (Web Ontology Language) defines:

  1. Classes - Types of things (Sensor, Observation, Location)
  2. Properties - Attributes and relationships (hasName, measuredAt, observedBy)
  3. Individuals - Specific instances (Sensor-123, Observation-456)
  4. Axioms - Rules and constraints (Observation must have exactly one Result)

Example: SOSA/SSN Ontology

SOSA (Sensor, Observation, Sample, and Actuator) is a W3C standard for sensor data:

@prefix sosa: <http://www.w3.org/ns/sosa/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .

# Class definitions
sosa:Sensor a owl:Class ;
    rdfs:label "Sensor" ;
    rdfs:comment "Device, agent, or software that observes" .

sosa:Observation a owl:Class ;
    rdfs:label "Observation" ;
    rdfs:comment "Act of carrying out observation procedure" .

sosa:ObservableProperty a owl:Class ;
    rdfs:label "Observable Property" ;
    rdfs:comment "Quality that can be observed" .

# Property definitions
sosa:observedProperty a owl:ObjectProperty ;
    rdfs:domain sosa:Observation ;
    rdfs:range sosa:ObservableProperty .

sosa:madeBySensor a owl:ObjectProperty ;
    rdfs:domain sosa:Observation ;
    rdfs:range sosa:Sensor .

sosa:hasResult a owl:DatatypeProperty ;
    rdfs:domain sosa:Observation ;
    rdfs:range xsd:string .

TrustGraph Ontology Format

TrustGraph uses OWL ontologies converted to proprietary JSON format:

{
  "classes": [
    {
      "id": "http://www.w3.org/ns/sosa/Sensor",
      "label": "Sensor",
      "comment": "Device that observes properties"
    },
    {
      "id": "http://www.w3.org/ns/sosa/Observation",
      "label": "Observation",
      "comment": "Act of carrying out observation"
    }
  ],
  "properties": [
    {
      "id": "http://www.w3.org/ns/sosa/madeBySensor",
      "domain": "http://www.w3.org/ns/sosa/Observation",
      "range": "http://www.w3.org/ns/sosa/Sensor"
    }
  ]
}

Implementing Ontology RAG

Step 1: Define or Import Ontology

Option A: Use Existing Ontology

# Download standard ontology (e.g., SOSA/SSN)
curl -o sosa.ttl https://www.w3.org/ns/sosa/

# Convert to TrustGraph format (if needed)
# Or use Workbench Ontology Editor to import

Option B: Create Custom Ontology

Use Workbench Ontology Editor to create domain-specific ontology:

@prefix intel: <http://example.org/intel#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .

# Intelligence domain ontology
intel:IntelligenceReport a owl:Class .
intel:Sensor a owl:Class .
intel:Location a owl:Class .
intel:Asset a owl:Class .

intel:collectedBy a owl:ObjectProperty ;
    rdfs:domain intel:IntelligenceReport ;
    rdfs:range intel:Sensor .

intel:locatedAt a owl:ObjectProperty ;
    rdfs:domain intel:Asset ;
    rdfs:range intel:Location .

Option C: Generate with AI

Use Claude or GPT-4 to generate ontologies from domain text:

# Provide domain description to LLM
echo "Generate an OWL ontology for maritime tracking including:
- Vessels (ships, cargo carriers)
- Ports and locations
- Tracking sensors (AIS, radar, satellite)
- Observations (position, speed, heading)
- Intelligence collection methods" | claude

# Review and refine generated ontology

Step 2: Install Ontology in TrustGraph

# Install ontology as configuration item
cat domain-ontology.json | tg-put-config-item \
  --type ontology \
  --key my-domain-ontology \
  --stdin

The ontology becomes available to all Ontology RAG flows.

Step 3: Create Collection

# Create collection for documents
tg-set-collection \
  -n "Domain Documents" \
  -d "Documents for ontology-based extraction" \
  domain-docs

Step 4: Add Documents

# Add document to library
tg-add-library-document \
  --name "Intelligence Report 2024-001" \
  --description "Maritime tracking intelligence" \
  --tags 'intelligence,maritime,tracking' \
  --id https://example.org/reports/2024-001 \
  --kind text/plain \
  report-2024-001.txt

Step 5: Create Ontology RAG Flow

# Create ontology RAG flow
tg-start-flow \
  -n onto-rag \
  -i onto-rag \
  -d "Ontology RAG processing flow"

The onto-rag flow type uses installed ontologies to guide extraction.

Step 6: Process Documents

# Submit document for ontology-guided processing
tg-start-library-processing \
  --flow-id onto-rag \
  --document-id https://example.org/reports/2024-001 \
  --collection domain-docs \
  --processing-id urn:processing-001

What happens during processing:

  1. Document is chunked into segments
  2. Ontology is loaded into extraction context
  3. LLM extracts entities matching ontology classes:
    • Only extracts Sensors, Observations, Locations (as defined in ontology)
    • Ignores entities not in ontology
  4. Relationships are typed according to ontology properties
  5. Validation ensures conformance to ontology schema
  6. Entities are embedded and stored in vector database
  7. Knowledge Graph is populated with conformant triples

Step 7: Query with Ontology RAG

# Query using Ontology RAG
tg-invoke-graph-rag \
  -f onto-rag \
  -C domain-docs \
  -q "What sensors were used to collect intelligence?"

Query process:

  1. Vector search finds semantically similar entities
  2. Type filtering ensures results match ontology types (only Sensor entities)
  3. Graph traversal follows typed relationships (collectedBy, observedBy)
  4. Subgraph extraction builds context from conformant entities
  5. LLM generation uses typed, structured subgraph

Real-World Example: Intelligence Analysis

Scenario: Maritime Intelligence

Analyzing intelligence reports about vessel tracking and maritime activity.

Ontology: SOSA/SSN Extended

@prefix sosa: <http://www.w3.org/ns/sosa/> .
@prefix intel: <http://example.org/intel#> .

# Base SOSA classes
sosa:Sensor, sosa:Observation, sosa:ObservableProperty

# Extended intelligence classes
intel:IntelligenceReport rdfs:subClassOf sosa:Observation .
intel:MaritimeSensor rdfs:subClassOf sosa:Sensor .
intel:Vessel a owl:Class .
intel:Port a owl:Class .

# Properties
intel:trackingMethod rdfs:subPropertyOf sosa:usedProcedure .
intel:targetVessel a owl:ObjectProperty .
intel:operatedFrom a owl:ObjectProperty .

Document: Intelligence Report

PHANTOM CARGO - Intelligence Report

HUMINT sources report a suspicious cargo vessel "PHANTOM CARGO"
tracked via AIS transponders in the South China Sea. Satellite
imagery from KEYHOLE-12 confirmed vessel position at coordinates
12.5°N, 109.3°E near Port of Sihanoukville, Cambodia.

Collection methods included:
- AIS maritime tracking (ELINT)
- KH-12 satellite reconnaissance (IMINT)
- Human intelligence from port authority contacts (HUMINT)

The vessel exhibited anomalous behavior characteristic of
sanctions evasion operations.

Extraction Results

With SOSA/SSN + Intelligence ontology, TrustGraph extracts:

Entities (typed by ontology):

{
  entities: [
    {
      type: "intel:Vessel",
      uri: "http://example.org/vessels/phantom-cargo",
      properties: {
        name: "PHANTOM CARGO",
        behavior: "anomalous"
      }
    },
    {
      type: "intel:MaritimeSensor",
      uri: "http://example.org/sensors/ais-tracking",
      properties: {
        sensorType: "AIS Transponder",
        method: "ELINT"
      }
    },
    {
      type: "intel:MaritimeSensor",
      uri: "http://example.org/sensors/kh12-satellite",
      properties: {
        sensorType: "KEYHOLE-12 Satellite",
        method: "IMINT"
      }
    },
    {
      type: "sosa:Observation",
      uri: "http://example.org/observations/position-001",
      properties: {
        latitude: "12.5°N",
        longitude: "109.3°E",
        timestamp: "2024-12-20T10:00:00Z"
      }
    },
    {
      type: "intel:Port",
      uri: "http://example.org/locations/sihanoukville",
      properties: {
        name: "Port of Sihanoukville",
        country: "Cambodia"
      }
    }
  ],

  relationships: [
    {
      subject: "http://example.org/observations/position-001",
      predicate: "sosa:madeBySensor",
      object: "http://example.org/sensors/ais-tracking"
    },
    {
      subject: "http://example.org/observations/position-001",
      predicate: "intel:targetVessel",
      object: "http://example.org/vessels/phantom-cargo"
    },
    {
      subject: "http://example.org/vessels/phantom-cargo",
      predicate: "intel:locatedNear",
      object: "http://example.org/locations/sihanoukville"
    }
  ]
}

Notice:

  • All entities conform to ontology types
  • All relationships use ontology-defined properties
  • Untyped information (e.g., "sanctions evasion") not extracted unless in ontology

Query: "What intelligence collection methods were used?"

// Ontology RAG query
const results = await tg.invokeGraphRag({
  "flow-id": "onto-rag",
  "collection": "intelligence-reports",
  "query": "What intelligence collection methods were used?"
});

// Response leverages typed entities:
// "Three intelligence collection methods were used:
//  1. ELINT (AIS maritime tracking via transponders)
//  2. IMINT (KEYHOLE-12 satellite reconnaissance)
//  3. HUMINT (human intelligence from port authority contacts)
//
//  These sensors observed the vessel PHANTOM CARGO at position
//  12.5°N, 109.3°E near Port of Sihanoukville, Cambodia."

// Response cites:
// - Sensor entities (typed as intel:MaritimeSensor)
// - Observation entity (typed as sosa:Observation)
// - Location entity (typed as intel:Port)

Advantages of Ontology RAG

1. Very Precise Retrieval

Type constraints ensure exact matches:

// Query: "Find all Sensors"
// Ontology RAG returns only entities of type sosa:Sensor
// GraphRAG might return sensors, detectors, monitors (ambiguous)

// Query: "What did Sensor-X observe?"
// Ontology RAG follows sosa:madeBySensor relationships
// GraphRAG relies on text similarity (less precise)

2. Conformant Knowledge Graphs

All entities match ontology schema:

// Ontology defines:
// - Observation must have exactly one hasResult
// - Sensor can observe multiple ObservableProperty
// - FeatureOfInterest is connected via isFeatureOfInterestOf

// Extracted graph enforces these constraints
// Invalid structures rejected during extraction

3. Semantic Interoperability

Use standard ontologies for integration:

// Using FOAF (Friend of a Friend) ontology
// Entities conform to standard vocab
// Compatible with other FOAF-using systems

// Using Dublin Core for documents
// Metadata fields standardized (dc:creator, dc:date)
// Queryable with SPARQL across systems

4. Validation and Quality

Schema conformance ensures quality:

// Ontology defines cardinality constraints
// Ex: Observation must have exactly 1 result
// Ex: Sensor must observe at least 1 property

// Extraction validates constraints
// Incomplete extractions flagged or rejected

5. SPARQL Querying

Typed entities enable precise SPARQL queries:

# Query using ontology types
PREFIX sosa: <http://www.w3.org/ns/sosa/>

SELECT ?sensor ?property ?result
WHERE {
  ?obs a sosa:Observation ;
       sosa:madeBySensor ?sensor ;
       sosa:observedProperty ?property ;
       sosa:hasResult ?result .

  FILTER(?sensor = <http://example.org/sensors/ais-tracking>)
}

Tradeoffs and Limitations

1. Ontology Creation Complexity

Creating comprehensive ontologies is challenging:

# Ontology design requires:
# - Domain expertise
# - Understanding OWL semantics
# - Balancing expressiveness vs. complexity
# - Iteration and refinement

# Time investment: Days to weeks
# Alternative: Use existing ontologies (FOAF, SOSA, Dublin Core)

2. Token Costs During Extraction

Ontology-guided extraction uses LLM tokens:

// For each document chunk:
// 1. Ontology sent to LLM (context tokens)
// 2. LLM extracts conformant entities (generation tokens)
// 3. Validation and storage

// Token cost scales with:
// - Ontology size (larger = more context tokens)
// - Document complexity
// - Number of entity types

3. Rigidity

Schema enforcement limits flexibility:

// If entity doesn't match ontology types:
// - GraphRAG extracts it anyway
// - Ontology RAG ignores it

// Example: New entity type emerges
// GraphRAG: Automatically discovers
// Ontology RAG: Requires ontology update

4. Extraction Miss Rate

May miss entities not in ontology:

// Document mentions "drone surveillance"
// Ontology only defines "Satellite" and "AIS Sensor"
// "Drone" not extracted (not in ontology)

// Solution: Iterate ontology to add new types
// Or: Use GraphRAG for discovery, Ontology RAG for precision

Best Practices

1. Start with Existing Ontologies

# Popular ontologies by domain:
# - FOAF: Social networks and people
# - Dublin Core: Document metadata
# - SOSA/SSN: Sensors and observations
# - PROV-O: Provenance and data lineage
# - FIBO: Financial industry
# - SNOMED CT: Medical terminology

# Don't reinvent the wheel
# Extend existing ontologies for your domain

2. Keep Ontologies Focused

# Bad: Overly complex ontology
intel:Entity a owl:Class .
intel:PhysicalEntity rdfs:subClassOf intel:Entity .
intel:AbstractEntity rdfs:subClassOf intel:Entity .
intel:TemporalEntity rdfs:subClassOf intel:Entity .
# ... 50+ classes with deep hierarchies

# Good: Focused ontology
intel:Sensor a owl:Class .
intel:Observation a owl:Class .
intel:Location a owl:Class .
intel:Asset a owl:Class .
# 5-10 key classes, clear purpose

3. Combine with GraphRAG

Use both approaches on the same data:

// Discovery phase: Use GraphRAG
await tg.startLibraryProcessing({
  "flow-id": "graph-rag",
  // Discover what entities exist
});

// Precision phase: Use Ontology RAG
await tg.startLibraryProcessing({
  "flow-id": "onto-rag",
  // Extract with strict typing
});

// Query both for comprehensive results

4. Monitor and Iterate

# Use Grafana dashboards to monitor:
# - Extraction success rate
# - Entity type distribution
# - Missed entity patterns
# - LLM token consumption

# Iterate ontology based on:
# - Low extraction rates (ontology too restrictive)
# - Unexpected entity types in text
# - User query patterns

5. Use AI to Generate Ontologies

# Provide domain text to Claude/GPT-4
echo "Generate OWL ontology from this domain description:
[Your domain text here]

Include classes, properties, and hierarchies.
Output as Turtle format." | claude

# Review, refine, and import

Ontology RAG vs GraphRAG: Decision Guide

QuestionChoose GraphRAGChoose Ontology RAG
Do you have existing ontologies?NoYes
Is type precision critical?NoYes
Will schema change frequently?YesNo
Is setup complexity acceptable?NoYes
Need SPARQL compatibility?NoYes
Exploratory analysis?YesNo
Regulated/compliance domain?NoYes

Recommendation: Start with GraphRAG for exploration, add Ontology RAG for precision where types matter.

Related Concepts

Learn More