Ontology RAG: Schema-Driven Knowledge Extraction
Learn how Ontology RAG uses formal ontologies to extract structured, typed knowledge from unstructured text. Understand when to use schema-driven extraction vs. schema-free GraphRAG.
Ontology RAG: Schema-Driven Knowledge Extraction
Ontology RAG is an advanced RAG technique that uses formal ontologies (OWL schemas) to guide the extraction of structured, typed knowledge from unstructured text. Unlike GraphRAG's schema-free approach, Ontology RAG enforces conformance to predefined types, properties, and relationships, producing highly structured, semantically rich Knowledge Graphs.
What is Ontology RAG?
Ontology RAG combines:
- Formal ontologies (OWL definitions) defining types, properties, and relationships
- Guided extraction using ontology definitions to discover entities
- Knowledge Graphs storing conformant, typed entities and relationships
- Vector search for semantic similarity entry points
- Graph traversal for relationship-aware retrieval
The Ontology RAG Pipeline
1. Document Chunking
↓
2. Ontology Loading (OWL definitions imported)
↓
3. Knowledge Extraction (guided by ontology)
↓
4. Entity Embedding (vector representations)
↓
5. Graph Storage (conformant to ontology schema)
↓
6. Semantic Retrieval (vector search finds entry points)
↓
7. Graph Traversal (discover related entities)
↓
8. LLM Generation (using structured subgraph)
Ontology RAG vs GraphRAG vs Document RAG
| Aspect | Document RAG | GraphRAG | Ontology RAG |
|---|---|---|---|
| Retrieval Method | Vector similarity | Graph + vector | Graph + vector |
| Context Structure | Isolated chunks | Connected entities | Typed, conformant entities |
| Schema Requirements | None | None (schema-free) | OWL ontology required |
| Extraction Approach | N/A (no extraction) | Automatic discovery | Ontology-guided |
| Knowledge Validation | No validation | No validation | Schema conformance |
| Setup Complexity | Simple | Simple | Complex |
| Retrieval Precision | Low | High | Very High |
| Best For | Simple search | Complex relationships | Typed domains |
Key Difference: Schema-Free vs Schema-Driven
GraphRAG (Schema-Free):
// GraphRAG discovers entities automatically
// No predefined schema needed
await tg.startLibraryProcessing({
"flow-id": "graph-rag",
// Entities and relationships discovered automatically
// Flexible, no constraints
});
Ontology RAG (Schema-Driven):
// Ontology RAG requires predefined ontology
// Extracts only entities that match ontology types
await tg.putConfigItem({
type: "ontology",
key: "domain-ontology",
// OWL ontology defines:
// - Valid entity types (Person, Organization, Event)
// - Valid properties (hasName, locatedIn)
// - Valid relationships (worksAt, participatedIn)
ontology: owlDefinitions
});
await tg.startLibraryProcessing({
"flow-id": "onto-rag",
// Extraction conforms to ontology schema
});
When to Use Ontology RAG
Use Ontology RAG When:
✅ Existing Ontologies Available
- Domain has established ontologies (FOAF, SOSA/SSN, Dublin Core)
- Industry standards exist (healthcare, cybersecurity, scientific domains)
- Taxonomies and schemas already defined
✅ Type Precision Required
- Need strict entity typing (this IS a Sensor, not maybe a Device)
- Regulatory compliance requires structured data
- Data integration across systems with different formats
✅ Complex Relationships with Types
- Relationships need typing (observedBy, measuredAt, collectedBy)
- Property constraints matter (temperature must be numeric)
- Hierarchical relationships (subClassOf, subPropertyOf)
✅ Knowledge Graph Conformance
- Integration with semantic web systems
- Need RDF/OWL compatibility
- Query using SPARQL with typed patterns
✅ Specialist Domains
- Intelligence analysis
- Cybersecurity (threat models, TTPs)
- Scientific research (sensors, observations, measurements)
- Legal documents (statutes, cases, citations)
Use GraphRAG Instead When:
❌ Schema Definition is Prohibitively Complex
- Domain is too diverse or undefined
- Creating ontology costs more than benefit
- Schema changes frequently
❌ Schema-Free Flexibility Preferred
- Exploratory data analysis
- Rapidly changing data models
- Unknown entity types
❌ Simple Use Cases
- Document RAG sufficient for keyword search
- No need for relationship understanding
Ontology Fundamentals
What is an OWL Ontology?
OWL (Web Ontology Language) defines:
- Classes - Types of things (Sensor, Observation, Location)
- Properties - Attributes and relationships (hasName, measuredAt, observedBy)
- Individuals - Specific instances (Sensor-123, Observation-456)
- Axioms - Rules and constraints (Observation must have exactly one Result)
Example: SOSA/SSN Ontology
SOSA (Sensor, Observation, Sample, and Actuator) is a W3C standard for sensor data:
@prefix sosa: <http://www.w3.org/ns/sosa/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
# Class definitions
sosa:Sensor a owl:Class ;
rdfs:label "Sensor" ;
rdfs:comment "Device, agent, or software that observes" .
sosa:Observation a owl:Class ;
rdfs:label "Observation" ;
rdfs:comment "Act of carrying out observation procedure" .
sosa:ObservableProperty a owl:Class ;
rdfs:label "Observable Property" ;
rdfs:comment "Quality that can be observed" .
# Property definitions
sosa:observedProperty a owl:ObjectProperty ;
rdfs:domain sosa:Observation ;
rdfs:range sosa:ObservableProperty .
sosa:madeBySensor a owl:ObjectProperty ;
rdfs:domain sosa:Observation ;
rdfs:range sosa:Sensor .
sosa:hasResult a owl:DatatypeProperty ;
rdfs:domain sosa:Observation ;
rdfs:range xsd:string .
TrustGraph Ontology Format
TrustGraph uses OWL ontologies converted to proprietary JSON format:
{
"classes": [
{
"id": "http://www.w3.org/ns/sosa/Sensor",
"label": "Sensor",
"comment": "Device that observes properties"
},
{
"id": "http://www.w3.org/ns/sosa/Observation",
"label": "Observation",
"comment": "Act of carrying out observation"
}
],
"properties": [
{
"id": "http://www.w3.org/ns/sosa/madeBySensor",
"domain": "http://www.w3.org/ns/sosa/Observation",
"range": "http://www.w3.org/ns/sosa/Sensor"
}
]
}
Implementing Ontology RAG
Step 1: Define or Import Ontology
Option A: Use Existing Ontology
# Download standard ontology (e.g., SOSA/SSN)
curl -o sosa.ttl https://www.w3.org/ns/sosa/
# Convert to TrustGraph format (if needed)
# Or use Workbench Ontology Editor to import
Option B: Create Custom Ontology
Use Workbench Ontology Editor to create domain-specific ontology:
@prefix intel: <http://example.org/intel#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
# Intelligence domain ontology
intel:IntelligenceReport a owl:Class .
intel:Sensor a owl:Class .
intel:Location a owl:Class .
intel:Asset a owl:Class .
intel:collectedBy a owl:ObjectProperty ;
rdfs:domain intel:IntelligenceReport ;
rdfs:range intel:Sensor .
intel:locatedAt a owl:ObjectProperty ;
rdfs:domain intel:Asset ;
rdfs:range intel:Location .
Option C: Generate with AI
Use Claude or GPT-4 to generate ontologies from domain text:
# Provide domain description to LLM
echo "Generate an OWL ontology for maritime tracking including:
- Vessels (ships, cargo carriers)
- Ports and locations
- Tracking sensors (AIS, radar, satellite)
- Observations (position, speed, heading)
- Intelligence collection methods" | claude
# Review and refine generated ontology
Step 2: Install Ontology in TrustGraph
# Install ontology as configuration item
cat domain-ontology.json | tg-put-config-item \
--type ontology \
--key my-domain-ontology \
--stdin
The ontology becomes available to all Ontology RAG flows.
Step 3: Create Collection
# Create collection for documents
tg-set-collection \
-n "Domain Documents" \
-d "Documents for ontology-based extraction" \
domain-docs
Step 4: Add Documents
# Add document to library
tg-add-library-document \
--name "Intelligence Report 2024-001" \
--description "Maritime tracking intelligence" \
--tags 'intelligence,maritime,tracking' \
--id https://example.org/reports/2024-001 \
--kind text/plain \
report-2024-001.txt
Step 5: Create Ontology RAG Flow
# Create ontology RAG flow
tg-start-flow \
-n onto-rag \
-i onto-rag \
-d "Ontology RAG processing flow"
The onto-rag flow type uses installed ontologies to guide extraction.
Step 6: Process Documents
# Submit document for ontology-guided processing
tg-start-library-processing \
--flow-id onto-rag \
--document-id https://example.org/reports/2024-001 \
--collection domain-docs \
--processing-id urn:processing-001
What happens during processing:
- Document is chunked into segments
- Ontology is loaded into extraction context
- LLM extracts entities matching ontology classes:
- Only extracts Sensors, Observations, Locations (as defined in ontology)
- Ignores entities not in ontology
- Relationships are typed according to ontology properties
- Validation ensures conformance to ontology schema
- Entities are embedded and stored in vector database
- Knowledge Graph is populated with conformant triples
Step 7: Query with Ontology RAG
# Query using Ontology RAG
tg-invoke-graph-rag \
-f onto-rag \
-C domain-docs \
-q "What sensors were used to collect intelligence?"
Query process:
- Vector search finds semantically similar entities
- Type filtering ensures results match ontology types (only Sensor entities)
- Graph traversal follows typed relationships (collectedBy, observedBy)
- Subgraph extraction builds context from conformant entities
- LLM generation uses typed, structured subgraph
Real-World Example: Intelligence Analysis
Scenario: Maritime Intelligence
Analyzing intelligence reports about vessel tracking and maritime activity.
Ontology: SOSA/SSN Extended
@prefix sosa: <http://www.w3.org/ns/sosa/> .
@prefix intel: <http://example.org/intel#> .
# Base SOSA classes
sosa:Sensor, sosa:Observation, sosa:ObservableProperty
# Extended intelligence classes
intel:IntelligenceReport rdfs:subClassOf sosa:Observation .
intel:MaritimeSensor rdfs:subClassOf sosa:Sensor .
intel:Vessel a owl:Class .
intel:Port a owl:Class .
# Properties
intel:trackingMethod rdfs:subPropertyOf sosa:usedProcedure .
intel:targetVessel a owl:ObjectProperty .
intel:operatedFrom a owl:ObjectProperty .
Document: Intelligence Report
PHANTOM CARGO - Intelligence Report
HUMINT sources report a suspicious cargo vessel "PHANTOM CARGO"
tracked via AIS transponders in the South China Sea. Satellite
imagery from KEYHOLE-12 confirmed vessel position at coordinates
12.5°N, 109.3°E near Port of Sihanoukville, Cambodia.
Collection methods included:
- AIS maritime tracking (ELINT)
- KH-12 satellite reconnaissance (IMINT)
- Human intelligence from port authority contacts (HUMINT)
The vessel exhibited anomalous behavior characteristic of
sanctions evasion operations.
Extraction Results
With SOSA/SSN + Intelligence ontology, TrustGraph extracts:
Entities (typed by ontology):
{
entities: [
{
type: "intel:Vessel",
uri: "http://example.org/vessels/phantom-cargo",
properties: {
name: "PHANTOM CARGO",
behavior: "anomalous"
}
},
{
type: "intel:MaritimeSensor",
uri: "http://example.org/sensors/ais-tracking",
properties: {
sensorType: "AIS Transponder",
method: "ELINT"
}
},
{
type: "intel:MaritimeSensor",
uri: "http://example.org/sensors/kh12-satellite",
properties: {
sensorType: "KEYHOLE-12 Satellite",
method: "IMINT"
}
},
{
type: "sosa:Observation",
uri: "http://example.org/observations/position-001",
properties: {
latitude: "12.5°N",
longitude: "109.3°E",
timestamp: "2024-12-20T10:00:00Z"
}
},
{
type: "intel:Port",
uri: "http://example.org/locations/sihanoukville",
properties: {
name: "Port of Sihanoukville",
country: "Cambodia"
}
}
],
relationships: [
{
subject: "http://example.org/observations/position-001",
predicate: "sosa:madeBySensor",
object: "http://example.org/sensors/ais-tracking"
},
{
subject: "http://example.org/observations/position-001",
predicate: "intel:targetVessel",
object: "http://example.org/vessels/phantom-cargo"
},
{
subject: "http://example.org/vessels/phantom-cargo",
predicate: "intel:locatedNear",
object: "http://example.org/locations/sihanoukville"
}
]
}
Notice:
- All entities conform to ontology types
- All relationships use ontology-defined properties
- Untyped information (e.g., "sanctions evasion") not extracted unless in ontology
Query: "What intelligence collection methods were used?"
// Ontology RAG query
const results = await tg.invokeGraphRag({
"flow-id": "onto-rag",
"collection": "intelligence-reports",
"query": "What intelligence collection methods were used?"
});
// Response leverages typed entities:
// "Three intelligence collection methods were used:
// 1. ELINT (AIS maritime tracking via transponders)
// 2. IMINT (KEYHOLE-12 satellite reconnaissance)
// 3. HUMINT (human intelligence from port authority contacts)
//
// These sensors observed the vessel PHANTOM CARGO at position
// 12.5°N, 109.3°E near Port of Sihanoukville, Cambodia."
// Response cites:
// - Sensor entities (typed as intel:MaritimeSensor)
// - Observation entity (typed as sosa:Observation)
// - Location entity (typed as intel:Port)
Advantages of Ontology RAG
1. Very Precise Retrieval
Type constraints ensure exact matches:
// Query: "Find all Sensors"
// Ontology RAG returns only entities of type sosa:Sensor
// GraphRAG might return sensors, detectors, monitors (ambiguous)
// Query: "What did Sensor-X observe?"
// Ontology RAG follows sosa:madeBySensor relationships
// GraphRAG relies on text similarity (less precise)
2. Conformant Knowledge Graphs
All entities match ontology schema:
// Ontology defines:
// - Observation must have exactly one hasResult
// - Sensor can observe multiple ObservableProperty
// - FeatureOfInterest is connected via isFeatureOfInterestOf
// Extracted graph enforces these constraints
// Invalid structures rejected during extraction
3. Semantic Interoperability
Use standard ontologies for integration:
// Using FOAF (Friend of a Friend) ontology
// Entities conform to standard vocab
// Compatible with other FOAF-using systems
// Using Dublin Core for documents
// Metadata fields standardized (dc:creator, dc:date)
// Queryable with SPARQL across systems
4. Validation and Quality
Schema conformance ensures quality:
// Ontology defines cardinality constraints
// Ex: Observation must have exactly 1 result
// Ex: Sensor must observe at least 1 property
// Extraction validates constraints
// Incomplete extractions flagged or rejected
5. SPARQL Querying
Typed entities enable precise SPARQL queries:
# Query using ontology types
PREFIX sosa: <http://www.w3.org/ns/sosa/>
SELECT ?sensor ?property ?result
WHERE {
?obs a sosa:Observation ;
sosa:madeBySensor ?sensor ;
sosa:observedProperty ?property ;
sosa:hasResult ?result .
FILTER(?sensor = <http://example.org/sensors/ais-tracking>)
}
Tradeoffs and Limitations
1. Ontology Creation Complexity
Creating comprehensive ontologies is challenging:
# Ontology design requires:
# - Domain expertise
# - Understanding OWL semantics
# - Balancing expressiveness vs. complexity
# - Iteration and refinement
# Time investment: Days to weeks
# Alternative: Use existing ontologies (FOAF, SOSA, Dublin Core)
2. Token Costs During Extraction
Ontology-guided extraction uses LLM tokens:
// For each document chunk:
// 1. Ontology sent to LLM (context tokens)
// 2. LLM extracts conformant entities (generation tokens)
// 3. Validation and storage
// Token cost scales with:
// - Ontology size (larger = more context tokens)
// - Document complexity
// - Number of entity types
3. Rigidity
Schema enforcement limits flexibility:
// If entity doesn't match ontology types:
// - GraphRAG extracts it anyway
// - Ontology RAG ignores it
// Example: New entity type emerges
// GraphRAG: Automatically discovers
// Ontology RAG: Requires ontology update
4. Extraction Miss Rate
May miss entities not in ontology:
// Document mentions "drone surveillance"
// Ontology only defines "Satellite" and "AIS Sensor"
// "Drone" not extracted (not in ontology)
// Solution: Iterate ontology to add new types
// Or: Use GraphRAG for discovery, Ontology RAG for precision
Best Practices
1. Start with Existing Ontologies
# Popular ontologies by domain:
# - FOAF: Social networks and people
# - Dublin Core: Document metadata
# - SOSA/SSN: Sensors and observations
# - PROV-O: Provenance and data lineage
# - FIBO: Financial industry
# - SNOMED CT: Medical terminology
# Don't reinvent the wheel
# Extend existing ontologies for your domain
2. Keep Ontologies Focused
# Bad: Overly complex ontology
intel:Entity a owl:Class .
intel:PhysicalEntity rdfs:subClassOf intel:Entity .
intel:AbstractEntity rdfs:subClassOf intel:Entity .
intel:TemporalEntity rdfs:subClassOf intel:Entity .
# ... 50+ classes with deep hierarchies
# Good: Focused ontology
intel:Sensor a owl:Class .
intel:Observation a owl:Class .
intel:Location a owl:Class .
intel:Asset a owl:Class .
# 5-10 key classes, clear purpose
3. Combine with GraphRAG
Use both approaches on the same data:
// Discovery phase: Use GraphRAG
await tg.startLibraryProcessing({
"flow-id": "graph-rag",
// Discover what entities exist
});
// Precision phase: Use Ontology RAG
await tg.startLibraryProcessing({
"flow-id": "onto-rag",
// Extract with strict typing
});
// Query both for comprehensive results
4. Monitor and Iterate
# Use Grafana dashboards to monitor:
# - Extraction success rate
# - Entity type distribution
# - Missed entity patterns
# - LLM token consumption
# Iterate ontology based on:
# - Low extraction rates (ontology too restrictive)
# - Unexpected entity types in text
# - User query patterns
5. Use AI to Generate Ontologies
# Provide domain text to Claude/GPT-4
echo "Generate OWL ontology from this domain description:
[Your domain text here]
Include classes, properties, and hierarchies.
Output as Turtle format." | claude
# Review, refine, and import
Ontology RAG vs GraphRAG: Decision Guide
| Question | Choose GraphRAG | Choose Ontology RAG |
|---|---|---|
| Do you have existing ontologies? | No | Yes ✓ |
| Is type precision critical? | No | Yes ✓ |
| Will schema change frequently? | Yes ✓ | No |
| Is setup complexity acceptable? | No | Yes ✓ |
| Need SPARQL compatibility? | No | Yes ✓ |
| Exploratory analysis? | Yes ✓ | No |
| Regulated/compliance domain? | No | Yes ✓ |
Recommendation: Start with GraphRAG for exploration, add Ontology RAG for precision where types matter.
Related Concepts
- GraphRAG - Schema-free knowledge extraction
- Semantic Structures - Ontologies, schemas, taxonomies
- Semantic Web - RDF, OWL, SPARQL standards
- Ontology - Formal semantic models
- Context Engineering - Building optimal LLM context