Basics Stack Example Turtle TTL Reasoning Query Validate Assumptions Compare Practice

Ontologies and Knowledge Graphs

A knowledge graph stores facts as three part sentences. An ontology is the rulebook that says which nouns and which verbs are allowed, what they mean, and what a machine is permitted to conclude from them. Everything below is built around a single worked example from grade 11 and 12 science.

Subject ex:RadioactiveDecay an individual

→

Predicate ex:isModelledBy the verb

→

Object ex:ExponentialODE an individual

That is one triple. One edge in the graph. A knowledge graph with a billion facts is just a billion of these, and nothing else.

Classes are nouns, properties are verbs

Almost every formal term in this field has a plain English counterpart.

Everyday word	Formal term	What it means	From our example
Noun, a kind of thing	`owl:Class`	A set of things	`Process`, `Function`, `Quantity`
Proper noun	Individual, instance	One member of a class	`RadioactiveDecay`, `Carbon14`
Verb	Property, predicate	The named edge between two things	`isModelledBy`, `hasHalfLife`
Verb joining two things	Object property	Individual to individual	`RadioactiveDecay isModelledBy ExponentialODE`
Adjective, attribute	Datatype property	Individual to a raw value	`Carbon14 hasHalfLife 5730`
Raw value	Literal	A string, number or date with a type	`"5730"^^xsd:decimal`
One sentence	Triple, statement	Subject plus predicate plus object	one edge in the graph
A rule about sentences	Axiom	A constraint or a definition	every process sits in exactly one subject
The name of anything	IRI	A globally unique web identifier	`http://example.org/sci#Carbon14`

Class

A set, not a thing

ex:Process is not any particular process. It is the set of all of them. Membership can be listed by hand, or computed by a reasoner from a condition.

Individual

One named member

ex:RadioactiveDecay is a single thing in the world that the graph talks about. It gets an IRI so that anyone else on the web can point at the same thing.

Object property

The verb that joins

ex:isModelledBy connects a process to a mathematical object. Give it a domain and a range and the graph starts catching its own mistakes.

Datatype property

The verb that measures

ex:hasHalfLife connects a thing to a plain number. The value is a literal, so it is the end of the line. Nothing hangs off it.

Axiom

A rule, stated once

Not data. A statement about the shape of the world, such as: a process is never a mathematical object. Reasoners work by chasing the consequences of these.

IRI

A globally unique name

Two graphs built on different continents can agree they mean the same carbon 14 because they use the same identifier. This is the entire point of the web part of the semantic web.

T T-Box, the terminology

The schema. Classes, properties and axioms. Small, hand crafted, changes rarely. This is the ontology.

ex:Process is a class
ex:isModelledBy runs from a process to a mathematical object
a process belongs to exactly one subject

A A-Box, the assertions

The data. Individuals and the triples about them. Large, often machine generated, changes constantly. This is the bulk of the knowledge graph.

ex:RadioactiveDecay is a physical process
ex:RadioactiveDecay isModelledBy ex:ExponentialODE
ex:Carbon14 hasHalfLife 5730

So what is an ontology derived knowledge graph? It simply means the A-Box was built to conform to a T-Box that already existed, rather than a schema being reverse engineered out of messy data afterwards. The rulebook came first.

Each layer adds one new power

RDF, RDFS, OWL, SPARQL and SHACL are not competitors. They stack. Read from the bottom up and every layer answers a question the one below it could not.

Hover any layer to see a construct it introduces. Width shows dependency, not importance: everything above rests on the layer below.

RDF

The data model. Everything is a triple. There is no table, no document, no nesting. Serialised as Turtle, N-Triples, JSON-LD or RDF/XML, all the same graph underneath.

RDFS

Adds hierarchy: subClassOf, subPropertyOf, domain, range, label. Enough for a taxonomy, not enough for a theory.

OWL

Adds description logic. You can define a class by a condition instead of listing members, and a reasoner will then find facts nobody typed in.

SPARQL

The query language. If SQL matches rows against a schema, SPARQL matches subgraphs against a pattern.

SHACL

Quality control. OWL says what may be inferred, SHACL says what counts as a valid record. Different jobs, often confused.

IRIs

Names that work across organisations. Without them, merging two graphs is string matching and guesswork.

The class hierarchy of our example

Solid grey lines are rdfs:subClassOf. The dashed red line is an axiom you must write by hand: a process can never also be a mathematical object.

Why the dashed line matters. Nothing in RDF assumes two classes are separate. If you do not state disjointness, a reasoner is perfectly happy to believe that ex:BacterialGrowth and ex:ExponentialFunction might be the same thing. An ontology with no disjointness axioms infers almost nothing useful.

One equation, four subjects, one graph

The most useful cross subject fact available at grade 11 and 12 level is that a single differential equation quietly governs radioactive decay, first order chemical kinetics, capacitor discharge and bacterial population growth. In a syllabus these live in four separate books and nobody ever says they are the same thing. A knowledge graph exists precisely to make that connection explicit and queryable.

Isolate a subject

Every outer node points at the same centre with the same verb. Only the letters change: lambda, k, and r are the same constant wearing three uniforms.

Mathematics, the hub

\(\dfrac{dN}{dt} = kN\)

\(N(t) = N_0 e^{kt}\)

↑ isModelledBy ↑

Physics

Radioactive decay

\(N(t) = N_0 e^{-\lambda t}\)

Physics

Capacitor discharge

\(q(t) = q_0 e^{-t/RC}\)

Chemistry

First order reaction

\([A](t) = [A]_0 e^{-kt}\)

Biology

Bacterial growth

\(N(t) = N_0 e^{rt}\)

↓ relatedByFormula ↓

Shared consequence

\(t_{1/2} = \dfrac{\ln 2}{k}\)

↓ usedIn ↓

Spans three subjects

Radiocarbon dating

decay physics, carbon cycle, photosynthesis

λdecay constant, physics

krate constant, chemistry

rMalthusian parameter, biology

kthe same constant, in maths

Where biology leaves the pattern. Exponential growth is only true while resources are unlimited. Biology then moves to the logistic model, \(\dfrac{dN}{dt} = rN\left(1 - \dfrac{N}{K}\right)\), which the graph records as ex:BacterialGrowth ex:refinedBy ex:LogisticGrowth. Recording the limit of a model is as valuable as recording the model.

The same graph, written in Turtle

Turtle is the readable serialisation of RDF. A semicolon means "same subject again", a comma means "same subject and predicate again", and a full stop ends the statement. That is nearly the whole syntax.

ontology.ttl · the nouns

# prefixes: short names for long IRIs
@prefix ex:   <http://example.org/sci#> .
@prefix rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix owl:  <http://www.w3.org/2002/07/owl#> .
@prefix xsd:  <http://www.w3.org/2001/XMLSchema#> .

# ---------- CLASSES, the nouns ----------

ex:Entity               a owl:Class .

ex:Process              a owl:Class ; rdfs:subClassOf ex:Entity .
ex:PhysicalProcess      a owl:Class ; rdfs:subClassOf ex:Process .
ex:ChemicalProcess      a owl:Class ; rdfs:subClassOf ex:Process .
ex:BiologicalProcess    a owl:Class ; rdfs:subClassOf ex:Process .

ex:MathematicalObject   a owl:Class ; rdfs:subClassOf ex:Entity .
ex:Function             a owl:Class ; rdfs:subClassOf ex:MathematicalObject .
ex:DifferentialEquation a owl:Class ; rdfs:subClassOf ex:MathematicalObject .

ex:Quantity             a owl:Class ; rdfs:subClassOf ex:Entity .
ex:Unit                 a owl:Class ; rdfs:subClassOf ex:Entity .
ex:Subject              a owl:Class ; rdfs:subClassOf ex:Entity .

# A process is never a mathematical object. State it, do not assume it.
ex:Process owl:disjointWith ex:MathematicalObject .

# The three science branches are mutually exclusive
[] a owl:AllDisjointClasses ;
   owl:members ( ex:PhysicalProcess ex:ChemicalProcess ex:BiologicalProcess ) .

# The subjects themselves are a closed list
ex:Subject owl:equivalentClass [
    a owl:Class ;
    owl:oneOf ( ex:Mathematics ex:Physics ex:Chemistry ex:Biology )
] .

ontology.ttl · the verbs

# ---------- OBJECT PROPERTIES, verbs joining two things ----------

ex:isModelledBy a owl:ObjectProperty ;
    rdfs:label    "is modelled by" ;
    rdfs:domain   ex:Process ;
    rdfs:range    ex:MathematicalObject ;
    owl:inverseOf ex:models .

ex:hasSolution a owl:ObjectProperty ;
    rdfs:domain ex:DifferentialEquation ;
    rdfs:range  ex:Function .

ex:belongsToSubject a owl:ObjectProperty , owl:FunctionalProperty ;
    rdfs:comment "functional: at most one subject per process" ;
    rdfs:domain  ex:Process ;
    rdfs:range   ex:Subject .

ex:hasRateConstant a owl:ObjectProperty ;
    rdfs:domain ex:Process ;
    rdfs:range  ex:Quantity .

ex:hasPrerequisite a owl:ObjectProperty , owl:TransitiveProperty ;
    rdfs:comment "if A needs B and B needs C, then A needs C" ;
    rdfs:domain  ex:Entity ;
    rdfs:range   ex:Entity .

ex:isAnalogousTo a owl:ObjectProperty , owl:SymmetricProperty ;
    rdfs:domain ex:Process ;
    rdfs:range  ex:Process .

# ---------- DATATYPE PROPERTIES, the adjectives ----------

ex:hasHalfLife a owl:DatatypeProperty , owl:FunctionalProperty ;
    rdfs:domain ex:Process ;
    rdfs:range  xsd:decimal .

ex:hasFormula a owl:DatatypeProperty ;
    rdfs:range xsd:string .

# ---------- PROPERTY CHAIN ----------
# studies then isModelledBy, composed into one new verb

ex:needsMathTopic a owl:ObjectProperty ;
    owl:propertyChainAxiom ( ex:studies ex:isModelledBy ) .

ontology.ttl · membership computed, not listed

# This is the payoff of OWL over RDFS.
# Do not list the members. State the condition and let the
# reasoner work out for itself who qualifies.

ex:ExponentiallyGovernedProcess a owl:Class ;
    rdfs:label "exponentially governed process" ;
    owl:equivalentClass [
        a owl:Class ;
        owl:intersectionOf (
            ex:Process
            [ a owl:Restriction ;
              owl:onProperty ex:isModelledBy ;
              owl:hasValue   ex:ExponentialODE ]
        )
    ] .

# Every process must be modelled by something.
# someValuesFrom means "at least one, of this kind".

ex:Process rdfs:subClassOf [
    a owl:Restriction ;
    owl:onProperty     ex:isModelledBy ;
    owl:someValuesFrom ex:MathematicalObject
] .

# Nobody will ever type the line
#     ex:RadioactiveDecay a ex:ExponentiallyGovernedProcess .
# The reasoner derives it, along with the chemistry and the biology.

data.ttl · the individuals

# ---------- the maths hub ----------
ex:ExponentialODE a ex:DifferentialEquation ;
    rdfs:label         "first order linear ODE" ;
    ex:hasFormula      "dN/dt = k*N" ;
    ex:hasSolution     ex:ExponentialFunction ;
    ex:hasPrerequisite ex:NaturalLogarithm , ex:SeparationOfVariables ;
    ex:belongsToSubject ex:Mathematics .

ex:ExponentialFunction a ex:Function ;
    ex:hasFormula "N(t) = N0 * exp(k*t)" .

# ---------- physics ----------
ex:RadioactiveDecay a ex:PhysicalProcess ;
    rdfs:label          "radioactive decay" ;
    ex:belongsToSubject ex:Physics ;
    ex:isModelledBy     ex:ExponentialODE ;
    ex:hasRateConstant  ex:DecayConstant ;
    ex:hasFormula       "N(t) = N0 * exp(-lambda*t)" .

ex:Carbon14 a ex:Nuclide ;
    ex:undergoes   ex:RadioactiveDecay ;
    ex:hasHalfLife "5730"^^xsd:decimal .

ex:CapacitorDischarge a ex:PhysicalProcess ;
    ex:belongsToSubject ex:Physics ;
    ex:isModelledBy     ex:ExponentialODE ;
    ex:hasFormula       "q(t) = q0 * exp(-t/(R*C))" ;
    ex:isAnalogousTo    ex:RadioactiveDecay .

# ---------- chemistry ----------
ex:FirstOrderReaction a ex:ChemicalProcess ;
    rdfs:label          "first order reaction kinetics" ;
    ex:belongsToSubject ex:Chemistry ;
    ex:isModelledBy     ex:ExponentialODE ;
    ex:hasRateConstant  ex:RateConstantK ;
    ex:hasFormula       "[A](t) = [A]0 * exp(-k*t)" ;
    ex:exampleReaction  ex:DecompositionOfN2O5 .

# ---------- biology ----------
ex:BacterialGrowth a ex:BiologicalProcess ;
    rdfs:label          "exponential population growth" ;
    ex:belongsToSubject ex:Biology ;
    ex:isModelledBy     ex:ExponentialODE ;
    ex:hasRateConstant  ex:MalthusianParameter ;
    ex:hasFormula       "N(t) = N0 * exp(r*t)" ;
    ex:refinedBy        ex:LogisticGrowth .

ex:LogisticGrowth a ex:BiologicalProcess ;
    ex:belongsToSubject ex:Biology ;
    ex:isModelledBy     ex:LogisticODE ;
    ex:hasFormula       "dN/dt = r*N*(1 - N/K)" ;
    ex:hasPrerequisite  ex:ExponentialODE .

# ---------- the cross subject application ----------
ex:RadiocarbonDating a ex:PhysicalProcess ;
    ex:belongsToSubject ex:Physics ;
    ex:isModelledBy     ex:ExponentialODE ;
    ex:hasPrerequisite  ex:RadioactiveDecay , ex:CarbonCycle , ex:Photosynthesis .

Semicolon

Repeat the subject. Start a new predicate.

Comma

Repeat the subject and the predicate. Add another object.

Square brackets

A blank node. An anonymous thing that needs no name of its own, used for restrictions.

The facts nobody typed in

This is the reason ontologies exist. Load the two Turtle files into a reasoner such as HermiT, Pellet or ELK, and it will assert triples that appear nowhere in the source. Press the button and watch what the graph knows that its authors never wrote.

asserted What was typed in

ex:RadioactiveDecayaex:PhysicalProcess

ex:RadioactiveDecayex:isModelledByex:ExponentialODE

ex:FirstOrderReactionex:isModelledByex:ExponentialODE

ex:BacterialGrowthex:isModelledByex:ExponentialODE

ex:LogisticGrowthex:hasPrerequisiteex:ExponentialODE

ex:ExponentialODEex:hasPrerequisiteex:NaturalLogarithm

ex:RadiocarbonDatingex:hasPrerequisiteex:RadioactiveDecay

7 asserted
9 derived

derived What the machine worked out

ex:RadioactiveDecayaex:ExponentiallyGovernedProcesssatisfies the hasValue restriction on isModelledBy

ex:FirstOrderReactionaex:ExponentiallyGovernedProcesssame defined class, different subject

ex:BacterialGrowthaex:ExponentiallyGovernedProcessthe cross subject family is now a real class

ex:ExponentialODEex:modelsex:RadioactiveDecayowl:inverseOf, the reverse edge is free

ex:LogisticGrowthex:hasPrerequisiteex:NaturalLogarithmtransitivity, two hops collapsed into one

ex:RadiocarbonDatingex:hasPrerequisiteex:ExponentialODEtransitivity again, through radioactive decay

ex:RadioactiveDecayaex:Process, ex:Entitythe subClassOf chain, walked upward

ex:RadioactiveDecayex:isAnalogousToex:CapacitorDischargeowl:SymmetricProperty, stated once in the other direction

ex:Carbon14is notex:Functiondisjointness, the reasoner can now rule things out

Transitivity, the hardest one to see in text

Nobody stated that radiocarbon dating needs logarithms. Declaring hasPrerequisite transitive was enough.

Why a curriculum planner would pay for this. The property chain studies then isModelledBy produces needsMathTopic. A school can now ask which maths must be taught before a given chemistry chapter, without anyone ever having written a rule that mentions chemistry and maths in the same sentence.

OWL constructs worth knowing

Construct	What it does	Where it appears above
`rdfs:subClassOf`	Taxonomy. Every member of the child is a member of the parent	PhysicalProcess under Process
`owl:equivalentClass`	A defined class. Membership is computed, not listed	ExponentiallyGovernedProcess
`owl:disjointWith`	Membership in one forbids membership in the other	Process against MathematicalObject
`owl:inverseOf`	Generates the reverse edge automatically	isModelledBy and models
`owl:TransitiveProperty`	A to B and B to C gives A to C	hasPrerequisite
`owl:SymmetricProperty`	A to B gives B to A	isAnalogousTo
`owl:FunctionalProperty`	At most one value allowed	belongsToSubject, hasHalfLife
`owl:InverseFunctionalProperty`	Acts as a primary key	an ISBN, a student roll number
`owl:Restriction` with `someValuesFrom`	Must relate to at least one thing of that kind	every process has some model
`owl:Restriction` with `allValuesFrom`	If it relates at all, only to that kind	tighten a range locally
`owl:Restriction` with `hasValue`	Must relate to this exact individual	the ExponentiallyGovernedProcess definition
`owl:propertyChainAxiom`	Composes two verbs into a third	studies plus isModelledBy gives needsMathTopic
`owl:oneOf`	An enumerated class, a closed list	the four school subjects
`owl:sameAs`, `owl:differentFrom`	Identity management across graphs	linking your IRIs to Wikidata

Profiles, because full logic is expensive

OWL EL

Huge taxonomies

Polynomial time. This is what SNOMED CT and the biomedical ontologies run on.

OWL QL

Rewrites into SQL

Designed so queries can be answered by an existing relational database underneath.

OWL RL

Rule engine friendly

Scales to very large graphs by giving up some expressivity.

OWL DL

Decidable and expressive

The description logic known as SROIQ(D). The default choice for a hand built ontology.

OWL Full

Undecidable

No reasoner can guarantee an answer. Avoid unless you enjoy suffering.

How to choose

Start in OWL DL. Drop to EL or RL only when the reasoner gets too slow on real data.

The query that pays for itself

In plain English: find every pair of topics from different subjects that are secretly the same mathematics. Nobody wrote that pairing anywhere. It falls out of the shape of the graph.

crosscutting.rq

PREFIX ex:   <http://example.org/sci#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT ?modelLabel ?topicA ?subjectA ?topicB ?subjectB
WHERE {
  ?topicA ex:isModelledBy     ?model ;
          ex:belongsToSubject ?subjectA .

  ?topicB ex:isModelledBy     ?model ;
          ex:belongsToSubject ?subjectB .

  ?model rdfs:label ?modelLabel .

  FILTER ( ?subjectA != ?subjectB )
  FILTER ( STR(?topicA) < STR(?topicB) )
}
ORDER BY ?modelLabel ?subjectA

Shared model	Topic A	Topic B
first order linear ODE	physics RadioactiveDecay	chemistry FirstOrderReaction
first order linear ODE	physics RadioactiveDecay	biology BacterialGrowth
first order linear ODE	chemistry FirstOrderReaction	biology BacterialGrowth
first order linear ODE	physics CapacitorDischarge	chemistry FirstOrderReaction
first order linear ODE	physics CapacitorDischarge	biology BacterialGrowth

You have just auto generated an interdisciplinary syllabus. The second FILTER stops each pair appearing twice in mirror image. The first stops a topic pairing with itself inside its own subject.

SELECT

Returns a table of bindings, like SQL. There is also CONSTRUCT for building a new graph, ASK for a yes or no, and DESCRIBE for everything known about a resource.

The WHERE block is a graph

It is written in Turtle with variables in place of some terms. The engine finds every subgraph that matches the shape.

Federation

A SERVICE clause can send part of one query to Wikidata and join the answer with your local data, live.

OWL adds facts, SHACL rejects records

This single distinction causes more confusion than anything else in the field. Beginners write an OWL cardinality axiom expecting an error message, and are baffled when the reasoner quietly infers something instead.

+ OWL is generative

You say a process must have at least one model. You then load a process with no model stated. OWL concludes: it must have one, we just do not know which. No error. A new anonymous fact instead.

Use OWL when you want the machine to know more.

! SHACL is restrictive

You say a process node must carry at least one isModelledBy triple. You then load a process with none. SHACL reports: violation, on this node, on this path, with your message attached.

Use SHACL when you want the pipeline to refuse bad data.

shapes.ttl

@prefix sh: <http://www.w3.org/ns/shacl#> .

ex:ProcessShape a sh:NodeShape ;
    sh:targetClass ex:Process ;

    sh:property [
        sh:path     ex:belongsToSubject ;
        sh:minCount 1 ;
        sh:maxCount 1 ;
        sh:message  "Every process must sit in exactly one subject."
    ] ;

    sh:property [
        sh:path     ex:isModelledBy ;
        sh:minCount 1 ;
        sh:class    ex:MathematicalObject ;
        sh:message  "Every process needs at least one mathematical model."
    ] ;

    sh:property [
        sh:path         ex:hasHalfLife ;
        sh:datatype     xsd:decimal ;
        sh:minExclusive 0 ;
        sh:message      "Half life must be a positive number."
    ] .

✓ Accepted

ex:FirstOrderReaction a ex:ChemicalProcess ;
    ex:belongsToSubject ex:Chemistry ;
    ex:isModelledBy     ex:ExponentialODE .

One subject, one model of the right class. Conforms.

✗ Rejected

ex:MysteryProcess a ex:ChemicalProcess ;
    ex:belongsToSubject ex:Chemistry , ex:Physics ;
    ex:hasHalfLife      "-40"^^xsd:decimal .

two subjects, maxCount is 1
no isModelledBy triple at all
a negative half life

The rule of thumb. If the answer to a broken record should be a helpful error message for a human, that is SHACL. If the answer should be a new triple in the graph, that is OWL. Most real systems run both, on the same data, for different reasons.

Two rules every database mind misreads

RDF and OWL were designed for the open web, where no single system ever holds all the facts. That design choice has consequences you will meet on day one.

Open world assumption

The graph is assumed incomplete. If it does not say that logistic growth is modelled by the exponential ODE, that does not make it false. It makes it unknown. A relational database works the opposite way: not in the table means false.

To make a reasoner conclude that a list is complete, you must say so out loud with owl:disjointWith, owl:AllDisjointClasses or owl:oneOf. That is exactly why the axioms earlier were written by hand.

No unique name assumption

Two different IRIs may denote the same thing unless you say otherwise. ex:DecayConstant and ex:RateConstantK could turn out to be one entity. Nothing forbids it.

Control it with owl:sameAs to merge and owl:differentFrom or owl:AllDifferent to keep apart. owl:sameAs is how a local graph gets stitched to Wikidata, and it is the technical heart of linked open data.

One fact, three possible states

Is ex:LogisticGrowth ex:isModelledBy ex:ExponentialODE true?

✓stated true

✗stated false

?not stated, so unknown

A closed world database has only two of these. Under a closed world, the third column silently collapses into the second, and every gap in your data becomes a confident lie. Under an open world, a gap stays a gap. That is safer for integration and more annoying for validation, which is precisely why SHACL exists as a separate closed world layer on top.

Not every knowledge graph is an RDF graph

The phrase covers two families of technology with genuinely different strengths. Knowing which one someone means saves a lot of confusion in a job interview.

	RDF and OWL	Labelled property graph, such as Neo4j
Unit of storage	a triple: subject, predicate, object	nodes and edges, each with a property map
The verb	a predicate IRI	a relationship type string
Attributes on an edge	awkward, needs reification or RDF-star	native and easy
Query language	SPARQL	Cypher, Gremlin, GQL
Global identifiers	built in, IRIs work across organisations	local identifiers only
Formal semantics and inference	strong and standardised	none by default
Best at	data integration, standards, sharing across institutions	fast traversal, analytics, recommendations, fraud rings

RDF-star, a statement about a statement

This is how RDF closed the edge attribute gap. It matters scientifically here, because the biology claim is only conditionally true.

provenance.ttl

<< ex:BacterialGrowth ex:isModelledBy ex:ExponentialODE >>
    ex:assertedBy    ex:BiologyTextbookClass12 ;
    ex:confidence    "0.98"^^xsd:decimal ;
    ex:validOnlyWhen "resources are unlimited" .

Why it matters here

Models have limits

The last line is the reason biology moves on to the logistic equation. A graph that cannot record the boundary of a model will eventually mislead someone.

The old way: reification

Four clumsy triples per statement, describing the statement as a resource. It works, and nobody enjoys it.

The other way: named graphs

Add a fourth element to every triple saying which sub graph it lives in. This is what most triple stores actually do in production.

Questions first, classes much later

The order below is not decorative. Doing step four before step one is the single most common cause of an abandoned ontology project.

Write competency questions

Not classes. Questions. Which physics topics use maths that has not been taught yet? Which chemistry and biology chapters share an equation? If a proposed class does not help answer one of these, delete it.

Reuse before you invent

Check Linked Open Vocabularies, the OBO Foundry and schema.org. Someone has already modelled units, people, provenance and time, and modelled them better than a first attempt will.

Pick an upper ontology

BFO, DOLCE or SUMO, if the graph will ever be merged with someone else's. Our ex:Process class is a BFO occurrent, and saying so makes future alignment cheap.

Model nouns as classes, verbs as properties

Resist inventing a class called Modelling. Promote a verb to a class only when you genuinely need to hang attributes off the relationship itself, which is the n-ary relation design pattern.

Add disjointness axioms early

Under the open world assumption, an ontology without disjointness infers almost nothing. This is the step everybody skips and everybody regrets.

Run the reasoner constantly

If it infers nonsense, or if every class collapses into every other class, your axioms are wrong. Treat an unsatisfiable class the way you treat a failing test.

Write SHACL shapes separately

Keep validation out of the ontology. The ontology describes the world. The shapes describe what your particular data pipeline will accept.

Mint stable IRIs and version them

Never reuse an IRI for a different meaning. Deprecate instead. Somebody somewhere has already linked to it.

Mistakes that show up in almost every first ontology

Mistake	What to do instead
Confusing a class with an individual	Is `Carbon14` a class of atoms, or one nuclide type? Decide deliberately. OWL punning allows both, but only on purpose
Using `subClassOf` to mean part of	Subclass means every member is also a member. A nucleus is not a subclass of an atom, it is `partOf` one
Expecting OWL to reject bad data	It will not. It infers. Put the rejection logic in SHACL
Forgetting the open world assumption	Missing data is unknown, never false. Say what is closed, explicitly
Over modelling	A hundred classes nobody queries is worse than ten that answer real questions
Bare numbers with no units	`5730` is meaningless on its own. Attach a unit IRI from QUDT or UCUM, or someone will compare years with seconds
Building the graph before writing the questions	Go back to step one. Everything else is cheaper afterwards

The wider ecosystem

Vocabularies to reuse

Do not reinvent

SKOS for loose thesauri, Dublin Core for metadata, schema.org for search engines, FOAF for people, PROV-O for provenance, QUDT and UCUM for units.

Real science ontologies

Already built

Gene Ontology, ChEBI for chemical entities, Protein Ontology, UBERON for anatomy, all curated under the OBO Foundry.

Tools

What people actually use

Protege to edit, HermiT or ELK or RDFox to reason, Jena Fuseki or GraphDB or Stardog to store, rdflib and owlready2 in Python.

Where this meets AI

Embeddings and GraphRAG

Graph embeddings such as TransE and ComplEx predict missing links statistically. GraphRAG lets a language model traverse explicit relationships instead of guessing them.

A note on how this page was scoped. Most projects that call themselves ontologies actually need SKOS: a thesaurus with broader and narrower terms, and nothing more. Reach for OWL when you specifically want a machine to draw conclusions, as in the reasoning section above. If you never plan to run a reasoner, you are paying for logic you will not use.