In Part I, we argued that human intelligence is not merely the ability to solve problems within a fixed repertoire of concepts, but the ability to invent new concepts and expand its own representational universe. The challenge for AI is therefore not to replicate intelligent behavior, but to characterize the cognitive principles that give rise to it.

‍
Several mathematical theories already capture fragments of these principles. For example, Algorithmic Information Theory (AIT) describes abstraction as compression. Free-energy minimization describes the reduction of prediction error. Structure-Mapping Theory (SMT) describes analogical semantics, and category theory offers a language for structural correspondence. Yet these theories remain largely disconnected: each captures one facet of cognition, but none offers a unified picture.
‍

This essay proposes a way to read them as facets of a more general process. It is important to emphasize first what kind of proposal this is, because it is easy to overclaim. What follows is not a finished formal theory of intelligence, and not a set of equations that can be directly implemented. It is a schema: a formal vocabulary that specifies what kinds of objects, transformations, goals, and discrepancy measures a theory of cognition may require. The thesis is that while intelligence can be ultimately characterized as an optimization problem, it is best understood not as optimization within a fixed representational space, but as the iterative transformation and expansion of conceptual structure itself, driven by the reduction of structured discrepancy under cognitive priors.

‍
1. Recap: The Cognitive Foundations

‍
In Part I, we argued that cognitive activity can be understood as the reduction of discrepancy between a current state of understanding and a more coherent, explanatory, or goal-satisfying state. A small set of primitive operators constantly recurs across reasoning, scientific discovery, engineering, and concept invention:

Abstraction extracts invariants from experience and turns them into new concepts.

Association connects related concepts and builds relational structures

Analogy transfers structure between domains.

Composition combines concepts into larger conceptual or functional systems.

These operators are not arbitrary. They are guided by deeper cognitive priors: intuitions about space, time, and causality, and aesthetic regularization toward symmetry, simplicity, order, and regularity. The priors determine which transformations within the conceptual states can be considered as meaningful and plausible. Without them, the search over possible representations would be intractably large.

‍
The origins of these priors lie in millions of years of evolution. Although they may never be perfectly formalized or computable (see previous articles), their role is central in the emergence of intelligent behaviors. Any system that aspires to open-ended intelligence must do more than manipulate representations. It must also be regulated by priors that make the transformation and creation of representations efficient, coherent, and meaningful.

‍
2. A Formal Schema for the Dynamics of Cognition

‍
We represent cognition as a trajectory through an evolving space of representations.

‍
At time \(t\), the system occupies a conceptual state \(R_t\), which represents its current structure of concepts, relations, and beliefs. Crucially, \(R_t\) is not a point in a fixed space. It is an object in a category \(\mathcal{C}\) of representations whose collection of objects can itself grow. The state evolves through cognitive operators:

\[
R_{t+1} = T_t(R_t)
\]
where each \(T_t\) is a transformation acting on the current representation. In general, two types of transformations are permitted. The first kind consists of intra-space transformations. These include search, inference, belief updating, planning, and state transitions within an already available representational system. The second kind consists of generative transformations. These do not merely move within the current space. They expand or restructure the space itself by introducing new concepts, relations, dimensions, abstractions, or operators. This distinction is the whole point: most existing AI systems are confined to the first type of transformation, whereas human cognition routinely makes generative ones.

‍
We can then describe intelligence as the search for a sequence of transformations that reduces discrepancy between the conceptual state and a desired state:
\[
C = \min_{\{T_t\}} \sum_t D(R_t, G_t)
\]
with \(R_t\) as the current representation, \(G_t\) as a target, \(D\) as a discrepancy functional, and \(T_t\) as a cognitive transformation. More generally, \(D\) can be understood as a map from representational states and target states to a non-negative value, such as \(D: \mathcal{C} \times \mathcal{C} \to \mathbb{R}_{\geq 0}\), although in open-ended cognition this mapping may itself evolve. When the operators are generative, \(R_t\) and \(R_{t+1}\) may inhabit different representational spaces, and \(D\) is not a single fixed metric. This expression should not be mistaken for a directly computable global objective. In open-ended cognition, the space itself changes, the available transformations change, and the target may not be fully specified in advance. Therefore, \(D\) is not always a fixed metric over a fixed space. The minimization is local, adaptive, and co-evolving with the representational structure.

‍
The meaning of \(G\) is context-dependent, determined by the nature of the task. For example:

Science: \(G\) is an explanation that accounts for observations.

Engineering: \(G\) is a desired future state of the world.

Concept invention: \(G\) is a representation capable of organizing previously unorganized phenomena.

Keep in mind that \(G_t\) is rarely a fully specified target. In concept invention, the desired concept is even unknown in advance, and the system is just seeking abstract representations that parsimoniously summarize raw observations. In other words, \(G_t\) can be treated as a “family of acceptable states” rather than a single destination. Despite these differences, the structure remains the same: intelligent behavior continually seeks transformations that reduce discrepancy between the current and desired mental states.

‍
3. Cognitive Operators as Transformations of Conceptual Structure

‍
If we model the conceptual state as an evolving graph, in which nodes and edges stand for concepts and relations at varying levels of abstraction, then different cognitive operators can be treated as transformations within this conceptual structure.

‍
What matters is that this structure is non-stationary. The node set evolves, the edge set evolves, the hierarchy evolves. Concept invention, under this view, is not a traversal of a fixed graph but to expansion of the graph itself. This distinction is central to open-ended intelligence. In a closed system, problem solving is primarily graph traversal: the system explores a pre-given state space using fixed representations, fixed actions, and fixed evaluators. In an open-ended system, It is expansion or restructuring of the graph itself.
Under this view, cognitive operators can be described as follows:

‍
Association: Establishes new connections among concepts and creates new edges \((v_i, v_j)\). This process enables “imagination” in problem solving, which higher-level reasoning may depend on.

Abstraction: Identifies common structure across concepts and creates a higher-order node:

\[
\{v_1, v_2, \ldots, v_n\} \rightarrow v_{\text{new}}
\]
For example, \(\{\text{three apples}, \text{five stones}, \text{two birds}, \ldots\} \rightarrow \text{number}\). The concept of “number” is not merely another object in the original set. It is a higher-order abstraction that reorganizes the original experiences under a new invariant. In this schema, concept invention is the outcome of abstraction when abstraction creates a representational object that did not previously exist. This is the canonical space-expansion move.

Analogy: Establishes a structure-preserving mapping between regions, \(f: G_1 \rightarrow G_2\). This is the process described by Structure-Mapping Theory. Analogy does not merely match surface features. It maps relational structure. It allows knowledge acquired in one domain to be transferred to another. For example, reasoning about electrical circuits through analogies to water flow works not because electrons and water are superficially identical, but because certain relational structures can be mapped between the two domains.

Composition: Combines concepts into larger structures:

\[
(v_1, v_2, \ldots, v_n) \rightarrow V_{\text{composite}}
\]
This is the operator behind many engineered solutions, mathematical constructions, and conceptual syntheses. A smartphone, for example, is not merely a sum of screen, battery, processor, camera, and software. It is a composed functional object whose parts acquire new roles within the whole. Composition is naturally expressed as a morphism into a product, composite, or higher-order structure.
‍

4. Existing Frameworks as Local Instances

‍
Several existing theoretical frameworks already capture important aspects of these cognitive priors, but they were largely developed independently, with each emphasizing a different facet of cognition. Free-energy minimization formalizes prediction and belief updating. MDL and AIT formalize compression and abstraction. SMT formalizes analogy, and reinforcement learning formalizes action selection under goals and rewards. Yet cognition does not appear to operate as a set of disconnected modules. The purpose of the schema above is therefore to interpret these frameworks as special cases of a more general cognitive dynamics, where each corresponds to a particular choice of representation R, transformation T, target G, and discrepancy measure D:

Framework	Representation \(R\)	Transformation \(T\)	Target \(G\)	Discrepancy \(D\)
Free-energy minimization	Generative model	Belief update	Accurate prediction	Variational free energy
Bayesian inference	Belief distribution	Bayesian update	Posterior consistency	KL divergence
MDL	Representation or code	Compression	Minimal description	Description length
AIT	Program	Program search	Generating program	Kolmogorov complexity
SMT	Concept graph	Structural mapping	Domain correspondence	Structural mismatch
Reinforcement learning	Policy / world model	Action selection	Goal state	Reward gap

‍

This table is not meant to claim that all these frameworks are already unified. They are not. Rather, it suggests that they may be viewed as local regimes of a broader representational dynamics. Existing frameworks are often modular and disconnected, it is difficult to imagine a genuinely intelligent system operating by explicitly switching among isolated theories: first running algorithmic information theory to generate abstractions, then switching to structure-mapping theory to search for analogies, then running several iterations of free-energy minimization to update its beliefs, and finally invoking reinforcement learning to select an action. Such a system would be theoretically fragmented. It would lack a unified account of how these processes interact continuously within a single cognitive architecture.

‍
A more satisfying intelligence framework should dissolve these boundaries. It should not treat compression, prediction, analogy, abstraction, and action as separate modules governed by separate objectives. Instead, it should explain them as different operators within a shared representational space, guided by a common optimization principle described above. Under such a framework, intelligence would not be modeled as a sequence of disconnected algorithms, but as a continuous process of transforming and expanding conceptual structure.

‍
The ultimate goal, therefore, is not merely to place existing theories side by side under a loose analogy. The goal is to identify a deeper formalism in which these theories emerge as limiting cases, local approximations, or specialized regimes. Such a framework would allow intelligence to be modeled as a seamless process: one in which the same representational dynamics can support prediction, compression, analogy, concept invention, and goal-directed action. This, I believe, is one direction future AI paradigms may need to explore.

‍
5. Next-Token Prediction as a Special Case
‍

The standard training objective of LLMs, next-token prediction, also naturally fits into this general schema. In this case, the representational state R is the model’s internal parameterized representation of the context; the transformation T is the forward computation and subsequent parameter update during training. The target G is the ground-truth next token, and the discrepancy measure D is typically cross-entropy loss.

‍
Under this interpretation, next-token prediction is a highly specific instance of representational optimization. It defines a fine-grained discrepancy measure against a local, pointwise target: given a sequence of tokens, predict the next token as accurately as possible. This objective has already proven extraordinarily powerful: As long as the massive corpora contains traces of human language, reasoning, explanation, analogy, planning, and problem solving, LLMs can acquire many cognitive-like capabilities implicitly.

‍
However, this objective also has an important limitation. Next-token prediction does not directly specify higher-level cognitive primitives, hence these capabilities are not directly incentivized and are acquired only insofar as they help next-token prediction on the training distribution. As a result, they become less reliable in low-coverage or zero-coverage regimes, precisely where genuine discovery often occurs.

‍
What is more, there is also an inefficiency problem. If cognitive primitives are learned only from surface token sequences, then the model must infer deep mechanisms from countless concrete examples. It has to observe many manifestations of abstraction before it learns something resembling abstraction, many examples of analogy before it learns analogical transfer, many instances of scientific reasoning before it approximates scientific discovery. This is a useful but indirect route, it treats cognition as something to be reconstructed from linguistic appearance rather than something to be modeled at the structural level. Therefore, there should still be sufficient room for improving the efficiency of scaling. Larger models and larger datasets can improve performance, but they do not by themselves change the objective being optimized. The next frontier may therefore require objectives that operate at deeper representational levels, which introduces objectives that directly reward abstraction quality, structural correspondence, causal coherence, conceptual compression, hypothesis generation, and expansion of the representational space, aligned with the general optimization framework described above.

‍
6. The Ultimate Limit: Cognitive Priors

‍
So far, the schema includes four major ingredients: \((R, T, G, D)\) indicating representations, transformations, goals, and discrepancy measures. But something deeper is still missing. The operators \(T\) do not arise in a vacuum. They are enabled, constrained, and shaped by the deepest cognitive priors: tendencies toward order, simplicity, regularity, continuity, symmetry, causality, spatial coherence, and temporal structure.

These priors determine which transformations are cognitively natural, which abstractions are relevant, and which analogies are meaningful. They are what make some representations feel simpler than others, some patterns feel regular, and some conceptual moves feel more promising. However, we do not yet have a satisfactory framework for specifying these priors at the most foundational level.

This connects to the earlier discussion of missing axioms and computability. At the deepest layer of human cognition, there may be something that cannot be fully captured by an explicit program. A program is itself a product of the mind’s representational and symbolic capacities, it may therefore be impossible for a program, within the same representational closure, to completely formalize the pre-symbolic intuitions that make representation possible in the first place.

Consider a few simple examples. To invent or understand a word like “gigantic”, one needs a primitive sense of spatial magnitude and comparison. To understand “therefore”, one needs a primitive sense of implication, consequence, or causality. To define concepts such as “negativity”, one needs some prior intuition of opposition, inversion, or symmetry. These are not labels attached to some arbitrary patterns, instead they presuppose deeper organizing intuitions about how experience should be compared, ordered, organized, and transformed.

From this perspective, the deepest cognitive priors may be definable but not fully computable. They may be characterizable as ideal constraints on cognition, but not fully implementable as a terminable algorithm. This is not necessarily a fatal admission, as uncomputable does not mean informal or meaningless. AIXI, for example, is uncomputable, but it is mathematically precise. Kolmogorov complexity is uncomputable in general, yet it gives a rigorous idealization of simplicity. In the same way, a mature theory of open-ended cognition may be a formal ideal rather than a directly executable algorithm. Practical AI systems would then approximate this ideal through computable heuristics, architectures, and training objectives.

‍
7. Relation to Prior Work

‍
This proposal has several important precursors. One main difference from many of them is that it treats intelligence not only as optimization within a representational space, but as the ability to expand and reorganize that space.

‍
Conceptual Spaces (Peter Gärdenfors)
In terms of viewing cognitive operators as transformations in a conceptual topological space, Gärdenfors’ work is perhaps the closest. In his work, concepts are modeled as regions in geometric spaces defined by quality dimensions, bridging symbolic and connectionist views. A key difference is that conceptual spaces are typically defined over a relatively fixed geometry, whereas the present framework treats the conceptual structure itself as evolving. New concepts do not merely occupy new regions, they may introduce new dimensions, relations, and new levels of abstraction.

‍
Universal Intelligence (Shane Legg and Markus Hutter)
Universal intelligence formalizes intelligence as expected reward maximization across environments. Its strength is generality, with the limitation that it assumes a fixed hypothesis or program space. The agent searches within a universal space, but it does not explicitly model the invention of new representations, abstractions, or operators.

‍
Free Energy Principle (Karl Friston)
The free energy principle can be seen as a powerful special case of the broader schema: cognition as discrepancy reduction between prediction and observation. But free-energy minimization mainly explains how a system updates beliefs within a generative model. It does not, by itself, specify how entirely new conceptual primitives, analogies, or representational dimensions are generated. In this framework, free-energy minimization is one operator within a larger process of conceptual transformation.

‍
Compression-Progress Theories (Jürgen Schmidhuber)
Schmidhuber’s theory of creativity treats novelty as progress in compression: discoveries are valuable when they make the world more compactly representable. This is closely aligned with the abstraction operator. The difference is that compression explains why a new concept is useful, but it does not account for how a system generates a primitive that was previously inexpressible in its representation space. Invention requires not only shorter descriptions, but a drive that creates such descriptions.

‍
On the Measure of Intelligence (François Chollet)
Chollet’s measure of intelligence is relevant because it shifts attention away from task performance alone and toward skill-acquisition efficiency, abstraction, and generalization under limited experience. The schema proposed here can be considered as complementary to the abstract intelligence measure at mechanism level. It asks what kind of representational structure and what kinds of operators must exist for abstraction-driven intelligence to be possible.

‍
In short, prior work has formalized conceptual geometry, reward maximization, discrepancy reduction, compression progress, and abstraction efficiency. This proposal attempts to connect them through a unified framework of minimizing cognitive gaps with a sequence of operations, with the representational space expansion itself as a critical requirement.

‍
8. Toward Open-Ended AI

‍
Recent work in AI-for-science, autonomous research agents, and AI-assisted knowledge creation aims to make AI systems more capable of generating hypotheses, designing experiments, and accelerating discovery. Much of this work follows a similar pattern: an LLM is wrapped in an agent harness, search procedure, evolutionary loop, or self-improvement cycle, with the hope that repeated iteration will eventually produce novel ideas.

‍
But the central limitation of current systems is not that they search too little, it is that they search within largely fixed representational spaces. They can explore, recombine, rank, and refine possibilities already expressible within their existing conceptual vocabulary. What they lack is a principled mechanism for changing the vocabulary itself.

‍
Number, probability, entropy, calculus, spacetime… none of these concepts was found by traversing an existing conceptual graph. Each emerged through a transformation that expanded the conceptual space. If intelligence is fundamentally the transformation and expansion of conceptual structure to reduce discrepancy under priors, then future systems will need explicit mechanisms for abstraction, analogy, concept creation, and representation space expansion that go beyond better prediction within a fixed space.

‍
The next frontier, therefore, may not lie solely in larger models or further scaling. It may require a revision of model-building methodology: objectives and architectures that directly incentivize cognitive primitives rather than relying on them to emerge indirectly from next-token prediction. We may need systems that can create new latent variables, new abstractions, new analogical mappings, and new spaces of possible actions. The goal is not simply to make models search more efficiently, but to give them operators that can create new searchable spaces.

‍

The Quest for Open-Ended Intelligence: Part II - Toward A Formal Schema for the Dynamics of Cognition

June 6, 2026

‍
1. Recap: The Cognitive Foundations

‍
2. A Formal Schema for the Dynamics of Cognition

‍
3. Cognitive Operators as Transformations of Conceptual Structure

4. Existing Frameworks as Local Instances

‍
5. Next-Token Prediction as a Special Case
‍

‍
6. The Ultimate Limit: Cognitive Priors

‍
7. Relation to Prior Work

‍
8. Toward Open-Ended AI

The Quest for Open-Ended Intelligence: Part II - Toward A Formal Schema for the Dynamics of Cognition

June 6, 2026

‍1. Recap: The Cognitive Foundations

‍2. A Formal Schema for the Dynamics of Cognition

‍3. Cognitive Operators as Transformations of Conceptual Structure

4. Existing Frameworks as Local Instances

‍5. Next-Token Prediction as a Special Case‍

‍6. The Ultimate Limit: Cognitive Priors

‍7. Relation to Prior Work

‍8. Toward Open-Ended AI

‍
1. Recap: The Cognitive Foundations

‍
2. A Formal Schema for the Dynamics of Cognition

‍
3. Cognitive Operators as Transformations of Conceptual Structure

‍
5. Next-Token Prediction as a Special Case
‍

‍
6. The Ultimate Limit: Cognitive Priors

‍
7. Relation to Prior Work

‍
8. Toward Open-Ended AI