Database theory has always lived with a tension. On one side stands the relational model, which gave data management a mathematical foundation: sets, relations, first-order logic, and algebraic query languages. On the other side stands the relentless pressure of practice: new kinds of data, larger scales, faster arrival rates, and looser structures that the original formalism was never designed to handle. The history of the subfield is not a story of one framework replacing another, but of a branching conversation in which each new theoretical framework extends, narrows, competes with, or coexists alongside the relational core.
The relational model, introduced by Edgar Codd in 1970, was the first framework to treat a database as a set of relations (tables) and to define query capabilities through relational algebra and relational calculus. Its distinctive commitment was to a declarative, set-oriented approach: users specify what data they want, not how to navigate to it. This was a sharp break from the earlier navigational models (hierarchical and network), which required programmers to follow explicit pointers. The relational model did not immediately replace those older frameworks; it coexisted with them for years, but it provided the theoretical anchor for nearly everything that followed in database theory.
Dependency theory emerged directly from the relational model's formalism. It studies constraints on relations—functional dependencies, multivalued dependencies, join dependencies—that capture real-world integrity rules. The core question is: given a set of dependencies, what is the best way to decompose a relation without losing information? Dependency theory gave database designers a precise language for normalization, the process of eliminating redundancy. It remains an active area, especially in the study of data quality and schema design.
Query optimization theory is the other direct theoretical extension of the relational model. Once queries are expressed declaratively, the system must choose an execution plan. The framework formalizes the space of possible plans, the cost models that compare them, and the algebraic equivalences that allow a query to be rewritten into a cheaper form. Without this theory, relational databases would be unusably slow. Query optimization theory and dependency theory are siblings: both take the relational model's mathematical structure as given and ask how to make it practical.
Concurrency control theory addresses a different pressure: what happens when multiple users read and write the same data at the same time? The framework defines correctness criteria (serializability, conflict serializability, view serializability) and the protocols (locking, timestamp ordering, optimistic concurrency) that guarantee them. It is not tied exclusively to the relational model—the same concepts apply to any data model—but it was developed in the context of relational systems and remains the standard theoretical tool for reasoning about concurrent access.
Deductive databases took the relational model's connection to logic in a different direction. Instead of treating a database as a set of stored facts queried by relational algebra, deductive databases treat it as a set of logical rules from which new facts can be inferred. The framework is built on Datalog, a logic-programming language that extends relational queries with recursion. Deductive databases never replaced mainstream relational systems, but they introduced a theoretical perspective—querying as logical inference—that later influenced graph query languages and knowledge-base systems.
By the 1990s, the relational model's "flat table" assumption was under pressure from applications that needed richer structures. Three frameworks responded in different ways.
The object-oriented data model borrowed ideas from object-oriented programming: objects with identity, complex nested structures, methods, and inheritance. It aimed to eliminate the "impedance mismatch" between the relational model's tabular representation and the object graphs used in code. The framework never achieved the dominance of relational systems, but it influenced later object-relational hybrids and the persistence layers of modern programming languages.
The semistructured data model took a different path. Instead of enforcing a rigid schema, it allowed data with irregular, nested, or missing structure. The key theoretical innovation was the use of labeled directed graphs (often represented as XML or JSON) and query languages that could navigate paths through the graph without knowing the full schema in advance. Semistructured data theory provided the formal foundation for XML databases and, later, for the JSON-based document stores that became central to NoSQL systems.
The graph data model also used graphs, but with a different emphasis: it treated the connections between data items as first-class citizens. The theoretical core includes graph query languages (such as reachability queries, shortest-path queries, and graph pattern matching) and the study of their computational complexity. Graph databases coexisted with relational systems for specialized applications (social networks, biological networks, knowledge graphs) and later found a natural home in the NoSQL ecosystem.
These three frameworks—object-oriented, semistructured, and graph—are not competitors to each other. They are three different responses to the same perceived limitation of the relational model: its insistence on flat, schema-first tables. Each preserved the relational model's commitment to declarative querying while relaxing different assumptions.
The 2000s brought a new pressure: scale. Internet applications needed to store petabytes of data across thousands of machines, and the relational model's strict ACID guarantees became a bottleneck.
NoSQL systems directly challenged the theoretical assumptions of the relational model, concurrency control theory, and query optimization theory. The core claim was that for many applications, consistency could be relaxed (eventual consistency) and query expressiveness could be sacrificed (no joins, no complex queries) in exchange for horizontal scalability and high availability. NoSQL is not a single framework but a family of approaches—key-value stores, document stores, wide-column stores, graph databases—each with its own theoretical trade-offs. The framework did not replace relational theory; it created a pluralistic landscape in which the choice of data model depends on the application's requirements.
Big data theory emerged alongside NoSQL but with a different focus: how to process massive datasets that do not fit in a single machine's memory. The framework includes the MapReduce programming model, its theoretical analysis (complexity, skew handling, communication cost), and the study of distributed query processing at extreme scale. Big data theory and NoSQL systems overlap in their concern with scale, but they address different levels: NoSQL focuses on storage and access patterns, while big data theory focuses on computation and algorithmic efficiency.
Streaming data theory addresses yet another dimension of scale: data that arrives continuously and must be processed with low latency. The framework formalizes streaming models (sliding windows, punctuations, event-time processing), query languages for continuous queries, and the trade-offs between latency, throughput, and accuracy. Streaming data theory and big data theory are complementary: batch processing (big data) handles historical analysis, while streaming handles real-time reactions. Both frameworks extend the relational model's query semantics into new temporal and architectural settings.
Data exchange and integration theory tackles a different problem: how to combine data from multiple, heterogeneous sources. The framework formalizes schema mappings, query rewriting across schemas, and the problem of data exchange (translating data from one schema to another while preserving its meaning). It draws heavily on dependency theory and query optimization theory, but it extends them into a multi-source setting where the schemas are not under a single designer's control.
Cloud-native data management is the most systems-focused framework in the timeline, but it has theoretical consequences. It assumes that infrastructure is elastic, that storage and compute are decoupled, and that failures are normal. These assumptions challenge traditional concurrency control theory (which assumes a stable set of nodes) and query optimization theory (which assumes a fixed hardware configuration). Cloud-native data management does not replace those theories; it forces them to be re-examined under new architectural assumptions.
Today, no single framework dominates. The relational model remains the theoretical gold standard for data integrity and declarative querying, and it is still the foundation of most commercial database systems. Dependency theory and query optimization theory continue to be active research areas, especially in the context of data integration and cloud-native systems. Concurrency control theory has been extended to handle geo-distributed and cloud-native settings, where the classic serializability definitions must be relaxed or rethought.
What the leading frameworks agree on is that declarative querying—specifying what rather than how—is a fundamental advance that should be preserved. They also agree that no single data model fits all applications; the era of "one size fits all" is over. What they disagree on is how much formal rigor is necessary. The relational tradition insists on strong consistency, schema-first design, and provable correctness. The NoSQL and streaming traditions argue that for many applications, weaker guarantees are acceptable and even desirable. Big data theory and cloud-native data management sit in the middle, trying to preserve as much formal structure as possible while adapting to scale and elasticity.
The result is a productive tension. Database theory today is not a settled body of knowledge but a set of frameworks in live disagreement, each with its own strengths, each borrowing from and reacting to the others. The student who understands the relational model's formal commitments, the pressures that led to the 1990s model expansions, and the scale-driven challenges of the 2000s will be equipped to navigate this landscape and to contribute to its next chapter.