Every database system begins with a choice about how to represent information. That choice—the data model—determines what questions are easy to ask, how fast answers come back, and whether the system can adapt when the world changes. For sixty years, data modeling has oscillated between two pressures: the need to faithfully capture the structure of real-world domains and the need to store and retrieve data efficiently on physical hardware. The history of data modeling frameworks is the story of how designers have navigated that tension, each framework making a different set of commitments about abstraction, flexibility, and performance.
The earliest database frameworks, the Hierarchical Model and the Network Model, both emerged in the 1960s and shared a fundamental assumption: data should be organized as records connected by explicit pointers. In the Hierarchical Model, records formed a tree structure—each child had exactly one parent, and navigating from a root to a leaf required following a predefined path. IBM's Information Management System (IMS) exemplified this approach, storing data in a rigid hierarchy that mirrored the physical layout of tapes and disks.
The Network Model, formalized by the CODASYL committee, loosened the tree constraint by allowing a record to have multiple parents, creating a graph of records linked by named sets. A programmer navigated this graph by issuing commands like FIND NEXT or FIND OWNER, moving from one record to another along pointer chains. Both models were navigational: the user had to know the physical access paths to retrieve data. The models were tightly coupled to storage structures, and changing the schema often required rewriting application code.
The central difference between the two navigational models was the shape of the connections. The Hierarchical Model enforced a strict parent-child tree, which made it simple for one-to-many relationships but awkward for many-to-many relationships (which had to be modeled with redundant records or virtual hierarchies). The Network Model, by contrast, could represent many-to-many relationships directly through its graph of sets, but at the cost of greater complexity in both schema design and query logic. Despite their differences, both models shared a deep commitment to pointer-based navigation, and both made the programmer responsible for managing the traversal of data structures.
The Relational Model, introduced by Edgar Codd in 1970, was a deliberate philosophical break from the navigational paradigm. Codd proposed that data should be represented as mathematical relations—sets of tuples—with no pointers at all. Instead of following links, the user connected data through values: a customer ID in an order relation matched the same customer ID in a customer relation. This value-based linking meant that the physical storage of data could be changed without altering queries, a property Codd called data independence.
The Relational Model's distinctive commitments were threefold. First, it adopted a set-theoretic foundation: relations were unordered sets, and operations (select, project, join) produced new relations, enabling a declarative query style. Second, it separated the logical schema from physical storage, so that users could think about data without worrying about indexes or access paths. Third, it introduced a rigorous normalization theory that eliminated redundancy and update anomalies. SQL, developed in the 1970s and standardized in the 1980s, gave the model a practical language that allowed users to say what they wanted, not how to find it.
The Relational Model did not immediately replace the navigational models; they coexisted for more than a decade. But by the 1990s, relational database systems had become dominant, largely because data independence made applications easier to build and maintain. The navigational models narrowed to niche roles in hierarchical data (IMS remained in use in banking) and specialized graph processing. The Relational Model's victory was not just technical—it was a shift in how designers thought about data, from paths to sets, from navigation to declaration.
By the late 1970s, relational databases were growing in popularity, but designers faced a practical problem: how to translate a real-world domain into a normalized relational schema. The Entity-Relationship (ER) Model, introduced by Peter Chen in 1976, addressed this gap by providing a high-level, implementation-independent notation. The ER Model distinguished entities (things like customers or products) from relationships (connections like purchases or employs), and it used diagrams with boxes and diamonds to represent them.
The ER Model was not a competitor to the Relational Model; it was a complementary design layer. A designer would first draw an ER diagram to capture the domain's structure, then map that diagram to relational tables. The mapping was systematic: entities became tables, relationships became foreign keys or junction tables. The ER Model also introduced the concept of cardinality—the number of instances on each side of a relationship (one-to-one, one-to-many, many-to-many)—which became a standard tool for reasoning about constraints.
Over time, the ER Model was absorbed into the broader relational design process. It remains widely taught and used today, not as a separate database system but as a conceptual modeling technique that precedes implementation. Its lasting contribution was to make explicit the distinction between the conceptual schema (what the data means) and the logical schema (how it is arranged in tables), a distinction that later frameworks would revisit.
In the 1980s, as object-oriented programming languages like C++ and Smalltalk gained traction, a new pressure emerged: the impedance mismatch between the relational model's tabular representation and the object-oriented paradigm's graphs of objects with identity, inheritance, and behavior. Object-Oriented Data Modeling (OODM) proposed storing objects directly, preserving their identity and relationships without mapping to tables. An object database would store a customer object with its embedded address and order history, navigable through references rather than joins.
OODM's distinctive commitments included support for complex objects (nested structures, sets, lists), object identity independent of values, inheritance hierarchies, and encapsulation of behavior alongside data. Systems like ObjectStore and GemStone emerged in the late 1980s and early 1990s, promising seamless integration with object-oriented code.
Despite its conceptual appeal, OODM never achieved broad adoption. The reasons were multiple: relational systems had enormous installed bases and mature query optimization; SQL was a standard; and the object-oriented features could be approximated by object-relational mappings (ORMs) that translated between tables and objects in application code. By the early 2000s, OODM had narrowed to specialized domains like computer-aided design and telecommunications, where complex data structures were the norm. The framework did not disappear entirely—its ideas about object identity and nested data would resurface in later NoSQL document stores.
The early 2000s brought a new pressure: internet-scale applications that needed to serve millions of users with low latency, handle massive volumes of data, and scale horizontally across clusters of commodity servers. The relational model's strict schemas, join-heavy queries, and ACID transactions became bottlenecks. NoSQL systems emerged as a diverse family of frameworks that relaxed one or more of the relational model's commitments.
NoSQL is not a single model but a collection of approaches. Key-value stores (like Redis and Dynamo) treat data as an opaque blob indexed by a key, offering extreme simplicity and speed. Document stores (like MongoDB) store self-describing documents—typically JSON—that can have varying fields, enabling schema flexibility. Column-family stores (like Cassandra) organize data by column families, optimized for wide tables and write-heavy workloads. Graph databases (like Neo4j) store nodes and edges explicitly, reviving the navigational idea of pointer-based traversal but with a declarative query language (Cypher) that hides the physical paths.
What unites these frameworks is a shared rejection of the relational model's universal schema. NoSQL systems typically support schema-on-read (the structure is interpreted when data is read) rather than schema-on-write (the structure is enforced when data is inserted). They also prioritize horizontal scalability and availability over strict consistency, often adopting BASE semantics (Basically Available, Soft state, Eventual consistency) instead of ACID.
NoSQL did not replace the relational model; it coexists with it. Relational databases remain the default for applications where data integrity and complex queries matter (financial systems, enterprise resource planning). NoSQL systems dominate in scenarios where flexibility, scale, or real-time performance are paramount (content management, session stores, recommendation engines). The two families now operate in a pluralistic landscape, each chosen for its strengths.
By the 2010s, a new tension had become visible: NoSQL systems sacrificed transactional guarantees and query expressiveness, while relational systems struggled with horizontal scaling. NewSQL emerged as an attempt to have both—the scalability of NoSQL and the ACID transactions and SQL interface of the relational model. Systems like Google Spanner, CockroachDB, and VoltDB combined distributed consensus protocols (Paxos, Raft) with relational storage, achieving strong consistency across geographically distributed nodes.
NewSQL's distinctive commitment was to preserve the relational model's logical structure while re-engineering the physical architecture for distribution. Spanner, for example, used atomic clocks and TrueTime to provide external consistency across global data centers. NewSQL did not reject the relational model; it transformed it by adding a distributed infrastructure layer. The framework remains active, particularly in financial services and global applications that cannot tolerate eventual consistency.
At the same time, Cloud-Native Databases emerged not as a new data model but as an operational paradigm that runs on top of any model. Cloud-native systems (Amazon Aurora, Google Bigtable, Snowflake) separate compute from storage, use elastic scaling, and are managed as services. They provide the infrastructure for relational, document, key-value, and graph databases alike. Cloud-Native Databases have become the dominant deployment model, but they do not replace the frameworks above—they host them.
Today, data modeling is a pluralistic field. The Relational Model remains the most widely used framework for structured data, supported by mature tools and a vast ecosystem. The Entity-Relationship Model continues as a standard design methodology. NoSQL systems have carved out large territories in web and mobile applications. NewSQL is growing in domains that demand both consistency and scale. Cloud-Native Databases provide the operational layer for all of them.
What the leading frameworks agree on is that no single model fits every application. The choice of framework depends on the data's structure, the query patterns, the consistency requirements, and the operational environment. What they disagree on is the right trade-off between structure and flexibility. The relational camp argues that upfront schema design prevents data corruption and enables powerful query optimization. The NoSQL camp argues that schema flexibility allows faster iteration and accommodates evolving data. NewSQL attempts to mediate this disagreement by offering the relational model's structure with the scalability that once seemed to require schema flexibility.
The central tension that has driven data modeling from the beginning—how to balance faithful representation with efficient storage and retrieval—has not been resolved. It has instead become a design space with multiple viable points, each framework representing a different answer to the same enduring question.