Corda, SQL and NoSQL

June 13, 2017

Corda, SQL and NoSQL

Corda differs from other platforms in numerous ways, but one of the most visible is our usage of relational database technology. Sometimes people query this — aren’t we swimming against the tide, which seems to be flowing away from SQL and conventional databases?

The answer is no. In this post I’d like to explain why we’re doing this.

NoSQL or NewSQL?

Before we get into the distinction between SQL, NoSQL and NewSQL databases I’d like to briefly mention my background. I’m the lead platform engineer on the Corda project and am responsible for most of the design. I used to work for Google, and my first job there was as a Google Maps & Earth “site reliability engineer”. SRE teams combine people with a variety of backgrounds — systems administrators and systems-level programmers work together on all aspects of the daily operation of a service. The job involves capacity planning, troubleshooting, design and architectural reviews, operation of the infrastructure and so on. One part of my role was to operate BigTable clusters, as the satellite imagery infrastructure relied heavily on Google’s first-gen scalable database. After moving on from this role, I also wrote quite a bit of software that used BigTable and MapReduce together. So I am quite familiar with what it’s like to work with such technologies.

After Google published papers on these two technologies, they started to become known as “NoSQL” or “big data”. By abandoning the constraints of the relational data model and its associated query language, working with massive “web scale” datasets became a lot easier. Relational database engines offer many features that are hard to scale up over lots of machines, so it made sense for firms offering free web services to seek alternatives. This trend led to many re-implementations of Google’s key ideas, like Hadoop, HBase, Cassandra and Accumulo (from the NSA!). The Cassandra documentation would look quite familiar to any Google engineer who has used BigTable, with its talk of sstables, merge compactions and so on.

The NoSQL trend also spawned a set of databases that look superficially similar to BigTable and its derivatives, but which are actually quite different. I put MongoDB and CouchDB in this category. Typical differences are that they aren’t column based but rather store arbitrary JSON trees, lack transparent automatic resharding, and don’t offer a query language at all (every query must be run as a MapReduce).

Beyond publishing occasional academic papers, Google did not offer much external opinion or advice on the topic of databases. So along the way a critical misunderstanding started to take hold in the software industry: the belief that the lack of SQL and other core relational features was the big idea and that it was thus a strength and not a weakness.

This was reflected in the “NoSQL” name that was adopted to describe this trend. But that wasn’t true. Inside Google losing SQL, joins, ad hoc querying and complex transactions was seen as a major blow. Such features were let go of only with extreme reluctance. In fact, in the most core parts of Google’s business — the parts where the advertising data was stored — they were never willing to switch to their own in-house databases because the missing features were just far too critical. The core AdWords database ran on a sharded MySQL cluster the entire time I was there — something little known outside the company until recently. For applications that couldn’t or wouldn’t adopt this architecture, massive engineering efforts were required to work around the weaknesses in the NoSQL feature set.

Thus the goal was always the production of a new relational database engine that scaled to the astonishing sizes Google needed. Vast projects were created to achieve this goal, which led to systems like Dremel, MegaStore, Spanner and eventually the crowning achievement, F1, a database engine with features competitive with MySQL. The F1 paper has some quotes that illustrate this 20 year journey:

In recent years, conventional wisdom in the engineering community has been that if you need a highly scalable, highthroughput data store, the only viable option is to use a NoSQL key/value store, and to work around the lack of ACID transactional guarantees and the lack of conveniences like secondary indexes, SQL, and so on. When we sought a replacement for Google’s MySQL data store for the AdWords product, that option was simply not feasible: the complexity of dealing with a non-ACID data store in every part of our business logic would be too great, and there was simply no way our business could function without SQL queries.

Instead of going NoSQL, we built F1, a distributed relational database system that combines high availability, the throughput and scalability of NoSQL systems, and the functionality, usability and consistency of traditional relational databases, including ACID transactions and SQL queries.

This new generation of database technology might be called NewSQL
— critical features brought into the world of big data via new engines.

Google was not the only firm that realised the features of conventional database engines were indispensable for any real business. Bloomberg has been engineering its own database engines for decades, a fact I was not aware of until recently but which did not come as a great surprise. Their ComDB2 database is now not only described in an excellent paper, but Bloomberg has even open sourced it. ComDB2 can use many machines to provide massive read throughput, and although it doesn’t scale to the same same extent as F1, it is clearly sufficient for even the most demanding use cases in the financial industry.

With that history lesson over … how is this relevant to Corda?

Corda and SQL

Corda nodes are backed by a relational database. The open source node uses the H2 embedded SQL engine, but the design allows for more or less any database that has a JDBC adapter to be used. Expect to see support for other database engines in future (helpfully, Bloomberg’s ComDB2 comes with such an adapter out of the box).

It’s worth a quick review of what other blockchain/distributed ledger platforms have chosen in this space:

  • Ethereum: each instantiated smart contract gets a key/value store. There is no query language as far as I’m aware.
  • IBM Fabric: you can choose between LevelDB (a key/value store from Google, based on parts of BigTable’s design) or CouchDB, which is a JSON document store. There is no query language, instead you have to write small MapReductions in JavaScript which then process the entire dataset each time. As both CouchDB and LevelDB are schemaless, the format of data on the ledger are unmanaged at the data layer.
  • Bitcoin: provides no general purpose database or query system, as it’s designed only to handle a single asset.
  • Quorum: based on Ethereum and provides the same query features.

As you can see, we’re the only ones who went for a relational data model. The decision was driven by the following factors:

  1. SQL is necessary. As Google put it bluntly, “features like ad-hoc query are not nice to have, they are essential for our business”. If they’re essential to Google’s business, they’re even more essential to the kinds of businesses Corda is designed for — financial firms where the ability to rapidly query, analyse and spreadsheet data is a daily need. SQL is widely known by business analysts and can answer complex business questions in seconds. “Just file a ticket for a developer to write a MapReduce” is not an acceptable substitute, and we anticipate that proofs-of-concept built on platforms that don’t support relational mapping will hit a wall once they try to reach production.
  2. SQL is the future. The software industry has an established pattern of following where a few big players go, and they are investing massively in SQL. Not only Google of course but the biggest players in finance. Our bet is already paying off thanks to Bloomberg’s decision to open source ComDB2 — I expect we will be experimenting with adding support to Corda at some point in the future.
  3. Many datasets do not require NewSQL databases. BigTable was designed in 2004 for the hardware of that era. The paper describes two tables used in Analytics at that time — a raw click table of 200 terabytes and a summary table of 20 terabytes. But even in 2005 it was possible to run 100 terabyte databases on Oracle, and since then SSDs, CPU core counts, RAM have all gone way up. Open source databases like PostgreSQL have improved dramatically and tens of terabytes on a single machine are no longer considered exotic or unusual.Consider that for about $10,000 you can buy a terabyte of RAM. If your application has a billion users, that yields around a kilobyte of RAM for every user … which yields truly awesome IO throughput. However, no bank even has a billion users to begin with! Especially in finance, datasets often have a hot “tip” where e.g. current trading is being recorded, with data quickly cooling off and being accessed only rarely. If you can use traditional relational databases by simply buying a big machine, it’s a really good idea to do so — you will save a lot of maintenance headaches that come with running distributed systems (and I should know, as I remember my lost evenings and weekends debugging large BigTable clusters very well!).In practice, for many use cases being examined for distributed ledgers, they can easily fit inside single well-specced servers.

Corda’s support for SQL goes pretty deep:

  • The node itself stores its working data in the backing database. Thus:
  • Backing up or replicating the database also backs up or replicates the node.
  • Replication of node data between datacenters is largely a matter of configuring your database engine, a task that financial institutions have huge expertise in.
  • Like in Google’s cutting edge F1 database, the Corda distributed ledger can store arbitrary data trees in states. JPA annotations are then used to enable mappings to the relational model. You aren’t restricted to the columnal approach. This is useful — in their F1 paper Google repeatedly emphasises how important and useful they found their equivalent ability (protobuf embedding).
  • Using that facility, the node automatically makes read-only relational mirrors of the parts of the global ledger you’re authorised to see. It happens automatically and app developers don’t need to take extra steps beyond annotating their schemas.
  • You can define multiple schemas for each state in a CorDapp, meaning the app can be upgraded on a different schedule to the und
    erlying schema changes. This fits well with the schema change process used by many banks.
  • You can issue SQL queries from the Corda shell.
  • When using the open source node with the embedded H2 database, you can open a web console from the DemoBench tool to interactively explore and work with the database layer.
  • By creating your own tables in the same database the node uses, you gain the ability to join private datasets like customer notes with ledger data — SQL JOINs are an exceptionally powerful tool to make keeping private data off ledger easy and straightforward. It’s a key part of our story around making sure data only goes where it needs to go.
  • The node sends change notifications and deltas to connected clients over RPC (for some types of query), enabling apps that react instantly to changes in the ledger. This is a feature often lacking in other platforms.

We believe this approach is sufficiently compelling that other ledger platforms will produce SQL adapter layers in time, and we’ll welcome that development. If multiple nodes on different ledger platforms can be made to share the same underlying database engine, then you can solve some interop issues by performing SQL JOINs between the different ledger tables.