Corda integration options in 2020
December 07, 2020
Since launching Cordaptor the Bond180 team have had a number of conversations with teams interested in exploring the potential of the technology. What became obvious to us during this time is that engineers are confused about different integration options available for CorDapps, and their relative merits, so they are having a hard time choosing the right one for their architectures.
In the early days of our Corda adoption journey, we spent days and weeks researching available options. After this period we decided to create our own bespoke integration middleware which we subsequently built out and released as an open source project known as Cordaptor. However, as the number of options in the Corda ecosystem increases, not everyone will have time to evaluate them all. Besides, frankly speaking, when you are under enormous pressure to get the MVP out of the door ASAP, it’s tempting to simply take the first one that seems to work. As a result, suboptimal decisions are made which can hurt a lot, even in the short run, because teams are either forced to work with unfamiliar technologies or experience a drag of having too many layers in their architectures for no added benefit.
I decided to write a sort of compendium of different options available to development teams facing a challenge of integrating Corda into their stacks and architectures. I will try to be objective and share the insights of our own journey, so hopefully, this will not be seen as a reason to plug Cordaptor in (pardon the pun).
Before I go into the options, I want to briefly clarify what I mean by integration. Our friend Adam Dry of Ivno wrote extensively about integrating Corda into enterprise IT estates, and it’s an important part of the equation. This is extremely important for end-user organizations, especially those participating in consortia like Marco Polo, B3i, and Ivno of course. However, another equally important challenge is faced by teams setting off to build solutions where Corda is a part of the whole. They are looking for ways to communicate with a node in a reliable and flexible way, without incurring significant overhead or abandoning their familiar tools. In the remainder of this post, I will try to give both perspectives equal weight.
Corda RPC
The RPC is the grandfather of Corda integration options. Developed by R3 in the early days of Corda, it is still the only officially supported connectivity option, although it seems Corda 5 is likely to change that at some point in 2021.
Corda RPC is a binary protocol, which relies on custom serialization framework within Corda, which in turn is tightly intertwined with Corda’s AMQP implementation. Under the hood, it uses AMQP broker (Apache Artemis as of Corda 4) to deliver protocol messages, but for the client application code, it appears as a collection of Java objects. Corda RPC does a great job in exposing a lot of functionality of the Corda core in a transparent way so that it almost seems as if your code is running inside the node itself. For example, you can wait on a Future to obtain an outcome of a flow, or subscribe to an Observable representing a feed of the vault updates, and the RPC will transparently deal with the fact that there is a network in-between, including graceful reconnection and support for active-passive deployments. Well, almost transparently.
The key downside of Corda RPC is the fact that there is no wire protocol specification, and probably there will never be one. Consequently, it requires a Java client library, which predisposes you to use JVM at least for the code that is in immediate contact with the Corda node. If you are running a full-stack JavaScript or .Net shop, you are out of luck. Another consideration is that RPC requires your CorDapp classes to be present in the classpath on the client-side, which potentially makes CorDapp version upgrades interesting. On the architectural side, Corda RPC client library is ‘dumb’ in the sense that all it does is relay your queries and instructions to the node. So, if you want a level of caching for objects obtained from the node, you would need to write your own solution. Finally, forget about using enterprise integration packages like the ones from Mulesoft, Microsoft, or Informatica for Corda RPC: you will always need some bespoke integration code.
The main benefit of using Corda RPC is simplicity. If Java is your stack, and you are creating a middle-tier component for other reasons anyway, connectivity comes almost for free. There are no other dependencies to this. The RPC API is well documented, rarely throws any surprises, and quite pleasant to work with in general. Also worth mentioning that R3 is going to support the API and most likely it will stay backwards compatible for the foreseeable future.
Corda node database
Surprisingly few engineers consider this to be a viable option, but it’s actually official. CorDapp can specify and manage a database schema for contract state classes (see QueryableState), and Corda will create necessary tables in the node database and even evolve the table structure in line with the upgrades.
This is an excellent option for pulling data out of the node for your data lake or data warehouse. An open-source version of Corda supports PostgreSQL, where Corda Enterprise adds even more options such as Oracle Database. There are more ETL tools out there supporting these databases than perhaps there are certified Corda engineers. You can pick your poison and choose between batch-oriented or real-time streaming options. Most databases now have excellent support for primary-secondary replication, which is instrumental for high availability. Another important benefit is that R3 seems to be committed to supporting this in future versions of Corda.
The key downside is, of course, the fact it’s mostly for reading access. You cannot initiate a Corda flow via its database, let alone manipulate a state because it will violate the integrity of the ledger. There are workarounds of course, for example, you can create an additional writeable non-ledger table, and then read instructions from it from within a Corda flow, started by a scheduler.
If all you care about is having a nice dashboard on top of your node, this route is hard to beat. For example, take Microsoft PowerBI, point it towards PostgresSQL instance with appropriately configured credentials, and you are good to go. However, for most applications requiring real-time tracking of the node ledger state, or complex flow initiation, this is somewhat cumbersome to manage.
Cordite Braid
Braid is an open source project as part of Cordite, which is a collection of tools for Corda network developers. Braid has been around for a number of years and has a mature codebase, which for many makes it a default choice for cross-platform integrations with Corda.
Braid is designed to be deployed inside a Corda node and uses WebSockets to offer real-time synchronous duplex communication channel. Braid aims to be analogous to Corda RPC, but unlike RPC comes with a wire protocol, which is possible to work with from any tech stack, not just Java. Under the hood Braid uses Vertx; a high-performance non-blocking embedded web server, able to scale to thousands concurrent connections. In that sense, it is also comparable with Corda RPC. CorDapps need to be enhanced to support Braid by creating facade Corda services, methods of which are exposed as operations available over WebSockets.
Unfortunately, the underlying protocol used by Braid is JSON RPC, which is somewhat niche and not many tools support it out of the box, so some hand coding would be required. Further, WebSockets are supported inconsistently by web browsers and, more crucially, corporate firewalls, so relying on a browser-based application using WebSockets to connect to Braid service running in the Corda node may be risky.
Architecturally Braid is not that different from Corda RPC, sharing its limitations. To start with, there is no intelligence helping with caching, all JSON RPC queries will be relayed to the node itself. Also, since Braid runs in the same process as Corda node, if node is down, so is the WebSocket connection. Whilst there are some API gateways supporting WebSockets — most notably Amazon API Gateway — middleware options that can help with these limitations are limited. This, plus my earlier point about corporate firewalls, in my view makes Braid unsuitable for exposing Corda nodes directly to a browser-based UI, which in most cases necessitates a middle-tier component negating the benefits of using WebSockets in the first instance.
To be fair, Braid does seem to offer support for REST API and what seems to be a standalone server mode, but it is not well documented. All I could gather is that it seems to be evolving towards a general purpose communication middleware platform, parting ways with Corda along the way.
If you are not bothered by these limitations, want to use a well-known tool for connecting to Corda, and Corda RPC is not good for you because, for example, most of your team know nothing about Java, then Braid is a viable choice. Just be certain you understand the implications for your architecture.
Web3j-corda
Not as widely known as Braid, but still an interesting library from Web3 Labs. Web3j-corda generates server-side and client code in Kotlin for contract states and workflows of a CorDapp by analysing the code.
Web3j-corda (I wish they came up with a name that’s easier to type!) supports JAX-RS annotations to expose flows and contract classes, so a REST API alongside with a Swagger JSON file could be generated for the CorDapp. There is a web server included into the CorDapp to listen for HTTP requests. Client application written in Java can use generated Java bindings for the CorDapp annotations. Other tech stacks can roll out their own bindings using any tool that supports Swagger specifications. In my book, HTTP API is a big win, because many tools can work with it out of the box, and even most corporate firewalls are not going to cause any issues.
Unfortunately, the documentation for web3j-corda seems patchy, and the library does not seem to be actively developed, so its future is uncertain, especially in the face of Corda 5 release. I could not find any mention of means to secure the web server connection for example. Apparently, Web3 Labs are seeing limited uptake. As an evidence, there are no questions about using web3j-corda on Stackoverflow. That aside, by now you already know I am not a big fan of connecting to Corda node directly from a browser-based UI. Every query requires Corda node to be up to answer it. With REST API it could be to some extent mitigated by a smart reverse proxy, but it would rely on web3j-corda to set cache control headers correctly.
I am not sure whether to recommend web3j-corda to anyone at this stage, because of the uncertainty of its future. I am happy to revisit this if someone can enlighten me.
Cordaptor
Last, but not least, meet the new entrant to this space — Cordaptor by B180.tech. Full disclosure: I am the core contributor and the maintainer of the project.
Cordaptor was born out of frustration with the inability to find a suitable integration option for CorDapp we were developing as part the IAN product early in 2020.
In nutshell, Cordaptor exposes a REST API to be used to communicate with a Corda node, initiating flows and querying its vault. Key difference from other options is that Cordaptor generates OpenAPI specification dynamically based on CorDapps available on the Corda node, and there is no specific programming model to follow or API to use. In fact, it uses Corda’s own CorDapp introspection logic, so anything that works over RPC should work over the REST API. Cordaptor also generates JSON Schema specifications and JSON serialization logic for all relevant classes, which plenty of tools understand and are able to use out of the box. Cordaptor can be deployed as a CorDapp itself, in which case it will scan other CorDapps available on the node and create REST APIs for them. This is literally a zero-configuration option — all that’s needed is to drop Cordaptor embedded bundle JAR into the cordapps directory and restart the node. I think this works very well for development.
Where Cordaptor adds significant value is its standalone deployment. REST API would be identical to the embedded mode, but architecturally it offers a lot more. Standalone Cordaptor is a caching load-balancing gateway for Corda, which addresses a fundamental concern in Corda-based systems, i.e. the availability of a Corda node. Whichever integration technology you use — perhaps only with the exception of reading from the underlying database — if the node is unavailable, client queries cannot be answered. The truth is that there are plenty of reasons for the node to become unavailable from time to time: from the need to complete (drain) all flows before shutting down for the upgrade, to intermittent outages caused by excessive load. Dealing with it requires some sort of intelligent proxy in the middle tier, which is tricky to do well and expensive to evolve. Cordaptor provides rich gateway functionality out of the box with any CorDapp, and will update the API in lock-step with changes in the CorDapp.
Cordaptor has limitations of course. It does not support WebSockets yet, which makes it harder to implement logic that requires push notifications. Instead, the client application code needs to poll to get flow completion results or wait for a blocking HTTP request. It also currently, as of version 0.1, offers limited caching capabilities. These features are on the roadmap, and hopefully, we will be able to progressively implement them. However, due to its open architecture, these features could be easily added by the adopting teams for the benefit of the broader community.
Conclusion
This was a whistle-stop tour over a number of different integration technologies that are available for developers building on Corda. All of them have their strengths and weaknesses and need to be chosen wisely.
There must be some other technologies I wasn’t aware of, and I would appreciate suggestions in comment. Also, I suppose, there could be things I simply got wrong about some of the options above — please do let me know if you think that’s the case too. I’m aiming for an objective appraisal, so let’s make it such together.
Igor Lobanov is a Principal Engineer at B180.tech, a wholly-owned business of Bond180 Limited focusing on providing world-class tools, technologies, and professional services for decentralized applications, decentralized finance, and innovative applications employed by traditional financial institutions. Please reach out via [email protected] for any enquiries.