Yesterday Microsoft announced a beta of SQL Server Data Services (SSDS), a cloud database. The features of SSDS provide a possible way to solve data portability and synchronization problems in Bible software.
(If terms like “cloud database” and “data portability” put you to sleep, then you should probably stop reading now. This post is fairly technical.)
At BibleTech08, Craig Rairdin from Laridian spoke about “synchronizing user-created data between platforms, readers, and vendors” (mp3 of his talk):
In the last ten years, Bible software users have moved from being 100% desk-bound to nearly 100% mobile. Unfortunately, mobile devices are significantly more disposable than desktop systems, and users move from device to device, platform to platform, and Bible software to Bible software. Through this they long to have portability of not just their libraries but their own annotations, highlights, cross-references, bookmarks, and any other user-created data their programs allow them to create.
In other words, Craig describes the digital analog of the classic problem when you buy a new print Bible: what do you do with all the notes, highlights, and underlines in your old Bible? Similarly, when you buy new hardware or software, what do you do with all your customizations? Your notes, highlights, history, saved reports, workspace settings, etc., should ideally come with you when you upgrade your computer or switch programs.
Unfortunately, each Bible software vendor stores this user data in different formats, depending on the needs of the program, and the vendor may or may not choose to export the data in an easily consumable format. Transferring data between programs, therefore, involves a lot of work—probably not as much work as copying out all the notes from your old print Bible, though.
(Naturally, vendors make it relatively easy to transfer your data when you upgrade to a newer version of their program or when you switch computers and need to reinstall your program. They could, however, simplify the process further.)
According to the SSDS overview (pdf), SSDS is a schemaless (no fixed schema), queryable, REST-accessible data store. Basically, you send it an HTTP request, and it sends you back an XML document containing the results of your query.
In many ways, SSDS is part of a trend in recent large-scale database development toward denormalization, key/value lookup, flexible schemas, and standards-based protocols. Amazon’s SimpleDB and CouchDB are both examples of this trend. The need for ORM schemes, like the one used in Ruby on Rails, shows that modern programming languages and relational databases aren’t exactly using the same playbook when it comes to data modeling. And scaling a database requires a good deal of specialized knowledge.
SSDS offers a potential solution to the problem of data portability and synchronization in Bible software: instead of only storing user data in a binary format on a local computer, also store it in the cloud so other programs and the user can access it when and how they want.
Let’s look at the SSDS data model and see how it applies to a user-data scenario. An SSDS database has four levels:
- Authority: a group of containers. As a software vendor, you’d probably have one authority per program.
- Container: a group of entities. Containers come with their own security model, so each container would be a user.
- Entity: a group of properties. Each entity would be a distinct piece of data—a user’s notes on a particular verse, for example. The lack of a defined schema means that an entity can consist of only those properties that are relevant to the data. A highlight, for example, might record the start and end points and a timestamp, while a note would need to record the text of the note, as well as a mime type.
- Property: a key-value pair. The raw data. The trick for data portability is that vendors have to agree on the terminology and format for various properties.
In this scenario, when the Bible software program starts up, it queries the SSDS database and look for new data created by the user, integrating the data into the program as necessary. Then, when users write a note in their Bible software program, the program queries the SSDS database and records the note there. If the user doesn’t have an Internet connection at the moment, it queues the data until the Internet becomes available. (The talk by Craig from Laridian goes into a bunch of different synchronization cases.)
Now, in theory, different programs—or the user—could query the database directly and read or write data as necessary. For example, someone might want to create a feed of recent Bible annotations and publish it on a blog. If SSDS produces an Atom feed natively, then it gets even easier to publish the data widely (should the user choose to do so).
In practice, for write requests, you’d probably want to have a trusted and an untrusted container, or proxy the requests through a program that validates the data. OAuth provides a standard way to give programmatic access to data.
In the end, users have more control over their data in an environment that provides them more permanence than a local store. Users who have confidence that they won’t lose their data will hopefully be more likely to commit more data to their Bible software. And the more they use their Bible software, the more loyal they become (and the more money they’ll spend on upgrades in the future).
Data portability is a harder sell to Bible software vendors—after all, you’re giving people an easy way to move to different software. But you’re also letting people, in principle, integrate one activity—studying and annotating the Bible—with other activities in their lives. One example is a Facebook application that shows what Bible passages you’re reading that day. Another example is twittering a brief response to your Bible study. (Are these applications useful? Maybe not. But you can probably come up with some that are.)
The digerati are beginning to demand data portability in online applications; at some point, they’ll start demanding it in their offline applications, too. Will most users of Bible software also demand data portability? No, probably not right away. But they may start to do so if other programs and websites they use begin offering it.
We’re not suggesting that any Bible software vendors go out and start publishing their users’ data on SSDS—we only suggest that if they want data portability, synchronization, and users’ control of their data, then SSDS is a good fit.
Undoubtedly, we haven’t thought through a lot of problems related to this approach, but we thought we’d publish this idea and let others push it deeper if you want.
Our websites run on LAMP rather than on Microsoft technology, and the beauty of SSDS’s REST approach is that we don’t need to care what’s running under SSDS’s hood. (CouchDB, for example, runs on Erlang, which is cool but not the most mainstream language.) We’ve been considering using (and have been testing) Amazon’s SimpleDB for some upcoming applications, but SSDS overcomes a number of present limitations in SimpleDB (notably the 1,024-character limit, the need for massive concurrency to retrieve many items in a reasonable amount of time, and the lack of a numeric data type). The main questions for us about SSDS revolve around price and what limitations SSDS has that the materials released so far don’t mention. We’ve requested access to SSDS, so hopefully we’ll learn some specifics firsthand sometime soon.
If Microsoft can make good on the promise of SSDS, they’ll have made a huge contribution to the notion of cloud computing. At the very least, it will definitely be a good thing for Amazon and Microsoft to compete in this area; competition should result in better products from both of them.
Comments are temporarily open on this post; we may not publish every comment, but we’ll definitely read all of them and publish the constructive ones.
Appendix: Comparison of SSDS, SimpleDB, and CouchDB
||REST (the two examples published by Microsoft point to a true REST approach) or SOAP; XML output (though Microsoft also mentions Atom); JSON is possible in principle, though Microsoft has made no promises
||REST (technically, if not in spirit) or SOAP; XML output
||REST; JSON output
||LINQ; returns only complete entities
||Proprietary Amazon language; returns item names—retrieving attributes requires one additional query per item
||Eventual consistency; new data may take several seconds to be available
||Eventual consistency; views are fast
|Smallest Updatable Unit
||Document (group of name/value pairs)
||Only in private beta; no benchmarks; scant information about other possible limitations (which undoubtedly exist)
||Each attribute value can only be 1,024 bytes; no sorting; no types (hacks needed to query numbers); lots of parallel requests needed to fetch items
||Currently in alpha; you manage the servers it runs on, unlike SSDS and SimpleDB; you can only update complete documents
||Unknown; Microsoft is targeting small- and medium-sized businesses
||Increases with usage and data; starts small
||Free (open source)
|Use as Primary Data Source?
||Latency issues (unless application server is in the same data center) make it infeasible if you need to make lots of queries in realtime. Microsoft reportedly plans to sell self-hosted editions of SSDS.
||Could work if combined with EC2 to minimize latency
||Too early for production