Scale, resilience and performance have always been the easy-to-explain but hard-to-deliver requirements for systems handling large volumes of data, and all too often they can only be achieved with expensive and unwieldy technology. However, a new type of database to handle 'Big Data' that offers practically unlimited scale, resilience and performance with little administration The “NoSQL” databases offered through the big cloud platforms, such as Amazon SimpleDB and Google BigTable offer a new technology for managing, storing and retrieving huge data volumes in support of your applications, potentially at a fraction of the cost of traditional options.
As the name suggests, these databases are not like traditional tools such as Oracle or SQL Server. The touted benefits of using them are the ability to scale your data storage and access requirements without worrying about clustering, licensing, or even much in the way of administration tasks. As an example, a recent application we built took on 7 million records and 500 daily users in just four months, a rate in excess of initial projections, with no database administration required whatsoever.
Despite the excited claims of 'Big Data' evangelists, actually reaping these benefits is not as simple as you might believe. Modern databases ship with superb support for important concerns such as data integrity and efficient storage that simply are not offered in NoSQL platforms, and common and well understood design techniques sometimes work against you in the new world.
So what considerations apply if the benefits of this Big Data world appeal to you? We’ve drawn on our experience of building applications on these new platforms to offer a few simple guidelines for you if you’re considering taking the step into the new cloud world:
Don’t underestimate the effort of transition. NoSQL is not intrinsically difficult to understand or hard to implement, but it is different. It takes time to learn what works and what doesn’t and which design patterns are best for your needs. As this is still a relatively new field, good design patterns are not those simply copied from the relational data world. So you must plan for your team to spend time learning and throwing away work.
Duplication is good not bad. The single biggest conceptual hurdle is throwing away the rulebook about denormalised data. In many cases your application will perform better, be easier to maintain and be far more efficient in terms of processing if data is duplicated, reducing the complexity of retrieving and then computing variants from a single master record.
Handle your data with care. NoSQL style databases do not offer type safety and record integrity. Your application must provide this instead. As a result, if you start changing the way your application handles data, you may find yourself running into problems that can only be spotted at run time. For this reason, the integrity and suitability of the reference data must be managed even more carefully than normal during version upgrades and the testing and release process to avoid your users finding the errors your testers didn’t. Although high resilience means the databases are unlikely become unavailable, rolling back to previous versions of the data is likely to be something you have to worry about manually.
Consider the new cost drivers. The pay as you go model for processing and storage means that initial charges are likely to be far lower than setting up your own infrastructure, particularly where demand is unknown. However, estimating the processing power required and longer term storage costs is essential in order to model the total cost of ownership of the solution and may requiring the trading of certain business requirements against longer term operational costs, for example how often to allow photos to be uploaded and stored.
To explore how PA can provide more insight on developing NoSQL-based cloud applications, please contact us now.