SmoothSpan Blog

For Executives, Entrepreneurs, and other Digerati who need to know about SaaS and Web 2.0.

Biggest Post Ever Redux: NoSQL as a More Flexible Solution?

Posted by Bob Warfield on July 23, 2011

Thanks to Reddit, HackerNews, and a host of other sources, my post on NoSQL being a Premature Optimization just became the biggest post ever for Smoothspan Blog.  Thanks to all for reading!

I’m actually surprised at how little argument the post has gotten.  The best comeback has been that NoSQL is not just about scaling.  You can see some of that sort of response in the comments for the original post.

The “it’s not about scaling” argument boils down to it being easier to model some kinds of problems with NoSQL than Relational because the tools and model are more flexible.  To this, I can only respond, “yeah maybe, but was modelling really the hard part of what you’re doing?”

I’ve modeled a lot of things in relational.  Some of them were very arbitrary and had little to do with the hardcore relational way of thinking.  Come to think of it, most were pretty arbitrary.  More than one commenter suggests that the existence of object relational mapping layers is a clear indication of how painful relational can be.  But it sure doesn’t feel that way if you’ve done a lot of it.  Seems like the usual sort of shoehorning some arbitrary notion into a data structure that we deal with all the time in Computer Science.  There are lots of good proven tools for it.   I’ve built mapping layers too and even that wasn’t all that hard.  Adrian Cockcroft from Netflix left one of the very first comments and suggested it was hard to beat the productivity of Ruby on Rails with MySQL for a small team.  That’s a case where the mapping layer became integral to the fabric of the framework, and one I’d love to see happen more often given how fundamental persistent storage is to a lot of problems.   One could even argue it is the fundamental thing that sets Ruby on Rails apart was making persistence a first class problem they wanted to solve up front.  Maybe there is another Ruby on Rails success story just waiting for a NoSQL tool to get crossed with some up and coming dynamic language.

Go back to my original post on NoSQL and go through some of the Netflix materials.  Some of the problems they had to solve in NoSQL are modeling problems there too, BTW.  The difference is that the warts and edge cases for relational are pretty well understood by now.  You don’t have to invent your own solutions (as the Netflix people did for things like NULL handling)–there are 6 or 8 out there just waiting to be Googled to choose from.

But this all ignores my question about whether modeling was really the hard part.  I don’t think it is, though developers love to think about the up front “minimum best fit to their design vision” as the hard part.  Having been through 6 startups now, the hard part is all the stuff that isn’t written down.  It’s the problems that pop up when things just don’t work, don’t work as expected, stop working, work too slowly, and generally just piss you off for no good or predictable in advance reason.  They pop up in the middle of the night, late in the project, after customers get hold of the software, and when there is no turning back.  They show up in spades when you hire new people and don’t have time to train them, so they just have to figure it out on their own.  Such problems extend far beyond mere development of a prototype and will in the ops and day-to-day care and feeding of a successful system.  They are problems born of a lack of maturity.  They get gradually burnished away over time in mature toolsets as bugs are fixed, experience spreads, and patterns and know-how are disseminated to the community.  NoSQL (let alone NewSQL) is just not old enough to be there yet relative to relational.  Give it a minimum of 5 years and more likely 10.  Companies like Netflix are helping make it happen as we speak.  When there are 10 Netflixes that have all built big projects that are wildly different and in total involved over 1000 developers, then we’ll have a start on it.

Meanwhile, you have a startup or a project that needs doing.  If you have a cadre who have already been through NoSQL in a prior startup or project, they may have the experience and scar tissue to make an informed decision about it.  They represent a localized burnishing of the worst problems away.  If you haven’t ever done more than read articles and tinker with toy projects, why would you risk your important project playing around with this new technology?  What do you hope to gain when there are proven solutions already at hand?  Do you expect the Silver Bullet that will magically cut your development time in half?  Do you think your startup or project is so easy you have the luxury to experiment with additional risk?

Interesting.

7 Responses to “Biggest Post Ever Redux: NoSQL as a More Flexible Solution?”

  1. Ugh, more FUD.

    “If you have a cadre who have already been through NoSQL in a prior startup or project, they may have the experience and scar tissue to make an informed decision about it.”

    Why couldn’t this statement read:

    “If you have a cadre who have already been through MySQL in a prior startup or project, they may have the experience and scar tissue to make an informed decision about it.”?

    There are plenty of people out there who’ve been burned by poor MySQL schemas and designs and engineers who are guilty of “shoehorning some arbitrary notion into a data structure” when it wasn’t appropriate. That’s WHY there are tools to debug poor schemas for MySQL and Postgres – the databases allow this bad behavior in the first place instead of solving the real problems the engineers were running into (i.e. storing unstructured data, constantly changing schemas, having dynamic views, many-master replication, sharding, etc.)

    After reading both these two blog entries, I think your real argument is focused around the costs in inventing good schemas and database structures as it relates to running a company overall. I think your real conclusion should be that sometimes it’s OK to take shortcuts to get your product out – it’s risk analysis and mitigation in running a business. The point that appears to be understated is how much cost there might be if, later on, you need to move away from your initial decision. Of course it’s easy to site NetFlix as being able to move over with a nice cozy cushion of nine months and a bunch of engineers doing it piece by piece – but what about the companies who started on MySQL and were successful but were unable to move it over and it killed them? Anyone remember Friendster and how completely unresponsive their service became? It got to the point where people stopped using it. http://www.procata.com/blog/archives/2004/07/14/friendster-wrapup-does-mysql-scale/ if you need a reminder. Had they started on NoSQL today with the same offering and services it might have been different. (Yes in 2004 NoSQL wasn’t a reliable option – but that’s quickly changing, hence the point of analysis for right tool for the job)

    Your argument has some valid points but you really need to do a better analysis of the pros/cons of these technologies before making such large, over-reaching assumptions. I really don’t care how many startups you’ve done – that probably gives you lots of insight into go-to-market strategies and how quickly they must be achieved, but these posts appear to be trying to make a technical and labor-expense case for why NoSQL is bad and it’s simply not properly founded on a full picture of all the decisions one must factor in when deciding on an appropriate database technology for his/her problem.

    • Darren, that’s an easy answer:

      Even if I don’t have a cadre familiar with MySQL, I know I can hire one easily. I know I can find a large number of supporting tools and ecosystems to help out. I know there are consultancies available. At my last startup, for example, we got access to the same setup configuration for MySQL that Facebook uses. Saved us a whole ton of time and it was cheap.

      The same is not true for the other tools. That’s the point of the maturity argument.

      Would Friendster have been different? Hard to say. Maybe they would simply have choked on other problems. It’s a straw man at best because we didn’t get to see them try the other option and we don’t know the full details of why they failed on the option they did try.

      Cheers,

      BW

  2. Bob, interesting post indeed… I can certainly see your perspectives on tool maturity. We have taken started down the journey of NoSQL – in fact, we have started the journey with a completely new toolset (Play!/Morphia/Mongo) stack with a migration approach from our older proprietary toolset (ASP.Net/BizBlox/MSSQL)

    This migration is big change for us – not just NoSQL, but also true MVC as well as an aspect oriented framework – huge.

    We used it in a trial by fire on one of our largest projects to date I think in my work with NoSQL to this point and it was successful. I think you point around tooling and maturity of strategies is a fair one – had we not had a veeery solid architect running our first project we would have been in a work of pain.

    That said, I think there are some major benefits to NoSQL as well as just some plain old different ways of thinking. Firstly, we stopped using queries on SQL back in 2003 when we started using an ORM (BizBlox) and it has meant that we stopped thinking in terms of the database as a real ‘layer’ and it was more integrated in the code tier (as any good ORM should do) – NoSQL takes this a step further and introduces a code-smell of ‘it just might not be there’ which is what I think you were referring to with your ‘Netflix null’ issue. It’s a whole different way of thinking.

    This means that the issues are not just around the maturity of tooling and tech – it”s about the prevalence of thought processes. Dev’s ‘think’ the way SQL/Relational platforms work because they were brought up with them – even your oldest vet will have cut their teeth on SQL.

    Honestly, when I was introduced to the idea of introducing non-normalization and performance-tradeoff based consistency checks to work with a key/value platform like Mongo, I felt like all the reality I ever knew was changed and I had swallowed the red pill to see how deep the rabbit hole went…

    Bottom line – if you are doing this for ‘optimization’ or thinking about ‘migrating a MySQL’ project to a NoSQL solution you’re on crack. Your developers will flip their shit and the migration penalty will near on kill your project. You cannot take this on thinking that your data layer and code layer are not interconnected when you use NoSQL – it has a fundamentally different ‘code smell’ that often sits at odds with a SQL based approach. You need to think NoSQL as an architectural approach – which then provides you some wonderful benefits such as massive horizontal scalability, great performance, fast queries, built-in map/reduce, native JSON etc – but your code will be very different, your way of thinking about data will be very different and that means you take the journey from the beginning with your eyes open knowing that this will be the case, not just because one day you think you are going to need scale – otherwise I hope you have a bunch of very open minded software engineers.

  3. Theo said

    This post is more nuanced than the previous, but it would be much more interesting if you, instead of talking about “NoSQL” as one single idea with one single motivation (“scaling”) explained why the relational model is as superior as you seem to claim it is. Why is it a good thing to cram your object model into something that so obviously wasn’t designed for it, but premature optimization to use a tool that was?

    Assuming you are not faced with scaling problems already, I do agree that if your motivation for choosing one database solution over another is purely for its scaling properties then you are doing premature optimization. However, that this automatically means that you should use a relational database seems like a logical fallacy to me.

    Relational databases are good at tabular data, so if you have tabular data, use them. If you have a graph like object model, use a graph database. If your objects look like objects, use a document database. If you don’t have any hierarchy or relations, use a key-value store and gain some benefits from it’s simplicity. Model your data according to how you will use it, not how you will store it.

    Arguing that relational databases are always the right choice, as you seem to do seems to me equivalent to choosing how data will be stored on disk before you’ve started thinking about the problems you will have to solve to build your app. And that is, frankly, stupid.

  4. Theo said

    Btw, your argument that it’s not so hard to cram object into relational tables once you’ve done a few times it is just luddite. I mean, isn’t it surprising how few people write applications in assembler these days?

    • What’s surprising is that I and many others would continue not to feel the pain so much from ORM. The same can’t be said for assembler. Maybe that’s telling you there isn’t as much pain there as you thought? Seriously, I don’t remember a project where we were huddled around the problem of all the pain the ORM was causing us and that includes a project where we had to build our own mapper as part of the deliverable. There were always bigger fish to fry elsewhere.

      • Theo said

        I’m sure the assembler programmers would disagree with you, just as you disagree with me. “It’s always worked for us in the past” is just not a very convincing argument.

Leave a Reply

 
%d bloggers like this: