Bigtable-inspired open source projects take different routes to the highly scalable, highly flexible, distributed, wide column data store
Stories by Rick Grehan
Apache HBase describes itself as "the Hadoop database," which can be a bit confusing, as Hadoop is typically understood to refer to the popular MapReduce processing framework. But Hadoop is really an umbrella name for an entire ecosystem of technologies, some of which HBase uses to create a distributed, column-oriented database built on the same principles as Google's Bigtable. HBase does not use Hadoop's MapReduce capabilities directly, though HBase can integrate with Hadoop to serve as a source or destination of MapReduce jobs.
Apache Cassandra is a free, open source NoSQL database designed to manage very large data sets (think petabytes) across large clusters of commodity servers. Among many distinguishing features, Cassandra excels at scaling writes as well as reads, and its "master-less" architecture makes creating and expanding clusters relatively straightforward. For organizations seeking a data store that can support rapid and massive growth, Cassandra should be high on the list of options to consider.
MongoDB is certainly one of the most popular open source, document-oriented NoSQL databases. Developed and maintained by 10gen, MongoDB is available in both a free version and a paid-for enterprise version, which adds features such as Kerberos security, SNMP access, and live monitoring features. However, neither the free version nor the enterprise version comes with a management GUI.
MongoDB edges Couchbase Server with richer querying and indexing options, as well as superior ease-of-use
Visual Studio is no longer simply an IDE, no longer a place you go just to write and debug C/C++ code. It has long since become something of a development mashup. It's where you go to tackle any task in the development process, regardless of the target. It's where you head to do your LightSwitch development, your SQL Server development, your Web application development, your Windows Azure development, and your ASP.Net or Windows Forms development in C#, F#, VB.Net, and -- oh, yes -- good old Visual C++. Naturally, it's where you build applications for http://www.infoworld.com/category/tags/microsoft-windows-azure and Windows RT.
When you consider the cloud, you typically imagine a realm of deployed, production applications. Zend Developer Cloud (ZDC) adds a twist: ZDC creates a place <em>in</em> the cloud where PHP-based applications can be developed <em>for</em> the cloud. No more developing locally, then deploying into the cloud -- ZDC pushes both into the ether.
Amazon's Relational Database Service (RDS) creates a MySQL database server in the cloud.
Signifying a formless haze of computing power and storage that is somewhere "out there," computerdom's current buzzword is as difficult to get one's arms around as a real cloud. A seemingly limitless pool of processors and memory and disk space, and you just scoop out what you need. Sounds great, doesn't it?
Have you got a few hundred gigabytes of data that need processing? Perhaps a dump of radio telescope data that could use some combing through by a squad of processors running Fourier transforms? Or maybe you're convinced some statistical analysis will reveal a pattern hidden in several years of stock market information? Unfortunately, you don't happen to have a grid of distributed processors to run your application, much less the time to construct a parallel processing infrastructure.
The day of the mold-your-own OS has come, and Linux is the clay.
Linux is, among other things, a customizable operating system. Clever developers can craft a Linux whose kernel and packages are configured for a specific purpose, to serve as a sort of vertical-market operating system. The benefit to users is somewhat akin to walking into a hardware store. On the shelves are tools, each suited to a specific task. And it's particularly nice that all the tools are free.
Connecting your application into the Amazon Web Services (AWS) isn't complicated, particularly if you've done Web service programming on other projects.
Amazon's Web Services (AWS) are based on a simple concept: Amazon has built a globe-spanning hardware and software infrastructure that supports the company's Internet business, so why not modularize components of that infrastructure and rent them? It is akin to a large construction company in the business of building interstate highways hiring out its equipment and expertise for jobs such as putting in a side road, paving a supermarket parking lot, repairing a culvert, or just digging a backyard swimming pool.