by Andrew Oram
American Reporter Correspondent
December 16, 2010
DEFINITIONS: CLOUDS, WEB SERVICES, AND OTHER REMOTE COMPUTING
CAMBRIDGE, Mass., Dec. 16, 2010 -- Technology commentators are a bit trapped by the term "cloud," which has been kicked and slapped around enough to become truly shapeless.
Time for confession: I stuck the term in this article's title because I thought it useful to attract readers' attention. But what else should I do? To run away from "cloud" and substitute any other term ("web services" is hardly more precise, nor is the phrase "remote computing" I use from time to time) just creates new confusions and ambiguities.
So in this section I'll offer a history of services that have led up to our cloud-obsessed era, hoping to help readers distinguish the impacts and trade-offs created by all the trends that lie in the "cloud."
The basic notion of cloud computing is simply this: one person uses a computer owned by another in some formal, contractual manner. The oldest precedent for cloud computing is timesharing, which was already popular in the 1960s. With timesharing, programmers could enter their programs on teletype machines and transmit them over modems and phone lines to central computer facilities that rented out CPU time in units of one-hundredth of a second.
Some sites also purchased storage space on racks of large magnetic tapes. The value of storing data remotely was to recover from flood, fire, or other catastrophe.
The two major, historic cloud services offered by the Amazon.com - Elastic Compute Cloud (EC2) and Simple Storage Service (S3) - are the descendants of timesharing and remote backup, respectively.
By EC2, complete computer systems is provided to clients, who can request any number of systems and dismiss them again when they are no longer needed. Pricing is quite flexible (even including an option for an online auction) but is essentially a combination of hourly rates and data transfer charges.
A storage system, S3, lets clients reserve as much or as little space as needed. Pricing reflects the amount of data stored and the amount of data transferred in and out of Amazon's storage. EC2 and S3 complement each other well, because EC2 provides processing but no persistent storage.
Timesharing and EC2-style services work a bit like renting a community garden. Just as community gardens let apartment dwellers without personal back yards grow fruits and vegetables, timesharing in the 1960s brought programming within reach of people who couldn't afford a few hundred thousand dollars to buy a computer. All the services discussed in this section provide hardware to people who run their own operations, and therefore are often called Infrastructure as a Service, or IaaS.
We can also trace back cloud computing in another direction as the commercially viable expression of grid computing, an idea developed through the first decade of the 2000s but whose implementations stayed among researchers. The term "grid" evokes regional systems for delivering electricity, which hide the origin of electricity so that I don't have to strike a deal with a particular coal-burning plant, but can simply plug in my computer and type away. Similarly, grid computing combined computing power from far-flung systems to carry out large tasks such as weather modeling.
These efforts were an extension of earlier cluster technology (computers plugged into local area networks), and effectively scattered the cluster geographically. Such efforts were also inspired by the well-known SETI@home program, an early example of Internet "crowdsourcing" that millions of people have downloaded to help process signals collected from telescopes.
Another form of infrastructure became part of modern life in the 1990s when it seemed like you needed your own Web site to be anybody. Internet providers greatly expanded their services, which used to involve bare connectivity and an email account. Now they also offer individualized Web sites and related services. Today you can find a wealth of different hosting services at different costs depending on whether you want a simple Web presence, a database, a full-featured content management system, and so forth.
These hosting services keep costs low by packing multiple users onto each computer. A tiny site serving up occasional files, such as my own praxagora.com, needs nothing that approaches the power of a whole computer system. Thanks to virtual hosting, I can use a sliver of a web server that dozens of other sites share and enjoy my web site for very little cost. But praxagora.com still looks and behaves like an independent, stand-alone web server. We'll see more such legerdemain as we explore virtualization and clouds further.
The next great breakthrough in remote computing was the concept of an Application Service Provider (denoted by the asp or aspx extension you see on many URLs these days). This article started with one contemporary example, Gmail. Computing services such as payroll processing had been outsourced for some time, but in the 1990s, the Web made it easy for a business to reach right into another organization's day-to-day practice, running programs on central computers, and offer interfaces to clients over the Internet. People used to filling out forms and proceeding from one screen to the next on a locally installed program could do the same on a browser with barely any change in behavior.
Using an Application Service Provider is a little like buying a house in the suburbs with a yard and garden, but hiring a service to maintain them. Just as the home-owner using a service doesn't have to get his hands dirty digging holes for plants, worry about the composition of the lime, or fix a broken lawnmower, companies who contract with Application Service Providers don't have to wrestle with libraries and DLL hell, rush to upgrade software when there's a security breach, or maintain a license server. All these logistics are on the site run by the service, hidden away from the user.
Early examples of Application Service Providers for everyday personal use include blogging sites such as blogger.com and wordpress.com. These sites offer web interfaces for everything from customizing the look of your pages to putting up new content (although advanced users have access to back doors for more complex configuration).
As broadband penetrated to more and more areas, web services became a viable business model for delivering software to individual users. First of all, broadband connections are "always on," in contrast to dial-up. Second, the HttpRequest extension allows browsers to fetch and update individual snippets of a web page, a practice that programmers popularized under the acronym AJAX.
Together, these innovations allow web applications to provide interfaces almost as fast and flexible as native applications running on your computer, and a new version of HTML takes the process even farther. The movement to the web is called Software as a Service or SaaS.
The pinned website feature introduced in Internet Explorer 9 encourages users to create menu items or icons representing web sites, making them as easy to launch as common applications on their computer. This feature is a sign of the shift of applications from the desktop to the Web.
very trend has its logical conclusion, even if it's farther than people are willing to go in reality. The logical conclusion of SaaS is a tiny computer with no local storage and no software except the minimal operating system and networking software to access servers that host the software to which users have access.
Such thin clients were already prominent in the work world before Web services became popular; they connected terminals made by companies such as Wyse with local servers over cables. (Naturally, Wyse has recently latched on to the cloud hype.) The Web equivalent of thin clients is mobile devices such as iPhones with data access, or Google Chrome OS, which Google is hoping will wean people away from popular software packages in favor of Web services like Google Docs.
Google is planning to release a netbook running Chrome OS in about six months. Ray Ozzie, chief software architect of Microsoft, also speaks of an upcoming reality of continuous cloud services delivered to thin appliances. The public hasn't followed the Web services revolution this far, though; most are still lugging laptops.
Most of the world's data is now in digital form, probably in some relational database such as Oracle, IBM's DB2, or MySQL. If the storage of the data is anything more formal than a spreadsheet on some clerical worker's PC (and a shameful amount of critical data is still on those PCs), it's probably already in a kind of cloud.
Database administrators know better than to rely on a single disk to preserve those millions upon millions of bytes, because tripping over an electric cable can lead to a disk crash and critical information loss. So they not only back up their data on tape or some other medium, but duplicate it on a series of servers in a strategy called replication. They often transmit data second by second over hundreds of miles of wire so that flood or fire can't lead to permanent loss.
Replication strategies can get extremely complex (for instance, code that inserts the "current time" can insert different values as the database programs on various servers execute it), and they are supplemented by complex caching strategies. Caches are necessary because public-facing systems should have the most commonly requested data - such as current pricing information for company products - loaded right into memory. An extra round-trip over the Internet for each item of data can leave users twiddling their thumbs in annoyance. Loading or "priming" these caches can take hours, because primary memories on computers are so large.
The use of backups and replication can be considered a kind of "private" cloud, and if a commercial service becomes competitive in reliability or cost, we can expect businesses to relax their grip and entrust their data to such a service.
We've seen how Amazon.com's S3 allowed people to store data on someone else's servers - with notably disastrous effects for the file-sharing site WikiLeaks, which relied on the Amazon cloud to host their files. When Amazon abruptly stopped doing so, apparently with very little warning, WikiLeaks had to scramble to find new servers to host their files amid worldwide interest in the purloined State Dept. classified diplomatic cables they were "leaking" at the time.
Amazon claimed the cyberattacks aimed at WikiLeaks slowed service to their outher cloud clients, while WikiLeaks charged Amazon had caved in to official and financial pressures. Who was right has not been resolved. But as a primary storage area, S3 isn't cost-effective. It's probably most valuable when used in tandem with an IaaS service such as EC2: you feed your data from the data cloud service into the compute cloud service.
Some people also use S3, or one of many other data storage services, as a backup to their local systems. Although it may be hard to get used to trusting some commercial service over a hard drive you can grasp in your hand, the service has some advantages. They are actually not as likely as you are to drop the hard drive on the floor and break it, or have it go up in smoke when a malfunctioning electrical system starts a fire.
But data in the cloud has a much more powerful potential. Instead of Software as a Service, a company can offer its data online for others to use.
Probably the first company to try this radical exposure of data was Amazon.com, who can also be credited for starting the cloud services mentioned earlier. Amazon.com released a service that let programmers retrieve data about its products, so that instead of having to visit dozens of web pages manually and view the data embedded in the text, someone could retrieve statistics within seconds.
Programmers loved this. Data is empowering, even if it's just sales from one vendor, and developers raced to use the application programming interface (API) to create all kinds of intriguing applications using data from Amazon. Effectively, they leave it up to Amazon to collect, verify, maintain, search through, and correctly serve up data on which their applications depend. Seen as an aspect of trust, web APIs are an amazing shift in the computer industry.
Amazon's API was a hack of the Web, which had been designed to exchange pages of information. Like many other Internet services, the Web's HTTP protocol offers a few basic commands: GET, PUT, POST, and DELETE. The API used the same HTTP protocol to get and put individual items of data. And because it used HTTP, it could easily be implemented in any language. Soon there were libraries of programming code in all popular languages to access services such as Amazon.com's data.
Another early adopter of Web APIs was Google. Because its Google Maps service exposed data in a program-friendly form, programmers started to build useful services on top of it. One famous example combined Google Maps with a service that published information on properties available for rent; users could quickly pull up a map showing where to rent a room in their chosen location. Such combinations of services were called mash-ups, with interesting cultural parallels to the practices of musicians and artists in the digital age who combine other people's work from many sources to create new works.
The principles of using the Web for such programs evolved over several years in the late 1990s, but the most popular technique was codified in a 2000 PhD thesis by HTTP designer Roy Thomas Fielding, who invented the now-famous term REST (standing for Representational State Transfer) to cover the conglomeration of practices for defining URLs and exchanging messages. Different services adhere to these principles to a greater or lesser extent. But any online service that wants to garner serious and sustained use now offers an API.
For programmers, SaaS has proven popular. In 1999, a company named VA Linux created a site called SourceForge with the classic SaaS goal of centralizing the administration of computer systems and taking that burden off programmers' hands. A programmer could upload his program there and, as is typical for free software and open source, accept code contributions from anyone else who chose to download the program.
At that time, VA Linux made its money selling computers that ran the GNU/Linux operating system. It set up SourceForge as a donation to the free software community, to facilitate the creation of more free software and therefore foster greater use of Linux. Eventually the hardware business dried up, so SourceForge became the center of the company's business: corporate history anticipated cloud computing history.
SourceForge became immensely popular, quickly coming to host hundreds of thousands of projects, some quite heavily used. It has also inspired numerous other hosting sites for programmers, such as Github. But these sites don't completely take the administrative hassle out of being a programmer. You still need to run development software - such as a compiler and debugger - on your own computer.
Google leapt up to the next level of programmer support with Google App Engine, a kind of programmer equivalent to Gmail or Google Docs. App Engine is a cocoon within which you can plant a software larva and carry it through to maturity. Like SaaS, the programmer does the coding, compilation, and debugging all on the App Engine site.
Also like SaaS, the completed program runs on the site and offers a web interface to the public. But in terms of power and flexibility, App Engine is like IaaS because the programmer can use it to offer any desired service. This new kind of development paradigm is called Platform as a Service or PaaS.
Microsoft offers both IaaS and PaaS in its Windows Azure project.
Hopefully you now see how various types of remote computing are alike, as well as different. We'll look next at what propels their triumph in the market.
Next: Why clouds and web services will continue to take over computing