Open source search tool finds new data mining role
By John Lister
An open source tool designed to power search engines could be reborn as a commercial product for major online databases. New firm Cloudera plans to use the Hadoop system as the basis of its professional service packages.
Hadoop, named after a stuffed toy elephant belonging to the son of its creator Doug Cutting, came about as an open source alternative to MapReduce, a technology developed by Google. Both systems work by splitting the handling of large databases (such as Google’s index of the web) across hundreds or thousands of different machines.
The system means there are less problems when machines go down, as the remaining machines can pick up the load easily. This spreads the strain on servers and means the database can be powered by cheaper machines, as well as minimising the pressure on engineers to quickly rectify problems for fear of the entire system being affected. These changes allowed Google much more power over the type of information it could figure out from the data it gathered from searches by users.
After Cutting used publicly available data on MapReduce to develop Hadoop, he was hired by Yahoo to develop the system to meet its needs; while Yahoo funded his work, it allowed him to keep the product as open source. The system now powers several elements of Yahoo’s services including customising homepages to individual users and matching adverts to news articles.
There are now many other firms using Hadoop, including Facebook which uses the system to analyse its database of photographs to work out which of your online friends you have the closest relationships with.
The new firm, Cloudera, is the work of four engineers from Google, Yahoo, Facebook and Oracle. They plan on distributing Hadoop-based database systems and then making their money through support and consulting services. One of the founders, Jeff Hammerbacher, told the New York Times that the aim is to help companies manage large databases more effectively and be able to get more useful information from them.

Related:





Stumble It!

March 18th, 2009
all in one search engine
http://trypu.com