TECH.BLORGE.com
VISTA.BLORGE.com
MAC.BLORGE.com
GAMER.BLORGE.com

March 17, 2009 |

Open source search tool finds new data mining role

By John Lister





Open source search tool finds new data mining roleAn open source tool designed to power search engines could be reborn as a commercial product for major online databases. New firm Cloudera plans to use the Hadoop system as the basis of its professional service packages.

Hadoop, named after a stuffed toy elephant belonging to the son of its creator Doug Cutting, came about as an open source alternative to MapReduce, a technology developed by Google. Both systems work by splitting the handling of large databases (such as Google’s index of the web) across hundreds or thousands of different machines.

The system means there are less problems when machines go down, as the remaining machines can pick up the load easily. This spreads the strain on servers and means the database can be powered by cheaper machines, as well as minimising the pressure on engineers to quickly rectify problems for fear of the entire system being affected.  These changes allowed Google much more power over the type of information it could figure out from the data it gathered from searches by users.

After Cutting used publicly available data on MapReduce to develop Hadoop, he was hired by Yahoo to develop the system to meet its needs; while Yahoo funded his work, it allowed him to keep the product as open source. The system now powers several elements of Yahoo’s services including customising homepages to individual users and matching adverts to news articles.

There are now many other firms using Hadoop, including Facebook which uses the system to analyse its database of photographs to work out which of your online friends you have the closest relationships with.

The new firm, Cloudera, is the work of four engineers from Google, Yahoo, Facebook and Oracle. They plan on distributing Hadoop-based database systems and then making their money through support and consulting services. One of the founders, Jeff Hammerbacher, told the New York Times that the aim is to help companies manage large databases more effectively and be able to get more useful information from them.

Related:

  • Open source census finally launches
  • Erase your search history with Ask.com’s AskEraser
  • British physicist shows off latest ‘Google killer’
  • Google lifts the lid on user data
  • SourceForge announces Open Source award winners




  • Sign up for the BLORGE daily email newsletter

    One Response to “Open source search tool finds new data mining role”

    1. trypu:

      all in one search engine

      http://trypu.com

    Leave a Reply:

    Copyright © 2008 Engaging and compelling blogs that entertain and inform