WebProNews

Tag: diffbot

  • Diffbot Uses Robots To Extract Data From E-commerce Sites

    Diffbot Uses Robots To Extract Data From E-commerce Sites

    Diffbot announced today that it is relasing a new API that uses robots to understand and extract data from e-commerce sites.

    The robotics company, which uses vision, machine learning and artificial intelligence to analyze and extract data from web pages, appeared at the Bing-sponsored LAUNCH event last year, where it laid out is plans to make the entire web machine-readable. More on that here.

    The new API uses computer vision to turn any e-commerce site into a product database, the company says in an email.

    “Software developers can use the API to extract a variety of data from the page include product image, SKU code, price, shipping cost, discount price, MSRP, etc.,” a spokesperson for diffbot tells WebProNews. “The API can identify and structure information regardless of a site’s design, layout, markup or language.”

    Additionally, diffbot has developed a spider technology, which can analyze an entire site, skipping non-product pages, and extracting just the data from relevant page types.

    “Think about Target.com, or Wal-Mart.com, and being able to extract ALL of the product data from all of the product pages,” the spokesperson says.

    “E-commerce is one of the most popular activities on the web. With 28% of US internet users shopping on a daily basis, we figured we should teach our robot how to understand products,” said CEO Mike Tung. “The Product API represents our latest advances in pushing the capabilities of automated page extraction. We are one step closer to the imminent goal of making the entire web machine-readable.”

    Diffbot believes the entire web can be broken down into about twenty or so page types, such as home pages, article pages, product pages, location pages, social network pages, etc., and says will continue to roll out APIs for new page types until it has tools to index the entire Internet. It already has APIs for home pages, article pages and image pages.

    The company is backed by Earthlink founder Sky Dayton, who is part of the board.

  • Diffbot Makes The Web Machine-Readable

    Diffbot Makes The Web Machine-Readable

    Microsoft’s Bing service has a cool promotion vehicle known as Bing Booster. It helps tech startups connect and collaborate with other startups and people in their field. One of the events Bing helps sponsor is LAUNCH, an event that give startups a platform to launch from.

    One of the startups that Bing brought to LAUNCH this year is really interesting. The company is called diffbot, and they have a mission. They want to make the entire Web machine-readable. What does that mean? According to the official Web site, diffbot is a “visual learning robot that enables developers to easily use Web content in their apps.”

    So diffbot lets a robot read the Web regardless of layout, design or language. That’s pretty cool, but what kind of applications would it have for the Web. The diffbot team lays out its potential uses on the BingBooster Web site:

    Using Diffbot’s existing Article API (which automatically parses blog posts or news articles into machine- app-friendly XML or JSON) to migrate users’ blogs — regardless of existing platform — to a new blog-platform provider.

    Providing Diffbot’s forthcoming Product API (which can parse product pages of any type across the web) with additional training data or edge cases from existing product-search and price-comparison applications.
    Leveraging Diffbot’s language agnosticism to help power a multi-language tablet newsreader focused on global content and stories.

    Powering a forthcoming event and activity search engine using Diffbot’s impending events page functionality.

    Using our image-identification and extraction capabilities to power a personal photo-book printing service, allowing the inclusion of photos from any gallery software or photo-hosting services.

    The LAUNCH event was a great chance for the diffbot team to meet various people that were interested in using their API as well as helping them find new uses for their technology.

    The BingBooster Web site says this is just the first in a series of startups that showed off their product at LAUNCH. There will be more development technologies being shown during the coming days.

    Here’s a presentation of diffbot at the DEMO Enterprise Disruption 2012 from a few months ago: