Diffbot

Integrate Diffbot with your AI workspace

Diffbot provides AI-powered tools to extract and structure data from web pages, transforming unstructured web content into structured, linked data.

Explore Triggers and Actions

Combine Entity Profiles

Combine multiple entity profiles into a unified view using the Diffbot Knowledge Graph. Returns enhanced person or organization data by matching on identifying attributes like name, email, employer, or URL. Use this to enrich partial entity data, merge duplicate profiles, or verify entity identity.

ActionTry it

Create Bulk Enhance Job

Tool to submit a bulk enhance job to enrich multiple entities asynchronously. Use when you need to process many Person or Organization records in batch. The API accepts entity descriptions and returns enriched data from the Diffbot Knowledge Graph.

ActionTry it

Create Bulk Extract Job

Tool to submit a bulk extract job to process multiple URLs with Extract APIs. Use when you need to process many URLs asynchronously using any Extract API. The job will process URLs in the background and provide downloadable results.

ActionTry it

Create or Update Custom API

Tool to create or update the parameters and ruleset of a Custom API. Use this when you need to define custom extraction rules for specific websites that require tailored parsing logic beyond standard Diffbot APIs. Allows defining URL patterns, CSS selectors, extraction rules, and preprocessing filters to extract structured data from websites with unique layouts.

ActionTry it

Delete Custom API

Tool to delete custom API definitions for a given URL pattern. Removes custom extraction rules from your account. Use when you need to remove previously configured custom APIs.

ActionTry it

Delete KG Enhance Bulkjob

Tool to delete an Enhance Bulkjob. Removes the bulk job and its results from the system. Use when cleaning up completed or failed jobs.

ActionTry it

Diffbot Analyze

Automatically analyzes a web page to determine its type and extract structured data. The Analyze API intelligently classifies pages into types (article, product, discussion, image, video, organization, etc.) and extracts relevant structured data. Use this when you need to process URLs of unknown type or want automatic extraction without specifying the page type in advance.

ActionTry it

Diffbot Extract Job

Tool to extract structured job posting data from job listing pages. Returns job title, company, location, salary, requirements, skills, and other job-related information. Use when you need to parse and structure data from job postings.

ActionTry it

Diffbot Extract List

Tool to extract structured data from list-style pages like news indexes, product listings, and directory pages. Returns an array of items with their titles, links, and descriptions. Use when you need to extract multiple items from a page organized as a list or index.

ActionTry it

Diffbot Get Event

Tool to extract event details from web pages. Use when you need structured event data such as venue, date, and description.

ActionTry it

Diffbot Get Image

Tool to extract detailed information about images, including dimensions and recognition data. Use after confirming the image URL is publicly accessible.

ActionTry it

Diffbot Get Product

Tool to extract product information such as specifications, prices, availability, and reviews. Use when you need structured product data including specs, pricing, and reviews.

ActionTry it

Diffbot Knowledge Graph Search

Search the Diffbot Knowledge Graph using DQL (Diffbot Query Language). Query billions of entities including organizations, people, articles, products, and more. Use structured queries to filter by type, fields, and relationships.

ActionTry it

Download Bulk Job Results

Tool to download results of a bulk enhance job with filtering options via POST request. Use this to retrieve processed results from a completed or running bulk job. Supports multiple export formats (json, jsonl, csv, xls, xlsx) and various filtering options to customize the output. HTTP 200 indicates results are ready, HTTP 201 means the job is still executing.

ActionTry it

Enhance Entity with Knowledge Graph

Enrich a person or organization with comprehensive data from the Diffbot Knowledge Graph. Provide identifiers like name, email, employer, or URL and receive detailed entity information including employment history, education, location, skills, and more. Use when you need to gather all publicly available knowledge about a specific person or organization from billions of web pages.

ActionTry it

Get Article Data

Tool to extract information from articles, including authors, publication dates, and images. Use when you need structured metadata from a web article URL.

ActionTry it

Get Bulk Job Data

Tool to download extracted results from a completed bulk job. Use after a bulk job has finished processing to retrieve the data. Supports JSON and CSV formats.

ActionTry it

Get Bulk Job Results

Tool to download the results of a completed Enhance Bulkjob. Returns enriched records from the bulk job. Use after a bulk enhance job has completed processing.

ActionTry it

Get Bulk Job Status

Tool to poll the status of a specific Diffbot Knowledge Graph Enhance bulk job. Use when you need to check the progress, completion status, or details of a bulk enhancement job.

ActionTry it

Get Bulk Single Result

Tool to download the result of a single job within a Diffbot bulk enhance job. Returns enriched entity data for a specific input record by its index. Use after a bulk enhance job has completed to retrieve individual results without downloading the entire dataset.

ActionTry it

Get Crawl Data

Download extracted results from a completed crawl job. Returns all structured data extracted during crawl processing (articles, products, etc.). Use after a crawl job has completed to retrieve the collected data.

ActionTry it

Get Diffbot Account Details

Retrieves comprehensive Diffbot account information including subscription plan details, credit balance, usage history, and account status. Returns account holder name, email, current plan, available credits, and daily usage statistics for the past 31 days. Use this to check your account's credit balance, monitor API usage patterns, verify account status, or retrieve account metadata.

ActionTry it

Get Discussion Thread

Extract structured discussion threads from web pages including forums, comment sections, product reviews, Reddit discussions, and blog comments. Returns posts with author info, timestamps, content, and hierarchical relationships. Useful for analyzing conversations, gathering feedback, or monitoring discussions. Supported platforms: Native comment systems, Disqus, Facebook Comments, Reddit, forum software, and more. Use this when you need to: - Extract all comments/posts from a discussion thread - Analyze user feedback or reviews - Monitor forum discussions or social media threads - Gather structured conversation data with metadata

ActionTry it

Get KG Coverage Report by ID

Download Knowledge Graph coverage report by report ID. Returns detailed CSV coverage statistics showing field presence across query results. Use this after generating a coverage report from a DQL query to retrieve the statistical breakdown of field coverage.

ActionTry it

Get Video Data

Tool to extract information from videos, including titles, descriptions, and embedded HTML. Use when you need structured video metadata from any web page.

ActionTry it

List Bulk Jobs

Tool to list all Bulk jobs associated with a specific token. Use after authenticating to retrieve statuses of all jobs for the account.

ActionTry it

List Bulk Jobs Status For Token

Tool to get the status of all bulk enhance jobs for a token. Returns list of all bulk jobs associated with your API token. Use when you need to monitor or retrieve the status of multiple bulk jobs at once.

ActionTry it

List Custom APIs

Tool to retrieve all Custom APIs and their extraction rules currently defined on your Diffbot token. Use when you need to list, review, or audit custom API configurations for your account.

ActionTry it

Manage Crawl Job

Manages Diffbot crawl jobs: pause, restart, delete, or view status. Returns list of all active crawl jobs when called without parameters. Use 'name' parameter with action flags (pause=1, restart=1, delete=1) to control specific jobs.

ActionTry it

Resolve Lost ID

Tool to resolve lost IDs in the Knowledge Graph. Use when you need to map a lost identifier to its canonical counterpart for data consistency.

ActionTry it

Search Crawl Job Data

Tool to query crawl job collections using DQL (Diffbot Query Language). Use when you need to search extracted data from completed crawl or bulk jobs by collection name.

ActionTry it

Start Bulk Job

Tool to start a Bulk Extract job. Use when processing large numbers of URLs asynchronously. The Diffbot Bulk API uses GET requests with query parameters to create jobs.

ActionTry it

Start Crawl Job

Initiates a Diffbot crawl job that spiders a website starting from seed URLs and processes discovered pages with a specified Extract API. The crawler follows links within the domain, collects structured data (articles, products, etc.), and stores results for download. Use this to systematically extract data from entire websites or sections. Requires Diffbot Plus plan or higher.

ActionTry it

Stop Bulk Job

Tool to pause (stop) a running Bulk job. Pausing halts further processing of URLs while preserving existing progress. To resume, use the appropriate resume action. Specify the exact job name (case-sensitive) as provided when the job was created.

ActionTry it

Stop KG Bulk Job By ID

Tool to stop an active Knowledge Graph Enhance bulk job by its ID. Halts processing of a running KG bulk job immediately. Use when you need to stop a specific KG bulk job using its bulkjobId.

ActionTry it
Diffbot integration | Dench