Web Browsing & Scraping Tools
Web Browsing and Scraping tools enable AI Agents to gather information from various online sources, including search engines, social media platforms, websites, and multimedia content. These tools provide capabilities for both broad web searches and targeted data extraction.
Available Tools
Tool Name | Platform/Category | Description |
---|---|---|
DuckDuckGo Search | Search Engine | Basic web search using DuckDuckGo |
Tavily Search | Search Engine | Advanced semantic search with AI-optimized results |
LinkedIn Profile Scraper | Social Media | Extracts detailed information from LinkedIn profiles |
Twitter Profile Scraper | Social Media | Collects data from Twitter/X user profiles |
Twitter Search Scraper | Social Media | Gathers tweets based on search terms |
Website Scraper | Web | Extracts text content from web pages |
Video Transcript Extractor | Multimedia | Retrieves transcripts from video/audio content |
Wikipedia Search | Knowledge Base | Searches and retrieves Wikipedia article content |
Tool Details
DuckDuckGo Search
Description: A basic web search tool that queries DuckDuckGo search engine to find relevant web pages and information.
System Tool ID: duckduckgo-search
Arguments:
Name | Required | Type | Description |
---|---|---|---|
maxResults | Optional (default: 5) | number | Maximum number of search results to return |
Tavily Search
Description: An advanced search tool that uses AI to provide more relevant and contextual search results.
System Tool ID: tavily-search
Arguments:
Name | Required | Type | Description |
---|---|---|---|
maxResults | Optional (default: 5) | number | Maximum number of search results to return |
LinkedIn Profile Scraper
Description: Extracts comprehensive information from LinkedIn profiles including work experience, education, skills, and other public profile data.
System Tool ID: linkedin_scrape_profiles_by_urls
Arguments:
Name | Required | Type | Description |
---|---|---|---|
profileUrls | Required | string[] | Array of LinkedIn profile URLs to scrape |
Twitter Profile Scraper
Description: Collects data from Twitter/X user profiles including tweets, profile information, and public metrics.
System Tool ID: twitter_scrape_by_handles
Arguments:
Name | Required | Type | Description |
---|---|---|---|
twitterHandles | Required | string[] | Array of Twitter handles to scrape |
start | Optional (default: ”) | string | Start date in YYYY-MM-DD format |
end | Optional (default: ”) | string | End date in YYYY-MM-DD format |
sort | Optional (default: ‘Top’) | enum: [‘Latest’, ‘Top’] | Sort order of tweets |
maxItems | Optional (default: ‘10’) | enum: [‘5’, ‘10’, ‘25’, ‘50’, ‘100’] | Maximum number of items to return |
Twitter Search Scraper
Description: Searches and extracts tweets based on specific search terms or keywords.
System Tool ID: twitter_scrape_by_search_terms
Arguments:
Name | Required | Type | Description |
---|---|---|---|
searchTerms | Required | string[] | Array of search terms |
start | Optional (default: ”) | string | Start date in YYYY-MM-DD format |
end | Optional (default: ”) | string | End date in YYYY-MM-DD format |
sort | Optional (default: ‘Top’) | enum: [‘Latest’, ‘Top’] | Sort order of tweets |
maxItems | Optional (default: ‘10’) | enum: [‘5’, ‘10’, ‘25’, ‘50’, ‘100’] | Maximum number of items to return |
Website Scraper
Description: Extracts visible text content from any webpage URL, making it ideal for gathering information from articles, blog posts, and other web content.
System Tool ID: scrape_web_text
Arguments:
Name | Required | Type | Description |
---|---|---|---|
url | Required | string | URL of the webpage to scrape |
Video Transcript Extractor
Description: Retrieves transcripts from various online video and audio content across different platforms.
System Tool ID: video_transcript
Arguments:
Name | Required | Type | Description |
---|---|---|---|
video_url | Required | string | URL of the video/audio content |
language | Optional | string | Language code for the transcript (e.g., ‘en’, ‘ru’) |
Wikipedia Search
Description: Searches Wikipedia articles and retrieves relevant content and information.
System Tool ID: wikipedia-query-run
Arguments:
Name | Required | Type | Description |
---|---|---|---|
topKResults | Optional (default: 3) | number | Number of top results to return |
maxDocContentLength | Optional (default: 4000) | number | Maximum content length per document |
When using search and scraping tools, be mindful of rate limits and usage quotas that may apply to specific services.