smolagents

Summary This post details building a robust web data pipeline using SmolAgents. We’ll create tools to retrieve content from various web endpoints, convert it to a consistent format (Markdown), store it efficiently, and then evaluate its relevance and quality using Large Language Models (LLMs). This pipeline is crucial for building a knowledge base for LLM applications. Web Data Convertor (MarkdownConverter) We leverage the MarkdownConverter class, inspired by the one in autogen, to handle the diverse formats encountered on the web.

Summary This post kicks off a series of three where we’ll build, extend, and use the open-source DeepResearch application inspired by the Hugging Face blog post. In this first part, we’ll focus on creating an arXiv search tool that can be used with SmolAgents. DeepResearch aims to empower research by providing tools that automate and streamline the process of discovering and managing academic papers. This series will demonstrate how to build such tools, starting with a powerful arXiv search tool.

smolagents

DeepResearch Part 3: Getting the best web data for your research

DeepResearch Part 1: Building an arXiv Search Tool with SmolAgents

DeepResearch Part 1: Building an arXiv Search Tool with SmolAgents