Summary Large Language Models (LLMs) are powerful, but their size can lead to slow inference speeds and high memory consumption, hindering real-world deployment. Quantization, a technique that reduces the precision of model weights, offers a powerful solution. This post will explore how to use quantization techniques like bitsandbytes, AutoGPTQ, and AutoRound to dramatically improve LLM inference performance.
What is Quantization? Quantization reduces the computational and storage demands of a model by representing its weights with lower-precision data types.
Summary This post kicks off a series of three where we’ll build, extend, and use the open-source DeepResearch application inspired by the Hugging Face blog post. In this first part, we’ll focus on creating an arXiv search tool that can be used with SmolAgents.
DeepResearch aims to empower research by providing tools that automate and streamline the process of discovering and managing academic papers. This series will demonstrate how to build such tools, starting with a powerful arXiv search tool.
Summary This post provides a practical guide to building common neural network architectures using PyTorch. We’ll explore feedforward networks, convolutional neural networks (CNNs), recurrent neural networks (RNNs), LSTMs, transformers, autoencoders, and GANs, along with code examples and explanations.
1. Understanding PyTorch’s Neural Network Module PyTorch provides the torch.nn module to build neural networks. It provides classes for defining layers, activation functions, and loss functions, making it easy to create and manage complex network architectures in a structured way.