SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size

Abstract: Recent research on deep neural networks has focused primarily on improving accuracy. For a given accuracy level, it is typically possible to identify multiple DNN architectures that achieve that accuracy level. With equivalent accuracy, smaller DNN architectures offer at least three advantages: (1) Smaller DNNs require less communication across servers during distributed training. (2) Smaller DNNs require less bandwidth to export a new model from the cloud to an autonomous car. (3) Smaller DNNs are more feasible to deploy on FPGAs and other hardware with limited memory. To provide all of these advantages, we propose a small DNN architecture called SqueezeNet. SqueezeNet achieves AlexNet-level accuracy on ImageNet with 50x fewer parameters. Additionally, with model compression techniques we are able to compress SqueezeNet to less than 0.5MB (510x smaller than AlexNet). The SqueezeNet architecture is available for download here: https://github.com/DeepScale/SqueezeNet

Preview

PDF Thumbnail

Synopsis

Overview

  • Keywords: SqueezeNet, CNN architecture, model compression, deep learning, ImageNet
  • Objective: Develop a compact CNN architecture that achieves AlexNet-level accuracy with significantly fewer parameters.
  • Hypothesis: A smaller CNN architecture can maintain competitive accuracy while offering advantages in deployment and training efficiency.
  • Innovation: Introduction of the Fire module and architectural strategies that reduce model size while preserving accuracy.

Background

  • Preliminary Theories:

    • Convolutional Neural Networks (CNNs): A class of deep learning models particularly effective for image classification tasks, relying on convolutional layers to extract features.
    • Model Compression: Techniques aimed at reducing the size of neural network models while maintaining their performance, crucial for deployment in resource-constrained environments.
    • Fire Module: A novel building block in SqueezeNet that combines squeeze and expand layers to optimize parameter efficiency.
    • Residual Connections: Techniques used in architectures like ResNet that allow gradients to flow through the network more effectively, improving training and accuracy.
  • Prior Research:

    • AlexNet (2012): Established a benchmark for image classification accuracy on ImageNet, with 60 million parameters and a model size of 240MB.
    • VGG (2014): Introduced deeper architectures with small filters, achieving better accuracy but with increased model size.
    • GoogLeNet (2014): Proposed Inception modules that use varying filter sizes, improving efficiency and accuracy.
    • Deep Compression (2015): Combined pruning, quantization, and Huffman coding to significantly reduce model sizes while maintaining accuracy.

Methodology

  • Key Ideas:

    • Fire Module Design: Each Fire module consists of a squeeze layer (1x1 convolutions) followed by an expand layer (mix of 1x1 and 3x3 convolutions), allowing for reduced parameters while maintaining expressive power.
    • Squeeze Ratio (SR): A hyperparameter defining the number of filters in squeeze layers relative to expand layers, set to 0.125 in SqueezeNet.
    • Bypass Connections: Introduced to alleviate bottlenecks in information flow, enhancing accuracy without increasing model size.
  • Experiments:

    • Evaluated SqueezeNet against AlexNet and other model compression techniques using the ImageNet dataset.
    • Conducted ablation studies on the impact of different architectural configurations, including variations in bypass connections and filter sizes.
    • Metrics included top-1 and top-5 accuracy, model size, and parameter count.
  • Implications: The design of SqueezeNet demonstrates that significant reductions in model size can be achieved without sacrificing accuracy, making it suitable for deployment in environments with limited computational resources.

Findings

  • Outcomes:

    • SqueezeNet achieved 57.5% top-1 accuracy and 80.3% top-5 accuracy with only 4.8MB model size, 50x fewer parameters than AlexNet.
    • Implementation of simple bypass connections improved accuracy to 60.4% top-1 and 82.5% top-5 without increasing model size.
    • Further compression techniques (Deep Compression) reduced model size to 0.47MB while maintaining accuracy.
  • Significance: SqueezeNet surpasses previous model compression efforts, demonstrating that smaller models can be both efficient and effective, challenging the belief that larger models are necessary for high accuracy.

  • Future Work: Exploration of additional architectural innovations, further optimization of the Fire module, and application of SqueezeNet in various domains such as mobile and embedded systems.

  • Potential Impact: Advancements in compact CNN architectures could lead to broader adoption in real-time applications, such as autonomous vehicles and mobile devices, where model size and computational efficiency are critical.

Notes

Meta

Published: 2016-02-24

Updated: 2025-08-27

URL: https://arxiv.org/abs/1602.07360v4

Authors: Forrest N. Iandola, Song Han, Matthew W. Moskewicz, Khalid Ashraf, William J. Dally, Kurt Keutzer

Citations: 6717

H Index: 265

Categories: cs.CV, cs.AI

Model: gpt-4o-mini