How transferable are features in deep neural networks?

Abstract: Many deep neural networks trained on natural images exhibit a curious phenomenon in common: on the first layer they learn features similar to Gabor filters and color blobs. Such first-layer features appear not to be specific to a particular dataset or task, but general in that they are applicable to many datasets and tasks. Features must eventually transition from general to specific by the last layer of the network, but this transition has not been studied extensively. In this paper we experimentally quantify the generality versus specificity of neurons in each layer of a deep convolutional neural network and report a few surprising results. Transferability is negatively affected by two distinct issues: (1) the specialization of higher layer neurons to their original task at the expense of performance on the target task, which was expected, and (2) optimization difficulties related to splitting networks between co-adapted neurons, which was not expected. In an example network trained on ImageNet, we demonstrate that either of these two issues may dominate, depending on whether features are transferred from the bottom, middle, or top of the network. We also document that the transferability of features decreases as the distance between the base task and target task increases, but that transferring features even from distant tasks can be better than using random features. A final surprising result is that initializing a network with transferred features from almost any number of layers can produce a boost to generalization that lingers even after fine-tuning to the target dataset.

Preview

PDF Thumbnail

Synopsis

Overview

  • Keywords: Transfer Learning, Deep Neural Networks, Feature Transferability, Convolutional Networks, Generalization
  • Objective: Quantify the generality versus specificity of features learned in deep neural networks across different layers.
  • Hypothesis: The transferability of features in deep neural networks varies by layer, with lower layers being more general and higher layers more specific.

Background

  • Preliminary Theories:

    • Transfer Learning: The process of using a pre-trained model on a new task, leveraging learned features to improve performance on smaller datasets.
    • Feature Generality vs. Specificity: Early layers in neural networks often learn general features (e.g., edges, textures), while later layers learn task-specific features.
    • Co-adaptation: The phenomenon where neurons in adjacent layers become interdependent during training, affecting the transferability of features.
  • Prior Research:

    • 2012: Krizhevsky et al. demonstrated the effectiveness of deep convolutional networks on the ImageNet dataset, sparking interest in feature transfer.
    • 2013: Donahue et al. and Zeiler & Fergus explored feature transfer from higher layers, suggesting these layers contain more generalizable features.
    • 2014: Sermanet et al. and others confirmed the utility of transferred features for various tasks, emphasizing the importance of understanding feature transferability.

Methodology

  • Key Ideas:

    • Layer-wise Transferability Assessment: The study involves systematically evaluating how well features from each layer of a convolutional network trained on ImageNet transfer to other tasks.
    • Network Splitting: Networks are split at various layers to analyze the performance of transferred features versus randomly initialized weights.
    • Fine-tuning vs. Freezing: Two approaches are compared: fine-tuning transferred features and keeping them frozen during training on the target task.
  • Experiments:

    • Random A/B Splits: Networks trained on two random subsets of ImageNet to evaluate transferability between similar tasks.
    • Dissimilar Datasets: A man-made versus natural class split to assess how task similarity affects feature transferability.
    • Random Weights Comparison: Evaluating the performance of networks initialized with random weights against those using transferred features.
  • Implications: The design allows for a nuanced understanding of how different layers contribute to feature transferability, informing best practices in transfer learning.

Findings

  • Outcomes:

    • Lower layers exhibit high transferability, while higher layers show significant performance drops due to specificity and co-adaptation issues.
    • The transferability gap increases with task dissimilarity, but even distant task features outperform random weights.
    • Transferring features from any layer can enhance generalization, even after fine-tuning, suggesting a lasting impact of the initial training.
  • Significance: The research challenges previous assumptions about the strict generality of lower layers and specificity of higher layers, revealing a more complex interplay influenced by co-adaptation and task similarity.

  • Future Work: Investigating the mechanisms behind co-adaptation and exploring methods to mitigate its effects could enhance feature transferability.

  • Potential Impact: Improved understanding of feature transferability could lead to more effective transfer learning strategies, particularly in domains with limited data availability.

Notes