Lightweight Pixel Difference Networks for Efficient Visual Representation Learning
Su, Zhuo; Zhang, Jiehua; Wang, Longguang; Zhang, Hua; Liu, Zhen; Pietikäinen, Matti; Liu, Li (2023-08-01)
Su, Zhuo
Zhang, Jiehua
Wang, Longguang
Zhang, Hua
Liu, Zhen
Pietikäinen, Matti
Liu, Li
IEEE
01.08.2023
Z. Su et al., "Lightweight Pixel Difference Networks for Efficient Visual Representation Learning," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 12, pp. 14956-14974, Dec. 2023, doi: 10.1109/TPAMI.2023.3300513
https://rightsstatements.org/vocab/InC/1.0/
© 2023 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
https://rightsstatements.org/vocab/InC/1.0/
© 2023 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
https://rightsstatements.org/vocab/InC/1.0/
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:oulu-202311213264
https://urn.fi/URN:NBN:fi:oulu-202311213264
Tiivistelmä
Abstract
Recently, there have been tremendous efforts in developing lightweight Deep Neural Networks (DNNs) with satisfactory accuracy, which can enable the ubiquitous deployment of DNNs in edge devices. The core challenge of developing compact and efficient DNNs lies in how to balance the competing goals of achieving high accuracy and high efficiency. In this paper we propose two novel types of convolutions, dubbed Pixel Difference Convolution (PDC) and Binary PDC (Bi-PDC) which enjoy the following benefits: capturing higher-order local differential information, computationally efficient, and able to be integrated with existing DNNs. With PDC and Bi-PDC, we further present two lightweight deep networks named Pixel Difference Networks (PiDiNet) and Binary PiDiNet (Bi-PiDiNet) respectively to learn highly efficient yet more accurate representations for visual tasks including edge detection and object recognition. Extensive experiments on popular datasets (BSDS500, ImageNet, LFW, YTF, etc.) show that PiDiNet and Bi-PiDiNet achieve the best accuracy-efficiency trade-off. For edge detection, PiDiNet is the first network that can be trained without ImageNet, and can achieve the human-level performance on BSDS500 at 100 FPS and with < 1 M parameters. For object recognition, among existing Binary DNNs, Bi-PiDiNet achieves the best accuracy and a nearly 2× reduction of computational cost on ResNet18.
Recently, there have been tremendous efforts in developing lightweight Deep Neural Networks (DNNs) with satisfactory accuracy, which can enable the ubiquitous deployment of DNNs in edge devices. The core challenge of developing compact and efficient DNNs lies in how to balance the competing goals of achieving high accuracy and high efficiency. In this paper we propose two novel types of convolutions, dubbed Pixel Difference Convolution (PDC) and Binary PDC (Bi-PDC) which enjoy the following benefits: capturing higher-order local differential information, computationally efficient, and able to be integrated with existing DNNs. With PDC and Bi-PDC, we further present two lightweight deep networks named Pixel Difference Networks (PiDiNet) and Binary PiDiNet (Bi-PiDiNet) respectively to learn highly efficient yet more accurate representations for visual tasks including edge detection and object recognition. Extensive experiments on popular datasets (BSDS500, ImageNet, LFW, YTF, etc.) show that PiDiNet and Bi-PiDiNet achieve the best accuracy-efficiency trade-off. For edge detection, PiDiNet is the first network that can be trained without ImageNet, and can achieve the human-level performance on BSDS500 at 100 FPS and with < 1 M parameters. For object recognition, among existing Binary DNNs, Bi-PiDiNet achieves the best accuracy and a nearly 2× reduction of computational cost on ResNet18.
Kokoelmat
- Avoin saatavuus [34546]