Abstract—Recent research has focused on Deep Neural Networks (DNNs) implemented directly in hardware. However, larger DNNs require significant energy and area, thereby limiting their wide adoption. We propose a novel DNN quantization technique and a corresponding hardware solution, CompactNet that optimizes the use of hardware resources even further, through dynamic allocation of memory for each parameter. Experimental results for the MNIST and CIFAR-10 datasets, show that CompactNet reduces the memory requirement by over 80%, the energy requirement by 12-fold, and the area requirement by 7-fold, when compared to the conventional DNN. This is achieved with minimal degradation to the classification accuracy. We demonstrate that, CompactNet provides pareto-optimal designs to make trade-offs between accuracy and resource requirement. The applications of CompactNet can be extended to datasets like ImageNet, and into models like MobileNet.

Index Terms—approximate computing, deep neural networks, low power design, asic

I. INTRODUCTION

The tremendous increases in computational power have enabled Deep Neural Networks (DNNs) to set new standards of performance for numerous tasks that include, object recognition, speech processing, and automatic image captioning [1]. New technologies incorporated in virtual assistants [2], driverless cars [3], and indoor navigation systems [4][5] also use DNNs for prediction and classification. In order to achieve a high classification accuracy, the size of the DNN, along with energy, and memory required is large [6]. Hence, most practical applications for DNNs are trained on large server clusters or GPUs [7]. Now, with machine learning algorithms being deployed on mobile and embedded devices, frequent access to off-chip DRAM for data results in inordinate amounts of energy consumption, which is not feasible for a device designed for low-power consumption and a long battery life [8].

To meet the latency constraints for implementing DNNs for real-time responses, even the fastest CPUs and GPUs are insufficient [3][9]. However, when DNNs are implemented on application-specific integrated circuits (ASICs), DNNs can take full advantage of the hardware customizability and minute requirement for control logic to eliminate redundancies that enables faster run time than software. Google’s Tensor Processing Unit (TPU) proposed in [9] is an example of dedicated hardware built for DNN implementation that can replace CPUs and GPUs.

Overcoming the need for the large floating-point/fixed-point Multiply-Accumulate (MAC) units, and the associated memory requirement for storing floating-point/fixed-point values is critical for hardware solutions to become more commonplace [10]. CompactNet introduces a new quantization technique, and an updated training algorithm that limits the memory requirement for each parameter (weights and biases of the DNN). Each parameter is represented with only 4 or 8 bits, depending on its relevance to the output of the DNN. The inputs, outputs, and activations are limited to a 8-bit fixed-point format to reduce the complexity of the arithmetic computation units on hardware. The amalgamation of these techniques ensures that the hardware costs are significantly reduced.

The rest of the paper is organized as follows. Related prior work in the area is briefly summarized in Section II. In Section III, we contrast the existing standard of lightweight DNN implementations with CompactNet and explain its characteristics and the training process. Section IV describes the experiments conducted and analyzes the results in detail. This paper is concluded in Section V and a summary is provided.

II. RELATED WORK

The first part of this section talks about different techniques to reduce data communication, and computations. The second part is an overview of existing lightweight DNN implementations.

To address the challenges associated with custom hardware implementation of a DNN, prior research has primarily concentrated on individually reducing data communication or the computational complexity. Through the work in [11], the authors use hashing to exploit the redundancies in DNNs to limit frequent access to memory. ApproxANN [12] characterizes the impact of neurons based on their contribution to the output and uses approximation in certain less-critical neurons. The work in [13] shows that reducing the word size has a negligible impact on the accuracy of the system, but reduces memory access drastically. Work in [14] optimizes the number of memory accesses by including cost penalties for the utilization hardware resources. Different methods to reduce the energy consumption by proposing approximate

CompactNet: High Accuracy Deep Neural Network Optimized for On-Chip Implementation

Abhinav Goel
Electrical and Computer Engineering
Purdue University
West Lafayette, IN, USA
goel39@purdue.edu

Zeye Liu
Electrical and Computer Engineering
Carnegie Mellon University
Pittsburgh, PA, USA
zeyel@andrew.cmu.edu

Ronald D. Blanton
Electrical and Computer Engineering
Carnegie Mellon University
Pittsburgh, PA, USA
rblanton@andrew.cmu.edu

© 2018 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
arithmetic circuits for DNNs that can tolerate a certain degree of inaccuracy are proposed in [15][16]. These works mainly address issues related to memory, computation, energy, or area requirements individually, but do not delve deep making DNNs easier to implement holistically.

Compared to a conventional DNN, a lightweight DNN has less storage, simpler computation logic, faster inference speeds and lower energy consumption. One popular approach to obtain a lightweight DNN complexity is to quantize the floating-point parameters and activations into discrete levels. Binarized neural networks (BNNs) [17]-[19] constrain the parameters to 1 or -1, implying each parameter is only a single bit. Similarly, XNOR-Net [20] constrains parameters, inputs, and activations to 1 or -1, meaning that only XNOR operators are required in each neuron. In Trained Ternary Quantization (TTQ) [21], the parameters are constrained to zero, a positive level, or a negative level. However, the large number of neurons and layers required by BNNs, XNOR-Net, and TTQ to achieve the desired accuracy, makes them unsuitable for most practical applications. Deep Compression [22] constrains the parameters to several quantized levels with linear or density-based quantization to save storage. DoReFa-Net [23] uses fixed-point parameters, activations and gradients with fewer than 32 bits. LightNN [24] constrains the parameters to be either a power of two or a sum of powers of two, thus reducing the storage, and replacing the multiplications with either a single shift or combination of shifts and adds. Overall, these quantized DNNs constrain the parameters from the $\mathbb{R}^d$ to $\mathbb{A}^b$, where $d$ and $b$ are the number of parameters, and $b < d$, $\mathbb{R}$ is the set of real numbers, and $\mathbb{A}$ is the constrained space ($\mathbb{A} \subset \mathbb{R}$). Methods presented in [25] provide a platform to quantize a trained DNN through a retraining algorithm, while CompactNet discusses a new algorithm, which trains a hardware-friendly network from scratch.

The quantization method used in CompactNet furthers the gains of prior work by ensuring the classification accuracy is maintained to a great extent, with a new approach to the training process. The novel dynamic memory quantization algorithm used in CompactNet to reduce the off-chip memory accesses and computational complexity can be employed to improve the latency, energy and area requirements. This algorithm dynamically selects a custom bit-sequence for the parameter, based on its relevance to the output of the DNN.

Our contributions through CompactNet include:

1) Most importantly, dynamic quantization for each parameter depending on its relevance to the output.

2) The use of custom 4 and 8-bit sequences to represent the parameters. This provides an advantage in terms of circuit complexity, and memory requirement. This is different from prior methods, which require a selection of a bit-sequence before the network is trained [24].

3) A DNN hardware solution that furthers the gains of prior work [13][24], by using a customized MAC unit that replaces all the multipliers with shift-and-add circuit elements. To further improve performance, an 8-bit fixed-point format is used to represent the activations. The CompactNet model, would blend in seamlessly with high-speed, efficient architectures like MobileNet [26], and SqueezeNet [27].

III. NEURAL NETWORK IMPLEMENTATIONS

A. CompactNet Architecture

Each neuron of a DNN is a non-linear function of a weighted sum of inputs. Conventional DNNs predominantly work with single precision floating-point parameters and activations for high precision and accuracy. However, working with floating-point requires significant computation for performing basic operations. The storage requirement is also commensurately large. By using a fixed-point representation, the storage requirement for a DNN can be significantly reduced. The self-correcting nature of DNNs offsets the approximation errors and ensures similar classification accuracy as the floating-point representation [13]. An extension of fixed-point as described in [13] is a form of dynamic fixed-point. Dynamic fixed-point uses small word sizes to represent the insignificant values in a DNN, leading to large savings in the total storage. However, the complexity of the basic operators is considerably higher because of the need for an extra decoder circuit to account for operands of several different sizes.

CompactNet uses an 8-bit fixed-point format for representing the inputs and the outputs of the DNN, and the activations of every neuron. This is done to avoid the computational complexity associated with arithmetic of floating-point values. At the end of every forward-pass each parameter is then quantized, and approximated to the nearest power of two, or to the nearest a sum of powers of two dynamically (depending on the relevance of the parameter to the output of the DNN). By doing so, it is possible to replace all the bulky multiplier units with shift-and-add circuit elements, thus improving area utilization.

The appropriate word size for each parameter is assigned depending on its relevance to the output of the DNN. This dynamic selection of 4-bit or 8-bit word sizes for the quantized parameters, coupled with the usage of a very minimal 8-bit fixed-point format to represent the inputs, outputs, and activations is a novel approach to reduce hardware costs. CompactNet uses a technique called k-ones approximation [24] through which each parameter is represented as

$$\text{parameter} = \text{sign}(\text{parameter}) \cdot (2^{n_1} + 2^{n_2} + \cdots + 2^{n_k}) \cdot \text{input}_L$$

The output of the layer $L - 1$ (input of the layer $L$) and the parameters, in each neuron, can be seen in equation (1), where, $\text{value} \ll n$ corresponds to an n-bit left-shift to the value. Similarly, a right shift can also be used for negative values of $n$. From prior research, it is clear that $k = 2$ is sufficient for high classification accuracy [24].

$$\text{out}_L = \text{parameter} \cdot \text{input}_L$$

$$= \text{sign}(\text{parameter}) \cdot (2^{n_1} + 2^{n_2} + \cdots + 2^{n_k}) \cdot \text{input}_L$$

$$= \text{sign}(\text{parameter}) \cdot (\text{input}_L \ll n_1 + \text{input}_L \ll n_2 + \cdots + \text{input}_L \ll n_k) \quad (1)$$

4724
Through the novelty of CompactNet we show that, $k = 1$ can be used along with $k = 2$, with the 8-bit fixed-point operations. This allows us to achieve a higher classification accuracy, while improving the DNN’s performance on hardware. This is an improvement on prior work, where only either $k = 1$ or $k = 2$ is used, and the selection of $k$ has to be made before the network is trained. Equations (2) and (3) represent the values of the parameters after performing $k$-ones approximation, for $k = 1$ and $k = 2$, respectively.

\[
\text{parameter}_{k=1} = 2^n 
\]

\[
\text{parameter}_{k=2} = 2^{n_1} + 2^{n_2}
\]

### B. Proposed Quantization

All the parameters in CompactNet are quantized to two levels to reduce the memory requirement. The relevance of a parameter to the output, is quantified by a term we call the Impact of Approximation (IoA). The IoA, described in equation (4), measures the normalized difference between the highest degree of approximation (using, $k = 1$) and the actual parameter.

\[
\text{IoA} = \left| \frac{\text{parameter}_{k=1} - \text{parameter}}{\text{parameter}} \right|
\]

As seen in equation (5), if the IoA for a parameter lies within a predefined threshold, $\phi$, then $k = 1$ is selected to represent the parameter sufficiently. Otherwise, $k = 2$ is selected to give a closer approximation. This is done to set an upper bound for the error in the approximated parameter, when $k = 1$ is used. The threshold value is selected depending on the size of the DNN and the dataset used, empirically. A larger threshold value will lead to a greater fraction of parameters being represented with $k = 1$, which leads to lower memory requirements at the expense of classification accuracy.

\[
k = \begin{cases} 
1, & \text{IoA} \leq \phi \\
2, & \text{otherwise}
\end{cases}
\]

Each quantized parameter is stored using just four and eight bits for $k = 1$ and $k = 2$, respectively. This constrains the number of shifts ($n$ in equation (2), $n_1$ and $n_2$ in equation (3)) to a maximum of three shifts in either direction. Thus, the reasons for using IoA are two-fold:

1) The parameters with large values, contribute more significantly to the output of the DNN. IoA ensures large values can be represented with more precision, and need not be scaled down, or scaled up to the nearest power of two.

2) From prior work [22], we can infer that a significant percentage of the parameters, lie within the range $[-0.01, 0.01]$. These values get significantly misrepresented, when they are approximated with $k = 1$, which is undesirable.

The use of only 4 and 8-bit representations ensures there is no need for a complex decoder circuit, which is generally associated with architectures that have large variations in operand sizes [13]. The 4-bit encoding format includes a Sign bit: to signify the polarity of the parameter, a Shift bit: to indicate an integer parameter (left shift) or a fractional parameter (right shift), and the remaining bits indicate the number of shifts required on the activation so as to obtain a similar result as the multiplication: $\text{activation}_{L-1} \times \text{parameter}_L$. The 8-bit encoding format is the arithmetic sum of two 4-bit encoding formats, corresponding to $k = 2$. The lower nibble of the 8-bit format is the difference between the true value and the approximation made in the upper nibble. This encoding optimizes the memory utilization, as all the parameters stored in memory are reduced from 32-bit floating point to just 4 or 8 bits.

Table 1 contains examples of the encoding scheme using $k$-ones approximation. Each example is encoded using $k = 1$ and $k = 2$, for illustration. For the case when $k = 1$ is used, the examples are encoded directly with the aforementioned encoding format. As the range of values for $k = 1$ is $[-8, 8]$ (three shifts in either direction), values like 11 can not be adequately represented with $k = 1$, thus requiring $k = 2$. In the case of $k = 2$, the closest possible approximation is made in the upper nibble of the encoding format. The difference between the true value and the initial approximation is then encoded in the lower nibble. The range of values for $k = 2$ is the arithmetic sum of $[-8,8]$ for the upper nibble and $[-8,8]$ for the lower nibble.

<table>
<thead>
<tr>
<th>Example E</th>
<th>Encode E</th>
<th>Value stored</th>
<th>Upper nibble U</th>
<th>Difference $D = E - U$</th>
<th>Encode $D$ into Lower Nibble $L$</th>
<th>Lower nibble $L$</th>
<th>Value stored $U + L$</th>
</tr>
</thead>
<tbody>
<tr>
<td>11</td>
<td>0011</td>
<td>8</td>
<td>8</td>
<td>3</td>
<td>0011 0001</td>
<td>2</td>
<td>10</td>
</tr>
<tr>
<td>7</td>
<td>1010</td>
<td>-8</td>
<td>-8</td>
<td>1</td>
<td>1010 0000</td>
<td>1</td>
<td>-7</td>
</tr>
<tr>
<td>0.74</td>
<td>0101</td>
<td>0.5</td>
<td>0.5</td>
<td>0.24</td>
<td>0101 0110</td>
<td>0.25</td>
<td>0.75</td>
</tr>
<tr>
<td>0</td>
<td>0000</td>
<td>0</td>
<td>1</td>
<td>-1</td>
<td>0000 1000</td>
<td>-1</td>
<td>0</td>
</tr>
</tbody>
</table>

Table 1: Examples of parameter encoding when they are represented with both $k = 1$, and $k = 2$.

4725
parameters are encoded into 4-bit and 8-bit sequences for \(k = 1\) and \(k = 2\), respectively. Figure 1 shows that the CompactNet training algorithm does not have issues in converging to the global minimum. It must also be noted that the the number of training epochs, and the training time required in CompactNet is not considerably higher compared to the conventional DNN.

<table>
<thead>
<tr>
<th>Technique</th>
<th>Weights</th>
<th>Activation function</th>
<th>Activation output</th>
</tr>
</thead>
<tbody>
<tr>
<td>Conventional DNN</td>
<td>floating-point</td>
<td>ReLU</td>
<td>floating-point</td>
</tr>
<tr>
<td>BinaryConnect</td>
<td>±1</td>
<td>ReLU</td>
<td>floating-point</td>
</tr>
<tr>
<td>BinaryNet</td>
<td>±1</td>
<td>Sign</td>
<td>±1</td>
</tr>
<tr>
<td>LightNN(_k=2)</td>
<td>(\pm(2^n_1 + 2^n_2))</td>
<td>ReLU</td>
<td>floating-point</td>
</tr>
<tr>
<td>LightNN(_k=1)</td>
<td>(\pm 2^n_1)</td>
<td>ReLU</td>
<td>floating-point</td>
</tr>
<tr>
<td>CompactNet</td>
<td>(\pm(2^n_1 + 2^n_2)) and (\pm 2^n_1)</td>
<td>ReLU</td>
<td>fixed-point</td>
</tr>
</tbody>
</table>

Table 2: Weight representation and computation comparisons for various approaches to DNN hardware implementation.

IV. Experiments and Results

CompactNet is tested against prior techniques: conventional DNN, LightNN\(_k=2\) [24], LightNN\(_k=1\) [24], BinaryNet [19] and BinaryConnect [17]. Table 2 highlights the differences among the various implementation techniques. All the experiments are trained and tested with two datasets: MNIST and CIFAR-10. MNIST contains 70,000 grayscale images of handwritten digits, each of size 28×28 pixels. CIFAR-10 contains 60,000 color images belonging to ten different categories, each image in CIFAR-10 is of size 32×32 pixels.

The approach for training includes the use of the dropout layers in the DNN to avoid overfitting. The loss function used is categorical cross entropy, which gives optimal results for the classification problem at hand. The techniques are prototyped on the Theano platform for Python. The ADAM learning rule is employed in all the experiments. The same number of training epoch is applied to each technique for fair result comparison. The results presented in this paper are obtained after averaging the test error of the epochs with the lowest

<table>
<thead>
<tr>
<th>Dataset</th>
<th>Config.</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>MNIST</td>
<td>1hidden</td>
<td>One hidden layer with 100 neurons.</td>
</tr>
<tr>
<td></td>
<td>2conv</td>
<td>Two convolutional layers and two</td>
</tr>
<tr>
<td></td>
<td></td>
<td>fully-connected layers.</td>
</tr>
<tr>
<td></td>
<td>3hidden</td>
<td>Three hidden layers with 4096 neurons each.</td>
</tr>
<tr>
<td>CIFAR-10</td>
<td>3conv</td>
<td>Three convolutional layers and one</td>
</tr>
<tr>
<td></td>
<td></td>
<td>fully-connected layer.</td>
</tr>
<tr>
<td></td>
<td>6conv</td>
<td>Six convolutional layers and three</td>
</tr>
<tr>
<td></td>
<td></td>
<td>fully-connected layers.</td>
</tr>
</tbody>
</table>

Table 3: The list of DNN configurations used in the experiments conducted.
validation error over several iterations. The validation set is used for selecting this best epoch in every iteration. Table 3 lists the various DNN configurations used for each dataset.

It must be noted that, all the results put forward are obtained after training on the same CPU till the training error begins to saturate, to maintain a fair comparison. This is the reason for the difference (if any) from the results published in the original work.

The general trend is that the accuracy increases with increasing precision of the parameters, however there are some exceptions to this rule due to the regularization effects observed due to constraining the parameters. The activation function has a significant impact on the accuracy, that is, ReLU performs considerably better than Sign activation function. Overfitting is frequent in large DNNs; use of dropout layers and regularization is key to avoid entrapment of the loss function in a local minimum. The classification error has been reported in Table 4. The average accuracy decreases in the order: Conventional DNN, LightNN\(_{k=2}\), CompactNet, LightNN\(_{k=1}\), BinaryConnect, and BinaryNet. From this we can infer that the accuracy drop in CompactNet is very similar, if not better, than other lightweight techniques.

The choice of activation function, and the loss function has no bearing on the memory requirement. The conventional DNN, based on full floating-point requires the greatest amount of memory due to the use of 32 bits for each value. BNNs use only a single bit to represent the parameters, hence they require the least amount of memory. The memory requirement decreases in the order: Conventional DNN, LightNN\(_{k=2}\), CompactNet, LightNN\(_{k=1}\), and BNNs. This trend is depicted in Table 4. As CompactNet uses 4-bit and 8-bit sequences, the storage requirement is between LightNN\(_{k=1}\) and LightNN\(_{k=2}\) which require only 4-bit and 8-bit sequences, respectively. All the approaches, except CompactNet, provide a constant improvement on the memory requirement when compared with the conventional DNN. Due to the dynamic adaptions to the word size, the memory required in CompactNet varies with the dataset and configuration.

The circuit area requirement for large DNNs is calculated using a pipelined approach, where a single instance of the largest neuron is repeated for each occurrence. Through this process, the worst-case area requirement is calculated. The number of large on-chip components is proportional to the area required. Multipliers present in the conventional DNNs are severalfold larger than the XOR gates needed in BinaryNet. Floating-point operations require much larger and complex components compared to fixed-point operations, thus occupying larger area. The general trend for the area required decreases in the order: Conventional DNN, BinaryConnect, LightNN\(_{k=2}\), LightNN\(_{k=1}\), CompactNet, and BinaryNet. Figure 2 compares the area required for the various architectures. CompactNet is at least \(7 \times\) smaller than the Conventional DNN, and only \(1.2 \times\) larger than BinaryNet in terms of circuit area. While, LightNN\(_k = 1\) is \(4 \times\) smaller than the conventional DNN, LightNN\(_k = 2\) is only \(2 \times\) smaller. BinaryConnect provides no improvement in terms of area requirement because of the use of floating-point operations. CompactNet shows these gains because of the use of approximate shift-and-add multipliers. The 8-bit representations for activation outputs

<table>
<thead>
<tr>
<th>Configuration</th>
<th>Test Error</th>
<th>Reduction Percentage</th>
<th>Test Error</th>
<th>Reduction Percentage</th>
<th>Test Error</th>
<th>Reduction Percentage</th>
<th>Test Error</th>
<th>Reduction Percentage</th>
<th>Test Error</th>
<th>Reduction Percentage</th>
</tr>
</thead>
<tbody>
<tr>
<td>Conventional DNN</td>
<td>1.75%</td>
<td>0%</td>
<td>0.8%</td>
<td>0%</td>
<td>0.71%</td>
<td>0%</td>
<td>20.19%</td>
<td>0%</td>
<td>9.30%</td>
<td>0%</td>
</tr>
<tr>
<td>BinaryConnect</td>
<td>5.6%</td>
<td>95%</td>
<td>4.15%</td>
<td>95%</td>
<td>2.01%</td>
<td>95%</td>
<td>31.04%</td>
<td>95%</td>
<td>13.71%</td>
<td>95%</td>
</tr>
<tr>
<td>BinaryNet</td>
<td>7.10%</td>
<td>6.5%</td>
<td>1.9%</td>
<td>61%</td>
<td>6%</td>
<td>21.56%</td>
<td></td>
<td>37%</td>
<td>56%</td>
<td>75%</td>
</tr>
<tr>
<td>LightNN(_k=1)</td>
<td>2.00%</td>
<td>87.5%</td>
<td>1.90%</td>
<td>87.5%</td>
<td>0.90%</td>
<td>87.5%</td>
<td>25.31%</td>
<td>87.5%</td>
<td>9.11%</td>
<td>87.5%</td>
</tr>
<tr>
<td>LightNN(_k=2)</td>
<td>1.93%</td>
<td>75%</td>
<td>1.31%</td>
<td>75%</td>
<td>0.80%</td>
<td>75%</td>
<td>24.60%</td>
<td>75%</td>
<td>8.21%</td>
<td>75%</td>
</tr>
<tr>
<td>CompactNet</td>
<td>1.96%</td>
<td>83.60%</td>
<td>1.48%</td>
<td>83.80%</td>
<td>0.84%</td>
<td>84.10%</td>
<td>25.03%</td>
<td>81.60%</td>
<td>8.39%</td>
<td>84.80%</td>
</tr>
</tbody>
</table>

Table 4: Comparison of the test error, and the corresponding memory requirement of different techniques and configurations.
ensures that the shift-and-add elements are more compact.

The energy requirement is computed as the sum of Switch, Leakage, Memory Access, and Internal energy consumption while running the model. Our results show that the activation functions have a significant impact on the energy requirement of the architecture. The Sign activation function in BinaryNet corresponds to the most energy-efficient logic. Furthermore, fewer bits for the parameters ensures less energy for each memory access. Due to the use of ReLU activation function and 32-bit floating point for the parameters, conventional DNNs require the most energy. From Figure 3 we see the general trend for the energy requirement decreases in the order: conventional DNN, LightNN \(k=2\), LightNN \(k=1\), BinaryConnect, CompactNet, and BinaryNet. CompactNet reduces the energy requirement by over 12× from the conventional DNN. LightNN \(k=1\), LightNN \(k=2\), and BinaryConnect provide 7×, 5×, 8× improvement on the conventional DNN, respectively. In comparison, CompactNet performs significantly better because of the reduced number of memory accesses. Due to the better classification accuracy possible with CompactNet, compared with BinaryNet, it is a more suitable option for most applications.

Through these experiments it is evident that CompactNet improves area and memory consumption significantly, while maintaining the accuracy to a great extent. The energy consumed during inferring is notably lower than the conventional DNN, and similar to the existing standard of lightweight DNN implementations.

V. SUMMARY AND CONCLUSION

There is no single technique that surpasses all other techniques in terms of all factors: accuracy, storage, area, and energy. The tradeoffs are evaluated depending on the dataset, and the hardware resource constraints. Through the experiments the tradeoffs between the accuracy and costs are illustrated. CompactNet provides a greater number of pareto-optimal designs, refer Figure 4 and Figure 5, compared to other architectures and techniques.

With CompactNet, we help bridge the gap between conventional DNNs and BNNs. The usage of fixed-point operators limits the complexity of the computation logic significantly. When used along with the quantized parameters and the shift-and-add multipliers, the resulting circuit is significantly more efficient, when compared to the conventional DNN. Dynamic encoding of the parameters also allows reduction in the off-chip DRAM accesses. CompactNet provides a suitable option for deployment of DNNs on hardware by reducing the memory, and area requirements while maintaining accuracy.

In our future work, we will employ the quantization method, and architecture described in this paper, with models like MobileNet. We would further improve the inferencing speed, with high accuracy, on the ImageNet dataset.

REFERENCES


