June 18, 2021
HMEM 2021
DRAM act as a cache (invisible to the user) for Intel Optane
Memory Mode (MM)
DRAM Only (DO)
We have used the pre-trained models of both networks.
Our experiments only take into account inference.
(Batch size: 2048)
Peak Memory Use
Inference time (MM)
Top 1 accuracy
Top 5 accuracy
For the width multipliers 1.3 & 1.4 only the resolution 224x224 px is available
(Batch size: 2048)
Peak Memory Use
Inference time (MM)
Top 1 accuracy
Top 5 accuracy
For the width multipliers 1.3 & 1.4 only the resolution 224x224 px is available
(Batch size: 2048)
Peak Memory Use
Inference time (MM)
Top 1 accuracy
Top 5 accuracy
For the width multipliers 1.3 & 1.4 only the resolution 224x224 px is available
Width Multiplier |
Input Resolution |
Memory Usage (GB) |
Time (s) Memory Mode |
Time (s) DRAM Only |
---|---|---|---|---|
0.35 | 96 | 71 | 718 | 665 |
0.35 | 128 | 125 | 1,311 | 1,164 |
0.50 | 96 | 80 | 925 | 841 |
0.50 | 128 | 144 | 1,691 | 1,524 |
0.75 | 96 | 153 | 1,572 | 1,432 |
1.00 | 96 | 157 | 1,830 | 1,645 |
Only models that fit on DRAM (192 GB) are shown in this table
Width Multiplier |
Input Resolution |
Memory Usage (GB) |
Time (s) Memory Mode |
Time (s) DRAM Only |
---|---|---|---|---|
0.35 | 96 | 71 | 718 | 665 |
0.35 | 128 | 125 | 1,311 | 1,164 |
0.50 | 96 | 80 | 925 | 841 |
0.50 | 128 | 144 | 1,691 | 1,524 |
0.75 | 96 | 153 | 1,572 | 1,432 |
1.00 | 96 | 157 | 1,830 | 1,645 |
Only models that fit on DRAM (192 GB) are shown in this table
Memory Mode only ~10% slower than DRAM Only
(Width multiplier: 0.75 | Resolution: 96x96 px)
Function | Ratio | ||||||
---|---|---|---|---|---|---|---|
Add | 26,605 | 0.78% | 6,333 | 2.23% | 1,345 | 6.02% | 4.75 |
AvgPool | 1,077 | 0.03% | 80 | 0.03% | 10 | 0.05% | 7.65 |
BoundedRelu* | 125,401 | 3.70% | 88,402 | 30.90% | 17,198 | 76.92% | 5.14 |
Concat | 10,578 | 0.31% | 115 | 0.04% | 5 | 0.03% | 20.29 |
Constant | 24,898 | 0.73% | 664 | 0.23% | 32 | 0.15% | 20.46 |
Convolution | 3,113,888 | 91.76% | 187,003 | 56.37% | 3,152 | 14.10% | 59.32 |
Multiply | 9,652 | 0.28% | 1,841 | 0.64% | 489 | 2.19% | 3.76 |
Reshape | 51,041 | 1.50% | 870 | 0.30% | 103 | 0.46% | 8.39 |
Result | 40 | 0.00% | 117 | 0.04% | 0.5 | 0.00% | 207.42 |
Slice | 30,338 | 0.89% | 576 | 0.20% | 19 | 0.09% | 30.11 |
Total | 3,393,517 | 286,006 | 22,358 | 12.79 |
*Not supported by encryption scheme
Time (s)
DRAM (K-loads)
Optane (K-loads)
(Width multiplier: 0.75 | Resolution: 96x96 px)
Function | Ratio | ||||||
---|---|---|---|---|---|---|---|
Add | 26,605 | 0.78% | 6,333 | 2.23% | 1,345 | 6.02% | 4.75 |
AvgPool | 1,077 | 0.03% | 80 | 0.03% | 10 | 0.05% | 7.65 |
BoundedRelu* | 125,401 | 3.70% | 88,402 | 30.90% | 17,198 | 76.92% | 5.14 |
Concat | 10,578 | 0.31% | 115 | 0.04% | 5 | 0.03% | 20.29 |
Constant | 24,898 | 0.73% | 664 | 0.23% | 32 | 0.15% | 20.46 |
Convolution | 3,113,888 | 91.76% | 187,003 | 56.37% | 3,152 | 14.10% | 59.32 |
Multiply | 9,652 | 0.28% | 1,841 | 0.64% | 489 | 2.19% | 3.76 |
Reshape | 51,041 | 1.50% | 870 | 0.30% | 103 | 0.46% | 8.39 |
Result | 40 | 0.00% | 117 | 0.04% | 0.5 | 0.00% | 207.42 |
Slice | 30,338 | 0.89% | 576 | 0.20% | 19 | 0.09% | 30.11 |
Total | 3,393,517 | 286,006 | 22,358 | 12.79 |
*Not supported by encryption scheme
Time (s)
DRAM (K-loads)
Optane (K-loads)
(Width multiplier: 0.75 | Resolution: 96x96 px)
Function | Ratio | ||||||
---|---|---|---|---|---|---|---|
Add | 26,605 | 0.78% | 6,333 | 2.23% | 1,345 | 6.02% | 4.75 |
AvgPool | 1,077 | 0.03% | 80 | 0.03% | 10 | 0.05% | 7.65 |
BoundedRelu* | 125,401 | 3.70% | 88,402 | 30.90% | 17,198 | 76.92% | 5.14 |
Concat | 10,578 | 0.31% | 115 | 0.04% | 5 | 0.03% | 20.29 |
Constant | 24,898 | 0.73% | 664 | 0.23% | 32 | 0.15% | 20.46 |
Convolution | 3,113,888 | 91.76% | 187,003 | 56.37% | 3,152 | 14.10% | 59.32 |
Multiply | 9,652 | 0.28% | 1,841 | 0.64% | 489 | 2.19% | 3.76 |
Reshape | 51,041 | 1.50% | 870 | 0.30% | 103 | 0.46% | 8.39 |
Result | 40 | 0.00% | 117 | 0.04% | 0.5 | 0.00% | 207.42 |
Slice | 30,338 | 0.89% | 576 | 0.20% | 19 | 0.09% | 30.11 |
Total | 3,393,517 | 286,006 | 22,358 | 12.79 |
*Not supported by encryption scheme
Time (s)
DRAM (K-loads)
Optane (K-loads)
(Operations that cannot be expressed in terms of addition and multiplication)
Data owner
Server
Decrypt
Operation
Encrypt
(Operations that cannot be expressed in terms of addition and multiplication)
Data owner
Server
Decrypt
Operation
Encrypt
Real configuration
Experiment configuration
Server
Decrypt
Operation
Encrypt
(Width multiplier: 0.75 | Resolution: 96x96 px)
DRAM traffic
Optane traffic
Bandwidth:
Bandwidth:
(Width multiplier: 0.75 | Resolution: 96x96 px)
(Batch size 2048)
HMEM 2021
guillermo.lloret@bsc.es