June 18, 2021
Annual Fall Conference
DRAM act as a cache (invisible to the user) for Intel Optane
Optane 1
Optane 2
Optane 3
Optane 4
Optane 5
Optane 6
Optane 7
Optane 8
Optane 9
Optane 10
Optane 11
Optane 12
DRAM 1
DRAM 2
DRAM 3
DRAM 4
CPU 1
CPU 2
42.6 GB/s
42.6 GB/s
Read: 43.8 GB/s
Write: 14.4 GB/s
Read: 43.8 GB/s
Write: 14.4 GB/s
CPU 1
CPU 2
DRAM 1
DRAM 2
DRAM 3
DRAM 4
DRAM 5
DRAM 6
DRAM 7
DRAM 8
DRAM 9
DRAM 10
DRAM 11
DRAM 12
127.8 GB/s
127.8 GB/s
CPU 1
CPU 2
DRAM 1
DRAM 2
DRAM 3
DRAM 4
DRAM 5
DRAM 6
DRAM 7
DRAM 8
DRAM 9
DRAM 10
DRAM 11
DRAM 12
127.8 GB/s
127.8 GB/s
DRAM Only (DO)
Optane 1
Optane 2
Optane 3
Optane 4
Optane 5
Optane 6
Optane 7
Optane 8
Optane 9
Optane 10
Optane 11
Optane 12
DRAM 1
DRAM 2
DRAM 3
DRAM 4
CPU 1
CPU 2
42.6 GB/s
42.6 GB/s
Read: 43.8 GB/s
Write: 14.4 GB/s
Read: 43.8 GB/s
Write: 14.4 GB/s
Memory Mode (MM)
(Operations that cannot be expressed in terms of addition and multiplication)
Data owner
Server
Decrypt
Operation
Encrypt
(Operations that cannot be expressed in terms of addition and multiplication)
Data owner
Server
Decrypt
Operation
Encrypt
Real configuration
Experiment configuration
Server
Decrypt
Operation
Encrypt
We have used the pre-trained models of both networks.
Our experiments only take into account inference.
(Batch size: 2048)
Peak Memory Use
Inference time (MM)
Top 1 accuracy
Top 5 accuracy
For the width multipliers 1.3 & 1.4 only the resolution 224x224 px is available
(Batch size: 2048)
Peak Memory Use
Inference time (MM)
Top 1 accuracy
Top 5 accuracy
For the width multipliers 1.3 & 1.4 only the resolution 224x224 px is available
(Batch size: 2048)
Peak Memory Use
Inference time (MM)
Top 1 accuracy
Top 5 accuracy
For the width multipliers 1.3 & 1.4 only the resolution 224x224 px is available
Width Multiplier |
Input Resolution |
Memory Usage (GB) |
Time (s) Memory Mode |
Time (s) DRAM Only |
---|---|---|---|---|
0.35 | 96 | 71 | 718 | 665 |
0.35 | 128 | 125 | 1,311 | 1,164 |
0.50 | 96 | 80 | 925 | 841 |
0.50 | 128 | 144 | 1,691 | 1,524 |
0.75 | 96 | 153 | 1,572 | 1,432 |
1.00 | 96 | 157 | 1,830 | 1,645 |
Only models that fit on DRAM (192 GB) are shown in this table
Width Multiplier |
Input Resolution |
Memory Usage (GB) |
Time (s) Memory Mode |
Time (s) DRAM Only |
---|---|---|---|---|
0.35 | 96 | 71 | 718 | 665 |
0.35 | 128 | 125 | 1,311 | 1,164 |
0.50 | 96 | 80 | 925 | 841 |
0.50 | 128 | 144 | 1,691 | 1,524 |
0.75 | 96 | 153 | 1,572 | 1,432 |
1.00 | 96 | 157 | 1,830 | 1,645 |
Only models that fit on DRAM (192 GB) are shown in this table
Memory Mode only ~10% slower than DRAM Only
(Width multiplier: 0.75 | Resolution: 96x96 px)
Function | Ratio | ||||||
---|---|---|---|---|---|---|---|
Add | 26,605 | 0.78% | 6,333 | 2.23% | 1,345 | 6.02% | 4.75 |
AvgPool | 1,077 | 0.03% | 80 | 0.03% | 10 | 0.05% | 7.65 |
BoundedRelu* | 125,401 | 3.70% | 88,402 | 30.90% | 17,198 | 76.92% | 5.14 |
Concat | 10,578 | 0.31% | 115 | 0.04% | 5 | 0.03% | 20.29 |
Constant | 24,898 | 0.73% | 664 | 0.23% | 32 | 0.15% | 20.46 |
Convolution | 3,113,888 | 91.76% | 187,003 | 56.37% | 3,152 | 14.10% | 59.32 |
Multiply | 9,652 | 0.28% | 1,841 | 0.64% | 489 | 2.19% | 3.76 |
Reshape | 51,041 | 1.50% | 870 | 0.30% | 103 | 0.46% | 8.39 |
Result | 40 | 0.00% | 117 | 0.04% | 0.5 | 0.00% | 207.42 |
Slice | 30,338 | 0.89% | 576 | 0.20% | 19 | 0.09% | 30.11 |
Total | 3,393,517 | 286,006 | 22,358 | 12.79 |
*Not supported by encryption scheme
Time (s)
DRAM (K-loads)
Optane (K-loads)
(Width multiplier: 0.75 | Resolution: 96x96 px)
Function | Ratio | ||||||
---|---|---|---|---|---|---|---|
Add | 26,605 | 0.78% | 6,333 | 2.23% | 1,345 | 6.02% | 4.75 |
AvgPool | 1,077 | 0.03% | 80 | 0.03% | 10 | 0.05% | 7.65 |
BoundedRelu* | 125,401 | 3.70% | 88,402 | 30.90% | 17,198 | 76.92% | 5.14 |
Concat | 10,578 | 0.31% | 115 | 0.04% | 5 | 0.03% | 20.29 |
Constant | 24,898 | 0.73% | 664 | 0.23% | 32 | 0.15% | 20.46 |
Convolution | 3,113,888 | 91.76% | 187,003 | 56.37% | 3,152 | 14.10% | 59.32 |
Multiply | 9,652 | 0.28% | 1,841 | 0.64% | 489 | 2.19% | 3.76 |
Reshape | 51,041 | 1.50% | 870 | 0.30% | 103 | 0.46% | 8.39 |
Result | 40 | 0.00% | 117 | 0.04% | 0.5 | 0.00% | 207.42 |
Slice | 30,338 | 0.89% | 576 | 0.20% | 19 | 0.09% | 30.11 |
Total | 3,393,517 | 286,006 | 22,358 | 12.79 |
*Not supported by encryption scheme
Time (s)
DRAM (K-loads)
Optane (K-loads)
(Width multiplier: 0.75 | Resolution: 96x96 px)
Function | Ratio | ||||||
---|---|---|---|---|---|---|---|
Add | 26,605 | 0.78% | 6,333 | 2.23% | 1,345 | 6.02% | 4.75 |
AvgPool | 1,077 | 0.03% | 80 | 0.03% | 10 | 0.05% | 7.65 |
BoundedRelu* | 125,401 | 3.70% | 88,402 | 30.90% | 17,198 | 76.92% | 5.14 |
Concat | 10,578 | 0.31% | 115 | 0.04% | 5 | 0.03% | 20.29 |
Constant | 24,898 | 0.73% | 664 | 0.23% | 32 | 0.15% | 20.46 |
Convolution | 3,113,888 | 91.76% | 187,003 | 56.37% | 3,152 | 14.10% | 59.32 |
Multiply | 9,652 | 0.28% | 1,841 | 0.64% | 489 | 2.19% | 3.76 |
Reshape | 51,041 | 1.50% | 870 | 0.30% | 103 | 0.46% | 8.39 |
Result | 40 | 0.00% | 117 | 0.04% | 0.5 | 0.00% | 207.42 |
Slice | 30,338 | 0.89% | 576 | 0.20% | 19 | 0.09% | 30.11 |
Total | 3,393,517 | 286,006 | 22,358 | 12.79 |
*Not supported by encryption scheme
Time (s)
DRAM (K-loads)
Optane (K-loads)
(Width multiplier: 0.75 | Resolution: 96x96 px)
DRAM traffic
Optane traffic
Bandwidth:
Bandwidth:
(Width multiplier: 0.75 | Resolution: 96x96 px)
(Batch size 2048)
guillermo.lloret@bsc.es
Annual Fall Conference