FFE #1 - Microsoft Research

Capabilities, Costs ∝
/
$
FPGAs
ASICs
Source: Bob Broderson, Berkeley Wireless group
Xeon CPU
NIC
Xeon CPU
Search Acc.
(FPGA)
Search Acc.
(ASIC)
NIC
Xeon CPU
Search Acc. v2
(FPGA)
Wasted Power,
Holds back SW
NIC
Xeon CPU
Math
Accelerator
Wasted Power, NIC
One more thing that
can break
http://www.globalfoundationservices.com/posts/2014/january/27/microsoft-contributes-cloud-server-specification-to-open-compute-project.aspx
•
•
•
•
Two 8-core Xeon 2.1 GHz CPUs
64 GB DRAM
4 HDDs, 2 SSDs
No cable attachments to server
Stratix V
8GB DDR3
PCIe Gen3 x8
Data Center Server (1U, ½ width)
FPGA
FPGA
FPGA
Web Search Pipeline
FPGA
Math Acceleration
Service
FPGA
FPGA
FPGA
Web Search Pipeline
Physics
Engine
FPGA
Comp.
Vision
Service
4 GB DDR3-1333
ECC SO-DIMM
4 GB DDR3-1333
ECC SO-DIMM
72
Shell
DDR3 Core 0
72
DDR3 Core 1
Config
Flash
(RSU)
Role
JTAG
Host
CPU
8
x8 PCIe
Core
LEDs
Application
Temp
Sensors
DMA
Engine
I 2C
xcvr
reconfig
Inter-FPGA Router
North
SLIII
2
South
SLIII
2
East
SLIII
2
SEU
West
SLIII
2
4
256 Mb
QSPI
Config
Flash
Ranking as a Service (RaaS)
Selection as a Service (SaaS)
Query
SaaS
11
IFM
IFM
IFM1 1
SaaS
22
IFM
IFM
IFM22
SaaS
33
IFM
IFM
IFM33
SaaS
IFM
44
48
IFM
IFM44
44
Selection-as-a-Service (SaaS)
- Find all docs that contain query terms,
- Filter and select candidate documents for
ranking
Selected
Documents
RaaS
11
IFM
IFM
IFM1 1
RaaS
22
IFM
IFM
IFM22
RaaS
33
IFM
IFM
IFM33
10 blue links
RaaS
IFM
44
48
IFM
IFM44
44
Ranking-as-a-Service (RaaS)
- Compute scores for how relevant each selected
document is for the search query
- Sort the scores and return the results
{Query,
Document}
Document
Query: “FPGA Configuration”
NumberOfOccurrences_0 = 7
~4K Dynamic
Features
~2K Synthetic
Features
L2
Score
Score
NumberOfOccurrences_1 = 4
NumberOfTuples_0_1 = 1
{Query,
Document}
Document
NumberOfOccurrences_0 = 7
NumberOfOccurrences_1 = 4
NumberOfTuples_0_1 = 1
~4K Dynamic
Features
~2K Synthetic
Features
FFE #1 =(2*NumberOfOccurrences_0 + NumberOfOccurrences_1)
(2 * NumberOfTuples_0_1)
FFE #1 = 9
L2
Score
Score
PCIe
Compressed
Document
Free Form
Expression
(FFE)
•
•
•
Stream
Preprocessing
FSM
Feature
Gathering
Network
196 feature families
54 state machines
2.6K dynamic features extracted in
less than 4us (~600us in SW)
Control/Data
Tokens
Distribution latches
Cluster
0
Outpu
t
Core 0
FST
Core 3
Core 1
Core 2
Complex
Core 4
Core 5
Document
8-Stage Pipeline
FE: Feature Extraction
FPGA 0
Route to
Head
FPGA 1
Route to
Head
FPGA 2
FFE: Free-Form
Expressions
FPGA 3
FPGA 4
FPGA 5
FPGA 6
Score
Compute
Score
Compute
Score
FPGA 7
Document
Scoring
Request
Return
Score
Document
Scoring
Request
Return
Score
RaaS Servers
Server
Server
Server
Server
Server
Server
Server
Server
8-Stage Pipeline
8-Stage Pipeline
FPGA 0
FPGA 5
FPGA 1
FPGA 6
FPGA 2
FPGA 2
FPGA 3
FPGA 0
FPGA 4
FPGA 1
FPGA 5
FPGA 2
FPGA 6
FPGA 3
FPGA 7
FPGA 4
1,632 Servers with FPGAs Running Bing Page Ranking Service (~30,000 lines of C++)
But when will an FPGA handle my Bing Search?
• Bing is going into production with FPGAs
Top Row: Eric Peterson, Scott Hauck,
Aaron Smith, Jan Gray, Adrian M.
Caulfield, Phillip Yi Xiao, Michael
Haselman, Doug Burger
Bottom Row: Joo-Young Kim, Stephen
Heil, Derek Chiou, Sitaram Lanka,
Andrew Putnam, Eric S. Chung,
Not Pictured: Kypros Constantinides,
John Demme, Hadi Esmaeilzadeh,
Jeremy Fowers, Gopi Prashanth Gopal,
Amir Hormati, James Larus, Simon
Pope, Jason Thong
Huge thanks to our partners at
Save the planet and return
your name badge before you
leave (on Tuesday)
Microsoft Privacy Policy statement applies to all information collected. Read at research.microsoft.com