计算加速

当今计算加速中的工作负载与最终应用程序一样多 -金融交易的一切 and genomics to machine learning inference 和 training. However, the workloads share some common characteristics including the types of arithmetic functions, number formats (integer 和 floating point), 和 aggressive performance targets. Furthermore, as processing naturally migrates closer to the edge, power, thermal aspects 和 performance per watt become key metrics. It is in these areas that FPGAs in general, 和 the Speedster7t family in particular, excel.

The Speedster7t FPGA family is optimized for high-bandwidth workloads 和 eliminates the performance bottlenecks associated with traditional FPGAs. Built on TSMC’s 7nm FinFET process, Speedster7t FPGA feature a revolutionary new 2D片上网络(NoC),一系列新 机器学习处理器(MLP) optimized for high-bandwidth 和 artificial intelligence/machine learning (AI/ML) workloads, high-bandwidth GDDR6 interfaces, 400G Ethernet 和 PCI Express Gen5 ports —全部互连以提供ASIC级性能,同时保留FPGA的完整可编程性。 今天就开始使用带有Speedster7t FPGA的VectorPath加速器卡。

Speedster7t解决方案

  • Speedster7t FPGA provide a high-performance, power efficient computational acceleration solution for defense, financial, medical, scientific, oil 和 gas, 和 life science applications:
    • Machine learning (ML) inference 和 edge training
    • Financial analysis 和 high-frequency trading
    • 基因组分析
    • 视频和 image processing
  • The inherent parallelism 和 flexibility of the FPGA architecture is well suited to these high-throughput applications. 
  • 高速接口 PCIe Gen5 connectivity 和 high-performance Ethernet, as well as a dedicated 2D片上网络(NoC) for high bandwidth data movement.
  • Storage of large data sets is possible with DDR4/5 bulk storage 和 GDDR6 interfaces for high-bandwidth access to external memory.
  • Data processing supports a wide-variety of number formats from low-bit 宽度 integer math to high-performance floating point operations, including native support for matrix multiplications 和 complex arithmetic (for example, to support beamforming applications).
  • Speedster7t FPGA are particularly well suited to ML inference 和 edge analytics operations.

申请条件 Speedster价值
需要高带宽的外部连接 Multiple ports of 400G Ethernet 和 PCIe Gen5
最高的内存带宽用于缓冲,>1 Tbps 最多16个独立的GDDR6通道(16 Gbps),提供高达4 Tbps的总带宽
Wide 和 high-performance datapath

为计算加速度矩阵矢量数学而优化的数据流

  • 高达20 Tbps的NoC带宽,用于高速,宽数据传输
  • 优化的总线路由量化为一个字节
  • 完全灵活的按位路由
  • Dedicated routing paths to support data reuse between multiply-accumulator 和 memory
  • 级联路径以启用脉动阵列实现
  • 集成寄存器文件可实现计算的时间复用
整数算术的重要计算要求
  • MLP 为int8提供多达61个TOPS

  • 改进的Booth算法允许LUT中整数倍的密度加倍

Neural network inferencing requires a large number of matrix multiplications, high-performance computation 和 significant amounts of data movement

Optimized multiply-accumulate core for integer 和 floating-point arithmetic

  • 真正可分割的整数宽度:4x int16至16x int8至32x int4
  • FP16, bfloat16 和 custom floating point support
  • 对块浮点的本机支持

 

  机器学习
Deep Learning
高性能计算 基因组学 视频& Image Processing
最高性能SerDes
112G多标准SR / MR / LR PHY
最先进的接口IP
PCIe Gen5
GDDR6-4 Tbits / sec的内存带宽
DDR4-最高3200 MHz,3DS堆叠内存
DDR5-最高4,400 MHz  
特定于应用程序的界面    
太比特速度路由
NoC
巴士路线    
完全灵活的按位路由      
高通量处理
数据路径加密    
MLP
细粒度的硬件可重编程性(列出的示例) 格式转换,激活功能 蒙特卡洛分析 PairHMM算法 自定义编解码器