reader comments 49
Nvidia has unveiled the Tesla V100, its first GPU based on the new Volta architecture. Like the Pascal-based P100 before it, the V100 is designed for high-performance computing rather than consumer use, but it still provides a tantalising glimpse at what the future might hold for Nvidia’s consumer graphics cards.
Titan Xp sports a mere 12 billion transistors on 471 mm².
Suffice it to say, V100 is a giant GPU and one of the largest silicon chips ever produced, period.
The combination of die size and process shrink has enabled Nvidia to push the number of streaming multiprocessors (SMs) to 84. Each SM features 64 CUDA cores for a total of 5,376—much more than any of its predecessors. That said, V100 isn’t a fully enabled part, with only 80 SMs enabled (most likely for yield reasons) resulting in 5,120 CUDA cores.
In addition, V100 also features 672 tensor cores (TCs), a new type of core explicitly designed for machine learning operations. In tasks that can take advantage of them, Nvidia claims that the new tensor cores offer a 4x performance boost versus Pascal, which in theory makes the V100 a better performer than Google’s dedicated tensor processing unit (TPU).
High-level performance of V100 is impressive: 15 teraflops of FP32, 30 teraflops of FP16, 7.5 teraflops of FP64, and a huge 120 teraflops for dedicated tensor operations. Should Nvidia ditch the die space reserved for FP64 and tensor cores for FP32 in a future consumer product (Titan XV anyone?), the gaming potential would be massive. Feeding the V100 GPU is 16GB of HBM2 memory clocked at 1.75GHz on a 4096-bit bus for 900GB/sec of bandwidth.
Despite the large die, the V100 GPU still runs at a peak 1455MHz. TDP is rated at 300W, and like its predecessor, V100 features Nvidia’s proprietary NVLink connector that allows multiple GPUs to connect directly to each other with more bandwidth than the PCI Express 3.0 bus. The difference is that V100 features NVLink 2, which sports a higher 25GB/s bidirectional link bandwidth, as well as six NVLinks per GPU versus four on GP100.
Ars Technica UK