Helios

De Wiki de Calcul Québec
Aller à : Navigation, rechercher
Cette page est une traduction de la page Helios et la traduction est complétée à 100 % et à jour.

Autres langues :anglais 100% • ‎français 100%

Sommaire

Description

Helios is a supercomputer consisting of graphics processor nodes and which has been installed in Université Laval's computing centre since the spring of 2014. The server was purchased using funds from researchers at Université Laval and the Université de Montréal as well as funds from Calcul Québec. At its commissioning in the summer of 2014, Helios was the largest GPU deployment administered by Compute Canada, both in terms of the number of GPUs and computing power.

Server Composition

Helios is composed of 15 compute nodes each of which has eight K20 GPUs from nVidia, and 6 compute nodes with eight nVidia K80 boards each. Each K80 board contains two GPU, for a total of 216 GPUs for the cluster.

K20 nodes

On K20 nodes, the eight GPUs share four PCIe buses. Each node also has 128 GB of memory, two Intel Xeon (Ivy Bridge) processors with ten 2.5 GHz cores (so a total of twenty cores per node), two 1 TB local hard drives configured in RAID-0 and an InfiniBand card. Each node of the cluster is connected to the others by a non-blocking QDR InfiniBand network.

Detailed topology of Helios' K20 nodes

Helios' K20 nodes have the following topology:

[name@server $] nvidia-smi topo -m
        GPU0    GPU1    GPU2    GPU3    GPU4    GPU5    GPU6    GPU7    mlx4_0    CPU Affinity
GPU0     X      PIX     PHB     PHB     SOC     SOC     SOC     SOC     PHB         0-9
GPU1    PIX      X      PHB     PHB     SOC     SOC     SOC     SOC     PHB         0-9
GPU2    PHB     PHB      X      PIX     SOC     SOC     SOC     SOC     PHB         0-9
GPU3    PHB     PHB     PIX      X      SOC     SOC     SOC     SOC     PHB         0-9
GPU4    SOC     SOC     SOC     SOC      X      PIX     PHB     PHB     SOC         10-19
GPU5    SOC     SOC     SOC     SOC     PIX      X      PHB     PHB     SOC         10-19
GPU6    SOC     SOC     SOC     SOC     PHB     PHB      X      PIX     SOC         10-19
GPU7    SOC     SOC     SOC     SOC     PHB     PHB     PIX      X      SOC         10-19
mlx4_0  PHB     PHB     PHB     PHB     SOC     SOC     SOC     SOC      X     
 
Legend:
 
  X   = Self
  SOC = Path traverses a socket-level link (e.g. QPI)
  PHB = Path traverses a PCIe host bridge
  PXB = Path traverses multiple PCIe internal switches
  PIX = Path traverses a PCIe internal switch


K80 nodes

On K80 nodes, the 16 GPUs share four PCIe buses. Each node asl has 256 GB of memory, two Intel Xeon Ivy Bridge E5-2697 v2 with 12 2.7 GHz cores (total of 24 cores per node), two SSD hard drives with 180 GB each, configured in RAID0 (330GB of usable space), and an InfiniBand card. Each node of the cluster is connected to the others by a non-blocking QDR InfiniBand network.

Detailed topology of Helios' K80 nodes

Helios' K80 nodes have the following topology:

[name@server $] nvidia-smi topo -m
	GPU0	GPU1	GPU2	GPU3	GPU4	GPU5	GPU6	GPU7	GPU8	GPU9	GPU10	GPU11	GPU12	GPU13	GPU14	GPU15	mlx4_0	CPU Affinity
GPU0	 X 	PIX	PXB	PXB	PHB	PHB	PHB	PHB	SOC	SOC	SOC	SOC	SOC	SOC	SOC	SOC	PHB	0-11
GPU1	PIX	 X 	PXB	PXB	PHB	PHB	PHB	PHB	SOC	SOC	SOC	SOC	SOC	SOC	SOC	SOC	PHB	0-11
GPU2	PXB	PXB	 X 	PIX	PHB	PHB	PHB	PHB	SOC	SOC	SOC	SOC	SOC	SOC	SOC	SOC	PHB	0-11
GPU3	PXB	PXB	PIX	 X 	PHB	PHB	PHB	PHB	SOC	SOC	SOC	SOC	SOC	SOC	SOC	SOC	PHB	0-11
GPU4	PHB	PHB	PHB	PHB	 X 	PIX	PXB	PXB	SOC	SOC	SOC	SOC	SOC	SOC	SOC	SOC	PHB	0-11
GPU5	PHB	PHB	PHB	PHB	PIX	 X 	PXB	PXB	SOC	SOC	SOC	SOC	SOC	SOC	SOC	SOC	PHB	0-11
GPU6	PHB	PHB	PHB	PHB	PXB	PXB	 X 	PIX	SOC	SOC	SOC	SOC	SOC	SOC	SOC	SOC	PHB	0-11
GPU7	PHB	PHB	PHB	PHB	PXB	PXB	PIX	 X 	SOC	SOC	SOC	SOC	SOC	SOC	SOC	SOC	PHB	0-11
GPU8	SOC	SOC	SOC	SOC	SOC	SOC	SOC	SOC	 X 	PIX	PXB	PXB	PHB	PHB	PHB	PHB	SOC	12-23
GPU9	SOC	SOC	SOC	SOC	SOC	SOC	SOC	SOC	PIX	 X 	PXB	PXB	PHB	PHB	PHB	PHB	SOC	12-23
GPU10	SOC	SOC	SOC	SOC	SOC	SOC	SOC	SOC	PXB	PXB	 X 	PIX	PHB	PHB	PHB	PHB	SOC	12-23
GPU11	SOC	SOC	SOC	SOC	SOC	SOC	SOC	SOC	PXB	PXB	PIX	 X 	PHB	PHB	PHB	PHB	SOC	12-23
GPU12	SOC	SOC	SOC	SOC	SOC	SOC	SOC	SOC	PHB	PHB	PHB	PHB	 X 	PIX	PXB	PXB	SOC	12-23
GPU13	SOC	SOC	SOC	SOC	SOC	SOC	SOC	SOC	PHB	PHB	PHB	PHB	PIX	 X 	PXB	PXB	SOC	12-23
GPU14	SOC	SOC	SOC	SOC	SOC	SOC	SOC	SOC	PHB	PHB	PHB	PHB	PXB	PXB	 X 	PIX	SOC	12-23
GPU15	SOC	SOC	SOC	SOC	SOC	SOC	SOC	SOC	PHB	PHB	PHB	PHB	PXB	PXB	PIX	 X 	SOC	12-23
mlx4_0	PHB	PHB	PHB	PHB	PHB	PHB	PHB	PHB	SOC	SOC	SOC	SOC	SOC	SOC	SOC	SOC	 X 	
 
Legend:
 
  X   = Self
  SOC = Path traverses a socket-level link (e.g. QPI)
  PHB = Path traverses a PCIe host bridge
  PXB = Path traverses multiple PCIe internal switches
  PIX = Path traverses a PCIe internal switch



Filesystems

Helios shares the Lustre filesystems of Colosse.

Useful Links

You will find more information about Helios in the appropriate section of the following pages:

Outils personnels
Espaces de noms

Variantes
Actions
Navigation
Ressources de Calcul Québec
Outils
Partager