Ceph Updates

Ceph has been deployed to the new machines, a set of three commodity boxes – not servers. Everything is connected via InfiniBand (unfortunately IP-over-IB rather than native InfiniBand, but that’s a different topic).

It’s uncertain as to why – the machines themselves, the relative lack of memory (8 and 16GB of RAM in each), or the InfiniBand switch (a Mellanox IB5030) – but things are not going as smoothly as anticipated. Everything generally works, but there have been two random OSD crashes and constant issues with dropped SSH connections and high (sometimes over 2 seconds) ping latency. Connecting to the machines via ethernet shows now issues, though I have seen drops over the InfiniBand connection. System loading, wait time, and memory utilization is minimal.

The question then becomes whether this is the switch or the Linux IP-over-IB stack. Given that there is no alternative stack to try, the next project will be a replacement switch.