Generally,<div>the recommended approach is to have  4TB disks and no more than 10-12 per HW RAID.</div><div>Of course , it's not always possible but a resync of a failed 14 TB drive will take eons.</div><div><br></div><div>I'm not sure if the Ryzens can support ECC memory, but if they do - go for it.</div><div><br></div><div>In both scenarios, always align the upper layers (LVM , FS) with the stripe width and stripe size.</div><div><br></div><div>What kind of workload do you have ?</div><div><br></div><div>Best Regards,</div><div>Strahil Nikolov <br> <br> <blockquote style="margin: 0 0 20px 0;"> <div style="font-family:Roboto, sans-serif; color:#6D00F6;"> <div>On Sat, Mar 18, 2023 at 14:36, Martin Bähr</div><div><mbaehr+gluster@realss.com> wrote:</div> </div> <div style="padding: 10px 0 0 20px; margin: 10px 0 0 0; border-left: 1px solid #6D00F6;"> <div dir="ltr"><br></div><div dir="ltr">hi,<br></div><div dir="ltr"><br></div><div dir="ltr">our current servers are suffering from a weird hardware issue that<br></div><div dir="ltr">forces us to start over.<br></div><div dir="ltr"><br></div><div dir="ltr">in short we have two servers with 15 disks at 6TB each, divided into<br></div><div dir="ltr">three raid5 arrays for three bricks per server at 22TB per brick.<br></div><div dir="ltr">each brick on one server is replicated to a brick on the second server.<br></div><div dir="ltr">the hardware issue is that somewhere in the backplane random I/O errors<br></div><div dir="ltr">happen when the system is under load. these cause the raid to fail<br></div><div dir="ltr">disks, although the disks themselves are perfectly fine. reintegration<br></div><div dir="ltr">of the disks causes more load and is therefore difficult.<br></div><div dir="ltr"><br></div><div dir="ltr">we have been running these servers for at least four years, and the problem<br></div><div dir="ltr">only started appearing about three months ago<br></div><div dir="ltr">our hostingprovider acknowledged the issue but does not support moving<br></div><div dir="ltr">the disks to different servers. (they replaced the hardware but that<br></div><div dir="ltr">didn't help)<br></div><div dir="ltr"><br></div><div dir="ltr">so we need to start over.<br></div><div dir="ltr"><br></div><div dir="ltr">my first intuition was that we should have smaller servers with less<br></div><div dir="ltr">disks to avoid repeating the above scenario.<br></div><div dir="ltr"><br></div><div dir="ltr">we also previously had issues with the load created by raid resync so we<br></div><div dir="ltr">are considering to skip raid alltogether and rely on gluster replication<br></div><div dir="ltr">instead. (by compensating with three replicas per brick instead of two)<br></div><div dir="ltr"><br></div><div dir="ltr">our options are:<br></div><div dir="ltr"><br></div><div dir="ltr">6 of these:<br></div><div dir="ltr">AMD Ryzen 5 Pro 3600 - 6c/12t - 3.6GHz/4.2GHz<br></div><div dir="ltr">32GB - 128GB RAM<br></div><div dir="ltr">4 or 6 × 6TB HDD SATA <br></div><div dir="ltr">6Gbit/s<br></div><div dir="ltr"><br></div><div dir="ltr">or three of these:<br></div><div dir="ltr">AMD Ryzen 7 Pro 3700 - 8c/16t - 3.6GHz/4.4GHz<br></div><div dir="ltr">32GB - 128GB RAM<br></div><div dir="ltr">6× 14TB HDD SAS<br></div><div dir="ltr">6Gbit/s<br></div><div dir="ltr"><br></div><div dir="ltr">i would configure 5 bricks on each server (leaving one disk as a hot<br></div><div dir="ltr">spare)<br></div><div dir="ltr"><br></div><div dir="ltr">the engineers prefer the second option due to the architecture and SAS<br></div><div dir="ltr">disks. it is also cheaper.<br></div><div dir="ltr"><br></div><div dir="ltr">i am concerned that 14TB disks will take to long to heal if one ever has<br></div><div dir="ltr">to be replaced and would favor the smaller disks.<br></div><div dir="ltr"><br></div><div dir="ltr">the other question is, is skipping raid a good idea?<br></div><div dir="ltr"><br></div><div dir="ltr">greetings, martin.<br></div><div dir="ltr">________<br></div><div dir="ltr"><br></div><div dir="ltr"><br></div><div dir="ltr"><br></div><div dir="ltr">Community Meeting Calendar:<br></div><div dir="ltr"><br></div><div dir="ltr">Schedule -<br></div><div dir="ltr">Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC<br></div><div dir="ltr">Bridge: <a href="https://meet.google.com/cpu-eiue-hvk" target="_blank">https://meet.google.com/cpu-eiue-hvk</a><br></div><div dir="ltr">Gluster-users mailing list<br></div><div dir="ltr"><a ymailto="mailto:Gluster-users@gluster.org" href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a><br></div><div dir="ltr"><a href="https://lists.gluster.org/mailman/listinfo/gluster-users" target="_blank">https://lists.gluster.org/mailman/listinfo/gluster-users</a><br></div> </div> </blockquote></div>