I’ve been doing upgrades and migrations for businesses for a decade. It never goes as planned, and in general I advise clients that change is painful.
Unfortunately THR has undergone major change, and is experiencing the related pain.
What I did was move from a relatively simple setup that was failing due to frequent power outages, plus a server with a hardware issue that resulted in failure to start properly after the power went out. We’ve moved from that datacenter to one that seems to be much better, and we’ve moved away from the failing hardware*.
Now we’ve got better power, better bandwidth, more capacity for servers and services, and a whole lot more moving parts. The moving parts and their interaction is where the problem seems to lie.
Things aren’t as good as I hope they will be, but we’re moving in the right direction. Getting everything settled down will require vigilance and careful attention to detail. The instinctive response in these situations is to brainstorm what might be going wrong, replace all the affected moving parts, and hope that the problems resolve themselves. Unfortunately that is generally the wrong instinct**, so we will be moving more methodically instead. Every proven problem (as in, we can replicate the problem) is corrected, data is gathered, and when the next failure happens then it will get addressed in turn.
Please be patient as we get things settled down.
* Dell has diagnosed the problem with the old machine as a bad motherboard, and will be repairing it tomorrow, but I’m inclined to avoid that server until everything else is diagnosed properly.
** In diagnosing the problem with the server mentioned above, I found a way to work around the problem. Replicating the failure and working through a proper diagnosis identified a much bigger problem (a failing motherboard) that would have bitten us eventually, so getting it corrected rather than just working around the problem is the best solution.