We’d like to give you some additional information about the service disruption that occurred in the Montreal (PRD.MNTL) Region on the morning of March 24th.
The Green-Light Security and Abuse team were assisting a customer with a security-related incident and as we were working with our upstream provider to resolve the issue all traffic to the system was interrupted. We engaged with the technical team regarding possible reasons for the traffic loss it became evident that the problem was an administrative oversite with the upstream data centre that resulted in our system being unprovisioned.
The issue compounded with an early created set of technical tickets related to the redundancy configuration of our DNS system that was working towards resolution from the previous week, the issues compounded and took out DNS service on (PRD.VAN) soon after. Without both locations providing DNS all other services were soon unreachable.
Our team and our business have evolved in the last 10 years and organise around these types of problems require a new way of thinking. Therefore we are going to be outsourcing the management of our hosting platform to the technical team at RepairFactory.ca.
The Repair Factory is an IT company started by Green-Light in 2012, it’s a separate team specifically focused on IT and as such is better suited from an organisational perspective to design and build out our next generation cluster and provide design and management of our future infrastructure. This is going to result in better service delivery for our legacy hosting clients.
The Changes Coming
Starting in the next few weeks Repair Factory will be provisioning a new high availability cluster with OVH on Green-Light’s behalf, one of the largest data centre companies in the world, this system will operate in a Proxmox Virtual Environment with a clustered xfs+ceph storage backend that will allow us to seamlessly and automatically remove any problem system from service without affecting availably of email, web or DNS. This style of environment is one that The Repair Factory team has been deploying and managing for clients with great success, we believe it will prevent future problems.
Is Green-Light a Good fit?
The problems we see need solving in the world today are not the problems we saw when we started.
One sentiment I’ve heard loud and clear from yesterday is phone access.
As our role has grown to handle behind the scenes business problems like; automating payroll, building online stories, integrating internal data silos with external portals, and the moon shots like aquaponic food security. It increasingly makes us ill-suited to handle short-term spikes in phone demand.
The point of including this here is that it’s important you know that our ability to staff the phone is something we’re less able to manage, it’s not because we don’t love you, it’s that we’re a small focused team of only 5 people, and working on an outage and staffing the phone during a spike in demand is too much for us to handle all at once.
Friday was a day where we put the phone down so we could fix the problem.
So while we’re committed to improving the infrastructure and reducing downtime it comes without the promise of free on-demand phone support, as we move forward that will increasingly be the case.
So for those of you require that type of access we’re no longer a good fit for service and it’s better to have us help you move to someone wants to staff that call centre 24/7 than to promise that someday we might have a better call centre.
Email firstname.lastname@example.org to setup some time to review your needs and help you evaluate a switch if required, or call 8668045359 and if required and leave a voicemail for a callback. I promise it won’t be too long and we’ll find you a provider who’s a great fit with all the phone support you need.