25 Mar 2017

Outage Report from Friday March 24, 2017

What Happened

We’d like to give you some additional information about the service disruption that occurred in the Montreal (PRD.MNTL) Region on the morning of March 24th.

The Green-Light Security and Abuse team were assisting a customer with a security-related incident and as we were working with our upstream provider to resolve the issue all traffic to the system was interrupted. We engaged with the technical team regarding possible reasons for the traffic loss it became evident that the problem was an administrative oversite with the upstream data centre that resulted in our system being unprovisioned.

The issue compounded with an early created set of technical tickets related to the redundancy configuration of our DNS system that was working towards resolution from the previous week, the issues compounded and took out DNS service on (PRD.VAN) soon after. Without both locations providing DNS all other services were soon unreachable. 

Re-Organization

Our team and our business have evolved in the last 10 years and organise around these types of problems require a new way of thinking. Therefore we are going to be outsourcing the management of our hosting platform to the technical team at RepairFactory.ca.

The Repair Factory is an IT company started by Green-Light in 2012, it’s a separate team specifically focused on IT and as such is better suited from an organisational perspective to design and build out our next generation cluster and provide design and management of our future infrastructure. This is going to result in better service delivery for our legacy hosting clients. 

The Changes Coming

Starting in the next few weeks Repair Factory will be provisioning a new high availability cluster with OVH on Green-Light’s behalf, one of the largest data centre companies in the world, this system will operate in a Proxmox Virtual Environment with a clustered xfs+ceph storage backend that will allow us to seamlessly and automatically remove any problem system from service without affecting availably of email, web or DNS. This style of environment is one that The Repair Factory team has been deploying and managing for clients with great success, we believe it will prevent future problems. 

Is Green-Light a Good fit?

The problems we see need solving in the world today are not the problems we saw when we started.

One sentiment I’ve heard loud and clear from yesterday is phone access.

As our role has grown to handle behind the scenes business problems like; automating payroll, building online stories, integrating internal data silos with external portals, and the moon shots like aquaponic food security. It increasingly makes us ill-suited to handle short-term spikes in phone demand.

The point of including this here is that it’s important you know that our ability to staff the phone is something we’re less able to manage, it’s not because we don’t love you, it’s that we’re a small focused team of only 5 people, and working on an outage and staffing the phone during a spike in demand is too much for us to handle all at once.

Friday was a day where we put the phone down so we could fix the problem. 

So while we’re committed to improving the infrastructure and reducing downtime it comes without the promise of free on-demand phone support, as we move forward that will increasingly be the case.

So for those of you require that type of access we’re no longer a good fit for service and it’s better to have us help you move to someone wants to staff that call centre 24/7 than to promise that someday we might have a better call centre. 

Email help@green-light.ca to setup some time to review your needs and help you evaluate a switch if required, or call 8668045359 and if required and leave a voicemail for a callback. I promise it won’t be too long and we’ll find you a provider who’s a great fit with all the phone support you need.

Regards
Keith Page
CEO

 

Share this
11 Apr 2013

host3.green-light.ca – Mail Delivery Problem

Some clients are getting Mail Delivery Problems to Yahoo. The root of this problem has been fixed and we expect the issue to dissipate completely within 48 hours.

The issues was caused by an exposed script from an outdated version of software being used to send spam to Yahoo and a few other providers. It only takes a few hours of this type of activity to temporarily cause reputation problems with providers like Yahoo that can last many days. It highlighted an oversight in our monitoring of this particular system  that prevented us from being notified until the reputation of host3 had already begun to sink. We’re sorry for that and have made changes to prevent it in the future.

One of the direct benefits of our CodeWatch program is that it would have prevented this issues regardless of our oversight. We’re systemizing much of the work done by traditional web developers to deliver a faster  more secure platform.With CodeWatch your website code is reviewed on a regular basis for outdated components and updated by qualified professionals.  Outdated software can be a very real security problem in the online world, used to amplify and distribute the messages of malware authors, identity thieves and their ilk to unsuspecting individuals world wide.

It’s therefor increasingly important to keep any application, online or off, updated and secure with CodeWatch

 

Share this
09 Sep 2012

Host5 – Outage – RESTORED

Host5 is currently offline. It’s network path is congested at the data centre. We are working to identify the issue and resolve.

** Update 4:16pm**

An issue in the visualization platform was causing near complete packet loss. The system has been re-initialized and this has fixed the problem.

 

Share this
27 Aug 2012
19 Jun 2012
17 Apr 2012

West Coast morning outage

Update: We have had a Re-occurance of the same problem this afternoon. Iweb has provided the following information about the downtime.

Customers on our cloud implementation with iweb.ca experienced a 30 minute interruption in service at 9:11 am PST till 9:45am PST. We are still working with our upstream provider to mitigate the problem in the future.

 

Share this
29 Feb 2012

Monitor System Upgrade Complete.

After a the fury of activity the last 6 months we’ve had to double the capacity of our monitoring systems. We’re making great use of Zenoss to sharpen our daily understanding of the overall health of your critical systems. We’ll be integrating more of this intelligence into our main site over 2012. It’s going to be more than cool graphs people.

Share this
19 Nov 2011

Host3 Upgrade Scheduled

Users on host3 in Montreal will be migrated over the coming weeks to new systems. These are scheduled hardware replacements and users should expect an overall improvement in service.

Update : Dec 12, 2011

The new server configuration requires encrypted outgoing mail authentication, the default for most mobile devices. You may need to enable the option on your mail client after the migration. Contact our office for information on how, 18668045359

Share this
25 Aug 2011

Restored : Email Service Host4

We have corrected an issue that caused a 3 hour email outage between the hours of 6am MST and 9am MST for user on Host4.green-light.ca The issue relates to a new backup regime that was put in place the night before and caused disk critical to email operation to fill up. The problem has been corrected and no mail was lost but some mail was delayed in it’s delivery.

Share this
30 Jun 2011

© 2018 Green-Light. All rights reserved.