Maintenance Night Peace of Mind

If you work with the WCI Portal suite for any length of time, then I'm willing to bet that at some point you've run into the following situation: You deploy a change to your Development environment and everything looks great. The customer takes a look, is excited, and approves the change to go to production. During your next maintenance window, you deploy the shiny new change to production, test that it works and call it a night. The next morning, things don't look so rosy. The change you made is working great (that's the good news). The bad news? The other half of the website is broken!

Testing a change in your Development environment is definitely better than "winging" it by deploying to production untested. But, as we all know, keeping a Development environment in sync with the Production environment is challenging, time consuming, and is pretty much always neglected. So, how can we ensure new changes introduced into production work as intended. And, perhaps more importantly, how can we ensure that nothing else broke as a result?

As you may already know, here at F1, we created a product called Watcher to tackle the problem of monitoring the health of your WCI services. Watcher will notify system admins by email whenever a back-end service (like automation, publisher, tomcat, iis, etc) has crashed or recovered. See here for more info about Watcher.

After using Watcher on my project for the past year, it would be very difficult to transition to a project without it. Once you have a comprehensive rule set defined for your environment, Watcher will notify you of most of the problems in your environment before they are even problems! For example, last week, Watcher helpfully notified us that the hard disk on our file server was 90% full and that a service related to oracle database backups was failing. Thanks to Watcher, these two issues were resolved in a day, before any end users noticed. Without watcher, these 2 events might very likely have gone unnoticed and might have caused a pretty significant portal outage and/or loss of valuable data.

Watcher is great, but since it lives in the back-end servers. It's not possible for Watcher to know what's going on in the end user's browsers. For example, Watcher has no way to detect JavaScript problems, browser version inconsistencies, problems with browser or proxy cache, etc. So, how do we test the front end? If you've not heard of it before, I'd like to suggest investigating a solution called Selenium. Selenium is an open source java project that can record and then play back an end user's clicks and actions in a browser. An example is worth a thousand words. Let's use Selenium to record the clicks and actions needed to ensure that we can log in and then log off successfully to a wci portal.

Click the following link to view a screencast that shows how to create a simple Selenium Test Case:

Simple Selenium Test Case - Log into WCI Portal

As demonstrated, it took less than 2 minutes to create a selenium script, fix a small bug with the script and then replay the script to ensure that portal login is functioning as expected! And that's really just scraping the surface of what Selenium is capable of! The Selenium IDE offers dozens of commands you can use to record, test and replay actions inside a web browser session. If you require more advanced scripts, then the Selenium Test Cases can be exported out of Selenium IDE and into source code (ruby, java, c#, perl, php, groovy, and more). The source code can then be executed to run the test cases (simulate user actions inside a browser) outside of the Selenium IDE. Once you have a selenium test case exported as source code, the sky's the limit on what you can test!

On my current project, all team members are encouraged to record a new Selenium Test Case whenever they make any kind of change to the system (updates to portlet source code, installing new servers, custom UI changes, etc). We have Maintenance Windows scheduled periodically so that we can take systems offline to deploy our changes. Maintenance Windows are typically scheduled off hours and sometimes we find ourselves working late at night deploying new changes to production. It's really helpful to run the automated selenium tests before and after the changes are applied. When the tests run successful we can be confident that the latest changes are working as expected and that nothing else broke in the process.

If you think that's cool, take a look at the Selenium Grid Project. Selenium Grid adds the ability to run your selenium scripts in parellel, from multiple clients, and from multiple types of browsers! When your selenium test suite grows to dozens of tests, it can be time consuming to run each test case back to back to back. Selenium Grid can help speed up the process by running several tests simultaneously. And for even more peace of mind, you might set up Selenium Grid to run each test in a different browser.

I think taking time to learn about selenium and write test cases is worth the effort. I was definitely glad that I did at 2am last Maintenance Window!

Stay In Touch