On SaaS Products

Developing SaaS? Forget Scrum, Check Out Kanban and Similar Approaches


Earlier this week I’ve had a chance to present WebCollage’s agile development methodology at a local Agile Practitioners meeting.

At WebCollage, we are releasing a new version of our SaaS based solution to our customers every two weeks. We released 23 versions in 2011, and will be releasing the 6th version of our software over this upcoming weekend. In other words, we are firm believers in agile development and in its ability to help obtain continuous market feedback (here’s a previous post on this topic).

For various reasons, though, agile development has become somewhat synonymous with one specific approach, namely Scrum. Realizing that Scrum is widely accepted, I previously expressed an opinion that Scrum is perhaps an interesting recipe, but is far from being the best approach to SaaS agile development (and web application development in general). I have received quite a lot of feedback on that other post, some with contrarian views arguing that Scrum is perhaps a silver bullet after all.

There’s always something to be said for using the most popular approach. As an old IT saying goes, no one ever got fired for buying IBM. In this regard, there are intrinsic advantages to using Scrum, most notably the industry ecosystem: ability to easily find knowledge, share best practices, etc.

Insomuch as the actual methodology goes, though, there are simply better alternatives for many software development scenarios. Here’s a sketch of how we at WebCollage develop software, and the advantages it has over Scrum. Our approach is an adaptation of Kanban/Lean software development.

As mentioned above, our approach is based on software Kanban, so a few words about software Kanban are in place. Oftentimes, software Kanban is marketed as an “evolutionary” approach rather than a “revolutionary” approach (a trait attributed to Scrum). A different way to view Kanban, though, is as a set of (Agile) principles and practices, stripped out of some of Scrum’s “New Age” ideas.

In the end of the day, when it comes to a software development methodology, whether people should have stand-up or sit-down meetings, work in the same office or in separate offices, or hang around for beer after office hours, are all nice preferences, but have nothing to do with software development in particular. Some people management practices are arguably more successful than others, especially in the 21st century (empowering team members, for example, is pretty much taken for granted). But, clearly software can be developed with multiple approaches to managing people and to facilitating communication.

Either way, below are some of the highlights of our approach.

1. We use a simple (traditional) development “pipeline”

In the end of the day, ever since the “old” waterfall days, most software development goes through a series of steps that have some planning (be it MRDs, PRDs and SRSs, or story cards), some development, some testing (or validation), and then some release process.

(One may note that this somewhat high level, not to say naïve, set of steps is as applicable to non-software projects as it is to software.)

Perhaps the biggest change introduced by agile development is the understanding that software can be developed more effectively by shortening each of the steps and executing them in parallel, at least to a certain degree.

In other words, iterative (or agile) development looks closer to the picture below (with more or less overlap between the steps based on the specific circumstances):

With the Kanban approach, each Issue (Feature, Story, …) follows the same “pipeline”, albeit potentially at a different pace. For example, the planning step for Feature A might require a short discussion, while for Feature B it may require a set of team meetings. For Feature C, the planning step might require technical research, while for Feature D the solution may be straightforward. Unlike typical Scrum-based approaches, we at WebCollage are not trying to use the same recipe for all types of Issues (e.g., there’s no fixed-duration Sprint Planning Meeting).

2. We parallelize and visualize

With the Kanban approach, multiple Issues can be and are in fact executed in parallel. Traditional project management techniques cannot cope with such a large number of moving parts. Instead, a Kanban-based approach uses a Kanban Board like the one below to track the phase in which each Issue is at. At a given point in time, there may be (and typically will be) Issues at multiple phases of the pipeline. This approach is sometimes referred to as visual control.

The diagram below shows a very partial list of Issues that were open in our system at a given point in time (for confidentiality reasons, I’ve only included relatively insignificant and technical issues):

The above Kanban Board might be seen by some as the equivalent of a sprint backlog and a task board, using the Scrum jargon.

3. We have well-defined flow rules

Kanban literature (based on Toyota’s manufacturing process) emphasizes the concept of “pull”, which is rooted at Toyota’s just-in-time approach. Back at Toyota, when a certain “piece” was needed (for example: a car wheel), it was ordered “just in time” using a paper note, called Kanban in Japanese. Some argue that the software analogy is that when (say) a developer has completed a task, they “pull” the next Issue from a bucket of Issues that are to be developed.

Personally, I don’t feel that the metaphor is critical to success. In the end of the day, one needs to define how Issues flow through the system. Whether a tester “pulls” an Issue or whether the Issue is “pushed” by a developer downstream to a tester seems to be a matter of definition, with little practical difference. The important part is to ensure that the issues actually flow through the system and not pile up in one phase (a concept called Limit the WIP in the Kanban jargon).

We at WebCollage manage the active Issues in a collection called In Play (often referred to as Work in Progress in the Kanban literature). We use the following transition rules to let an Issue flow through the pipeline:

Whenever a developer starts working on an Issue, they transition it to the In Development state. Once they’ve completed development (which includes unit tests) and submitted the code to the mainline, the continuous integration server builds the change and transitions the Issue to a Development Complete status.

Similarly, when a tester starts testing the Issue, they transition it to an In Testing state, and when the Issue is completely verified, it is transitioned to a Testing Complete state.

We normally do not transition Issues backward. When a defect is found, it is opened as a new sub-issue, and is managed through a short (OpenResolveClose) cycle. Due to the short time lag between development and the next step of verification, developers can usually fix defects extremely quickly, and we usually don’t leave known defects open for a subsequent iteration.

4. We have multiple sources for incoming requests

At WebCollage, new Issues can come from two different sources.

First, there is the well-known Backlog. We manage two levels of backlog Issues. The first is a Wish List, which includes any idea or request that is a candidate for inclusion in the software. In many cases, this includes a request in its raw form (I need a button that does this and that), which may transform into a different approach during product design. The second is a true Backlog, which includes items that are candidate for implementation in the near future. Issues are moved from the Wish List to the Backlog, and then, when their time has come, to the In Play collection.

The second source of Issues is a Ticket system. At WebCollage, the development organization receives approximately 2 tickets a day. The Kanban approach lets us handle important tickets as they come even if they arrive during a development iteration. When an Issue is determined to be of a high priority or requires very little effort, we put it in a Fast Track, and usually address it either during the current iteration or the next one. Consequently, we can resolve an issue in an average of less than two weeks. With such a turnaround, the need for hot fixes and other exceptional handling is greatly reduced.

The ticket workflow we use is the following:

5. We communicate with the larger product team regularly

One of the omissions by most agile development methodologies is the end-to-end communication of new production functionality, including documentation, release, rollout, etc.

At WebCollage, we’ve created dashboards that communicate new functionality to people outside the core development organization (e.g., product marketing, product support, technical services). We classify new functionality (Issues) into New Feature (major impact on customers), Enhancement (tactical change), Bug (resolution of a malfunction), Internal Change (invisible to customers) and Epics (the agile lingo for a mega-change). A sample dashboard is below:

Because the status of each Issue is always up to date in the system, the dashboards always display the latest information. Dashboards are available online for the upcoming version and for the last version; the information is (obviously) available historically as well. We meet weekly to review the previous release and the upcoming one.

Examples: Developing using a flow-based approach

To illustrate how Issues flow through the system, here are three random Issues we addressed in the last release, called 2012.05. One is a Ticket, a malfunction reported by a customer; another is a small Enhancement; the last is a lengthy Feature (a rewrite of a GUI component). For confidentiality reasons, I’ve omitted the details of each Issue.

Example 1: A Customer-Reported Issue (IP-421)

Here is the sequence of events for one Ticket addressed during the iteration, as taken from the Issue management system.

Date Action
2012-02-09 7:29pm It is Thursday afternoon. The final release for version 2012.04 has been built and will make it to the market during the upcoming weekend. TKT-267 is opened for a malfunction of the previous version, 2012.03. The ticket is opened with priority P2, which indicates that the issue is not urgent. It may become a roadmap item, and there’s no commitment for a fix date.
2012-02-13 6:51pm It is now two business days later, Monday evening. Release 2012.04 is already public. Release 2012.05 is already two days into development. The ticket is reviewed by the support team (the delay is acceptable due to the relatively low priority of the original ticket). The Issue seems easy to fix, and seems important because it relates to a feature that was released in the previous iteration. The Ticket is moved to Fast Track. A linked Issue IP-421 (of type Bug) is (automatically) opened.
2012-02-15 5:01pm Two more days have passed. The Issue is reviewed by the product owner, and is identified to be of a higher priority than originally assessed, because it adversely affects heavy users. It is ranked higher in the system.
2012-02-15 6:20pm An hour and 20 minutes pass. The Issue is now at the top of the queue, and is picked by a developer. It is marked as In Development.
2012-02-15 9:12pm 3 hours have passed. The Issue has been resolved, and the fix is submitted to the mainline. The continuous integration server marks it as Development Complete.
2012-02-15 11:01pm An hour and 50 minutes have passed. The continuous integration server has completed building the component, successfully running all tests. It associated the Issue with the newly built binaries.
2012-02-16 11:13am It is now the following morning, Thursday. The QA lead notices the Issue in the Development Complete step, and assigns it to a specific tester.
2012-02-16 12:00pm 45 minutes have passed. The designated tester moves the Issue to In Testing. It is now Thursday afternoon.
2012-02-19 6:59pm A weekend has passed. During the weekend another ticket, TKT-278, is opened with the same symptoms. The new ticket is linked to the same Issue.
2012-02-20 12:16pm It is now Monday. The iteration will be complete by the end of this week. The tester has completed testing and moves the Issue into Testing Complete.
2012-02-23 10:05pm It is now Thursday night. The version is being released and Issue IP-421 is marked as Released. The two tickets, TKT-267 and TKT-278, are marked as Resolved and the openers are notified of this new status by e-mail. The software will be installed in the production environment shortly.

At no point during the execution of this Issue was it formally estimated (neither using Story Point nor using any other point system). No SRS document or Story Card was produced.

Example 2: A Minor UI Change (IP-412)

Here is the sequence of events for another Issue, a minor UI change:

Date Action
2012-02-12 4:58pm It is Sunday, a working day in our R&D facilities in Israel. A new iteration, towards release 2012.05, has just started. The Issue is moved from Backlog to In Play, and named IP-412. It is assigned to a specific developer. Its ranking is increased. This is a small visual enhancement, so the issue is marked as requiring visual design guidance (this is not a phase in the process but an indicator set up for the Issue).
2012-02-14 10:34am Two days pass. It is Tuesday, three days into the iteration. The designer uploads the revised graphics.
2012-02-15 9:07pm Another day passes. It is Wednesday night. The developer moves the Issue to In Development.
2012-02-15 9:12pm 5 minutes have passed. The developer has replaced one graphics file with another. The developer submits the Issue to the mainline. The continuous integration server marks the issue as Development Complete.
2012-02-15 11:01pm An hour and 50 minutes have passed. The continuous integration server has completed building the component, successfully running all tests. It associated the Issue with the newly built binaries.
2012-02-16 11:12am It is the following morning, Thursday. The QA lead notices the Issue in the Development Complete step and assigns it to a specific tester.
2012-02-19 9:37am A weekend has passed. It is now Sunday, the beginning of the working week in Israel. The tester is starting to test the Issue, marking it as In Testing.
2012-02-19 9:53am 16 minutes have passed. The tester has validated that the change correctly addresses the business need expressed, and transitions the Issue to Testing Complete.
2012-02-23 10:05pm It is Thursday night. Version 2012.05 is released and Issue IP-412 is marked as Released. The Issue appears in the dashboards as part of release 2012.05, as an Enhancement. The software will be installed in the production environment shortly.

At no point in the execution of this Issue was it discussed by the team (neither in a daily stand-up meeting nor in any other meeting). In fact, any type of a team discussion would have probably doubled the overall time spent on this Issue. Not all people involved work in the same office; in fact, some people completed some of the work from home.

Example 3: A Major GUI Component Rewrite (IP-220)

Here is an example of a major GUI component rewrite. The component is presented on most leading retailer sites and used by many millions of shoppers monthly, so it must meet high quality standard. And, while this GUI rewrite is itself part of a larger Epic, we could not find a way to break it down to smaller tasks, because we could not afford to release incomplete functionality to the mass market. Similarly, our analysis indicated that splitting this task to multiple developers would not be efficient.

Date Action
2012-01-11 3:54pm The preparation of this Issue is completed. The functionality design and the visual design are ready, pending further review and tuning that will occur during development.
2012-01-29 9:07 Almost three weeks have passed. The design was reviewed and discussed. The developer assigned to this Issue has just completed a previous task and is now starting to develop this Issue. The Issue is transitioned to In Development.
2012-02-15 6:24pm Almost three more weeks have passed. Release 2012.04 was out during that period. The Issue is now submitted to the mainline and marked as Development Complete. It is Wednesday, and we’re not yet sure if the Issue will make it into 2012.05, which is due in the end of the following week.
2012-02-15 8:24pm Two hours later, the Issue is built by the continuous integration server.
2012-02-16 11:11am It is the following morning. The Issue is assigned to a specific tester.
2012-02-19 9:29am A weekend has passed. It is now Sunday, and testing is starting. The Issue is transitioned to In Testing. Release 2012.05 is due in four days.
2012-02-23 12:21pm Four days have passed. The Release is due later today. There are still open Defects for this Issue. Some defects are due to a faulty third-party component, which requires further communication with the vendor. The Issue will not make it to Release 2012.05. It is moved to Release 2012.06.
2012-03-08 6:10pm The Issue is officially transitioned to Testing Complete and is ready for release as part of 2012.06. A couple of defects remain. They remain in the In Play collection and will be addressed in the subsequent iteration.

The execution of this Issue was visible on the Kanban board throughout the weeks it was active. Yet, it was not formally estimated, nor was it tracked using any Burn-Down Chart.

So, why not Scrum?

If you’re a Scrum fan, you may be thinking that this can all be done using Scrum. However, the examples above illustrate some of the benefits that a flow-based approach provides over a (more recipe-based) Scrum approach:

Presumably, there are ways to deal with the above issues within the Scrum framework. But, little of Scrum remains if you strip it off the New Age ideas (which are a set of behavioral preferences) and adapt its otherwise rigid recipes to address more flexible needs.

So, the bottom line is: if you’re developing a complex customer-facing multi-disciplinary hosted software product, look beyond Scrum. Surely you can fight with your hands tied behind your back (even with your hands chopped off, you can still bite, as you can see in Monty Python’s Black Knight Fight scene). But then, why would you want to?