Regression Testing and CSS

Note: It's been a while since I originally wrote this. Nowadays I use WebdriverIO and WebdriverCSS. If you're interested in learning those tools, I've got a free short course available if you're interested

This past October I gave a presentation at the wonderful CSS Dev Conf about CSS and regression testing. Leading up to it and since, I've spent a lot of time thinking about how to solve this problem.

The difficulty with CSS testing is that CSS isn't a logical language. In better terms, there isn't logic in CSS. It's a declarative language, basically a dictionary of specific combinations of styles. So when people try and "unit test" their CSS, they're really trying to do something impossible. There's no unit of functionality to test, just a repeat of what is already defined in the language itself.

Instead of thinking of unit testing your CSS, I think it's better to think of testing in terms of the goal. All you want to do is make sure that the small change you made to one style doesn't break another (or maybe you do want to sanity a huge change). The goal isn't to make sure your CSS is correct, the goal is to make sure your design hasn't broken.

That's why I think we need to focus more on design regression testing than CSS unit testing. It's much more productive to focus on the former than the latter. Besides, much more than changing the CSS can break the design. Something as simple as a slighter longer line of text or a different class name on an element can drastrically break a site.

So how do we do this? We've seen screenshot tools (dpxdt, wraith, etc). They're great and are a step in the right direction, but I don't think they're going to provide us with a solution that the industry really grabs on to.

Straight screenshots just aren't a solid option. There is too much variance in the life cycle of a site for screenshots to be manageable in the long term. Changing content, whether over a short span of time (say a carousel that moves while a screenshot is taken) or over a long span of time (say test content that's changed since the original capture), causes havoc with image-based diffs. We need tests more resiliant to meaningless changes.

What's the solution? The problem is that there isn't an easy one. If there was, it would have been done already. I think it's going to be a combination of both screenshots for specific regions of a page and a comparison of computed styles pulled, again, for specific regions of a page. Here's my list of key features my ideal tool would have:

Very little setup needed to create baseline
Can be integrated into CICD process
Not a "unit" testing/TDD tool. More about catching regressions in
existing code than a tool for writing a new design.
Automatically tests for states like :hover, :focus, etc
Nice reporter options (can view a site that shows the results so a
design can easily sign off on it)
Works w/ responsive designs (can change screen size of each capture)
Ability to define page actions
Can send custom headers with each request (for login or A/B testing)
Runs in both headless browsers (e.g. phantomjs) and actual browsers
(using selenium)
Runs as a server. Can run locally or on a dedicated server
Runs in node.js, to provide easier integration with other FE dev
tools
Can be run via command line
Use CLI with path to server to run tests (similar to a selenium
grid)

And here's the list of difficulties I see along the way:

How to deal with changing content
Different content between dev and staging
Content that changes automatically on the page (timer/carousel/etc)
Testing designs for things like blog posts, where content will be
very different, but styles need to be tested.
Need to be able to define styles for child elements that may have
different HTML layout
CSS animations with long delays
Content that loads on scroll
Content that changes after action (Styles hidden/shown via user
action)
Click events, hover/focus styles
Parallax/scroll effects
(e.g.https://www.google.com/nexus/5/)

Other difficulties:

If HTML order/content changes, but design doesn't. Should be able to
easily test against this.
- This would be like refactoring HTML to use HTML5 elements,
  instead of plain divs
Browsers with different renderings (based on fonts)
Small changes that can cause lots of false regressions - i.e. change
the font-size 1px
Images are difficult/impossible to compare using just computed
styles
Testing :before, :after
Testing forms (visuals for required fields, testing styles applied
on certain text like error messages)
How to update to tests after the face?
Background images that contaminate screenshots/foreground
How to test animations
Should be able to run animation and test that the correct animation
takes place
Canvas/SVG
iframes
testing pages behind a login

I've started building a tool called Regret.css that will hopefully morph into this ideal toolset, but it's a long ways away from that. It's nothing more than a simple proof of concept right now, and if you're looking for something to use, dpxdt and wraith have a much richer toolset. The only nice thing (for me) about my tool is that it's under my control and is entirely node.js :-)