The Death of Web Scanners
I come here not to bury web application scanners, but to praise them.1 And then bury them a bit. Perhaps just up to the neck. On the beach. At low tide.
Setting aside web security scanning, the necessary feat of QA testing web sites has been variably difficult, non-existent, or so manually intensive that it lags the pace of development. The challenges of automation aren't specific to security, but security testing does impose some unique requirements and diverges in distinct ways from QA testing.
Reading through a complete list of automation challenges would exceed the patience of many more than the 140-character crowd. Consider these happy few:2
- Efficiently crawl sites with thousands or tens of thousands (or more!) links
- Populate form fields correctly in order to obtain new links and exercise workflows
- Identify multi-step workflows and non-idempotent requests
I hesitate to arrange these in significance or difficulty, or to explain which have reasonable solutions. Instead, I want to focus on the implications that web application design and technology trends have for automation.
Web sites are progressing towards more dynamic, interactive UIs near indistinguishable from the neolithic, disconnected age of desktop apps and strongly divorced from the click-reload pages of the 90's. That web sites have nifty UIs isn't news if the first ones that come to mind fall into the Alexa Top 20 or so. Those are the easy examples of early pioneers of this trend. Once you look lower on the Alexa list or drop off it entirely to consider large organizations' internal networks you'll find sites still designed for IE6 or that have never heard of the XHR object.
Regardless of your perspective on the pace of web security's evolution, web application technologies have been changing quickly. It's unlikely that the SQL database will disappear from the traditional web site stack, but the NoSQL5 movement will require a new set of vuln tests largely unrelated to traditional SQL injection. There are no publicly known examples of "NoSQL injection" attacks, nor even clear ideas on what an attack of that kind would look like. Yet that's no reason to avoid applying security theory to the practice of testing NoSQL-backed web sites.
Single sign-on (SSO) solutions should eventually become more widely adopted. They alleviate the burden of managing and security passwords, which is evidently difficult to do right. (Compromised credentials from database attacks number in the millions.) The distrust6 of early solutions like Microsoft Passport7 (and its MS Wallet) and the Liberty Alliance8 has been forgotten in light of Facebook Connect, Google Account, OpenID, Twitter, and Yahoo! ID. (There's possibly an SSO service for each letter of the alphabet even though they mostly use OAuth.) Privacy issues haven't been forgotten, they've just been swept aside in the face of millions of users with accounts on one or more of these sites.
By this point, you might have forgotten that we were originally discussing automated web scanning. The implications of Single Sign-On is that scanners must be able to support them. Once again this boils down to robust browser emulation — or a lot of customization to different sites' use of the SSO APIs.
Forward-looking site developers aren't satisfied with HTTP. Now that they've been getting a taste of HTML5's features, they're turning their sights to the deficiencies in HTTP and HTTPS. This means scanners should start thinking about things like SPDY10, designed for network performance, and HSTS11, designed for improved transport security. Few sites have adopted these, but considering those few sites include behemoths like Google and Paypal expect others to follow.
The acronym assault hasn't yet finished. REST is the new SOAP (at least I think so, I'm not sure if SOAP ever caught on). I've noted elsewhere the security benefits of a well-defined separation between the server-side API and client-side HTML. As a reminder, a server-side API call that performs a single action (e.g. get list of foo) can be easier to examine and secure as opposed to a function that gets a list, rewrites the page's HTML, and has to update other unrelated content.
In one way, the move towards well-defined APIs makes a scanner's job easier. If it's possible to fully enumerate a site's functions and their associated parameters, then the scanner doesn't necessarily have to crawl thousands of different links trying to figure out where the important, unique points of functionality are — it can pull this information from a documented API.
Alas, a raw list of API calls emphasizes a problem scanners already have: state context. You and I can review a list of functions, then come up with a series of security tests. For example, calling
events.create with an XSS payload followed by a call to
events.get to see if the payload was filtered, or calling
admin.banUsers from a non-admin account to see if authorization controls are correctly enforced. A dull scanner, on the other hand, might make calls in a poorly chosen order. In a somewhat contrived example, the scanner might call
events.get followed by
auth.expireSession (which logs out the user). This causes any subsequent API call to fail (at least it should) if the call requires an authenticated user.
Before we finish, permit a brief aside to address the inevitable concern trolls. There's a Don Quixote contingent fighting straw man arguments that automation is useless, unusable, disusable, will never replace manual testing, and so on. This article doesn't aim to engage these distractions.12 I can control quote-mining no more than I can raise the sun. This paragraph serves as a warning about taking statements out of context or twisting its intent. To be clear: I think a degree of automation is important, accurate, and scaleable. And possible. The goal is to accompany technology trends, not trail them.
HTML and HTTP may not have changed very much in the past decade, but the way web sites cobble them together surely has. As web apps grow into more complex UIs, scanners must more accurately emulate (or outright embed) a browser to make sure they're not missing swaths of functionality. As APIs become more common, scanners must dive into stateful modeling of a site. And as new web specifications and protocols become widely adopted, scanners must avoid the laziness of dealing solely with HTTP and HTML4.
It's twilight for the era of simple scripting and unsophisticated scanners. The coming tide of HTML5, new plugins, protocols, and complexity make for a long night. With luck, some scanners will survive until dawn.
1 An uniambic inversion of Mark Anthony's speech. http://shakespeare.mit.edu/julius_caesar/julius_caesar.3.2.html
3 Regardless of its draft status, all modern browsers support at least a few features of HTML5.
s/Microsoft/Google/g and fast-forward a decade. Plus ca change, eh? http://www.wired.com/politics/security/news/2001/08/46095
12 Another of Shakespeare's Romans, Menenius, provides apt words, "…more of your conversation would infect my brain…" http://shakespeare.mit.edu/coriolanus/coriolanus.2.1.html