Web App Security Scanner Comparison

September 25, 2009

Consciere recently had the opportunity to use two widely-recognized commercial web application security scanners against a real-world target built with ASP.NET. The target app was small, with a handful of pages protected by a sign-on form, and we performed remote unauthenticated scanning. Although perhaps un-representative because of scale, structure, and nature of the testing, we nevertheless felt our results provided some insight into the state of the art in automated dynamic web application security assessment. Some of the things we found interesting included:

Large disparity in quantity of findings. The first product found 2 Critical severity (4 instances each), 1 High (5 instances), 3 Medium (9, 9, and 1 instances respectively), 4 Low (1 instance each), 5 Informational (4, 1, 1, and 152 instances), and 3 Best Practices (2, 1, and 2 instances) vulnerabilities for a total of 13 vulns and 195 instances. The second tool found 1 Low severity vulnerability (1 instance), yielding a 1,200% difference in quantity of findings. Clearly, the first product implemented much more sophisticated checks with greater efficiency (both scans took under 5 minutes), indicating to us that there are differences in quality of commercial web app scanning tools. The quality of the findings was also a factor however, as we discuss next…

Signal-to-noise ratio was about 30%. We spent about a day manually validating the Critical, High, and Medium findings from the first tool (6 vulns, 28 instances total). This seemed noteworthy by itself; consider the amount of effort to perform such validation on a larger scale application(s). Based on our findings, the manual validation was worthwhile. Of the 2 Critical vulnerabilities, 1 was a false positive (diagnosed as Cross-Site Scripting, XSS, since the error page returned user input, even though it was escaped and we were unable to manually execute the input), and one appeared to also be a false positive, but turned out to be a false negative of a sort (XSS input by scanner would not execute, but we did find a variant that executed after some experimentation). Of the 13 vulns found by the first tool, we only found two issues that justified expedited mitigation, and two that warranted low-priority remediation. We were unable to verify the single Low severity result of the second tool, so it appeared to be a false positive. Thus our extrapolation of signal-to-noise across both tools was 4/14 = ~30%, not an encouraging ratio for those who simply rely verbatim on canned scanner reports.

Severity rankings made prioritization difficult. In both of the cases discussed above, XSS was non-persistent and did not appear to warrant a Critical severity ranking. The vulnerability scored “High” was “ActiveX Control Discovery,” and the description alluded to known issues with the Microsoft Active Template Library (ATL) included with Visual Studio used to create ActiveX controls. The “Medium” vulnerabilities were innocuous, relating to IP addresses in headers, failing to mark cookies “secure,” and the use of persistent cookies. We were unable to find descriptions of the severity ratings in the reports generated by the tools, and overall struggled to relate these ratings to probability or impact of exploitation, or amount of effort required to fix the identified issues, leaving us confused as to how to prioritize remediation efforts. Once again, the level of manual refinement required to produce actionable recommendations was concerning.

Overall, we felt the verdict mixed: the first tool efficiently pointed us in the direction of numerous interesting weaknesses, but the level of manual validation required to filter actionable recommendations was concerning. Other reports confirm some of our observations (see for example this comparison of IBM AppScan, HP WebInspect, and Acunetix WVS).

Unquestionably, web application scanning tools provide great value if you are accountable for the ongoing security of large (numbers of) web applications, and they should receive strong consideration for inclusion in the “first tier” of capabilities implemented in a web-centric SDL (along with threat modeling/risk analysis, design assessment, source code review, and security testing). However, be prepared to invest the additional effort to manually tune and review the output to achieve maximum effectiveness.

Comments

Got something to say?