Veteran tester Andreas Marx has done another major test of antivirus engines, and it’s worth taking a look at — notwithstanding the caveat that it’s only using the on-demand capabilities of the scanner (as opposed to real-time protection, which is another bulwark in an antivirus program’s defense of a system).
It should be contrasted with much weaker testing, such as this one.
Andreas’ results haven’t received that much attention, except for a few publications: PCWelt, VirusBulletin, and Security.nl.
So with the permission of Andreas, I’m publishing a more comprehensive look at the results.
First, some (slightly edited) commentary from Andreas:
The results are in of our latest test of 29 anti-virus and anti-malware products, performed on Windows XP (English, SP2) using the on-demand scanner utility. All products were last updated at 2007-08-10 (8:00 h GMT). The scan required about a week completing on 28 identical Core 2 Duo 6600 PCs with 2 GB RAM. We only used regular products and updates (no special or beta versions) of all scanners, in their most current edition for home users or small companies (the ones which are usually labelled “2007”, as the “2008” series of products are not yet released).
For this test, we only used current samples which were seen spreading (or which were distributed by malware authors) within the last six months. A total of 874.822 unique malware have been used for this test, including worms, backdoors, bots (zombies) and Trojan Horses.
All samples were intensively tested (e.g. if they are really malware) and replicated (e.g. to ensure that the samples are really running) before putting them in our collection — a process which took several weeks completing.
Besides looking at the detection rates of this large collection, we also checked for the size of the AV signature databases (DB) on disk.
Of course, one cannot easily compare the detection scores and the DB size, as some products includes a large set of disinfection routines (which were not reviewed) and some vendor’s DBs are compressed while others are not. All tested products are using incremental update mechanisms, so the “big” DB will only be transferred to the PC once.
Later, only the differences of this version and the newest pattern file from the AV company will be send over the internet, usually not more than 20 to 50 KB per day, depending on the program.
Products which haves small DBs might have better heuristics or generic detection routines when compared with products using large databases which might often — but not always — points to the extensive use of CRC checksums which usually can only detect one malware file per signature. Heuristic and generic detection routines are often able to detect thousands of malware files by the use of just one pattern (detection string) or algorithmic rule. Even if you can’t really compare the DB size differences of the products (it would be like comparing apples and eggs), it is interesting that some products require less than 10 MB to detect a high amount of files while other products require a lot more space on disk, but still detects less malware.
Of course, this is a “snapshot” test only and as AV updates are usually released every few hours, the results might change dramatically over time. It’s important to keep in mind that AV products shouldn’t be seen as a replacement for a proper patching of Windows and other software, or “safer surfing” practices — all individual components are important. Good scanners might get better in the next test, the results might stay at the current level or they might get worse, what wouldn’t be a good sign. So it’s essential to not only check the results of this test, but it’s a good idea to monitor the results of “your” product over time, based on different tests, to see how the product develops.
The first two products in this round of testing (AVK 2007 and WebWasher) uses two scan engines, what is good for detection scores, but which might also have some impact on scanning times and false positives. Positions 3 (BitDefender) and 4 (AntiVir) are occupied by single-engine products. The products representing position 10 (Microsoft’s OneCare and Forefront Client Security) were quite a surprise to us, as their detection scores have developed significantly in the good direction over time. (When compared with our last test, they are more than 10% up.) It looks like that the high amount of malware researchers Microsoft has hired from other AV companies (including many people from Symantec, McAfee, Trend Micro, F-Secure and CA) has paid off.
When one looks at both the DB sizes and the detection rates, the products on position 13 (Nod32) and 16 (Dr Web) appears to have the best trade-off between detection scores and signature sets: with less than 10 MB of signatures and scan engine routines they are already able to detect more viruses than the average scanner we tested (which is at a 90% level).
Trend Micro and Symantec, on the other hand, have one of the largest DBs what doesn’t point to “inefficient scanners”, but to the fact that their DBs are not compressed — and indeed, one could easily ZIP the DBs to about half of their current size. Another factor for the large DBs are a high amount of specific disinfection routines which are included, so that found malware can be removed properly (a fact, what was not tested here, but in some of our past reviews).
You can also check out the “Papers” section on his website, where more details about Av-test.org’s testing procedures and some general information can be found.
You can see the whole test results here (pdf).
Another test which is interesting is the constantly evolving results over at oitc.com (graph here, tabular data here, methodology here). Now, this is a test which has to be taken with some reservations. First, it exclusively uses VirustTotal, itself an outstanding service, but one which which does not reflect the performance of the real-time scanner portions of antivirus engines. Second, it primarily uses samples that come out of the CastleCops Malware Incident and Response team (the antimalware branch of PIRT), who submit samples to VirusTotal as they are found. This will likely bias the results toward certain types of threats versus others. However, it’s interesting to see real-world results as they happen, and it’s a further datapoint to use in evaluating antivirus engines — but should not be used as a sole evaluation of antivirus performance. A beta version of our VIPRE engine, which is going to be part of our own upcoming antivirus product, is also in the results.
Generally, I believe you’ll find that antivirus test results tend to cluster among a few top contenders, and these results are fairly consistent over time.
Alex Eckelberry