tests

Independent benchmark testing: Evaluating the evaluators

Independent testers are constantly examining security solutions. We have some tips on how to make use of their results.

Vyacheslav Zakorzhevsky
February 10, 2017

How can you know which enterprise solution to trust with your company’s security? You certainly can’t go by the vendors’ marketing materials, which are hardly objective. Tips and recommendations from industry peers may sound like a great resource, but in reality, they can be even less useful because every company has unique cybersecurity requirements, IT environments, and other needs. And, strictly speaking, not every advisor can realistically evaluate the state of cybersecurity in his or her own company.

That’s why it’s crucial to have access to objective information from independent industry specialists who base their evaluations on measurable parameters. By evaluating products under the same conditions, independent testers can accurately identify winners in each discipline. With their reports, a customer has the opportunity to choose products based on objective performance results.

That said, benchmark tests are not performed solely to help users make purchasing decisions. Vendors need them as well, to see whether they are on a par with the rest of the market, whether their solutions can operate effectively in the modern threat landscape, and whether they are moving in the right general direction with their products.

Gaining the top position in an independent tester’s overall rating is solid proof of a product’s effectiveness. That’s why market players willingly participate in unbiased tests of their solutions and help independent labs improve their methodologies. If a vendor aggressively markets its product’s technical excellence while failing to file for independent benchmark tests, think twice about whether the product is likely to be as good as advertised.

The tests

Sometimes, market players request special selective benchmark tests: to evaluate a new technology, to check product performance in specific circumstances, and so forth. In these cases, vendors pay all costs for test-related work and thus are entitled to choose participants or adapt the methodology. Sometimes, a vendor asks for a “duel” with a certain competitor.

Theoretically, the vendor that makes the request can require special conditions that give its product certain advantages. Or the testing case can be totally artificial and irrelevant to the real threat landscape. In those cases, the test results are not trustworthy. In general, look for a majority of market players to approve of a test’s methodology.

Sometimes testers themselves see a market’s interest in a specific technology and decide to perform selective benchmarks. For example, a tester might evaluate the quality of cloud protection, exploit protection, antiransomware tools, or protection from banking threats.

Regular comparative benchmarks occur on a schedule — annually, semiannually, or bimonthly, for example. The benchmark tests evaluate and rank competing products on the basis of various tasks. The top performers get an award.

Sometimes, such tests take place over the course of a whole year, and the lab may put out an intermediate report every so often. Those tests are called continuous comparative benchmarks. For such evaluations, a vendor has to provide its products for testing on an ongoing basis, without missing an update. This ensures each solution is evaluated under dynamic conditions and more thoroughly. It’s much harder to remain the leader over time than to become a one-time winner. Continuous benchmark tests provide a holistic view of the industry. They also shine a light on vendors that subject themselves to only one test per year (you cannot get an adequate picture of a product based on only one examination).

Some researchers run certification tests, which evaluate a single product against a limited scope of parameters. The point of this methodology is to determine if the product meets certain requirements (usually minimal). They help separate real products from fake (unfortunately, fakes are out there). This kind of testing does not provide a clear view of why a product succeeded; it’s useful for understanding a product’s merits, but you should not rely on certification alone to make buying decisions.

Methodologies

Every test laboratory uses its own methodology. In most cases, it is a result of an evolution of the various methods. Every laboratory collects independent sets of test cases. For this reason, you should look at the results of various tests carried out by different companies to get a comprehensive picture of the product’s effectiveness.

The first antivirus benchmark tests were based on primitive checks. A lab would collect a selection of viruses and scan it with each of the available products. This procedure was called on-demand scanning (ODS). A variation of such tests relied on on-access scanning, which analyzed files in the process of copying them. However, both threats and security solutions evolved quite fast. Although such benchmarks are still in use, they are worth little on their own.

Further development of the methodology involved more testing of behavioral analysis technologies. For this purpose, malware samples were executed on the machines. This kind of testing complicated the process, increasing test duration.

With malware becoming more and more sophisticated, those relatively primitive methods became increasingly irrelevant. For example, some malware functions exclusively inside a specific environment (operating system, system language, browser, installed applications, even country). Moreover, the most cunning malware samples thwart analysis by recognizing and not running in an isolated environment.

Testing methodology thus required further improvement. Enter real-world (RW) benchmark tests. The test machines and conditions closely mirror real-world specs and common user behavior. The method offers more precise results, but it’s complex, cumbersome, and expensive. That’s why only a limited number of labs run RW benchmark tests.

Sometimes testers certify based only on behavioral, or proactive, tests. During those tests, products scan threats that are guaranteed unknown to them; they must detect threats based only on behavioral analysis. The testers install the system on a disconnected PC, leave it for several months, and then feed it newly discovered samples. Sometimes, they even engineer or modify malicious code to emulate a brand-new threat. However, with cloud technologies spreading fast, that sort of approach is becoming obsolete.

Finally, a mature benchmarking methodology includes two more types of tests. Even if a solution proves effective at detecting and neutralizing malicious code, it’s totally impractical if it gobbles computing resources, and therefore, performance tests are part of the standard battery. The false positive (FP) test is even more important: A good solution should not flag a legitimate application as malicious.

How to use benchmark tests

Any organization that tests cybersecurity products should make its methodology transparent to vendors and consumers alike. How can you trust the test results otherwise?

Here are four key reasons to be skeptical about claims of the company’s products effectiveness:

The vendor uses benchmark tests that do not employ a transparent methodology;
The vendor participated in only one test, avoiding all other tests in the series;
The vendor avoids providing its testing product to independent testing experts;
The vendor participates only in tests with methodology built around artificial cases that don’t reflect real-world use.

Always evaluate test results over time to get a balanced view, and don’t stick to a single benchmarking methodology. A product’s benchmark tests should be handled by different labs for a comprehensive picture of its strengths and abilities.

Note the operating system under which tests were carried out, as well. For example, one solution might be more effective on Windows 10 than with earlier versions, or vice versa.

Keep an eye on how different products by the same vendor perform. If the overall picture isn’t great, then one winning product could be a fluke.

Kaspersky Lab is constantly in touch with leading independent labs, and provides a variety of products for benchmark testing. Test results are public and can be found here.

All about Android app permissions

What are app permissions in Android, and should you grant them?

Tips

AI beat CAPTCHA. What’s next?

For over a decade, internet users have had to squint at blurry fire hydrants, bridges, and bicycles — until AI came along. What’s next for the CAPTCHA?

Subscription security: how to protect your account, your wallet… and your sanity

Why subscription owners need to prioritize personal and family cybersecurity.

Cracked in under a minute: (nearly) every other password

We’ve revisited our study on the crackability of real-world passwords leaked on the dark web — originally conducted two years ago. The findings are sobering: nearly every other password can be cracked in under a minute, and three out of five take less than an hour. How can we move away from insecure passwords?

What happens in the bedroom stays in the bedroom

Smart sex toys and their companion apps collect and process some extremely personal data. We break down the risks involved, and ways to protect your privacy.

Privacy & Kids

Independent benchmark testing: Evaluating the evaluators

The tests

Methodologies

How to use benchmark tests

Actual case of indecorous test

New test methodology: Something went wrong

All about Android app permissions

Tips

AI beat CAPTCHA. What’s next?

Subscription security: how to protect your account, your wallet… and your sanity

Cracked in under a minute: (nearly) every other password

What happens in the bedroom stays in the bedroom

Home Solutions

Small Business Products

Medium Business Products

Enterprise Solutions

Securelist

Eugene Personal Blog

Encyclopedia

Kaspersky ICS CERT