I understand your initial interest in this article, but a disclaimer: porn is not the main topic of discussion in this post. Rather, I will focus on a position at Kaspersky Lab that is officially named, proudly and venerably, ‘a content analyst’. To put it simply, the content analyst’s task is creating content filter databases, which are, in turn, used in our products. The filters are included in the security solutions both for end users and corporate clients. A module responsible for content filtering is called Parental Control, for the first group of users, and Web Control, for the latter. By using the tools implemented in Parental Control, parents are able to restrict their children’s access to inappropriate websites. In order to make it happen, a security company has to know which category each site falls into and what kind of content is stored there. This where our databases come along.
Apparently, filling in the filter databases manually is not possible. According to Google, the Internet now contains approximately 15 billion web pages, which is why a majority of the scanning is performed by robots. They analyze the Internet content using our KSN cloud and automatically deliver the verdict on each web site. The decision made in the cloud is used in our mobile security solutions (including Safe Browser for iOS), and, of course, the rest of our products use them to power the components of the content filtration. The analyst’s task is to teach robots to correctly categorize the content on the Internet. A robot has to evaluate keywords and combinations on a Web page, housekeeping data, as well as images, and carry out decisions whether the content on the page falls within a certain category. For example, if the web site code contains combinations like ‘watch porn online’, ‘free porn’ or a large number of flesh-colored images and alike, a robot would categorize that given resource as ‘Pornography and erotic’ based on a set of criteria.
As a part of their job responsibilities, web analysts at times have to scan ‘inappropriate content’. We work with many content categories, including a number of highly dangerous ones:
- Cruelty and violence;
- Obscene language.
As a rule, the above-mentioned categories are, by default, ticked as restricted by the Administrator when Parental or Web Control is enabled.
As web analysts, content may vary. One should be psychologically resistant, able to abstract from certain emotions, and healthily cynical in order to work with the ‘Cruelty and violence’ category. These personal qualities are explicitly stated as required to the content analytics candidate during the very early stages of a job interview.
But that’s not our entire job… it’s difficult to describe the work process and all the peculiarities of searching the wickedest content on the Internet in detail, but, in general, the typical work day looks like this:
Morning email check
- To me personally, this process helps to get into the mood for work and evaluate the volume of tasks for today.
- Every day at 11am our team gathers in the meeting room and discusses everything done yesterday and the scope of work for today. This is an integral part of the work process, which helps to exercise discipline and ensure transparency when working on a team.
- The job is important, but who could do without a morning cup!
- On a daily basis, our tech support has to deal with user inquiries relative to incorrect blocking of web sites by content filter modules. Such requests are passed to our teams of content analysts who take care of the inquiry, analyze it, and provide feedback to users. We also work with issues like Anti-Banner malfunctioning in our products. Officially, each request takes up to three work days, but, as always, the sooner the better.
Work with content categories
- The major part of content analyst’s work is creating new categories and supporting existing ones. Now we support 15 categories and 7 languages. That is why the most important criteria for an applicant to web content analyst position is fluency in a second foreign language, besides English. We welcome candidates with degrees in philology or linguistics, and, which goes without saying, analytical thinking. In a nutshell, the candidate must be an ‘arts and math’ person, which is quite a rare case.
- When categorizing the content, the bots use databases created and supported by content analysts. Obviously, before being released, the databases go through a number of tests. The most important of them is live testing using real-life, top web sites. This test is run daily. A list of the most popular global web sites is formed every morning, and those which have not been included into our test list before, are sent to our analysts to categorize. The analysts deliver their verdict on the new sites and update the test manually. The latter is meant to ensure that the categories assigned to web sites by bots correspond to the analyst’s verdict. As the web site content may change, we have to constantly work with this test. As a rule, we initiate a live test three times a day: in the morning, in the afternoon, and in the evening, and take turns working with it.
- The employees at our department are well-known experts in protecting children from inappropriate content on the Internet: they write articles, post in blogs and are interviewed or invited as spokespeople at a number of conferences.
- Nevertheless, despite a number of articles, speeches, or interviews, the protection of children from explicit content is still a hot topic.
Surely, there are a million ways to find resources with explicit content. Parental or Web Control is not foolproof. However, we do our best to protect users, as well as their family or colleagues, from inappropriate content on the Internet by providing useful tools to organize their defense.