{"id":14215,"date":"2017-03-10T09:00:25","date_gmt":"2017-03-10T14:00:25","guid":{"rendered":"https:\/\/www.kaspersky.com\/blog\/?p=14215"},"modified":"2020-02-26T11:11:16","modified_gmt":"2020-02-26T16:11:16","slug":"office-documents-metadata","status":"publish","type":"post","link":"https:\/\/www.kaspersky.com\/blog\/office-documents-metadata\/14215\/","title":{"rendered":"How ephemeral metadata may cause real problems"},"content":{"rendered":"<p>Calling Captain Obvious\u2026come in, Captain Obvious: Which IT threat brings the most danger to enterprises, SMBs, governments, and individuals?<\/p>\n<p>The answer, of course, is <em>data breaches<\/em>. Now: Which data breaches are the hardest to prevent? And the answer is, those people don\u2019t know about.<\/p>\n<p>Today we are talking about something most people don\u2019t know or think much about, <em>metadata<\/em> \u2014 information about a file rather than information shown in a file. Metadata can turn a normal digital document into compromising intel.<\/p>\n<h2>Document metadata<\/h2>\n<p>Let\u2019s start our deep dive with a bit of theory. American law defines three categories of metadata:<\/p>\n<ol>\n<li><b>App metadata<\/b> is added to the file by the application used to create the document. This type of metadata keeps edits introduced by the user, including change logs and comments.<\/li>\n<li><b>System metadata<\/b> include the name of the author, file name and size, changes, and so forth.<\/li>\n<li><b>Embedded metadata<\/b> might be formulae in Excel cells, hyperlinks, and associated files. <a href=\"https:\/\/www.kaspersky.com\/blog\/exif-privacy\/\" target=\"_blank\" rel=\"noopener noreferrer nofollow\">EXIF metadata typical of graphic files<\/a> also belongs to this category.<\/li>\n<\/ol>\n<p>Here\u2019s a classic example of the troubles compromised metadata may bring: the UK government\u2019s 2003 report on Iraq\u2019s supposed weapons of mass destruction. The .doc version of the report included metadata on the authors (or, precisely, people who introduced the latest 10 edits). This information raised some flags about the quality, authenticity, and credibility of the report.<\/p>\n<p>According to the <a href=\"http:\/\/news.bbc.co.uk\/2\/hi\/technology\/3154479.stm\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">BBC follow-up story<\/a>, as a result of noticing the original file\u2019s metadata, the government chose to use the .pdf version of the report instead, because it contained less metadata.<\/p>\n<h3>A $20 million (doctored) file<\/h3>\n<p>Another curious metadata-powered eye-opener <a href=\"https:\/\/www.venable.com\/venables-20-million-plus-sanctions-trade-secrets-win-for-government-contractor-what-it-means-for-you-10-16-2015\/\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">involved a client of Venable, an American law firm, back in 2015<\/a>. Venable was contacted by a company whose vice president had recently resigned. Shortly after his exit, the firm lost a contract with a government organization to a competitor \u2014 a competitor working with the former VP.<\/p>\n<p>The company accused its former VP of misuse of trade secrets, saying that\u2019s how he won the government contract. In their defense, the defendant and his new firm provided as evidence a similar commercial offer prepared for a foreign government. They claimed it was created for another client <em>before<\/em> the contract pitch in the United States, and thus it did not violate the former VP\u2019s non-compete agreement with the plaintiff.<\/p>\n<p>But the defendants failed to consider that metadata in their evidence contained a time stamp abnormality. System metadata showed that the file was last saved before it was last printed, which, as an expert affirmed, could not happen. The time stamp of the last print belongs to app metadata, and it is saved in the document <em>only<\/em> when the file itself is saved. If a document is printed and is not saved afterwards, the new date of printing would not be saved to the metadata.<\/p>\n<p>Another proof of document forgery was its date of creation on the corporate server. The document was created after the lawsuit was brought to court. Moreover, defendants were accused of tampering with the time stamp of the last edit in the .olm files (that extension is used for Microsoft Outlook for Mac files).<\/p>\n<p>The metadata evidence was enough for the court to rule in favor of the plaintiffs, eventually awarding them $20 million and slapping the defendants with millions more in sanctions.<\/p>\n<h3>Hidden files<\/h3>\n<p>Microsoft Office files offer a rich tool set for collecting private data. For example, footnotes to text can include additional information not intended for public use. The built-in revision tracking in Word could also be of use for a spy. If you choose the \u201cShow final\u201d option (or \u201cNo markup,\u201d or similar, depending on your version of Word), tracked changes will disappear from the screen, yet they will remain in the files, waiting for some observant reader.<\/p>\n<p>Also, there are notes to slides in Power Point presentations, hidden columns in Excel sheets, and more.<\/p>\n<p>Ultimately, attempts to hide data without knowing how to do it properly tends not to work. A great example here is <a href=\"http:\/\/static.cbslocal.com\/station\/wbbm\/obama.pdf\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">a court document<\/a> published on CBSLocal, referring to the case of United States vs. Rod Blagojevic, ex-governor of Illinois. This is a motion for the court to issue a trial subpoena to Barack Obama, dated 2010.<\/p>\n<p>Some parts of text are hidden by black boxes. However, if you copy and paste a text block into any text editor, you can read the text in its entirety.<\/p>\n<div id=\"attachment_14218\" style=\"width: 1231px\" class=\"wp-caption alignnone\"><a href=\"https:\/\/media.kasperskydaily.com\/wp-content\/uploads\/sites\/92\/2017\/03\/06020846\/office-docs-metadata-1.png\"><img decoding=\"async\" aria-describedby=\"caption-attachment-14218\" class=\"size-full wp-image-14218\" src=\"https:\/\/media.kasperskydaily.com\/wp-content\/uploads\/sites\/92\/2017\/03\/06020846\/office-docs-metadata-1.png\" alt=\"Black boxes in PDF doesn't really work\" width=\"1221\" height=\"441\"><\/a><p id=\"caption-attachment-14218\" class=\"wp-caption-text\">Black boxes in a PDF may be useful to hide information in print, but this measure can be easily bypassed in a digital format<\/p><\/div>\n<h3>Files inside of files<\/h3>\n<p>Data from external files embedded in a document is a completely different story.<\/p>\n<p>To show a real example, we searched through some documents on .gov websites, and picked the US Department of Education\u2019s tax report for the 2010 financial year to examine.<\/p>\n<p>We downloaded the file and disabled read-only protection (which did not require a password). There is a seemingly normal graph on page 41. We selected \u201cChange data\u201d in the graph\u2019s context menu, eventually opening an embedded Microsoft Excel source file containing all source data.<\/p>\n<div id=\"attachment_14219\" style=\"width: 1376px\" class=\"wp-caption alignnone\"><a href=\"https:\/\/media.kasperskydaily.com\/wp-content\/uploads\/sites\/92\/2017\/03\/06020845\/office-docs-metadata-2.png\"><img decoding=\"async\" aria-describedby=\"caption-attachment-14219\" class=\"size-full wp-image-14219\" src=\"https:\/\/media.kasperskydaily.com\/wp-content\/uploads\/sites\/92\/2017\/03\/06020845\/office-docs-metadata-2.png\" alt=\"Embedded Excel table in Word document\" width=\"1366\" height=\"588\"><\/a><p id=\"caption-attachment-14219\" class=\"wp-caption-text\">Here is a report in a Word file, containing an Excel with an abundance of source data for this <em>and some other<\/em> graphs<\/p><\/div>\n<p>It should go without saying such embedded files might contain anything, including loads of private information; whoever published the document must have assumed that data was inaccessible.<\/p>\n<h3>Harvesting metadata<\/h3>\n<p>The process of collecting metadata from a document belonging to an organization of interest may be automated with help of software such as ElevenPaths\u2019 FOCA (Fingerprinting Organizations with Collected Archives).<\/p>\n<p>FOCA can find and download required document formats (for example, .docx and .pdf), analyze their metadata, and find out many things about the organization, such as the server-side software they use, usernames, and more.<\/p>\n<p>We must insert a serious warning, here: Analyzing websites with such tools, even for the sake of research, might be taken very seriously by websites owners or even qualify as cybercrime.<\/p>\n<h3>Documented oddities<\/h3>\n<p>Here are a couple of metadata peculiarities not all IT experts are familiar with. Take the NTFS file system used by Windows.<\/p>\n<p><strong>Fact 1<\/strong>. If you delete a file from a folder and immediately save a new file with the same name in the same folder, the date of creation will be the same as that of the file you deleted.<\/p>\n<p><strong>Fact 2<\/strong>. In addition to other metadata, NTFS keeps the date of the last access to the file. However, if you open the file and then check out the date stamp of last access in the file properties, the date remains the same.<\/p>\n<p>You might think those oddities are just bugs, but they are in essence documented features. In the first case, we are talking about <a href=\"https:\/\/support.microsoft.com\/en-gb\/kb\/172190\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">tunneling<\/a>, which is required to enable backward software compatibility. By default, this effect lasts for 15 seconds, during which the new file gets the creation time stamp associated with the previous file (you can change the interval in system settings or disable tunneling entirely in the registry). Actually, the default interval was sufficient for me to stumble across tunneling twice in a week just doing my job.<\/p>\n<p>The second case is also documented: Starting with Windows 7, for the sake of performance Microsoft disabled automated time-stamping for the time of last access. You can enable this feature in the <a href=\"https:\/\/technet.microsoft.com\/en-us\/library\/cc959914.aspx\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">registry<\/a>. However, once it\u2019s enabled, you cannot reverse the process to correct the problem; the file system does not keep the correct date stamps (as proven by a low-level disk editor).<\/p>\n<p>We hope computer forensics experts are aware of these peculiarities.<\/p>\n<p>By the way, file metadata can be altered using default OS \/ native apps and special software. That means you can\u2019t rely on metadata as evidence in a court of law unless it\u2019s accompanied by things like mailing service and server logs.<\/p>\n<h2>Metadata: Security<\/h2>\n<p>A built-in feature in Microsoft Office called Document Inspector (<em>File \u2192 Info \u2192 Inspect Document<\/em> in Word 2016) shows a user the data contained in a file. To an extent, this data can be deleted on request \u2014 although not embedded data (as in the report by Department of Education cited above). Users should take care when inserting graphs and diagrams.<\/p>\n<p><a href=\"http:\/\/help.adobe.com\/ru_RU\/acrobat\/pro\/using\/WS4E397D8A-B438-4b93-BB5F-E3161811C9C0.w.html\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">Adobe Acrobat<\/a> has a similar ability to remove metadata from files.<\/p>\n<p>In any case, security systems should manage leak prevention. For example, we have the DLP (Data Loss Prevention) module in Kaspersky Total Security for Business, Kaspersky Security for mail servers, and Kaspersky Security for collaboration platforms. These products can filter confidential metadata such as change logs, comments, and embedded objects.<\/p>\n<p>Of course, the ideal (read: unachievable) method to prevent leaks entirely is having responsible, aware, and well-trained staff.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The most dangerous data leaks are the ones people don\u2019t even know about.<\/p>\n","protected":false},"author":2049,"featured_media":14217,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[5,1788,9],"tags":[1864,1947,43,914,97],"class_list":{"0":"post-14215","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-news","8":"category-privacy","9":"category-tips","10":"tag-metadata","11":"tag-office-files","12":"tag-privacy","13":"tag-private-data","14":"tag-security-2"},"hreflang":[{"hreflang":"x-default","url":"https:\/\/www.kaspersky.com\/blog\/office-documents-metadata\/14215\/"},{"hreflang":"en-gb","url":"https:\/\/www.kaspersky.co.uk\/blog\/office-documents-metadata\/8494\/"},{"hreflang":"es-mx","url":"https:\/\/latam.kaspersky.com\/blog\/office-documents-metadata\/9001\/"},{"hreflang":"es","url":"https:\/\/www.kaspersky.es\/blog\/office-documents-metadata\/10198\/"},{"hreflang":"it","url":"https:\/\/www.kaspersky.it\/blog\/office-documents-metadata\/9924\/"},{"hreflang":"ru","url":"https:\/\/www.kaspersky.ru\/blog\/office-documents-metadata\/14277\/"},{"hreflang":"fr","url":"https:\/\/www.kaspersky.fr\/blog\/office-documents-metadata\/6809\/"},{"hreflang":"pt-br","url":"https:\/\/www.kaspersky.com.br\/blog\/office-documents-metadata\/7192\/"},{"hreflang":"pl","url":"https:\/\/plblog.kaspersky.com\/office-documents-metadata\/6369\/"},{"hreflang":"de","url":"https:\/\/www.kaspersky.de\/blog\/office-documents-metadata\/9915\/"},{"hreflang":"ja","url":"https:\/\/blog.kaspersky.co.jp\/office-documents-metadata\/14790\/"},{"hreflang":"ru-kz","url":"https:\/\/blog.kaspersky.kz\/office-documents-metadata\/14277\/"},{"hreflang":"en-au","url":"https:\/\/www.kaspersky.com.au\/blog\/office-documents-metadata\/14215\/"},{"hreflang":"en-za","url":"https:\/\/www.kaspersky.co.za\/blog\/office-documents-metadata\/14215\/"}],"acf":[],"banners":"","maintag":{"url":"https:\/\/www.kaspersky.com\/blog\/tag\/metadata\/","name":"metadata"},"_links":{"self":[{"href":"https:\/\/www.kaspersky.com\/blog\/wp-json\/wp\/v2\/posts\/14215","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.kaspersky.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.kaspersky.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.kaspersky.com\/blog\/wp-json\/wp\/v2\/users\/2049"}],"replies":[{"embeddable":true,"href":"https:\/\/www.kaspersky.com\/blog\/wp-json\/wp\/v2\/comments?post=14215"}],"version-history":[{"count":3,"href":"https:\/\/www.kaspersky.com\/blog\/wp-json\/wp\/v2\/posts\/14215\/revisions"}],"predecessor-version":[{"id":33749,"href":"https:\/\/www.kaspersky.com\/blog\/wp-json\/wp\/v2\/posts\/14215\/revisions\/33749"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.kaspersky.com\/blog\/wp-json\/wp\/v2\/media\/14217"}],"wp:attachment":[{"href":"https:\/\/www.kaspersky.com\/blog\/wp-json\/wp\/v2\/media?parent=14215"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.kaspersky.com\/blog\/wp-json\/wp\/v2\/categories?post=14215"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.kaspersky.com\/blog\/wp-json\/wp\/v2\/tags?post=14215"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}