Other than most previous Wikileaks documents, the Iraq War Logs appeared in a redacted form. All proper names had been removed. At a press conference on the 23rd of October 2010 Julian Assange gave a glimpse on how this work was carried out. In the following, I am going to describe the process in my own words.
Rather than reviewing every single document one by one and deleting all proper names, Wikileaks decided to go for an intelligent approach to the matter. All content was assumed to be out, until it was whitelisted. This process is far more complex than it might appear at first sight, as a number of words could have very different meanings, depending on the context.
“Osama bin Laden” would, for instance, be difficult to pick up in a simple word by word search, as both the word “bin” and “Laden” are also part of the English vocabulary. Thus, Wikileaks approached the matter by conducting a context search, identifying common phrases and patterns in the text, and then decided whether these would be in or out. This was then supplemented by a word for word concordance search. Once the editing process was completed, a number of randomly chosen sample documents were reviewed to double check whether all names had indeed been replaced with blanks.
Even though this approach cut the work load down considerably, the editing process was still a lot of (hard) work. Broken down, the documents would have filled approximately 200,000 pages in a standard layout.
Caveat: I have no inside knowledge of the project. The above description is solely based on publicly available information.
Theme by Danetsoft and Danang Probo Sayekti inspired by Maksimer