Liberate Crime Data: Just Say No to PDF

SpotCrime is mapping crime data from hundreds of police agencies which means large-scale consumption in many different formats. PDF (portable format document) is an antiquated file format still being used by agencies nationwide. 

PDF’s came about in the early 1990's with the introduction of the World Wide Web and gave the ability see full documents sent via e-mail. No matter what operating system being used, the document is digitally accessible and looks uniform to all viewers. In the '90's it was favorable technology. 

However, in today's standards, public data and information locked behind a PDF is disastrous.

Go behind the scenes of any company that maintains a database and find that data needs to be uploaded to the database in a machine readable format. Digitally accessible documents, like PDFs, are not machine readable.

According to Data.gov, ‘a digitally accessible document may be online, making it easier for a human to access it via a computer, but unless the relevant data is available in a machine readable format, it will be much harder to use the computer to extract, transform and process that data’. In other words, digitally accessible documents make information really easy too look at, but very hard to touch digitally.

For this reason, PDF’s do not fall under the definition of open data.

Data.gov continues, noting that ‘the degree to which information is machine readable, however, is critical to meeting priorities such as open government and open data, and directly influences, and in many cases limits, the uses citizens and other interested parties can make of that information’.

With PDFs, a lot of work and editing is required to make the data ready for database consumption. This editing takes time, slows down the rate of consumption, and in the translation process doubles the risk of computer and/or human error. Hindering access to public information creates strains on the government-public relationship. It gives the potential to make police agencies, or any government agency, look like they’re hiding something. The harder it is to consume data, the more of a chance conclusions are drawn about trust and transparency. 

We recently went back to agencies sending us data in PDF and asked for a spreadsheet format like excel.

Some agencies had no problem sending an excel file. 

A couple of agencies said their software only allows them to pull data from their system in a PDF. Any RMS or CAD system is a database. It’s probably safe to assume that any vendor who gives only a PDF download option for data from a database didn’t have the end user in mind during development.

A couple of agencies did have the ability to publish in excel, but responded to our inquiry with what we consider non-fact based thinking mainly because we aren't sure how they draw this conclusion. A direct quote from one police agency:

'No. It's against policy. PDFs can't be manipulated. Not that you would
do that but that's why it's in policy.'

Our look when agencies claim PDFs prevent manipulation (via GIPHY )

‘PDF is policy’ because they ‘don’t want to allow for data manipulation’. Unfortunately, this type of thinking is a prevalent and widespread misunderstanding found throughout all levels of government. By forcing the press and the public to convert the data, police departments not only actually increase the chances of accidental errors (manipulation), but create a trust barrier between the public and police. Implying data manipulation will happen implies that police agencies do not trust the receiver of data to do good.

If data manipulation is a true concern, then data should be published in a machine readable format and made available and accessible to everyone. The more people who have access to the information, the easier it is to cross reference for errors or manipulation.

Is your police agency publishing crime data in a PDF? Ask them to be more transparent and start publishing openly - make the data available in a machine readable format for anyone to access, use, and share!

Comments

Popular posts from this blog

SpotCrime Weekly Reads: Transparency, gun violence, crime data

SpotCrime Weekly Reads: AI, police conduct, transparency

SpotCrime Weekly Reads: Violent crime, AI tech, transparency