A Call for Researchers to Embrace Robust, Open Crime Data

Below is our paper that was recently published in The American Society of Criminology's newsletter The Criminologist. The American Society of Criminology (ASC) is an international organization whose members pursue scholarly, scientific and professional knowledge concerning the measurement, etiology, consequences, prevention, control, and treatment of crime and delinquency. ASC publications consist of the following: the journals, Criminology and Criminology & Public Policy, and the newsletter, The Criminologist. Check out our article on page 7 of their most recent newsletter or read it below.

Access to accurate and timely data on crime is important for any city wanting to uphold and improve policing. Unfortunately, although such data is increasingly being collected by cities, access to it is increasingly being restricted by private companies or by cities themselves, making it difficult if not impossible for researchers and policy analysts to do their jobs. This article describes the value of crime data and explores the threat that private companies and lack of standardization now pose to open access. Researchers must advocate for free, unrestricted, and timely access to robust crime data in all cities.

The value behind RMS/CAD data

On a daily basis police agencies pull reports from their Records Management System (RMS) and/or Computer Aided Dispatch system (CAD) called crime blotters or calls for service logs. Included in these logs are a list of crime incidents officers respond to throughout the day, the address location of the event, and the date and time the event happened. It is a preliminary list with no victim or suspect information and does not include the full report.

Since 2014 and the roll out of the White House 21st Century Policing Initiative hundreds of police agencies have begun embracing open crime data, publishing these logs on an hourly or daily basis directly to their own city run websites in machine readable format via an API or download button for anyone to access, use, and reshare without restrictions. It’s important to note that RMS/CAD data is not Federal Bureau of Investigation (FBI) Uniform Crime Report (UCR) or National Incident Based Reporting Statistics (NIBRS) data. RMS/CAD data is more robust, less scrubbed, and more ‘real time’ compared to the annual FBI reports.

For example, the New York Police Department (NYPD) reported information on about 75,000 FBI UCR crimes in 2015, however the NYPD receives more than 10 million calls for service a year. Additionally, the calls for service/crime blotter data is geolocatable to the address level while FBI data is limited to a city/county level.

This data has allowed for many analytical programs to evolve in an effort to reduce crime.

In the 90’s the NYPD attributed a decrease in crime to their CompStat program where weekly RMS data is compiled into statistical charts to determine which crimes to focus on. RMS/CAD data drives hot spot policing initiatives, such as in Minneapolis where they learned three percent of the city's addresses accounted for 50 percent of calls for service to the police. University of California Los Angeles’ PredPol - a ‘predictive policing’ program used by the Los Angeles Police Department, feeds artificial intelligence to predict where crimes will occur.

This analytical problem oriented policing method has fueled the creation of successful crime prevention strategies and allowed police and their public to become more proactive rather than reactive to crime - lowering crime rates and fostering accountability.

Future programs could include using blockchain to track a crime from the moment 911 is called to the report being made, to an arrest, sentencing, prison, and release. Or applying AI and machine learning to assess biases and imbalances, explain why the violent crime rate is increasing, reveal insights behind recidivism, determine if current policing methods are impactful, and make policing easier and more efficient.

Additionally, public access to crime information has an impactful effect on transparency, public safety, community relationships, and police accountability. This kind of information is imperative to protecting against and preventing crime.

Proprietary data silos

Regrettably, it is getting harder to study and implement crime solutions because the availability and access to police data is increasingly being restricted by private companies.

Police agencies rely on third party vendors to provide the RMS and CAD systems that collect and compile data. What ends up happening is the private vendor is given preferential access to public crime information - information that taxpayers have paid for. Private vendors have a monetary incentive to monopolize and silo public crime data. Monopolization allows them to control who has access to the data while selling data to industry at a premium.

Currently, the only way to access this crime data from a private vendor, without violating terms of use or the threat of a lawsuit, is to reach out to vendors directly and pay the premium to access the data, or to FOIA the data from the police agency themselves. Both options are costly, especially for researchers with limited access to grant money, and they take a considerable amount of time.

This potentially makes it easier for big businesses like Walmart, Google, Amazon to access this data faster than local residents, graduate students, or civic hackers. Additionally, locking up data this way cripples transparency, stifles innovation, and erodes the police community trust quotient.

Recently, one of the largest open crime datasets was turned off because of vendor infighting. Datasets for hundreds of police agencies nationwide are no longer available for the public to download, use, and share. This is not in the interest of data transparency or 21st century policing.

Lack of standards

Another problem is that there are no standards across jurisdictional lines when it comes to RMS and CAD data.

Every police agency has a different way of thinking about this kind of data, a different system to release the data, different computers, different vendors. Even the FBI discourages using their data for comparisons.

Some agencies release CAD, some release RMS, some release both. Some agencies only release FBI defined crime types, meaning if multiple crimes occurred within the same report number only the crime highest on the FBI crime hierarchy is released. Others break down crime types even further like noting whether a burglary is residential or business. Most times shootings go ignored - normally looped into an aggravated assault category. With CAD data, the CAD/911 codes can vary across jurisdictions and over time.

Some agencies update feeds hourly, most daily or weekly, and some only monthly. The formats range from machine readable to PDF to hard copies. And the method of delivery can be anything from an API or FTP to faxing or snail mailing.

For example, if a researcher wants a comprehensive RMS and/or CAD dataset in Cook County, the task is difficult. There are around 20+ police agencies - Chicago PD, Cook County Sheriff, Evanston PD, Schaumburg PD, Oak Park PD, Arlington Heights, Skokie PD, to name a few. There are also cross jurisdictional police agencies like the state police, transportation authorities, and university police agencies. Each agency releases data in a different format, at different rates. Some make the data available for free, others charge.

If you are trying to look at crime on a nationwide scale, applying this task across the nation is even more daunting.

A standard for this basic data - like the SpotCrime Open Crime Standard (SOCS) - would help streamline and standardize this data. Allowing cross jurisdictional comparisons to become more accurate.

Location, location, location

Location information associated with this data enhances the ability to identify problem areas and target scarce resources more efficiently. Location is important to police agencies - look at the importance police are placing on security video footage devices like Ring, the algorithms created to predict crime like Predpol, or gunshot detection devices like ShotSpotter.

A majority of police agencies release data to the block level address. Only a handful release latitude/longitude coordinates in addition to street level addresses.

Providing no coordinates adds yet another expense to researchers or civic hackers trying to geolocate the data - the cost of geocoding large datasets has increased 10 times in the past few years.

Even worse, a few agencies have moved from releasing data at the street address level to only providing the intersection or just street names. For example, Detroit recently moved from street number addresses to intersections. The explanation given was that reporters were re-identifying victims. However, what really happened was that Detroit PD took hours to respond to 911 calls. A reporter used the 911 calls for service log to try to connect with residents to learn about first hand experiences of 911 callers who waited more than 9 hours to get a police response. This is the level of accountability the community should have with police agencies, but now it will be much harder for the public to assess 911 response times in Detroit.

San Francisco Police have decided to publish crime locations by the intersections citing privacy concerns. With a population density of an estimated 18,500/square mile, moving incidents to the intersection level renders the data almost useless.

Detroit PD and SFPD’s privacy concerns are both in good faith, however, it makes for a bad tradeoff.

Imagine trying to pinpoint where on a highway a string of car jackings have occurred without latitude/longitude coordinates, block level address, or even an intersection. Any homes or businesses at intersections are going to be scrutinized as high crime businesses and residences. A corner store could potentially show an increase of thefts by 200%. The farther the point moves from the actual crime location, the more it makes the data inconsequential - especially for mapping and neighborhood alerting.

There is no documented evidence or examples of RMS/CAD crime data being used for revictimization or invasion of privacy. Instead, what has been documented is residents get upset when a crime is pinpointed to their house when it really did happen down the street. It is important to note that websites such as SpotCrime have been at this for 12 years and have found no concrete examples of re-identification with this kind of data.

What can be done

Until this type of crime data is democratized, it will continue to be costly and time consuming to compile.

Opening up RMS and CAD data to the public, press, researchers, civic hackers alike not only promotes public safety and police transparency, but it leads to innovation and better accountability which in turn solves and prevents crimes from happening. Additionally, making this data easily accessible will encourage standardization across jurisdictional lines, making the data better and more useful for all.

We at SpotCrime implore you to ask your cities and your police agencies to embrace open data. Ask your university to help - partner with local agencies to help them embrace and open up the data. Work toward installing standards, like SOCS (or something better!), to this data.

About SpotCrime

SpotCrime is a public facing crime mapping and alert website collecting public location-based crime data and delivering alerts for free through a multitude of platforms. We are not a vendor, and instead operate as an independent news agency that deals solely with crime information. We have never received or accepted funds from any government agency. Last year alone we delivered over 300 million email alerts to the public.

In addition to mapping and alerts, SpotCrime advocates for open, equal, and fair access to crime information. We have a ‘do no harm’ approach. Whenever we obtain crime data from police agencies, we ask that they share the same file publicly to their website and with anyone who asks.

We are recognized as a GovTech100 company, OpenData500 company, and our open crime data standard (SOCS) has been recognized by the Johns Hopkins Innovation Hub. We’ve also provided testimony on legislation related to access to public data in states such as Maryland and Kansas.

SpotCrime - The Public's Crime Map