Crime across time: date and time in crime data feeds

Date and time is important when displaying crime data publicly for obvious reasons - the public needs to know when a crime occurred to better inform themselves. Additionally, date and time is important for any type of crime analysis over time.

There are dozens of different ways to display a date and time. Every police agency addresses this differently in their crime data feeds. For example, some agencies include date and time in the same data fields, others break them apart into separate fields. Some include time, some don’t. Some write out the month, others use numbers. Some make note of the time zone, others use Unix time. Representing time is sometimes a minor headache, but not a necessarily difficult task.

The SpotCrime Open Crime data Standard (SOCS) specifically addresses the date/time issue in open crime data by specifying that date and time to be split into two different data fields. The ISO 8601 YYYY-MM-DD standard is to be used for date and the UTC and 24 hour to be used for time.

Universal Time Coordinated (UTC) is the primary time standard by which the world regulates clocks and time. It is a coordinated time scale, maintained by the Bureau International des Poids et Mesures (BIPM).

GMT is often interchanged or confused with UTC. They are not the same - GMT is a time zone, and UTC is a time standard. However, UTC and GMT represent the same time, for example 4:30 PM Monday GMT is 4:30 PM Monday UTC.

time zones of the world
The US tells time based on the Greenwich Mean Time (GMT) which is based on the solar time as read from the Prime Meridian in Greenwich, London, England. GMT is a time zone and therefore changes depending on your location on Earth (your longitude). When it is
3:30pm December 31, 2020 in London (GMT), it is also
10:30am December 31, 2020 in New York City (EST, GMT-5).
7:30am December 31, 2020 in San Francisco (PST, GMT-8), and
12:30am January 1, 2021 in Tokyo. (JST, GMT+9).

Unix time is a widely used system for describing a point in time by operating systems and file formats. It assigns a single number to a specific date and time by calculating the number of seconds that have elapsed since 00:00:00 UTC on January 1, 1970 (or the Unix epoch). Every second of the day is accounted for and ends up being a different Unix number making it easier for computers to store and manipulate dates than conventional date systems.
UNIX: 1612801744 unix
UTC: Monday, February 8, 2021 4:29:04 PM
GMT: Monday, February 8, 2021 4:29:04 PM
EST: Monday, February 8, 2021 11:29:04 AM (GMT-5)
JST: Tuesday February 9, 2021 1:29:04 AM (GMT+9)

Albuquerque, NM open crime data feed displays the date and time in Unix without the conversion to UTC or the Albuquerque time zone (CST). This isn’t wrong! But it is problematic when a regular human tries to look at the data themself. They would have to either convert everything by hand (via a converter like https://www.epochconverter.com/) or write code to convert data for them.

What we’ve also seen in a few data feeds is the date/time is converted into UTC, but not a specific time zone with no specific documentation included. (We can tell sometimes because the dates will be in the future.)

Another issue we’ve come across is how the date/time fields are structured. Programming languages have built-in data structures, but these often differ from one language to another. A majority of languages (python, R, javascript) treat time as a specific data structure as opposed to regular text.

This doesn’t make things too messy, both will allow the end user to see a date and time. The problem presents itself when the user attempts to sort the data by date and time.

If the date is treated as text, it ends up getting sorted alphabetically or numerically. This would mean that sorting text data descendingly would cause this order:
April, August, February, January, July, June, March, May
or
01/21/21, 01/21/20, 01/20/21, 01/21/20, etc.

We try to notify police agencies when this happens, mainly because if you go to sort by date it can make it look like the data is not up to date when it really is!

SpotCrime makes sure to determine what time format the agency delivers the data in and we display the local timezone for the area in which the crime occurred. So for example, if you are viewing a crime in Baltimore that is listed to have occurred at 6pm, the time zone would be Baltimore’s time zone - so 6pm EST. If you are viewing a crime in San Francisco that is listed to have occurred at 6pm, the time zone would be San Francisco’s timezone - so 6pm PST.

The more police agencies who start publishing their RMS/CAD data feeds publicly will allow for these ‘kinks’ to be worked out. And, following a data standard like SOCS can help agencies avoid these kinds of hiccups. Does your police agency publish crime data openly? Let us know!

Comments

Popular posts from this blog

SpotCrime Weekly Reads: AI, police conduct, transparency

SpotCrime Weekly Reads: Violent crime, AI tech, transparency

SpotCrime Weekly Reads: Surveillance, crime rate, prison