Provinance: Where did this data come from?
All data provided by OakCrime.org comes directly from OPD sources. Since 2014 the bulk of the data was has been harvested daily from the OPD's FTP site. Beginning in mid-2017, OakCrime.org switched to use of the application program interface (API) provided by Oakland's primary public data repository. Because the OPD resources provide data for only the previous 90 days, OakCrime.org's retention of older incidents makes it the only source for historical data beyond 90 days.
Note that OPD publishes the following caution concerning addresses in these data sources:
Be advised that the exact address of each crime has been substituted with the block address to protect the privacy of the victim.
In addition to this daily incident data, OPD provides daily "patrol logs" concerning major incidents (i.e., those considered to be "Part 1" crimes by the FBI; details here.) These are PDF documents published via Box.com The script for parsing these PDFs to capture their data is posted as part of the OakCrime github repository. Shared incident numbers (a.k.a. RD number, case number) are used to merge the additional data provided in the patrol logs to daily incident logs.
Note that OPD publishes the following caution concerning patrol logs:
Information is still preliminary at this time the investigation continues. Until an investigator can review the incident details are subject to change. No further information is available. The Oakland Police Department is committed to transparency. However, a complete investigative process requires information be limited in order to maintain the integrity of the investigation. For this reason, only those preliminary details that do not compromise the investigation can be released at this time. Additional information will be released as soon as practical.
Data from 2007-2014: Early collaboration with OPD and Oakland's IT Department provided a retrospective corpus of crime incident data going back to 2007 to 2011. Further details regarding early stages of the OakCrime corpus construction are available here.
Augmenting early primary data, additional data from several sources has been used to provide additional details about incidents, in particular the California statute penal code (PC) and Uniform Crime Reporting (UCR) code under which the incident has been charged and reported. Early versions of the OakCrime data set relied on data from Urban Strategies (a non-profit that worked with OPD on a contract ending in 2011) to add some PC and UCR attributes. More recently, PC and UCR data has been found in OPD responses to public record requests (specifically, PRR#5885, PRR#6180, PRR#6933, PRR#7680, PRR#10437).
How is OakCrime different than CrimeMapper?
The CrimeMapping interface to OPD crime incident data is provided by The Omega Group (acquired by TriTech.com in February, 2016) as part of a suite of "CrimeView" analytic tools sold to "hundreds" of police departments across the country (their homepage used to list 370 agencies), including OPD. One product is designed for use by top-level police captains, a second version for operational use by crime analysts within OPD, another version for use by city council members and other trusted users, etc.
The citizen-facing CrimeMapping facility is a loss-leader product, provided at low cost to police deparments who have been asked to provide citizens with a subset of the other data used by the other Omega products. Because the Omega products have been designed for sale to many different police departments, the details of OPD's reporting is regularized to fit into the demands of Omega's larger market. Their FAQ page says: "The content on CrimeMapping.com is a representation of crime and is not all-inclusive." That is, there is currently no real penalty to Omega for dropping or mis-reporting OPD's data. On two previous occassions (June, 2013, details reported here; and March, 2014 details reported here ), a comparison was made between the data provided via CrimeMapping facility and via OpenOakland's data source. Summarizing important differences:
CrimeMapping currently goes back six months; OakCrime data goes back to 2007.
As of the March, 2014 analysis, CrimeMapping is dropping 70% of the data OPD reports elsewhere!
OakCrime incidents have been classified into a OPDCrimeCat ontology that supports semantically useful hierarcic queries (see below)
In the data they do report, CM also loses information regarding crime naming specifics that OPD provides and the crime categories preserve.
How is the OPD Crime Classification useful?
A careful ontology of the varieties of crime generally, and those most relevant to the situation in Oakland, is an important project for future data analysis, but no such standard currently exists. Inspired by sources such as the International Classification Of Crime For Statistical Purposes (ICCS), a classification of OPD crime incidents called OPDCrimeCat has been developed.
OPD provides two text fields describing each incident, it's CrimeType and its Description. The current OPD Crime Category system is an attempt to map all of the various CrimeTypes and Descriptions values used over the years into a consistent hierarchy of 14 top level crime classes and 57 more refined sub-classes that allow queries at varying levels of specificity.
There are typically about 40 different CrimeTypes and 250 Descriptions mentioned across any 90 days. In part because there is no support for data entry by officers, the input of the Description field can vary, eg, from:
BATTERY ON PEACE OFFICER/EMERGENCY PERSONNEL/ETC, to:
BATTERY ON PEACE OFFICER/EMERGENCY PERSONAL/ETC W/O INJURY
A simple but transparent routine for classifying incidents into the OPDCrimeCat based on CrimeType and Description attributes (cf.
harvestSocrata.classify()) has also been developed.