CS 360 Midterm Project - Love the Data

Data and Processing

For our D3 visualizations, we use 27,768 rows, each of which denote a call for service to the SF Fire Department during December 2016. Out of 34 columns, which include date/time, categorical, and location data, we use 4 columns: Call Type Group, Zipcode of Incident, Received DtTm, and On Scene DtTm.

We also used the 2017 Population dataset from the San Francisco Health Improvement Partnership with 27 rows, which show the population for each zip code in San Francisco, last updated in January 2017. As shown on the geographical map below, we derived by zip code 5 regions (Northeast, Northwest, Southeast, Southwest, and Treasure Island), which we will use in the following visualizations. The population for each region, used in the stacked bar and area charts, was calculated by adding the population of each zip code in that region.

For the stacked bar chart, the number of calls per person by region was calculated by dividing the total number of calls of each type in each region (data from SFFD) by the population of that region (data from SFHIP). For the radar chart, the percentage of each call type for each region is calculated by dividing the number of calls of each type in the region by the total number of calls in that region. For the stacked area chart, the response duration is the difference between the time the call was received and the time the responders were on the scene. For the heat map, the hour denotes the calls’ Received DtTm.

Geographic Map of Zip Codes by Region

This geomap is created using two datasets below. For the details of why choosing TopoJSON formats for creating the map can be explanined in this link: Converting Data. This geomap is implemented in D3 and TOPOJSON. If you interested in making geomap using command-line, please follows Mike Bostock's latest tutorial on this link: Command-Line Cartography

The dataset:

SF Fire Department Calls for Service

San Francisco Zip Codes

D3 implementation reference link:

Let's Make a Map by Mike Bostock

Cloropeth

Calls per Capita by Type and Region

In this stacked bar chart, the four Call Type Group categories (excluding Other, which has minimal data points) are shown by distinct bars, and the five derived regions are encoded by color in each stack, and the height of the bars represents the number of calls per person.

Viewing the whole bars, the amount of actual fire calls for the whole of San Francisco did not come up to even 1 incident per 100 persons in this month. However, there were almost 9 potentially life-threatening incidents per 100 people which the Fire Department had to respond to. Looking at the colored stacks, we can see that the eastern side of the city, consisting of the northeastern and southeastern quadrants, along with Treasure Island, experienced more incidents per person than the western side (northwest and southwest).

Used modified code from https://bl.ocks.org/mbostock/3886208.

Percent of Each Call Type by Region

Also see Radar Charts using row values

In these small multiple radar charts, the five vertices represent five Call Type Groups (including Other) in the original data; the color represents the five derived regions and the size of area shows the percentage of certain call types in a particular region. You can directly view the percentages by hovering over the vertices on the charts that use row values.

From this visualization, it’s clear that even though the total number of calls happened in each region is different, the pattern of percentage of each call type in each region is similar: half of calls are potentially life-threatening; about 20 percent of calls are alarm type; about 10 percent of calls are non-life threatening and only few percentage are fire-related calls. Not surprisingly, it reflects the context that stacked bar chart has been concluded eariler.

Stacked Area Chart: Calls Per Hundred Population by Call Duration and Region

In this stacked area chart, the five derived regions are encoded by color in each stack. The height of each region represents the number of calls per 100 persons during December 2016, which was calculated by dividing the total number of calls in each region (data from SFFD) by the population of that region (data from SFHIP) and multiplying the value by 100. The x-axis measures the duration of the call responses, measured by the difference in timestamp between call received and on scene. Although there are records that exceed 20 minutes, 99% of the calls were responded in under 20 minutes, which motivated us to make the decision to drop the records that is over that range.

From the visualization, we can observe that the majority of the calls were responded within 6 minutes, with the most responded in 2 minutes. In particular, Treasure Island in generally took longer than the four contiguous regions, which might be due to the fact that it is located away from the mainland, and it takes the fire department longer to reach the site. Meanwhile, the NW region accounts to the least amount of calls per hundred population, while the NE region accounts to the most amount of calls.

Used modified code from https://bl.ocks.org/mbostock/3885211.

Number of Calls per Person by Hours

In this heat map, the regions are shown in the rows and the hours of the day are shown in the columns. The darkness of the color represents the value of the danger index, which is the scaled ratio of calls per person calculated by the formula: new value = min + (max-min)/(largest-smallest) * (original value - smallest), from https://usf-cs360-2017.github.io/homework2-ngchwanlii/. The red slider bar below shows the danger index percentage and have been categorized as Super Safe at 0%, Moderate at 20%, Considerable at 50%, High at 80%, and Extremely Dangerous at 100%. The General Information table shows the summary of ratio of calls per person on each region. The table row highlighted with red color shows the region with the highest danger index, while the green rows represent the regions that are more safe compared to the other regions. The final row of table also shows the danger index for San Francisco as a whole.

From this heatmap visualization, we can visualize which hour has higher call/per person ratio (danger index) based on different categories (Regions / Neighborhoods). As we see from the heatmap and switching in between regions and neighborhoods, we could find out that higher call/per person ratio (danger index) occurs in between 9am to 8pm. Furthermore, the peak ratio happens in between 4pm to 6pm. It is interesting to find out that even though Treasure Island has low population in overall, but it has high call/per person ratio (danger index).

Interactivity:

You can click on all categories to see the pattern across all the regions.

You can also click on the drop down menu to filter the different datasets based on the derived SF Regions or Top 5 Neighborhoods by population.

You can also hover on each cell to show the danger index of that region at that hour on the danger index table that appears below.

The dataset:

SF Fire Department Calls for Service

Anaylsis Neighborhoods

D3 heatmap implementation reference link:

Day / Hour Heatmap

Each Category All Categories