Visualizing social distancing data in Brazilian states with better and worse education using Python

Photo by Markus Spiske from Pexels


At first, this question may look nonsense, but having in mind that the poor knowledge of how viruses work and the spread of fake news make people care less about preventive measures, maybe states that offer a better education will have more aware people.

Of course things aren’t so simple, there are many other aspects that influence the amount of people who stay at home, some of them are not even optional. That said, let’s go to the data.


import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.colors as mcolors
from matplotlib.collections import LineCollection
data = pd.read_csv("")

After that, let’s clean the data so that it is in the ideal format for our analysis. We’ll select the columns that might be useful, set the date to be the index, select only the data of Brazilian states, clean the names of the states and group everything by state and get the moving average of each day.

data_br = data.loc[data.country_region == "Brazil", :].iloc[:, [1, 2, 3, 7, 8, 9, 10, 11, 12, 13]].copy()
data_br.columns = ["country", "state", "city", "date", "retail", "grocery", "parks", "transit", "workplaces", "residential"] = pd.to_datetime(
data_br.index =
data_br.drop(labels = "date", axis = 1, inplace = True)
data_br.state = data_br.state.str.replace("State of ", "")
data_br.state = data_br.state.str.replace("Federal District", "Distrito Federal")
data_br_state = data_br.loc[~data_br.state.isnull() &].copy()
data_br_state = data_br_state.loc['2020-02-24':'2020-08-16', :]
mobility = data_br_state.groupby(by=[data_br_state.index, "state"]).mean().unstack()["retail"].rolling(window=7).mean()
mobility = mobility.loc['2020-03-01':, :]

Now, we have the data from March 1st to August 16th. Initially, we selected the data from February 24th, but that was only to calculate the moving average of March 1. We’ll use the data from the retail column, which includes mobility trends for places like restaurants, cafes, shopping centers, theme parks, museums, libraries, and movie theaters.

For each state, we want to visualize the mobility trends over time, comparing with other states and showing variation between the highest and the current isolation. To begin, we’ll create the chart for only one state and, then, we make the adaptations needed so that all charts and generated together. See the result below:

Chart with data from Rio Grande do Norte

First, we need to create two auxiliary functions: get_lowest and create_segments. The former returns the lowest value of a series (the one of highest distancing), the latter divides the line in pieces of one day. We’ll use it in order to make the gradient of the highlighted line, applying one color to each segment based on its y-value.

def get_lowest(column):
return (column.idxmin(), column.min())
def create_segments(x, y):
pairs = np.stack((x, y), axis=-1)
return [np.array([pairs[i], pairs[i+1]]) for i in range(len(pairs) - 1)]

Now, we can create the chart for one state (Rio Grande do Norte, in the example) with the following code:

fig, ax = plt.subplots(figsize=(10,8))mob = mobility.copy()
mob.index = np.arange(len(mobility.index))
mob.plot(legend=False, color="grey", linewidth=1, alpha=0.2, ax=ax)
mob['Rio Grande do Norte'].plot(legend=False, color='white', linewidth=6, ax=ax)
x = mob.index
y = np.array(mob['Rio Grande do Norte'])
segs = create_segments(x, y)norm = plt.Normalize(y.min(), y.max())cdict = {
'red': [[0.0, 0.0, 0.10196078431372549], [1.0, 0.054901960784313725, 0.0]],
'green': [[0.0, 0.0, 0.4588235294117647], [1.0, 0.8352941176470589, 0.0]],
'blue': [[0.0, 0.0, 0.6941176470588235], [1.0, 0.8862745098039215, 0.0]]
blue_fade = mcolors.LinearSegmentedColormap('Blue Fade', cdict)
lc = LineCollection(segs, cmap=blue_fade, norm=norm, zorder=30, linewidth=3)
ax.set_xlim(x.min(), x.max() + 10)
ax.set_ylim(-90, 10)
ax.set_xticks([x.min(), x.max()])
ax.set_xticklabels(['Mar 1', 'Ago 16'])
low = get_lowest(mob['Rio Grande do Norte'])
end = (int(x[-1]), float(y[-1]))
plt.scatter([low[0], end[0]], [low[1], end[1]], s=150, zorder=31, color='white')
plt.scatter(low[0], low[1], s=100, zorder=31, color=blue_fade(norm(low[1])))
plt.scatter(end[0], end[1], s=100, zorder=31, color=blue_fade(norm(end[1])))
plt.plot([low[0], end[0]], [low[1], low[1]], '--', color='grey')ax.annotate("",
(end[0], int(end[1]) - 1.2),
(end[0], low[1]),
arrowstyle="-|>, head_width=0.5, head_length=1",
linestyle='--', color='crimson', linewidth=1.75),
plt.text(low[0], low[1] - 2, f"{int(low[1])} ", verticalalignment='top', horizontalalignment='center', fontsize=14, fontweight='bold', color='black', zorder=32)
plt.text(end[0], end[1] + 2, f"{int(end[1])} ", horizontalalignment='center', fontsize=14, fontweight='bold', color='black', zorder=32)
plt.text(end[0], (end[1] - low[1]) / 2 + low[1], f"{int(end[1] / low[1] * 100)}% ", horizontalalignment='right', fontsize=14, fontweight='bold', color='crimson', zorder=32)
plt.grid(color='grey', linestyle='-', linewidth=1, axis='y', alpha=0.2)

ax.tick_params(axis='y', which='both', length=0)

Now that we saw how to generate a chart for one state, let’s make a figure with charts for several states, but which ones? According to data from INEP, the top five states with better IDEB (Basic Education Development Index) for the last year of high school are: Espírito Santo, Goiás, Paraná, São Paulo and Distrito Federal. On the other hand, the state with the worst IDEB are: Mato Grosso, Bahia, Rio Grande do Norte, Amapá and Pará.

Final result

You can find the code for the final chart in the repository linked at the end of this article.




IPython Notebook



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store