Exploring the address data¶

Exploration of the address dataset from BOSA. In this notebook we present the most common streetnames, the longest streetnames and the streets with the highest amount of houses.

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
sns.set()

Reading the csv file into memory, this assumes you have the file of all addresses in Belgium saved in the following folder

belgium = pd.read_csv('../data/belgium_addresses.csv')

Most common streetname¶

We can count the amount of times a streetname occurs accross all municipalities. Between the French and Dutch names there are many similarities in names just being translations of eachother.

Dutch names¶

# Selecting only Dutch streetnames
streets = belgium[['streetname_nl', 'postcode']].drop_duplicates()
# Grouping on name and counting
result = streets.groupby('streetname_nl').count().rename(columns={'postcode': 'count'}).sort_values(by='count', ascending=False).head(10)
result

# Plot the result
plt.figure(figsize=(18, 6))
sns.barplot(x='streetname_nl', y='count', data=result.reset_index())

<matplotlib.axes._subplots.AxesSubplot at 0x7f424b9cc160>

French names¶

# Selecting only French streetnames
streets = belgium[['streetname_fr', 'postcode']].drop_duplicates()
# Grouping on name and counting
result = streets.groupby('streetname_fr').count().rename(columns={'postcode': 'count'}).sort_values(by='count', ascending=False).head(10)
result

# Plot the result
plt.figure(figsize=(18, 6))
sns.barplot(x='streetname_fr', y='count', data=result.reset_index())

<matplotlib.axes._subplots.AxesSubplot at 0x7f4293d04fd0>

Longest streetname¶

As we lack the geo data to compute the actual length of a street we resort to calculating the length of the names of the streets. With the French names there are some odd results where there seem to be comments added to the streetname.

Dutch names¶

# Select the Dutch names and map to length
long = list(map(lambda street: (len(street), street), belgium['streetname_nl'].dropna().unique()))
# Sort to get the highest length
long.sort(reverse=True)
result = pd.DataFrame(long[:10], columns=['length', 'streetname_nl'])

# Plot the results
plt.figure(figsize=(16, 10))
sns.barplot(y='streetname_nl', x='length', data=result.reset_index(), orient='h')

<matplotlib.axes._subplots.AxesSubplot at 0x7f4293618550>

French names¶

# Select the French names and map to length
long = list(map(lambda street: (len(street), street), belgium['streetname_fr'].dropna().unique()))
# Sort to get the highest length
long.sort(reverse=True)
result = pd.DataFrame(long[:10], columns=['length', 'streetname_fr'])

# Plot the results
plt.figure(figsize=(16, 10))
sns.barplot(y='streetname_fr', x='length', data=result.reset_index(), orient='h')

<matplotlib.axes._subplots.AxesSubplot at 0x7f429361ef60>

Most houses on street¶

For both the streets in Dutch and in French we can count the amount of houses on each street and see on which streets the highest amount of houses have been built.

Dutch streets¶

# Select the housenumbers of each street
streets = belgium[['streetname_nl', 'postcode', 'house_number']].drop_duplicates()
# Group and count
result = streets.groupby(['streetname_nl', 'postcode']).count().rename(columns={'house_number': 'count'}).sort_values(by='count', ascending=False).head(10)
result

# Plot the results
plt.figure(figsize=(16, 10))
sns.barplot(y='streetname_nl', x='count', data=result.reset_index(), orient='h')

<matplotlib.axes._subplots.AxesSubplot at 0x7f4293547ef0>

French streets¶

# Select the housenumbers of each street
streets = belgium[['streetname_fr', 'postcode', 'house_number']].drop_duplicates()
# Group and count
result = streets.groupby(['streetname_fr', 'postcode']).count().rename(columns={'house_number': 'count'}).sort_values(by='count', ascending=False).head(10)
result

# Plot the results
plt.figure(figsize=(16, 10))
sns.barplot(y='streetname_fr', x='count', data=result.reset_index(), orient='h')

<matplotlib.axes._subplots.AxesSubplot at 0x7f4292e83d30>

	count
streetname_nl
Kerkstraat	246
Molenstraat	244
Nieuwstraat	208
Schoolstraat	192
Veldstraat	184
Stationsstraat	175
Groenstraat	161
Bosstraat	146
Kloosterstraat	143
Kasteelstraat	133

	count
streetname_fr
Rue de l'Eglise	187
Rue du Moulin	163
Rue de la Station	133
Rue des Ecoles	128
Rue de la Chapelle	115
Rue du Centre	106
Rue de la Fontaine	106
Rue de la Gare	100
Rue du Château	97
Rue Haute	96

		count
streetname_nl	postcode
Bredabaan	2930	1046
Kortrijksesteenweg	9000	1013
Koningin Fabiolapark	9100	1000
Hundelgemsesteenweg	9820	997
Kikvorsstraat	9000	956
Zwijnaardsesteenweg	9000	936
Poseidonlaan	8420	920
Bergense Steenweg	1070	897
Alsembergsesteenweg	1180	884
Antwerpsesteenweg	9040	877

		count
streetname_fr	postcode
Chaussée de Mons	1070	897
Chaussée d'Alsemberg	1180	884
Chaussée de Wavre	1160	759
Chaussée de Waterloo	1180	673
Rue de Visé	4020	657
Avenue du Champ de Bataille	7012	625
Chaussée de Haecht	1030	553
Avenue Eugène Mascaux	6001	546
Chaussée de Bruxelles	1410	518
Rue Saint-Léonard	4000	512