Land Data Analysis, an Example

by Jessica Chan

Import Libraries

In [1]:
import pandas as pd #Dataset lib
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')
%matplotlib inline
import numpy as np

List all available chart styles

In [2]:
print(plt.style.available)
['bmh', 'classic', 'dark_background', 'fast', 'fivethirtyeight', 'ggplot', 'grayscale', 'seaborn-bright', 'seaborn-colorblind', 'seaborn-dark-palette', 'seaborn-dark', 'seaborn-darkgrid', 'seaborn-deep', 'seaborn-muted', 'seaborn-notebook', 'seaborn-paper', 'seaborn-pastel', 'seaborn-poster', 'seaborn-talk', 'seaborn-ticks', 'seaborn-white', 'seaborn-whitegrid', 'seaborn', 'Solarize_Light2', 'tableau-colorblind10', '_classic_test']

Download: https://data.gov.sg/dataset/land-area-and-dwelling-units-by-town
( Pandas DataFrame from csv file)

In [3]:
df = pd.read_csv("land-area-and-dwelling-units-by-town\land-area-and-dwelling-units-by-town.csv")

Data Wrangling Starts Here

List data types

In [4]:
df.dtypes 
Out[4]:
financial_year                        int64
town                                 object
total_land_area                      object
residential_land_area                 int64
dwelling_units_under_management       int64
projected_ultimate_dwelling_units     int64
dtype: object

Convert 'total_land_area' to numeric values and marked non-numeric entries, 'NaN'

In [5]:
df['total_land_area'] = pd.to_numeric(df['total_land_area'], errors='coerce')
df.dtypes
Out[5]:
financial_year                         int64
town                                  object
total_land_area                      float64
residential_land_area                  int64
dwelling_units_under_management        int64
projected_ultimate_dwelling_units      int64
dtype: object

Create DataFrames for selected towns

In [6]:
df_Yishun = df.loc[df['town'] == 'Yishun']
df_Sembawang = df.loc[df['town'] == 'Sembawang']
df_Woodlands = df.loc[df['town'] == 'Woodlands']
df_Tampines = df.loc[df['town'] == 'Tampines']
df_JurongEast = df.loc[df['town'] == 'Jurong East']
df_Woodlands #An example of DataFrame
Out[6]:
financial_year town total_land_area residential_land_area dwelling_units_under_management projected_ultimate_dwelling_units
21 2008 Woodlands 1198.0 525 58025 88000
45 2009 Woodlands 1198.0 525 58525 88000
69 2010 Woodlands 1198.0 525 58525 88000
93 2011 Woodlands 1198.0 525 59034 88000
117 2012 Woodlands 1198.0 525 60065 88000
141 2013 Woodlands 1198.0 480 61989 98000
165 2014 Woodlands 1198.0 480 62675 98000
189 2015 Woodlands 1198.0 480 63779 98000
213 2016 Woodlands 1260.0 486 64602 102000

An Example

In [7]:
df_Woodlands['total_land_area'].sum() #Total land area from 2008 to 2018
Out[7]:
10844.0

Start Plotting: (A) Line Graph

In [8]:
labels = df['financial_year'].unique() #Year list, no repeat. x-axis labels
In [9]:
plt.plot(df_Woodlands['total_land_area'].values,  color="green", linewidth=2.5, linestyle="-", label="Woodlands Land")
plt.plot(df_Woodlands['residential_land_area'].values,  color="green", linewidth=2.5, linestyle="-.", label="Woodlands Residential")

plt.plot(df_Sembawang['total_land_area'].values,  color="blue", linewidth=2.5, linestyle="-", label="Sembawang")
plt.plot(df_Sembawang['residential_land_area'].values,  color="blue", linewidth=2.5, linestyle="-.", label="Sembawang Residential")

plt.plot(df_Yishun['total_land_area'].values,  color="red", linewidth=2.5, linestyle="-", label="Yishun")
plt.plot(df_Yishun['residential_land_area'].values,  color="red", linewidth=2.5, linestyle="-.", label="Yishun Residential")

#############################################################
plt.xticks(np.arange(0, 9, step=1), labels)
plt.xlabel('Years')
plt.ylabel('Land Areas')
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)
plt.title('Land Areas from 2008 To 2016 - North')

##########################################################
# Other alternative ways to create legends 
# plt.legend( ncol=3, loc='best')
#leg = plt.legend(loc='best', ncol=1, mode="none", shadow=True, fancybox=True)
#leg.get_frame().set_alpha(0.5)
#plt.legend(bbox_to_anchor=(0., 1.02, 1., .102), loc=3,
           #ncol=1, mode="expand", borderaxespad=0.)

plt.show()

Start Plotting: (B) Pie Chart

In [10]:
sizes = [df_Sembawang['total_land_area'].sum(), df_Woodlands['total_land_area'].sum(), df_Yishun['total_land_area'].sum()]
myColors = ['gold', 'yellowgreen', 'lightcoral']
explode = (0, 0.1, 0)
myLabels = ['Sembawang','Woodlands','Yishun'] #Just 3 towns that made up the northen cluster
In [11]:
plt.pie(sizes, explode=explode, labels=myLabels, colors=myColors, autopct='%1.0f%%', shadow=True, startangle=-45)
plt.axis('equal')
plt.title('Land Areas from 2008 To 2016 - North')
plt.show()

Start Plotting: (C) Bar Charts

In [12]:
df2 = df.loc[df['town'] != 'Other Estates'] # drop rows that matched, removed as it's not a town.
num = df2['town'].nunique() #number of towns

Saving the df2 to csv file

In [13]:
df2.to_csv("land-area-and-dwelling-units-by-town\land-area-and-dwelling-units-by-town-df2.csv")
In [14]:
twn = df2.iloc[0,1] # row at 0 and column positioned at 1
twn # see what we got
Out[14]:
'Ang Mo Kio'

Create 2 lists for labels and values. Fill the lists with extractions from the while loop

In [15]:
LArea_Town = [] # Town labels
L_Town = [] # Total land area for each town for the period of 2008 to 2016
i = 0
while i<num:
    twn = df2.iloc[i,1]
    LArea_Town.append(df2[df2['town']== twn].total_land_area.mean())
    L_Town.append(twn)
    i +=1
In [16]:
width = 0.5
plt.barh(L_Town,LArea_Town, width, align='center', alpha=0.5, color ='green')
#plt.xticks(x_pos, labels)
plt.xlabel('Area (Hectares)')
plt.title('Town Land Areas From 2008 to 2016')
plt.show()

Using the Pie Chart data creates a Vertical Bar Chart.

In [17]:
sizes2 = sizes
plt.bar(myLabels, sizes2, width, align='center', alpha=0.5, color ='gold')
plt.xlabel('Towns in the North')
plt.ylabel('Area (Hectares)')
plt.title('Land Areas From 2008 to 2016 - North')
plt.show()
In [18]:
#Annex: Land Area and Dwelling Units by Town (Download: https://data.gov.sg/dataset/land-area-and-dwelling-units-by-town)
df
Out[18]:
financial_year town total_land_area residential_land_area dwelling_units_under_management projected_ultimate_dwelling_units
0 2008 Ang Mo Kio 638.0 283 48069 58000
1 2008 Bedok 937.0 408 59929 74000
2 2008 Bishan 690.0 172 19367 32000
3 2008 Bukit Batok 785.0 291 31731 47000
4 2008 Bukit Merah 858.0 312 50247 68000
5 2008 Bukit Panjang 489.0 228 29498 43000
6 2008 Choa Chu Kang 583.0 307 39173 62000
7 2008 Clementi 408.0 198 23873 35000
8 2008 Geylang 678.0 214 30417 49000
9 2008 Hougang 1276.0 354 47819 68000
10 2008 Jurong East 384.0 165 22299 29000
11 2008 Jurong West 987.0 480 68758 92000
12 2008 Kallang/ Whampoa 799.0 200 34288 54000
13 2008 Pasir Ris 601.0 318 27514 44000
14 2008 Punggol 957.0 422 17103 96000
15 2008 Queenstown 687.0 210 29077 60000
16 2008 Sembawang 708.0 317 17664 60000
17 2008 Sengkang 1055.0 397 42570 90000
18 2008 Serangoon 737.0 156 21292 29000
19 2008 Tampines 1200.0 500 62099 83000
20 2008 Toa Payoh 463.0 210 35419 48000
21 2008 Woodlands 1198.0 525 58025 88000
22 2008 Yishun 810.0 439 46613 84000
23 2008 Other Estates NaN 126 21052 25000
24 2009 Ang Mo Kio 638.0 283 48068 58000
25 2009 Bedok 937.0 408 59201 74000
26 2009 Bishan 690.0 172 19367 32000
27 2009 Bukit Batok 785.0 291 31731 47000
28 2009 Bukit Merah 858.0 312 50247 68000
29 2009 Bukit Panjang 489.0 228 29498 43000
... ... ... ... ... ... ...
186 2015 Serangoon 737.0 163 21488 30000
187 2015 Tampines 1200.0 549 67060 110000
188 2015 Toa Payoh 556.0 248 36871 61000
189 2015 Woodlands 1198.0 480 63779 98000
190 2015 Yishun 778.0 398 57538 84000
191 2015 Other Estates NaN 126 22988 25000
192 2016 Ang Mo Kio 638.0 283 50025 58000
193 2016 Bedok 937.0 418 61100 79000
194 2016 Bishan 690.0 172 20072 34000
195 2016 Bukit Batok 785.0 291 33728 53000
196 2016 Bukit Merah 858.0 312 52401 68000
197 2016 Bukit Panjang 489.0 219 35325 44000
198 2016 Choa Chu Kang 583.0 307 48900 62000
199 2016 Clementi 412.0 203 26226 39000
200 2016 Geylang 678.0 214 30704 49000
201 2016 Hougang 1309.0 367 53862 72000
202 2016 Jurong East 384.0 165 23897 30000
203 2016 Jurong West 987.0 480 73510 94000
204 2016 Kallang/ Whampoa 799.0 210 39194 57000
205 2016 Pasir Ris 601.0 318 29654 44000
206 2016 Punggol 957.0 374 43385 96000
207 2016 Queenstown 694.0 210 31504 60000
208 2016 Sembawang 708.0 331 22872 65000
209 2016 Sengkang 1055.0 397 65981 92000
210 2016 Serangoon 737.0 163 21487 30000
211 2016 Tampines 1200.0 549 67911 110000
212 2016 Toa Payoh 556.0 248 37358 61000
213 2016 Woodlands 1260.0 486 64602 102000
214 2016 Yishun 778.0 398 60904 84000
215 2016 Other Estates NaN 126 22733 25000

216 rows × 6 columns