Home Dataplay Download And Load Merge Data Map Basics Intake An... Nb 2 Html Map Correlation Netw... Timelapse Data Gifs Retrieve Acs Data Pivot Table Sync Data

Don't Look! I'm changing!

URL Copied

 About This Tutorial:
  Objectives
 Background
  Datatypes And Geo-Data
  Coordinate Reference Systems (CRS)
  Coordinate Encoding
  Raster Vs Vector Data
   Vector Data
   Raster Data
 SETUP:
  Import Modules
 Retrieve GIS Data
  Approach 1: Reading In Data Directly
  Approach 2: Converting Pandas Into Geopandas
   Approach 2: Method 1: Convert Using A Pre-Formatted 'Geometry' Column
   Approach 2: Method 2: Convert Column(S) To Coordinate
   Approach 2: Method 3: Using A Crosswalk (Need Crosswalk On Esri)
  Approach 3: Connecting To A PostGIS Database
 Basic Operations
  Inspection
  Converting CRS
  Saving
  Draw Tool
  Geometric Manipulations
 Advanced
  Create Geospatial Functions
 Example: Using The Advanced Functions
  Playing With Points: Geoloom
   Points In Polygons
   Polygons In Points
   Folium

BinderBinderBinderOpen Source Love svg3

NPM LicenseActivePython VersionsGitHub last commit

GitHub starsGitHub watchersGitHub forksGitHub followers

TweetTwitter Follow

About this Tutorial:

In this notebook, the basics of working with geographic data are introduced.

Objectives

By the end of this tutorial users should have an understanding of:

Background

An expansive discursive on programming and cartography can be found here

Datatypes and Geo-data

Geographic data must be encoded properly order to attain the full potential of the spatial nature of your geographic data.

If you have read in a dataset using pandas it's data type will be a Dataframe.

It may be converted into a Geo-Dataframe using Geopandas as demonstrated in the sections below.

You can check a variables at any time using the dtype command:

undefined

Coordinate Reference Systems (CRS)

Make sure the appropriate spatial Coordinate Reference System (CRS) is used when reading in your data!

ala wiki:

A spatial reference system (SRS) or coordinate reference system (CRS) is a coordinate-based local, regional or global system used to locate geographical entities

CRS 4326 is the CRS most people are familar with when refering to latiude and longitudes.

Baltimore's 4326 CRS should be at (39.2, -76.6)

BNIA uses CRS 2248 internally

Additional Information: https://docs.qgis.org/testing/en/docs/gentle_gis_introduction/coordinate_reference_systems.html

Ensure your geodataframes' coordinates are using the same CRS using the geopandas command:

undefined

Coordinate Encoding

When first recieving a spatial dataset, the spatial column may need to be encoded to convert its 'text' data type values into understood 'coordinate' data types before it can be understood/processed accordingly.

Namely, there are two ways to encode text into coordinates:

  • df[geom] = df[geom].apply(lambda x: loads( str(x) ))

  • df[geom] = [Point(xy) for xy in zip(df.x, df.y)]

The first approach can be used for text taking the form "Point(-76, 39)" and will encode the text too coordinates. The second approach is useful when creating a point from two columns containing lat/lng information and will create Point coordinates from the two columns.

More on this later

Raster Vs Vector Data

There exists two types of Geospatial Data, Raster and Vector. Both have different file formats.

This lab will only cover vector data.

Vector Data

Vector Data: Individual points stored as (x,y) coordinates pairs. These points can be joined to create lines or polygons.

Format of Vector data

Esri Shapefile — .shp, .dbf, .shx Description - Industry standard, most widely used. The three files listed above are needed to make a shapefile. Additional file formats may be included.

Geographic JavaScript Object Notation — .geojson, .json Description — Second most popular, Geojson is typically used in web-based mapping used by storing the coordinates as JSON.

Geography Markup Language — .gml Description — Similar to Geojson, GML has more data for the same amount of information.

Google Keyhole Markup Language — .kml, .kmz Description — XML-based and predominantly used for google earth. KMZ is a the newer, zipped version of KML.

Raster Data

Raster Data: Cell-based data where each cell represent geographic information. An Aerial photograph is one such example where each pixel has a color value

Raster Data Files: GeoTIFF — .tif, .tiff, .ovr ERDAS Imagine — .img IDRISI Raster — .rst, .rdc

Information Sourced From: https://towardsdatascience.com/getting-started-with-geospatial-works-1f7b47955438

Vector Data: Census Geographic Data:

SETUP:

Import Modules

# @title Run: Import Modules
 
 # These imports will handle everything
 import os
 import sys
 import csv
 import numpy as np
 import pandas as pd
 import pyproj
 from pyproj import Proj, transform
 # conda install -c conda-forge proj4
 from shapely.geometry import LineString
 # from shapely import wkb
 # https://pypi.org/project/geopy/
 import folium
 
 # In case file is KML, enable support
 import fiona
 fiona.drvsupport.supported_drivers['kml'] = 'rw'
 fiona.drvsupport.supported_drivers['KML'] = 'rw'
 
 import psycopg2
import matplotlib.pyplot as plt
 import IPython
 from IPython.core.display import HTML
 
 import os 
 from branca.colormap import linear
import pandas as pd
 import geopandas as gpd
 from geopandas import GeoDataFrame
 from shapely.geometry import Point
 from shapely.wkt import loads
 from geopy.geocoders import Nominatim
 from IPython.display import clear_output
 from folium import plugins
 from folium.plugins import TimeSliderChoropleth 
 from folium.plugins import MarkerCluster
from dataplay.intaker import Intake
 from dataplay.acs import retrieveAcsData
 from dataplay.merge import mergeDatasets

Retrieve GIS Data

As mentioned earlier:

When you use a pandas function to 'read-in' a dataset, the returned value is of a datatype called a 'Dataframe'.

We need a 'Geo-Dataframe', however, to effectively work with spatial data.

While Pandas does not support Geo-Dataframes; Geo-pandas does.

Geopandas has everything you love about pandas, but with added support for geo-spatial data.

Principle benefits of using Geopandas over Pandas when working with spatial data:

There are many ways to have our spatial-data be read-in using geo-pandas into a geo-dataframe.

Namely, it means reading in Geo-Spatial-data from a:

  1. (.geojson or .shp) file directly using Geo-pandas

  2. (.csv, .json) file using Pandas and convert it to Geo-Pandas

  1. Connecting to a DB

We will review each one below

Approach 1: Reading in Data Directly

If you are using Geopandas, direct imports only work with geojson and shape files.

spatial coordinate data is properly encoded with these types of files soas to make them particularly easy to use.

You can perform this using geopandas' undefined function.

# This dataset is taken from the public database provided by BNIAJFI hosted by Esri / ArcGIS
 # BNIA ArcGIS Homepage: https://data-bniajfi.opendata.arcgis.com/
 csa_gdf = Intake.getData("https://services1.arcgis.com/mVFRs7NF4iFitgbY/ArcGIS/rest/services/Hhchpov/FeatureServer/0/query?where=1%3D1&outFields=*&returnGeometry=true&f=pgeojson")

As you can see, the resultant variable is of type GeoDataFrame.

type(csa_gdf)
geopandas.geodataframe.GeoDataFrame

GeoDataFrames are only possible when one of the columns are of a 'Geometry' Datatype

csa_gdf.dtypes
OBJECTID int64 ,CSA2010 object ,hhchpov14 float64 ,hhchpov15 float64 ,hhchpov16 float64 ,hhchpov17 float64 ,hhchpov18 float64 ,hhchpov19 float64 ,CSA2020 object ,hhchpov20 float64 ,hhchpov21 float64 ,Shape__Area float64 ,Shape__Length float64 ,geometry geometry ,dtype: object

Awesome. So that means, now you can plot maps all prety like:

csa_gdf.plot(column='hhchpov15')
Image Alt Text

And now lets take a peak at the raw data:

csa_gdf.head(1)
OBJECTIDCSA2010hhchpov14hhchpov15hhchpov16hhchpov17hhchpov18hhchpov19CSA2020hhchpov20hhchpov21Shape__AreaShape__Lengthgeometry
01Allendale/Irvington/S. Hilton41.5538.9334.7332.7735.2732.6Allendale/Irvington/S. Hilton21.4221.426.38e+0738770.17POLYGON ((-76.65726 39.27600, -76.65726 39.276...

I'll show you more ways to save the data later, but for our example in the next section to work, we need a csv.

We can make one by saving the geo-dataframe avove using the undefined function.

The spatial data will be stored in an encoded form that will make it easy to re-open up in the future.

csa_gdf.to_csv('example.csv')

Approach 2: Converting Pandas into Geopandas

Approach 2: Method 1: Convert using a pre-formatted 'geometry' column

This approach loads a map using a geometry column

In our previous example, we saved a geo-dataframe as a csv.

Now lets re-open it up using pandas!

# A url to a public Dataset.
 url = "example.csv"
 geom = 'geometry'
 # An example of loading in an internal BNIA file
 crs = {'init' :'epsg:2248'} 
  
 # Read in the dataframe
 csa_gdf = intaker.Intake.getData(url)

Great!

But now what?

Well, for starters, regardless of the project you are working on: It's always a good idea to inspect your data.

This is particularly important if you don't know what you're working with.

csa_gdf.head(1)
Unnamed: 0OBJECTIDCSA2010hhchpov14hhchpov15hhchpov16hhchpov17hhchpov18hhchpov19CSA2020hhchpov20hhchpov21Shape__AreaShape__Lengthgeometry
001Allendale/Irvington/S. Hilton41.5538.9334.7332.7735.2732.6Allendale/Irvington/S. Hilton21.4221.426.38e+0738770.17POLYGON ((-76.657 39.276, -76.657 39.276, -76....

Take notice of how the geometry column has a special.. foramatting.

All spatial data must take on a similar form encoding for it to be properly interpretted as a spatial data-type.

As far as I can tell, This is near-identical to the table I printed out in our last example.

BUT WAIT!

You'll notice, that if I run the plot function a pretty map will not de-facto appear

csa_gdf.plot()
Image Alt Text

Why is this? Because you're not working with a geo-dataframe but just a dataframe!

Take a look:

type(csa_gdf)
geopandas.geodataframe.GeoDataFrame

Okay... So thats not right..

What can we do about this?

Well for one, our spatial data (in the geometry-column) is not of the right data-type even though it takes on the right form.

csa_gdf.dtypes
Unnamed: 0 int64 ,OBJECTID int64 ,CSA2010 object ,hhchpov14 float64 ,hhchpov15 float64 ,hhchpov16 float64 ,hhchpov17 float64 ,hhchpov18 float64 ,hhchpov19 float64 ,CSA2020 object ,hhchpov20 float64 ,hhchpov21 float64 ,Shape__Area float64 ,Shape__Length float64 ,geometry geometry ,dtype: object

Ok. So how do we change it? Well, since it's already been properly encoded...

You can convert a columns data-type from an object (or whatver else) to a 'geometry' using the undefined function.

In the example below, we convert the datatypes for all records in the 'geometry' column

# Convert the geometry column datatype from a string of text into a coordinate datatype
 csa_gdf[geom] = csa_gdf[geom].apply(lambda x: loads( str(x) ))

Thats all! Now lets see the geometry columns data-type and the entire tables's data-type

csa_gdf.dtypes
Unnamed: 0 int64 ,OBJECTID int64 ,CSA2010 object ,hhchpov14 float64 ,hhchpov15 float64 ,hhchpov16 float64 ,hhchpov17 float64 ,hhchpov18 float64 ,hhchpov19 float64 ,CSA2020 object ,hhchpov20 float64 ,hhchpov21 float64 ,Shape__Area float64 ,Shape__Length float64 ,geometry geometry ,dtype: object
type(csa_gdf)
geopandas.geodataframe.GeoDataFrame

As you can see, we have a geometry column of the right datatype, but our table is still only just a dataframe.

But now, you are ready to convert your entire pandas dataframe into a geo-dataframe.

You can do that by running the following function:

# Process the dataframe as a geodataframe with a known CRS and geom column
 csa_gdf = GeoDataFrame(csa_gdf, crs=crs, geometry=geom)

Aaaand BOOM.

csa_gdf.plot(column='hhchpov18')
Image Alt Text

goes the dy-no-mite

type(csa_gdf)
geopandas.geodataframe.GeoDataFrame

Approach 2: Method 2: Convert Column(s) to Coordinate

This is the generic example but it will not work since no URL is given.

# More Information: https://geopandas.readthedocs.io/en/latest/gallery/create_geopandas_from_pandas.html#from-longitudes-and-latitudes
  
 # If your data has coordinates in two columns run this cell
 # It will create a geometry column from the two.
 # A public dataset is not provided for this example and will not run.
  
 # Load DF HERE. Accidently deleted the link. Need to refind. 
 # Just rely on example 2 for now. 
 """
 exe_df['x'] = pd.to_numeric(exe_df['x'], errors='coerce')
 exe_df['y'] = pd.to_numeric(exe_df['y'], errors='coerce')
 # exe_df = exe_df.replace(np.nan, 0, regex=True)
  
 # An example of loading in an internal BNIA file
 geometry=[Point(xy) for xy in zip(exe_df.x, exe_df.y)]
 exe_gdf = gpd.GeoDataFrame( exe_df.drop(['x', 'y'], axis=1), crs=crs, geometry=geometry)
 """
"\nexe_df['x'] = pd.to_numeric(exe_df['x'], errors='coerce')\nexe_df['y'] = pd.to_numeric(exe_df['y'], errors='coerce')\n# exe_df = exe_df.replace(np.nan, 0, regex=True)\n \n# An example of loading in an internal BNIA file\ngeometry=[Point(xy) for xy in zip(exe_df.x, exe_df.y)]\nexe_gdf = gpd.GeoDataFrame( exe_df.drop(['x', 'y'], axis=1), crs=crs, geometry=geometry)\n"
Approach 2: Method 2: Example: Geoloom

Since I do not readily have a dataset with lat and long's I will have to make one.

We can split the coordinates from a geodataframe like so...

# Alternate Primary Table
 # Table: Geoloom, 
 # Columns:  
 # In this example, we are going to read in a shapefile
 geoloom_gdf = gpd.read_file("https://services1.arcgis.com/mVFRs7NF4iFitgbY/ArcGIS/rest/services/Geoloom_Crowd/FeatureServer/0/query?where=1%3D1&outFields=*&returnGeometry=true&f=pgeojson");
 # then create columns for its x and y coords
 geoloom_gdf['POINT_X'] = geoloom_gdf['geometry'].centroid.x
 geoloom_gdf['POINT_Y'] = geoloom_gdf['geometry'].centroid.y
 # Now lets just drop the geometry column and save it to have our example dataset. 
 geoloom_gdf = geoloom_gdf.dropna(subset=['geometry'])
 geoloom_gdf.to_csv('example.csv')

The first thing you will want to do when given a dataset with a coordinates column is ensure its datatype.

geoloom_df = pd.read_csv('example.csv')
 # We already know the x and y columns because we just saved them as such.
 geoloom_df['POINT_X'] = pd.to_numeric(geoloom_df['POINT_X'], errors='coerce')
 geoloom_df['POINT_Y'] = pd.to_numeric(geoloom_df['POINT_Y'], errors='coerce')
 # df = df.replace(np.nan, 0, regex=True)
  
 # And filter out for points only in Baltimore City. 
 geoloom_df = geoloom_df[ geoloom_df['POINT_Y'] > 39.3  ]
 geoloom_df = geoloom_df[ geoloom_df['POINT_Y'] < 39.5  ]
# An example of loading in an internal BNIA file
 crs = {'init' :'epsg:2248'} 
 geometry=[Point(xy) for xy in zip(geoloom_df['POINT_X'], geoloom_df['POINT_Y'])]
 geoloom_gdf = gpd.GeoDataFrame( geoloom_df.drop(['POINT_X', 'POINT_Y'], axis=1), crs=crs, geometry=geometry)
 # 39.2904° N, 76.6122°
geoloom_gdf.head(1)
Unnamed: 0OBJECTIDData_typeAttachProjNmDescriptLocationURLNamePhEmailCommentsGlobalIDgeometry
345Artists & ResourcesNaNOpen WorksMaker Space1400 Greenmount Ave, Baltimore, MD, 21202, USAhttp://www.openworksbmore.comAlyce Myattalycemyattconsulting@gmail.comOne of Jane Brown's projects!140e7db7-33f1-49cd-8133-b6f75dba5851POINT (-76.608 39.306)

Heres a neat trick to make it more presentable, because those points mean nothing to me.

# Create our base layer.
 ax = csa_gdf.plot(column='hhchpov18', edgecolor='black')
  
 # now plot our points over it.
 geoloom_gdf.plot(ax=ax, color='red')
  
 plt.show()
Image Alt Text

Approach 2: Method 3: Using a Crosswalk (Need Crosswalk on Esri)

When you want to merge two datasets that do not share a common column, it is often useful to create a 'crosswalk' file that 'maps' records between two datasets. We can do this to append spatial data when a direct merge is not readily evident.

Check out this next example where we pull ACS Census data and use its 'tract' column and map it to a community. We can then aggregate the points along a the communities they belong to and map it on a choropleth!

We will set up our ACS query variables right here for easy changing

# Our download function will use Baltimore City's tract, county and state as internal paramters
 # Change these values in the cell below using different geographic reference codes will change those parameters
 tract = '*'
 county = '510' # '059' # 153 '510'
 state = '24' #51
  
 # Specify the download parameters the function will receieve here
 tableId = 'B19049' # 'B19001'
 year = '17'
 saveAcs = True 

And now we will call the function with those variables and check out the result

retrieve_acs_data = retrieve_acs_data
 IPython.core.display.HTML("")
 # state, county, tract, tableId, year, saveOriginal, save 
 df = retrieve_acs_data(state, county, tract, tableId, year)
 df.head(1)
 df.to_csv('tracts_data.csv')
B19049_001E_Median_household_income_in_the_past_12_months_(in_2017_inflation-adjusted_dollars)_--_TotalB19049_002E_Median_household_income_in_the_past_12_months_(in_2017_inflation-adjusted_dollars)_--_Householder_under_25_yearsB19049_003E_Median_household_income_in_the_past_12_months_(in_2017_inflation-adjusted_dollars)_--_Householder_25_to_44_yearsB19049_004E_Median_household_income_in_the_past_12_months_(in_2017_inflation-adjusted_dollars)_--_Householder_45_to_64_yearsB19049_005E_Median_household_income_in_the_past_12_months_(in_2017_inflation-adjusted_dollars)_--_Householder_65_years_and_overstatecountytract
NAME
Census Tract 2710.0238358-66666666634219409723714324510271002

This contains the CSA labels we will map our tracts to. This terminal command will download it

!wget https://raw.githubusercontent.com/bniajfi/bniajfi/main/CSA-to-Tract-2010.csv

Here

# Obtained from: https://raw.githubusercontent.com/bniajfi/bniajfi/main/CSA-to-Tract-2010.csv
 crosswalk = pd.read_csv('CSA-to-Tract-2010.csv')
 crosswalk.tail(1)
TRACTCE10GEOID10CSA2010
19928050024510280500Oldtown/Middle East
mergeDatasets = merge.mergeDatasets
 
 merged_df_geom = mergeDatasets(left_ds=df, right_ds=crosswalk, crosswalk_ds=False,
                   left_col='tract', right_col='TRACTCE10',
                   crosswalk_left_col = False, crosswalk_right_col = False,
                   merge_how='outer', # left right or columnname to retrieve
                   interactive=False)
 merged_df_geom.head(1)
import geopandas as gpd
 Hhchpov = gpd.read_file("https://services1.arcgis.com/mVFRs7NF4iFitgbY/ArcGIS/rest/services/Hhchpov/FeatureServer/0/query?where=1%3D1&outFields=*&returnGeometry=true&f=pgeojson")
 Hhchpov = Hhchpov[['CSA2010', 'hhchpov15',	'hhchpov16',	'hhchpov17',	'hhchpov18', 'geometry']]
 Hhchpov.to_file("Hhchpov.geojson", driver='GeoJSON')
 Hhchpov.to_csv('Hhchpov.csv')
 gpd.read_file("Hhchpov.geojson").head(1)
CSA2010hhchpov15hhchpov16hhchpov17hhchpov18geometry
0Allendale/Irvington/S. Hilton38.9334.7332.7735.27POLYGON ((-76.65726 39.27600, -76.65726 39.276...
# A simple merge
 # df.merge(crosswalk, left_on='tract', right_on='TRACTCE10')

A simple example of how this would work

# A simple merge
 merged_df = mergeDatasets(left_ds=merged_df_geom, right_ds=Hhchpov, crosswalk_ds=False,
                   left_col='CSA2010', right_col='CSA2010',
                   crosswalk_left_col = False, crosswalk_right_col = False,
                   merge_how='outer', # left right or columnname to retrieve
                   interactive=False)
# geoms.readInGeometryData(url='Hhchpov.geojson').head(0)
 # The attributes are what we will use.
 in_crs = 2248 # The CRS we recieve our data 
 out_crs = 4326 # The CRS we would like to have our data represented as
 geom = 'geometry' # The column where our spatial information lives.
 
 # To create this dataset I had to commit a full outer join. 
 # In this way geometries will be included even if there merge does not have a direct match. 
 # What this will do is that it means at least one (near) empty record for each community will exist that includes (at minimum) the geographic information and name of a Community.
 # That way if no point level information existed in the community, that during the merge the geoboundaries are still carried over.
 
 # Primary Table
 # Description: I created a public dataset from a google xlsx sheet 'Bank Addresses and Census Tract'.
 # Table: FDIC Baltimore Banks
 # Columns: Bank Name, Address(es), Census Tract
 left_ds = 'tracts_data.csv'
 left_col = 'tract'
 
 # Crosswalk Table
 # Table: Crosswalk Census Communities
 # 'TRACT2010', 'GEOID2010', 'CSA2010'
 crosswalk_ds = 'CSA-to-Tract-2010.csv'
 use_crosswalk = True
 crosswalk_left_col = 'TRACTCE10'
 crosswalk_right_col = 'CSA2010'
 
 # Secondary Table
 # Table: Baltimore Boundaries => HHCHPOV
 # 'TRACTCE10', 'GEOID10', 'CSA', 'NAME10', 'Tract', 'geometry'
 right_ds = 'Hhchpov.geojson'
 right_col ='CSA2010'
 
 interactive = True
 merge_how = 'outer'
 
 # reutrns a pandas dataframe
 mergedf = merge.mergeDatasets( left_ds=left_ds, left_col=left_col, 
               crosswalk_ds=crosswalk_ds,
               crosswalk_left_col = crosswalk_left_col, crosswalk_right_col = crosswalk_right_col,
               right_ds=right_ds, right_col=right_col, 
               merge_how=merge_how, interactive = interactive )
---Handling Left Dataset Options--- Left column: tract ---Handling Right Dataset Options--- Right column: tract ---Ensuring Compatability Between merge_how (val: 'outer') and the Right Dataset--- Column or ['inner','left','right','outer'] value: {merge_how} ---Checking Crosswalk Dataset Options--- crosswalk_left_col TRACTCE10 ---Casting Datatypes from-to: Left->Crosswalk Datasets--- Before Casting: -> Column One: tract int64 -> Column Two: TRACTCE10 int64 After Casting: -> Column One: tract int64 -> Column Two: TRACTCE10 int64 ---Casting Datatypes from-to: Right->Crosswalk Datasets--- Before Casting: -> Column One: CSA2010 object -> Column Two: CSA2010 object After Casting: -> Column One: CSA2010 object -> Column Two: CSA2010 object ---All checks complete. Status: True --- ---PERFORMING MERGE : LEFT->CROSSWALK --- Column One : tract int64 How: CSA2010 Column Two : TRACTCE10 int64 Local Column Values Not Matched [10000] 2 Crosswalk Unique Column Values [ 10100 10200 10300 10400 10500 20100 20200 20300 30100 30200 40100 40200 60100 60200 60300 60400 70100 70200 70300 70400 80101 80102 80200 80301 80302 80400 80500 80600 80700 80800 90100 90200 90300 90400 90500 90600 90700 90800 90900 100100 100200 100300 110100 110200 120100 120201 120202 120300 120400 120500 120600 120700 130100 130200 130300 130400 130600 130700 130803 130804 130805 130806 140100 140200 140300 150100 150200 150300 150400 150500 150600 150701 150702 150800 150900 151000 151100 151200 151300 160100 160200 160300 160400 160500 160600 160700 160801 160802 170100 170200 170300 180100 180200 180300 190100 190200 190300 200100 200200 200300 200400 200500 200600 200701 200702 200800 210100 210200 220100 230100 230200 230300 240100 240200 240300 240400 250101 250102 250103 250203 250204 250205 250206 250207 250301 250303 250401 250402 250500 250600 260101 260102 260201 260202 260203 260301 260302 260303 260401 260402 260403 260404 260501 260604 260605 260700 260800 260900 261000 261100 270101 270102 270200 270301 270302 270401 270402 270501 270502 270600 270701 270702 270703 270801 270802 270803 270804 270805 270901 270902 270903 271001 271002 271101 271102 271200 271300 271400 271501 271503 271600 271700 271801 271802 271900 272003 272004 272005 272006 272007 280101 280102 280200 280301 280302 280401 280402 280403 280404 280500] ---PERFORMING MERGE : LEFT->RIGHT --- Column One : CSA2010 object How: outer Column Two : CSA2010 object
mergedf.dtypes
NAME object ,B19049_001E_Median_household_income_in_the_past_12_months_(in_2017_inflation-adjusted_dollars)_--_Total int64 ,B19049_002E_Median_household_income_in_the_past_12_months_(in_2017_inflation-adjusted_dollars)_--_Householder_under_25_years int64 ,B19049_003E_Median_household_income_in_the_past_12_months_(in_2017_inflation-adjusted_dollars)_--_Householder_25_to_44_years int64 ,B19049_004E_Median_household_income_in_the_past_12_months_(in_2017_inflation-adjusted_dollars)_--_Householder_45_to_64_years int64 ,B19049_005E_Median_household_income_in_the_past_12_months_(in_2017_inflation-adjusted_dollars)_--_Householder_65_years_and_over int64 ,state int64 ,county int64 ,tract int64 ,CSA2010 object ,hhchpov15 float64 ,hhchpov16 float64 ,hhchpov17 float64 ,hhchpov18 float64 ,geometry geometry ,dtype: object
# Convert the geometry column datatype from a string of text into a coordinate datatype
 # mergedf[geom] = mergedf[geom].apply(lambda x: loads( str(x) ) ) 
 
 # Process the dataframe as a geodataframe with a known CRS and geom column 
 mergedGdf = GeoDataFrame(mergedf, crs=in_crs, geometry=geom) 
mergedGdf.plot()
Image Alt Text
Approach 2: Method 4: Geocoding Addresses and Landmarks to Coordinates

Sometimes (usually) we just don't have the coordinates of a place, but we do know it's address or that it is an established landmark.

In such cases we attempt 'geo-coding' these points in an automated manner.

While convenient, this process is error prone, so be sure to check it's work!

For this next example to take place, we need a dataset that has a bunch of addresses.

We can use the geoloom dataset from before in this example. We'll just drop geo'spatial data.

geoloom = gpd.read_file("https://services1.arcgis.com/mVFRs7NF4iFitgbY/ArcGIS/rest/services/Geoloom_Crowd/FeatureServer/0/query?where=1%3D1&outFields=*&returnGeometry=true&f=pgeojson");
 geoloom = geoloom.dropna(subset=['geometry'])
 geoloom = geoloom.drop(columns=['geometry','GlobalID', 'POINT_X',	'POINT_Y'])
 geoloom.head(1)
OBJECTIDData_typeAttachProjNmDescriptLocationURLNamePhEmailComments
01Artists & ResourcesNaNJoeTest123 Market Pl, Baltimore, MD, 21202, USA

But if for whatever reason the link is down, you can use this example dataframe mapping just some of the many malls in baltimore.

address_df = pd.DataFrame({ 
     'Location' : pd.Series([
     '100 N. Holliday St, Baltimore, MD 21202',
     '200 E Pratt St, Baltimore, MD',
     '2401 Liberty Heights Ave, Baltimore, MD',
     '201 E Pratt St, Baltimore, MD',
     '3501 Boston St, Baltimore, MD',
     '857 E Fort Ave, Baltimore, MD',
     '2413 Frederick Ave, Baltimore, MD'
   ]),
     'Address' : pd.Series([ 
     'Baltimore City Council',
     'The Gallery at Harborplace',
     'Mondawmin Mall',
     'Harborplace',
     'The Shops at Canton Crossing',
     'Southside Marketplace',
     'Westside Shopping Center'
   ])
 })
 
 address_df.head()
LocationAddress
0100 N. Holliday St, Baltimore, MD 21202Baltimore City Council
1200 E Pratt St, Baltimore, MDThe Gallery at Harborplace
22401 Liberty Heights Ave, Baltimore, MDMondawmin Mall
3201 E Pratt St, Baltimore, MDHarborplace
43501 Boston St, Baltimore, MDThe Shops at Canton Crossing

You can use either the Location or Address column to perform the geo-coding on.

address_df = geoloom.copy()
 addrCol = 'Location'

This function takes a while. The less columns/data/records the faster it executes.

# More information vist: https://geopy.readthedocs.io/en/stable/#module-geopy.geocoders
 
 # In this example we retrieve and map a dataset with no lat/lng but containing an address
 
 # In this example our data is stored in the 'STREET' attribute
 geometry = []
 geolocator = Nominatim(user_agent="my-application")
 
 for index, row in address_df.iterrows():
   # We will try and return an address for each Street Name
   try: 
       # retrieve the geocoded information of our street address
       geol = geolocator.geocode(row[addrCol], timeout=None)
 
       # create a mappable coordinate point from the response object's lat/lang values.
       pnt = Point(geol.longitude, geol.latitude)
       
       # Append this value to the list of geometries
       geometry.append(pnt)
       
   except: 
       # If no street name was found decide what to do here.
       # df.loc[index]['geom'] = Point(0,0) # Alternate method
       geometry.append(Point(0,0))
       
 # Finally, we stuff the geometry data we created back into the dataframe
 address_df['geometry'] = geometry
address_df.head(1)

Awesome! Now convert the dataframe into a geodataframe and map it!

gdf = gpd.GeoDataFrame( address_df, geometry=geometry)
 gdf = gdf[ gdf.centroid.y > 39.3  ]
 gdf = gdf[ gdf.centroid.y < 39.5  ]
# Create our base layer.
 ax = csa_gdf.plot(column='hhchpov18', edgecolor='black')
 
 # now plot our points over it.
 geoloom_gdf.plot(ax=ax, color='red')
Image Alt Text

A litte later down, we'll see how to make this even-more interactive.

Approach 3: Connecting to a PostGIS database

In the following example pulls point geodata from a Postgres database.

We will pull the postgres point data in two manners.

  • SQL query where an SQL query uses ST_Transform(the_geom,4326) to transform the_geom's CRS from a DATABASE Binary encoding into standard Lat Long's

  • Using a plan SQL query and performing the conversion using gpd.io.sql.read_postgis() to pull the data in as 2248 and convert the CRS using .to_crs(epsg=4326)

  • These examples will not work in colabs as their is no local database to connect to and has been commented out for that reason

# This Notebook can be downloaded to connect to a database
 '''
 conn = psycopg2.connect(host='', dbname='', user='', password='', port='')
 
 # DB Import Method One
 sql1 = 'SELECT the_geom, gid, geogcode, ooi, address, addrtyp, city, block, lot, desclu, existing FROM housing.mdprop_2017v2 limit 100;'
 pointData = gpd.io.sql.read_postgis(sql1, conn, geom_col='the_geom', crs=2248)
 pointData = pointData.to_crs(epsg=4326)
 
 # DB Import Method Two
 sql2 = 'SELECT ST_Transform(the_geom,4326) as the_geom, ooi, desclu, address FROM housing.mdprop_2017v2;'
 pointData = gpd.GeoDataFrame.from_postgis(sql2, conn, geom_col='the_geom', crs=4326)
 pointData.head()
 pointData.plot()
 '''

Basic Operations

Inspection

def geomSummary(gdf): return type(gdf), gdf.crs, gdf.columns;
 # for p in df['Tract'].sort_values(): print(p)
 geomSummary(csa_gdf)
(geopandas.geodataframe.GeoDataFrame, ,, Name: NAD83 / Maryland (ftUS) , Axis Info [cartesian]: , - X[east]: Easting (US survey foot) , - Y[north]: Northing (US survey foot) , Area of Use: , - name: United States (USA) - Maryland - counties of Allegany; Anne Arundel; Baltimore; Calvert; Caroline; Carroll; Cecil; Charles; Dorchester; Frederick; Garrett; Harford; Howard; Kent; Montgomery; Prince Georges; Queen Annes; Somerset; St Marys; Talbot; Washington; Wicomico; Worcester. , - bounds: (-79.49, 37.97, -74.97, 39.73) , Coordinate Operation: , - name: SPCS83 Maryland zone (US Survey feet) , - method: Lambert Conic Conformal (2SP) , Datum: North American Datum 1983 , - Ellipsoid: GRS 1980 , - Prime Meridian: Greenwich, , Index(['Unnamed: 0', 'OBJECTID', 'CSA2010', 'hhchpov14', 'hhchpov15', , 'hhchpov16', 'hhchpov17', 'hhchpov18', 'hhchpov19', 'CSA2020', , 'hhchpov20', 'hhchpov21', 'Shape__Area', 'Shape__Length', 'geometry'], , dtype='object'))

Converting CRS

# Convert the CRS of the dataset into one you desire
 # The gdf must be loaded with a known crs in order for the to_crs conversion to work
 # We use this often to converting BNIAs custom CRS to the common type 
 out_crs = 4326
 csa_gdf = csa_gdf.to_crs(epsg=out_crs)

Saving

# Here is code to comit a simple save
 filename = 'TEST_FILE_NAME'
 csa_gdf.to_file(f"{filename}.geojson", driver='GeoJSON')
# Here is code to save this new projection as a geojson file and read it back in
 csa_gdf = csa_gdf.to_crs(epsg=2248) #just making sure
 csa_gdf.to_file(filename+'.shp', driver='ESRI Shapefile')
 csa_gdf = gpd.read_file(filename+'.shp')

Draw Tool

import folium
 from folium.plugins import Draw
 # Draw tool. Create and export your own boundaries
 m = folium.Map()
 draw = Draw()
 draw.add_to(m)
 m = folium.Map(location=[39.28759453969165, -76.61278931706487], zoom_start=12)
 draw = Draw(export=True)
 draw.add_to(m)
 # m.save(os.path.join('results', 'Draw1.html'))
 m
Make this Notebook Trusted to load map: File -> Trust Notebook

Geometric Manipulations

Boundary

newcsa = csa_gdf.copy()
 newcsa['geometry'] = csa_gdf.boundary
 newcsa.plot(column='CSA2010' )
Image Alt Text

envelope

newcsa = csa_gdf.copy()
 newcsa['geometry'] = csa_gdf.envelope
 newcsa.plot(column='CSA2010' )
Image Alt Text

convex_hull

newcsa = csa_gdf.copy()
 newcsa['geometry'] = csa_gdf.convex_hull
 newcsa.plot(column='CSA2010' )
 # , cmap='OrRd', scheme='quantiles'
 # newcsa.boundary.plot(  )
Image Alt Text

simplify

newcsa = csa_gdf.copy()
 newcsa['geometry'] = csa_gdf.simplify(30)
 newcsa.plot(column='CSA2010' )
Image Alt Text

buffer

newcsa = csa_gdf.copy()
 newcsa['geometry'] = csa_gdf.buffer(0.01)
 newcsa.plot(column='CSA2010' )
Image Alt Text

rotate

newcsa = csa_gdf.copy()
 newcsa['geometry'] = csa_gdf.rotate(30)
 newcsa.plot(column='CSA2010' )
Image Alt Text

scale

newcsa = csa_gdf.copy()
 newcsa['geometry'] = csa_gdf.scale(3, 2)
 newcsa.plot(column='CSA2010' )
Image Alt Text

skew

newcsa = csa_gdf.copy()
 newcsa['geometry'] = csa_gdf.skew(1, 10)
 newcsa.plot(column='CSA2010' )
Image Alt Text

Advanced

Create Geospatial Functions

Operations:

Input(s):

Output: File

This function will handle common geo spatial exploratory methods. It covers everything discussed in the basic operations and more!

Click to toggle
#
 # Work With Geometry Data
 # Description: geomSummary, getPointsInPolygons, getPolygonOnPoints, mapPointsInPolygons, getCentroids
 #
 def workWithGeometryData(method=False, df=False, polys=False, ptsCoordCol=False, polygonsCoordCol=False, polyColorCol=False, polygonsLabel='polyOnPoint', pntsClr='red', polysClr='white', interactive=False):
   def geomSummary(df): return type(df), df.crs, df.columns;
   def getCentroid(df, col): return df[col].representative_point() # df['geometry'].centroid
 
   # To 'import' a script you wrote, map its filepath into the sys
   def getPolygonOnPoints(pts, polygons, ptsCoordCol, polygonsCoordCol, polygonsLabel, interactive):
       count = 0
       # We're going to keep a list of how many points we find.
       boundaries = []
 
       # Loop over polygons with index i.
       for i, pt in pts.iterrows():
           # print('Searching for point within Geom:', pt )
           # Only one Label is accepted.
           poly_on_this_point = []
           # Now loop over all polygons with index j.
           for j, poly in polygons.iterrows():
               if poly[polygonsCoordCol].contains(pt[ptsCoordCol]):
                   # Then it's a hit! Add it to the list
                   poly_on_this_point.append(poly[polygonsLabel])
                   count = count + 1
                   # pts = pts.drop([j])
 
           # We could do all sorts, like grab a property of the
           # points, but let's just append the number of them.
           boundaries.append(poly_on_this_point)
           clear_output(wait=True)
 
       # Add the number of points for each poly to the dataframe..
       pts = pts.assign(CSA2010 = boundaries)
       if (interactive):
         print( 'Total Points: ', (pts.size / len(pts.columns) ) )
         print( 'Total Points in Polygons: ', count )
         print( 'Prcnt Points in Polygons: ', count / (pts.size / len(pts.columns) ) )
       return pts
 
   # To 'import' a script you wrote, map its filepath into the sys
   def getPointsInPolygons(pts, polygons, ptsCoordCol, polygonsCoordCol, interactive):
     count = 0
     total = pts.size / len(pts.columns)
     # We're going to keep a list of how many points we find.
     pts_in_polys = []
 
     # Loop over polygons with index i.
     for i, poly in polygons.iterrows():
         # print('Searching for point within Geom:', poly )
         # Keep a list of points in this poly
         pts_in_this_poly = 0
 
         # Now loop over all points with index j.
         for j, pt in pts.iterrows():
             if poly[polygonsCoordCol].contains(pt[ptsCoordCol]):
                 # Then it's a hit! Add it to the list,
                 pts_in_this_poly += 1
                 # and drop it so we have less hunting. # pts = pts.drop([j])
 
         # We could do all sorts, like grab a property of the
         # points, but let's just append the number of them.
         pts_in_polys.append(pts_in_this_poly)
         if (interactive): print('Found this many points within the Geom:', pts_in_this_poly )
         count += pts_in_this_poly
         clear_output(wait=True)
 
     # Add the number of points for each poly to the dataframe.?
     polygons['pointsinpolygon'] = pts_in_polys
     if (interactive):
       print( 'Total Points: ', total )
       print( 'Total Points in Polygons: ', count )
       print( 'Prcnt Points in Polygons: ', count / total )
     return polygons
 
   def mapPointsandPolygons(pnts, polys, pntsCl, polysClr, polyColorCol):
     print('mapPointsandPolygons');
     # We restrict to South America.
     ax = 1
     if polyColorCol:
       ax = polys.plot( column=polyColorCol, legend=True)
     else:
       ax = polys.plot( color=polysClr, edgecolor='black')
 
     # We can now plot our ``GeoDataFrame``.
     pnts.plot(ax=ax, color=pntsClr)
 
     return plt.show()
 
   if method=='summary': return geomSummary(df);
   if method=='ponp': return getPolygonOnPoints(df, polys, ptsCoordCol, polygonsCoordCol, polygonsLabel, interactive);
   if method=='pinp': return getPointsInPolygons(df, polys, ptsCoordCol, polygonsCoordCol, interactive);
   if method=='pandp': return mapPointsandPolygons(df, polys, pntsClr, polysClr, polyColorCol);
   if method=='centroid': return getCentroid(df, col);
Click to toggle
 # reverseGeoCode, readFile, getGeoParams, main
 def readInGeometryData(url=False, porg=False, geom=False, lat=False, lng=False, revgeocode=False,  save=False, in_crs=4326, out_crs=False):
 
   def reverseGeoCode(df, lat ):
     # STREET	CITY	STATE ZIP NAME
     # , format_string="%s, BALTIMORE MD"
     geometry = []
     geolocator = Nominatim(user_agent="my-application")
     for index, row in df.iterrows():
       try:
           geol = geolocator.geocode(row[lat], timeout=None)
           pnt = Point(geol.longitude, geol.latitude)
           geometry.append(pnt)
       except:
           geometry.append(Point(-76, 39) )
           print(row[lat]);
     return geometry
 
   def readFile(url, geom, lat, lng, revgeocode, in_crs, out_crs):
     # print("readInGeometryData-READFILE STARTING")
     df = False
     gdf = False
     ext = isinstance(url, pd.DataFrame)
     if ext: ext='csv'
     else: ext = url[-3:]
 
     #XLS
     # b16 = pd.read_excel('Jones.BirthsbyCensus2016.XLS', sheetname='Births')
 
     # The file extension is used to determine the appropriate import method.
     if ext in ['son', 'kml', 'shp', 'pgeojson']: gdf = gpd.read_file(url)
     if ext == 'csv':
       df = url if isinstance(url, pd.DataFrame) else pd.read_csv(url)
       # Read using Geom, Lat, Lat/Lng, revGeoCode
       if revgeocode=='y': df['geometry'] = reverseGeoCode(df, lat)
       elif geom: df['geometry'] = df[geom].apply(lambda x: loads( str(x) ))
       elif lat==lng: df['geometry'] = df[lat].apply(lambda x: loads( str(x) ))
       elif lat!=lng: df['geometry'] = gpd.points_from_xy(df[lng], df[lat]);
 
       gdf = GeoDataFrame(df, crs=in_crs, geometry='geometry') #crs=2248
       if not out_crs == in_crs: gdf = gdf.to_crs(epsg=out_crs)
     return gdf
 
   def getGeoParams(url, porg, geom, lat, lng, revgeocode, save, in_crs, out_crs):
     addr=False
 
     if not url: url = input("Please enter the location of your dataset: " )
     # if url[-3:] == 'csv' :
     #   df = pd.read_csv(url,index_col=0,nrows=1)
     #   print(df.columns)
 
     # Geometries inside
     if geom and not (lat and lng): porg = 'g'
     # Point data inside
     elif not geom and lat or lng:
       porg = 'p';
       if not lat: lat = lng
       if not lng: lng = lat
 
     # If the P/G could not be infered...
     if not (porg in ['p', 'g']):
       if not revgeocode in ['y', 'n']: revgeocode = input("Do your records need reverse geocoding: (Enter: y/n') " )
       if revgeocode == 'y': porg = 'p'; lng = lat = input("Please enter the column name where the address is stored: " );
       elif revgeocode == 'n': porg = input("""Do the records in this dataset use (P)oints or (g)eometric polygons?: (Enter: 'p' or 'g') """ );
       else: return getGeoParams(url, porg, geom, lat, lng, revgeocode, save, in_crs, out_crs);
 
       if porg=='p':
           if not lat: lat = input("Please enter the column name where the latitude coordinate is stored: " );
           if not lng: lng = input("Please enter the column name where the longitude cooridnate is stored: (Could be same as the lat) " );
       elif porg=='g':
         if not geom: geom = input("Please enter column name where the geometry data is stored: (*optional, skip if unknown)" );
       else: return getGeoParams(url, porg, geom, lat, lng, revgeocode, save, in_crs, out_crs)
 
     if not out_crs: out_crs=in_crs
 
     return url, porg, geom, lat, lng, revgeocode, save, in_crs, out_crs
 
   # This function uses all the other functions
   def main(url, porg, geom, lat, lng, revgeocode, save, in_crs, out_crs):
     # print("READINGEOMETRYDATA-MAIN STARTING")
 
     # Check for missing values. retrieve them
     if (isinstance(url, pd.DataFrame)): print('Converting DF to GDF')
     elif (not (url and porg) ) or (
         not (porg == 'p' or porg == 'g') ) or (
         porg == 'g' and not geom) or (
         porg == 'p' and (not (lat and lng) ) ):
       # return readInGeometryData( *getGeoParams(url, porg, geom, lat, lng, revgeocode, save, in_crs, out_crs) );
         url, porg, geom, lat, lng, revgeocode, save, in_crs, out_crs = getGeoParams(url, porg, geom, lat, lng, revgeocode, save, in_crs, out_crs)
 
     # Quit if the Columns dont exist -> CSV Only
     # status = checkColumns(url, geom, lat, lng)
     # if status == False: print('A specified column does not exist'); return False;
 
     # Perform operation
     gdf = readFile(url, geom, lat, lng, revgeocode, in_crs, out_crs)
 
     # Tidy up
 
     # Save
     # if save: saveGeoData(gdf, url, fileName, driver='esri')
 
     return gdf
 
   return main(url, porg, geom, lat, lng, revgeocode, save, in_crs, out_crs)
Click to toggle
def maps_points(df, lat_col='POINT_Y', lon_col='POINT_X', zoom_start=11, \
                 plot_points=False, pt_radius=15, \
                 draw_heatmap=True, heat_map_weights_col=None, \
                 heat_map_weights_normalize=True, heat_map_radius=15):
     """Creates a map given a dataframe of points. Can also produce a heatmap overlay
     Arg:
         df: dataframe containing points to maps
         lat_col: Column containing latitude (string) 
     """
 
     ## center map in the middle of points center in
     middle_lat = df[lat_col].median()
     middle_lon = df[lon_col].median()
 
     curr_map = folium.Map(location=[middle_lat, middle_lon],
                           zoom_start=zoom_start)
 
     # add points to map
     if plot_points:
         for _, row in df.iterrows():
             folium.CircleMarker([row[lat_col], row[lon_col]],
                                 radius=pt_radius,
                                 popup=row['name'],
                                 fill_color="#3db7e4", # divvy color
                                ).add_to(curr_map)
 
     # add heatmap
     if draw_heatmap:
         # convert to (n, 2) or (n, 3) matrix format
         if heat_map_weights_col is None:
             stations = zip(df[lat_col], df[lon_col])
         else:
             # if we have to normalize
             if heat_map_weights_normalize:
                 df[heat_map_weights_col] = \
                     df[heat_map_weights_col] / df[heat_map_weights_col].sum()
 
             stations = zip(df[lat_col], df[lon_col], df[heat_map_weights_col])
 
         curr_map.add_child(plugins.HeatMap(stations, radius=heat_map_radius))
 
     return curr_map
Click to toggle
# draw_heatmap, cluster_points, plot_points,
 def map_points(data, lat_col='POINT_Y', lon_col='POINT_X', zoom_start=11, plot_points=True, cluster_points=False,
                pt_radius=15, draw_heatmap=False, heat_map_weights_col=None, heat_map_weights_normalize=True,
                heat_map_radius=15, popup=False):
     """Creates a map given a dataframe of points. Can also produce a heatmap overlay
 
     Arg:
         df: dataframe containing points to maps
         lat_col: Column containing latitude (string)
         lon_col: Column containing longitude (string)
         zoom_start: Integer representing the initial zoom of the map
         plot_points: Add points to map (boolean)
         pt_radius: Size of each point
         draw_heatmap: Add heatmap to map (boolean)
         heat_map_weights_col: Column containing heatmap weights
         heat_map_weights_normalize: Normalize heatmap weights (boolean)
         heat_map_radius: Size of heatmap point
 
     Returns:
         folium map object
     """
     df = data.copy()
     ## center map in the middle of points center in
     middle_lat = df[lat_col].median()
     middle_lon = df[lon_col].median()
 
     curr_map = folium.Map(location=[middle_lat, middle_lon], zoom_start=zoom_start)
 
     # add points to map
     if plot_points:
       for _, row in df.iterrows():
         if pd.isna(row[lat_col]) or pd.isna(row[lon_col]):
             continue
         if( row[lat_col] != row[lat_col] ):
            print("Invalid coordinates, skipping marker:", row)
            continue
 
         folium.CircleMarker([row[lat_col], row[lon_col]],
           radius=pt_radius,
           popup=row[popup],
           fill_color="#3db7e4", # divvy color
         ).add_to(curr_map)
     if cluster_points:
       marker_cluster = MarkerCluster().add_to(curr_map)
       for index, row in df.iterrows():
         if pd.isna(row[lat_col]) or pd.isna(row[lon_col]):
             continue
         popup_html = '
' + '
'.join([f'{col}: {row[col]}' for col in popup]) + '
' folium.Marker( location=[row[lat_col], row[lon_col]], popup=folium.Popup(popup_html, max_width=450), icon=None ).add_to(marker_cluster) # add heatmap if draw_heatmap: # convert to (n, 2) or (n, 3) matrix format if heat_map_weights_col is None: stations = zip(df[lat_col], df[lon_col]) else: # if we have to normalize if heat_map_weights_normalize: df[heat_map_weights_col] = \ df[heat_map_weights_col] / df[heat_map_weights_col].sum() stations = zip(df[lat_col], df[lon_col], df[heat_map_weights_col]) curr_map.add_child(plugins.HeatMap(stations, radius=heat_map_radius)) return curr_map
geoloom_gdf_url = "https://services1.arcgis.com/mVFRs7NF4iFitgbY/ArcGIS/rest/services/Geoloom_Crowd/FeatureServer/0/query?where=1%3D1&outFields=*&returnGeometry=true&f=pgeojson"
 geoloom_gdf = readInGeometryData(url=geoloom_gdf_url, porg=False, geom='geometry', lat=False, lng=False, revgeocode=False,  save=False, in_crs=4326, out_crs=False)
 geoloom_gdf = geoloom_gdf.dropna(subset=['geometry'])
 geoloom_gdf = geoloom_gdf.drop(columns=['POINT_X','POINT_Y'])
 geoloom_gdf['POINT_X'] = geoloom_gdf['geometry'].x
 geoloom_gdf['POINT_Y'] = geoloom_gdf['geometry'].y
 geoloom_gdf.head(1)
OBJECTIDData_typeAttachProjNmDescriptLocationURLNamePhEmailCommentsGlobalIDgeometryPOINT_XPOINT_Y
01Artists & ResourcesNaNJoeTest123 Market Pl, Baltimore, MD, 21202, USAe59b4931-e0c8-4d6b-b781-1e672bf8545aPOINT (-76.60661 39.28746)-76.6139.29
map_points(geoloom_gdf, lat_col='POINT_Y', lon_col='POINT_X', zoom_start=11, plot_points=True, cluster_points=True,
                pt_radius=15, draw_heatmap=False, heat_map_weights_col=None, heat_map_weights_normalize=True,
                heat_map_radius=15, popup=['ProjNm', 'Data_type', 'Location', 'URL', 'Name', 'PhEmail', 'Comments'])
Make this Notebook Trusted to load map: File -> Trust Notebook
import pandas as pd 
 url = "C:/Users/charl/Documents/GitHub/karpatic/src/ipynb/labs/scraped_data.csv"
 readInGeometryData(url, geom='geometry')

Processing Geometry is tedius enough to merit its own handler

As you can see we have a lot of points. Lets see if there is any better way to visualize this.

Example: Using the advanced Functions

Playing with Points: Geoloom

Points In Polygons

The red dots from when we mapped the geoloom points above were a bit too noisy.

Lets create a choropleth instead!

We can do this by aggregating by CSA.

To do this, start of by finding which points are inside of which polygons!

Since the geoloom data does not have a CSA dataset, we will need merge it to one that does!

Lets use the childhood poverty link from example one and load it up because it contains the geometry data and the csa labels.

# from dataplay.intaker import Intake
 # csa_gdf = Intake.getData('https://raw.githubusercontent.com/bniajfi/bniajfi/main/CSA-to-Tract-2010.csv')
# This dataset is taken from the public database provided by BNIAJFI hosted by Esri / ArcGIS
 # BNIA ArcGIS Homepage: https://data-bniajfi.opendata.arcgis.com/
 csa_gdf_url = "https://services1.arcgis.com/mVFRs7NF4iFitgbY/ArcGIS/rest/services/Hhchpov/FeatureServer/0/query?where=1%3D1&outFields=*&returnGeometry=true&f=pgeojson"
 csa_gdf = readInGeometryData(url=csa_gdf_url, porg=False, geom='geometry', lat=False, lng=False, revgeocode=False,  save=False, in_crs=2248, out_crs=False)
 csa_gdf.head(1)
OBJECTIDCSA2010hhchpov14hhchpov15hhchpov16hhchpov17hhchpov18hhchpov19CSA2020hhchpov20hhchpov21Shape__AreaShape__Lengthgeometry
01Allendale/Irvington/S. Hilton41.5538.9334.7332.7735.2732.6Allendale/Irvington/S. Hilton21.4221.426.38e+0738770.17POLYGON ((-76.65726 39.27600, -76.65726 39.276...

And now lets pull in our geoloom data. But to be sure, drop the empty geometry columns or the function directly below will now work.

geoloom_gdf_url = "https://services1.arcgis.com/mVFRs7NF4iFitgbY/ArcGIS/rest/services/Geoloom_Crowd/FeatureServer/0/query?where=1%3D1&outFields=*&returnGeometry=true&f=pgeojson"
 geoloom_gdf = readInGeometryData(url=geoloom_gdf_url, porg=False, geom='geometry', lat=False, lng=False, revgeocode=False,  save=False, in_crs=4326, out_crs=False)
 geoloom_gdf = geoloom_gdf.dropna(subset=['geometry'])
 # geoloom_gdf = geoloom_gdf.drop(columns=['POINT_X','POINT_Y'])
 geoloom_gdf.head(1)
OBJECTIDData_typeAttachProjNmDescriptLocationURLNamePhEmailCommentsPOINT_XPOINT_YGlobalIDgeometry
01Artists & ResourcesNaNJoeTest123 Market Pl, Baltimore, MD, 21202, USA-8.53e+064.76e+06e59b4931-e0c8-4d6b-b781-1e672bf8545aPOINT (-76.60661 39.28746)

And now use a point in polygon method 'ponp' to get the CSA2010 column from our CSA dataset added as a column to each geoloom record.

geoloom_w_csas = workWithGeometryData(
     method='pinp',  # method=False,
     df=geoloom_gdf, # df=False, 
     polys=csa_gdf, # polys=False, 
     ptsCoordCol='geometry', # ptsCoordCol=False,
     polygonsCoordCol='geometry', # polygonsCoordCol=False,
     polyColorCol='hhchpov18', # polyColorCol=False
     polygonsLabel='CSA2010', # polygonsLabel='polyOnPoint',
     pntsClr='red', # pntsClr='red', 
     polysClr='white' # polysClr='white',
 ) # interactive=False
geoloom_w_csas.head(2)
OBJECTIDCSA2010hhchpov14hhchpov15hhchpov16hhchpov17hhchpov18hhchpov19CSA2020hhchpov20hhchpov21Shape__AreaShape__Lengthgeometrypointsinpolygon
01Allendale/Irvington/S. Hilton41.5538.9334.7332.7735.2732.60Allendale/Irvington/S. Hilton21.4221.426.38e+0738770.17POLYGON ((-76.65726 39.27600, -76.65726 39.276...0
12Beechfield/Ten Hills/West Hills22.3119.4221.2223.9221.9015.38Beechfield/Ten Hills/West Hills14.7714.774.79e+0737524.95POLYGON ((-76.69479 39.30201, -76.69465 39.301...0

You'll see you have a 'pointsinpolygons' column now.

geoloom_w_csas.plot( column='pointsinpolygon', legend=True)
Image Alt Text
geoloom_w_csas.head(1)
OBJECTIDCSA2010hhchpov14hhchpov15hhchpov16hhchpov17hhchpov18hhchpov19CSA2020hhchpov20hhchpov21Shape__AreaShape__Lengthgeometrypointsinpolygon
01Allendale/Irvington/S. Hilton41.5538.9334.7332.7735.2732.6Allendale/Irvington/S. Hilton21.4221.426.38e+0738770.17POLYGON ((-76.65726 39.27600, -76.65726 39.276...0

Polygons in Points

Alternately, you can run the ponp function and have returned the geoloom dataset

geoloom_w_csas = workWithGeometryData(method='ponp', df=geoloom_gdf, polys=csa_gdf, ptsCoordCol='geometry', polygonsCoordCol='geometry', polyColorCol='hhchpov18', polygonsLabel='CSA2010', pntsClr='red', polysClr='white')

We can count the totals per CSA using undefined

Alternately, we could map the centroid of boundaries within another boundary to find boundaries within boundaries

geoloom_w_csas['POINT_Y'] = geoloom_w_csas.centroid.y
 geoloom_w_csas['POINT_X'] = geoloom_w_csas.centroid.x
 
 # We already know the x and y columns because we just saved them as such.
 geoloom_w_csas['POINT_X'] = pd.to_numeric(geoloom_w_csas['POINT_X'], errors='coerce')
 geoloom_w_csas['POINT_Y'] = pd.to_numeric(geoloom_w_csas['POINT_Y'], errors='coerce')
 # df = df.replace(np.nan, 0, regex=True)
 
 # And filter out for points only in Baltimore City. 
 geoloom_w_csas = geoloom_w_csas[ geoloom_w_csas['POINT_Y'] > 39.3  ]
 geoloom_w_csas = geoloom_w_csas[ geoloom_w_csas['POINT_Y'] < 39.5  ]

Folium

But if that doesn't do it for you, we can also create heat maps and marker clusters

# https://github.com/python-visualization/folium/blob/master/examples/MarkerCluster.ipynb
map_points(geoloom_w_csas, lat_col='POINT_Y', lon_col='POINT_X', zoom_start=11, plot_points=True, cluster_points=False,
                pt_radius=1, draw_heatmap=True, heat_map_weights_col=None, heat_map_weights_normalize=True,
                heat_map_radius=15, popup='ProjNm')
Make this Notebook Trusted to load map: File -> Trust Notebook

And Time Sliders

Choropleth Timeslider

https://github.com/python-visualization/folium/blob/master/examples/TimeSliderChoropleth.ipynb

import geopandas as gpd
 import numpy as np
 import pandas as pd
 from branca.colormap import linear
 # conditionally loaded ->  from dataplay import geoms
 
 u = intaker.Intake
 rdf = u.getData('https://services1.arcgis.com/mVFRs7NF4iFitgbY/ArcGIS/rest/services/Biz1_/FeatureServer/0/query?where=1%3D1&outFields=*&returnGeometry=true&f=pgeojson')
 # rdf.set_index('CSA2010', drop=True, inplace=True)
 rdf.drop(labels=['OBJECTID_1', 'Shape__Area', 'Shape__Length'], axis=1, inplace=True)
 
 ndf = rdf.filter(regex='biz1|CSA2010', axis=1)
 
 # Calculate number of years available
 n_periods = len(ndf.columns) - 1
 # Get starting year.
 startAt = "20"+ndf.columns[1][-2:]
 
 # Create a 'YEAR' index with the assumption that all following years exist
 datetime_index = pd.date_range(startAt, periods=n_periods, freq="Y")
 dt_index_epochs = datetime_index.astype(int) // 10 ** 9
 dt_index = dt_index_epochs.astype("U10")
rdf.head()
styledata = {}
 # For the Index of each CSA
 for idx, csa in rdf.iterrows():
     df = pd.DataFrame( { "color": csa.values[1:-1] }, index=dt_index, )
     styledata[idx] = df
 
 max_color, min_color = 0, 0
 for country, data in styledata.items():
     max_color = max(max_color, data["color"].max())
     min_color = min(max_color, data["color"].min())
 
 cmap = linear.PuRd_09.scale(min_color, max_color)
 def norm(x): return (x - x.min()) / (x.max() - x.min())
 for country, data in styledata.items():
     data["color"] = data["color"].apply(cmap)
     data["opacity"] = 1
 
 styledict = { str(country): data.to_dict(orient="index") for country, data in styledata.items() }
 
 # { CSA : { timestamp: {color: value, opacity:value } }, 
 #    CSA : { timestamp: {color: value, opacity:value } }, 
 #    ... 
 # }
styledict
import folium
 from folium.plugins import TimeSliderChoropleth
 
 m = folium.Map([39.28759453969165, -76.61278931706487], width='50%', height='50%', zoom_start=12)
 g = TimeSliderChoropleth( rdf.to_json(), styledict=styledict, ).add_to(m)
 m
m.save(outfile= "test.html")