Home 00 Dataplay 01 Download And Load02 Merge Data 03 Map Basics Intake... 04 Nb 2 Html 05 Map Correlation N... 06 Timelapse Data Gi...

Don't Look! I'm changing!

URL Copied

Musical loops

BinderBinderBinderOpen Source Love svg3

NPM LicenseActivePython VersionsGitHub last commit

GitHub starsGitHub watchersGitHub forksGitHub followers

TweetTwitter Follow

About this Tutorial:

Whats inside?

In this notebook, the basics of data-intake are introduced.

Objectives

By the end of this tutorial users should have an understanding of:

Background

For this next example to work, we will need to import hypothetical csv files

Try It! Go ahead and try running the cell below in Colabs

Advanced

The Function

class Intake:
 
   # 1. Recursively calls self/getData until something valid is given.
   #    Returns df or False. Calls readInGeometryData. or pulls csv directly.
   # Returns df or False.
   def getData(url, interactive=False):
     escapeQuestionFlags = ["no", '', 'none']
     if ( Intake.isPandas(url) ): return url
     if (str(url).lower() in escapeQuestionFlags ): return False
     if interactive: print('Getting Data From: ', url)
     try:
       if ([ele for ele in ['pgeojson', 'shp', 'geojson'] if(ele in url)]):
         print('gothere', url)
         from dataplay import geoms
         print('gothere1')
         df = geoms.readInGeometryData(url=url, porg=False, geom='geometry', lat=False, lng=False, revgeocode=False,  save=False, in_crs=2248, out_crs=False)
       elif  ('csv' in url): df = pd.read_csv( url )
       return df
     except:
       if interactive: return Intake.getData(input("Error: Try Again?  ( URL/ PATH or  'NO'/  ) " ), interactive)
       return False
 
   # 1ai. A misnomer. Returns Bool.
   def isPandas(df): return isinstance(df, pd.DataFrame) or isinstance(df, gpd.GeoDataFrame) or isinstance(df, tuple)
 
 
   # a1. Used by Merge Lib. Returns valid (df, column) or (df, False) or (False, False).
   def getAndCheck(url, col='geometry', interactive=False):
     df = Intake.getData(url, interactive) # Returns False or df
     if ( not Intake.isPandas(df) ):
       if(interactive): print('No data was retrieved.', df)
       return False, False
     if (isinstance(col, list)):
       for colm in col:
         if not Intake.getAndCheckColumn(df, colm):
           if(interactive): print('Exiting. Error on the column: ', colm)
           return df, False
     newcol = Intake.getAndCheckColumn(df, col, interactive) # Returns False or col
     if (not newcol):
       if(interactive): print('Exiting. Error on the column: ', col)
       return df, col
     return df, newcol
 
   # a2. Returns Bool
   def checkColumn(dataset, column): return {column}.issubset(dataset.columns)
 
   # b1. Used by Merge Lib. Returns Both Datasets and Coerce Status
   def coerce(ds1, ds2, col1, col2, interactive):
     ds1, ldt, lIsNum = Intake.getdTypeAndFillNum(ds1, col1, interactive)
     ds2, rdt, rIsNum  = Intake.getdTypeAndFillNum(ds2, col2, interactive)
 
     ds2 = Intake.coerceDtypes(lIsNum, rdt, ds2, col2, interactive)
     ds1 = Intake.coerceDtypes(rIsNum, ldt, ds1, col1, interactive)
 
     # Return the data and the coerce status
     return ds1, ds2, (ds1[col1].dtype == ds2[col2].dtype)
 
    # b2. Used by Merge Lib. fills na with crazy number
   def getdTypeAndFillNum(ds, col, interactive):
     dt = ds[col].dtype
     isNum = dt == 'float64' or dt == 'int64'
     if isNum: ds[col] = ds[col].fillna(-1321321321321325)
     return ds, dt, isNum
 
    # b3. Used by Merge Lib.
   def coerceDtypes(isNum, dt, ds, col, interactive):
     if isNum and dt == 'object':
       if(interactive): print('Converting Key from Object to Int' )
       ds[col] = pd.to_numeric(ds[col], errors='coerce')
       if interactive: print('Converting Key from Int to Float' )
       ds[col] = ds[col].astype(float)
     return ds
 
   # a3. Returns False or col. Interactive calls self
   def getAndCheckColumn(df, col, interactive):
     if Intake.checkColumn(df, col) : return col
     if (not interactive): return False
     else:
         print("Invalid column given: ", col);
         print(df.columns);
         print("Please enter a new column fom the list above.");
         col = input("Column Name: " )
         return Intake.getAndCheckColumn(df, col, interactive);
df = geoms.readInGeometryData(url=url, porg=False, geom='geometry', lat=False, lng=False, revgeocode=False,  save=False, in_crs=2248, out_crs=False)
 u = Intake
 rdf = Intake.getData(url) 

Here we can save the data so that it may be used in later tutorials.

OBJECTIDCSA2010hhchpov14hhchpov15hhchpov16hhchpov17hhchpov18hhchpov19CSA2020hhchpov20hhchpov21Shape__AreaShape__Lengthgeometry
01Allendale/Irving...41.5538.9334.7332.7735.2732.60Allendale/Irving...21.4221.426.38e+0738770.17POLYGON ((-76.65...
12Beechfield/Ten H...22.3119.4221.2223.9221.9015.38Beechfield/Ten H...14.7714.774.79e+0737524.95POLYGON ((-76.69...
23Belair-Edison36.9336.8836.1334.5639.7441.04Belair-Edison31.7631.764.50e+0731307.31POLYGON ((-76.56...
# .to_csv(string+'.csv', encoding="utf-8", index=False, quoting=csv.QUOTE_ALL)

Download data by:

You can upload this data into the next tutorial in one of two ways.

  1. Uploading the saved file to google Drive and connecting to your drive path

OR.

  1. 'by first downloading the dataset as directed above, and then navigating to the next tutorial. Go to their page and then uploading data using an file 'upload' button accessible within the 'Files' tab in the left hand menu of this screen. The next tutorial will teach you how to load this data so that it may be mapped.

Here are some examples:

Using Esri and the Geoms handler directly:

geoloom_gdf_url = "https://services1.arcgis.com/mVFRs7NF4iFitgbY/ArcGIS/rest/services/Geoloom_Crowd/FeatureServer/0/query?where=1%3D1&outFields=*&returnGeometry=true&f=pgeojson"
 geoloom_gdf = dataplay.geoms.readInGeometryData(url=geoloom_gdf_url, porg=False, geom='geometry', lat=False, lng=False, revgeocode=False,  save=False, in_crs=4326, out_crs=False)
 geoloom_gdf = geoloom_gdf.dropna(subset=['geometry']) 
 geoloom_gdf.head(1)
OBJECTIDData_typeAttachProjNmDescriptLocationURLNamePhEmailCommentsPOINT_XPOINT_YGlobalIDgeometry
01Artists & ResourcesNaNJoeTest123 Market Pl, B...-8.53e+064.76e+06e59b4931-e0c8-4d...POINT (-76.60661...

Again but with the Intake class:

Geoloom_Crowd, rcol = u.getAndCheck('https://services1.arcgis.com/mVFRs7NF4iFitgbY/ArcGIS/rest/services/Geoloom_Crowd/FeatureServer/0/query?where=1%3D1&outFields=*&returnGeometry=true&f=pgeojson')
 Geoloom_Crowd.head(1)

This getAndCheck function is useful for checking for a required field.

Hhpov = Hhpov[['CSA2010', 'hhpov15',	'hhpov16',	'hhpov17',	'hhpov18',	'hhpov19']]
 # Hhpov.to_csv('Hhpov.csv')

We could also retrieve from a file.

# rdf = u.getData('Hhpov.csv')
 rdf.head()
OBJECTIDCSA2010hhchpov14hhchpov15hhchpov16hhchpov17hhchpov18hhchpov19CSA2020hhchpov20hhchpov21Shape__AreaShape__Lengthgeometry
01Allendale/Irving...41.5538.9334.7332.7735.2732.60Allendale/Irving...21.4221.426.38e+0738770.17POLYGON ((-76.65...
12Beechfield/Ten H...22.3119.4221.2223.9221.9015.38Beechfield/Ten H...14.7714.774.79e+0737524.95POLYGON ((-76.69...
23Belair-Edison36.9336.8836.1334.5639.7441.04Belair-Edison31.7631.764.50e+0731307.31POLYGON ((-76.56...
34Brooklyn/Curtis ...46.9445.0146.4546.4139.8941.39Brooklyn/Curtis ...51.3251.321.76e+08150987.70MULTIPOLYGON (((...
45Canton6.525.492.994.024.614.83Canton4.134.131.54e+0723338.61POLYGON ((-76.57...