Home Dataplay Download And Load Merge Data Map Basics Intake An... Map Correlation Netw... Timelapse Data Gifs Retrieve Acs Data Pivot Table Sync Data Food Cook Food Fish Food Science History Dad History History Life DateIdeas Life Words

Don't Look! I'm changing!

URL Copied

 About
 Retrieve Datasets
  Get The Principal Dataset.
  Get The Secondary Dataset
 Perform Merge & Save
 Interactive
 Advanced
  Function Explanation
  Function Diagrams
  Function Examples
   Interactive Example 1. Merge Esri Data
   Example 2 ) Get CSA And Geometry With A Crosswalk Using 3 Links
   Example 3: Ran Alone

BinderBinderBinderOpen Source Love svg3

NPM LicenseActivePython VersionsGitHub last commit

GitHub starsGitHub watchersGitHub forksGitHub followers

TweetTwitter Follow

About

Retrieve Datasets

Our example will merge two simple datasets; The first dataset is from the census bureau and the later dataset has spatial data and will match to the first using each respective datatsets' tract attribute.


The First dataset will be obtained from the Census' ACS 5-year serveys.


The Second dataset is from a publicly accessible link: CSA-to-Tract-2010 crosswalk

Get the Principal dataset.

We will use the function we created in the "ACS" article to download the data.

Our download function will use Baltimore City's tract, county and state as internal paramters.

Change these values in the cell below using different geographic reference codes will change those parameters.

B19001_001E_TotalB19001_002E_Total_Less_than_$10,000B19001_003E_Total_$10,000_to_$14,999B19001_004E_Total_$15,000_to_$19,999B19001_005E_Total_$20,000_to_$24,999B19001_006E_Total_$25,000_to_$29,999B19001_007E_Total_$30,000_to_$34,999B19001_008E_Total_$35,000_to_$39,999B19001_009E_Total_$40,000_to_$44,999B19001_010E_Total_$45,000_to_$49,999B19001_011E_Total_$50,000_to_$59,999B19001_012E_Total_$60,000_to_$74,999B19001_013E_Total_$75,000_to_$99,999B19001_014E_Total_$100,000_to_$124,999B19001_015E_Total_$125,000_to_$149,999B19001_016E_Total_$150,000_to_$199,999B19001_017E_Total_$200,000_or_morestatecountytract
NAME
Census Tract 2710.0215102097394971101199765361491681066644502724510271002
Census Tract 2604.02113414629738041914975811705716263116024510260402

Get the Secondary Dataset

Spatial data can be attained by using the 2010 Census Tract Shapefile Picking Tool or search their website for Tiger/Line Shapefiles

The core TIGER/Line Files and Shapefiles do not include demographic data, but they do contain geographic entity codes (GEOIDs) that can be linked to the Census Bureau’s demographic data, available on data.census.gov.-census.gov

For this example, we will simply pull a local dataset containing columns labeling tracts within Baltimore City and their corresponding CSA (Community Statistical Area). Typically, we use this dataset internally as a "crosswalk" where-upon a succesfull merge using the tract column, will be merged with a 3rd dataset along it's CSA column. Link: CSA-to-Tract-2010 crosswalk


Our Example dataset contains Polygon Geometry information we want to merge over to our principle dataset by matching on either CSA or Tract

TRACTCE10GEOID10CSA2010
01010024510010100Canton
11020024510010200Patterson Park N...

Perform Merge & Save

The following picture does nothing important but serves as a friendly reminder of the 4 basic join types.

The final function will perform this union wherever undefined and undefined match.

B19001_001E_TotalB19001_002E_Total_Less_than_$10,000B19001_003E_Total_$10,000_to_$14,999B19001_004E_Total_$15,000_to_$19,999B19001_005E_Total_$20,000_to_$24,999B19001_006E_Total_$25,000_to_$29,999B19001_007E_Total_$30,000_to_$34,999B19001_008E_Total_$35,000_to_$39,999B19001_009E_Total_$40,000_to_$44,999B19001_010E_Total_$45,000_to_$49,999...B19001_013E_Total_$75,000_to_$99,999B19001_014E_Total_$100,000_to_$124,999B19001_015E_Total_$125,000_to_$149,999B19001_016E_Total_$150,000_to_$199,999B19001_017E_Total_$200,000_or_morestatecountytractGEOID10CSA2010
01510209739497110119976536...10666445027245102710022.45e+10Greater Govans
111341462973804191497581...162631160245102604022.45e+10Claremont/Armistead

2 rows Γ— 22 columns

As you can see, our Census data will now have a CSA appended to it.

Interactive

B19001_001E_TotalB19001_002E_Total_Less_than_$10,000B19001_003E_Total_$10,000_to_$14,999B19001_004E_Total_$15,000_to_$19,999B19001_005E_Total_$20,000_to_$24,999B19001_006E_Total_$25,000_to_$29,999B19001_007E_Total_$30,000_to_$34,999B19001_008E_Total_$35,000_to_$39,999B19001_009E_Total_$40,000_to_$44,999B19001_010E_Total_$45,000_to_$49,999...B19001_013E_Total_$75,000_to_$99,999B19001_014E_Total_$100,000_to_$124,999B19001_015E_Total_$125,000_to_$149,999B19001_016E_Total_$150,000_to_$199,999B19001_017E_Total_$200,000_or_morestatecountyTRACTCE10GEOID10CSA2010
08081951148049768126700...3113015624510190100.02.45e+10Southwest Baltimore
16985869131323926192432...55534211924510190200.02.45e+10Southwest Baltimore

2 rows Γ— 22 columns

Advanced

Intro

The following Python function is a bulked out version of the previous notes.

Description: add columns of data from a foreign dataset into a primary dataset along set parameters.

Purpose: Makes Merging datasets simple

Services

Show Code

Function Explanation

Input(s):

  • Dataset url

  • Crosswalk Url

  • Right On

  • Left On

  • How

  • New Filename

Output: File

How it works:

  • Read in datasets

  • Perform Merge

  • If the 'how' parameter is equal to ['left', 'right', 'outer', 'inner']

    • then a merge will be performed.

  • If a column name is provided in the 'how' parameter

    • then that single column will be pulled from the right dataset as a new column in the left_ds.

Function Diagrams

Diagram the mergeDatasets()

mergeDatasets Flow Chart

Gannt Chart mergeDatasets()

Sequence Diagram mergeDatasets()

Function Examples

Interactive Example 1. Merge Esri Data

left_ds left_ds
OBJECTID_xCSA2010hhchpov14hhchpov15hhchpov16hhchpov17hhchpov18hhchpov19CSA2020_xhhchpov20...hhpov15hhpov16hhpov17hhpov18hhpov19CSA2020_yhhpov21Shape__Area_yShape__Length_ygeometry_y
01Allendale/Irving...41.5538.9334.7332.7735.2732.60Allendale/Irving...21.42...24.1521.2820.7023.019.18Allendale/Irving...14.176.38e+0738770.17POLYGON ((-76.65...
12Beechfield/Ten H...22.3119.4221.2223.9221.9015.38Beechfield/Ten H...14.77...11.1711.5910.4710.98.82Beechfield/Ten H...8.464.79e+0737524.95POLYGON ((-76.69...

2 rows Γ— 25 columns

Example 2 ) Get CSA and Geometry with a Crosswalk using 3 links

B19049_001E_Median_household_income_in_the_past_12_months_(in_2017_inflation-adjusted_dollars)_--_TotalB19049_002E_Median_household_income_in_the_past_12_months_(in_2017_inflation-adjusted_dollars)_--_Householder_under_25_yearsB19049_003E_Median_household_income_in_the_past_12_months_(in_2017_inflation-adjusted_dollars)_--_Householder_25_to_44_yearsB19049_004E_Median_household_income_in_the_past_12_months_(in_2017_inflation-adjusted_dollars)_--_Householder_45_to_64_yearsB19049_005E_Median_household_income_in_the_past_12_months_(in_2017_inflation-adjusted_dollars)_--_Householder_65_years_and_overstatecountytract
NAME
Census Tract 2710.0238358-66666666634219409723714324510271002
crosswalk_ds household poverty
B19049_001E_Median_household_income_in_the_past_12_months_(in_2017_inflation-adjusted_dollars)_--_TotalB19049_002E_Median_household_income_in_the_past_12_months_(in_2017_inflation-adjusted_dollars)_--_Householder_under_25_yearsB19049_003E_Median_household_income_in_the_past_12_months_(in_2017_inflation-adjusted_dollars)_--_Householder_25_to_44_yearsB19049_004E_Median_household_income_in_the_past_12_months_(in_2017_inflation-adjusted_dollars)_--_Householder_45_to_64_yearsB19049_005E_Median_household_income_in_the_past_12_months_(in_2017_inflation-adjusted_dollars)_--_Householder_65_years_and_overstatecountytractCSA2010OBJECTIDhhpov15hhpov16hhpov17hhpov18hhpov19CSA2020hhpov21Shape__AreaShape__Lengthgeometry
038358-66666666634219409723714324510271002Greater Govans20.021.3219.2719.5317.9920.5Greater Govans17.192.27e+0722982.13POLYGON ((-76.59...
144904-6666666665132442083372692451090100Greater Govans20.021.3219.2719.5317.9920.5Greater Govans17.192.27e+0722982.13POLYGON ((-76.59...

Example 3: Ran Alone

False False
False False passed ---Handling Left Dataset Options--- No data was retrieved. False Left column: False