Home Software 01 Github 02 Colabs 03 Shell Basics 04 Apis 05 Webscraping 06 Nbdev 07 Javascript Packag... 08 Cloud Functions 09 Browser Extension Css Css Animations Javascript Llm Paradigms Protocols Websites

Don't Look! I'm changing!

URL Copied

BinderBinderBinderOpen Source Love svg3

NPM LicenseActiveGitHub last commit

GitHub starsGitHub watchersGitHub forksGitHub followers

TweetTwitter Follow

A free book covering Python data science with notebooks may be found here. It uses Jupyter Notebook, of which Google Colab is built off of.

Information for this section was pulled from a variety of resources. Click on the links to learn more!

The Colab Environment

Before we get into gritty details, take a moment to explore the Colab environment.

Setup & Configuration:

  1. Begin by visiting https://colab.research.google.com

  2. Click 'NEW PYTHON 3 NOTEBOOK'

  3. For the most part, that is all it takes!

  4. Many modules are already pre-installed on the virtual enviornment.

The following articles can help get you started. Excerpts have been selected and shown in block quotes.

Welcome to Colaboratory

Source

Colaboratory, or "Colab" for short, allows you to write and execute Python in your browser, with:

  • Zero configuration required

  • Free access to GPUs

  • Easy sharing

The document you are reading is not a static web page, but an interactive environment called a Colab notebook that lets you write and execute code.

To execute the code... use the keyboard shortcut "Command/Ctrl+Enter".

# The hash symbol at the front of this line means its a comment.
 # Comments show up in green and will not be interpreted upon code execution. 
 # In this example, we will perform a simple computation to see its output below
 
 1  + 1
# Notice how the output is now stored in the 'madeUpVariable' and 'evenMoreMadeUpVariable' variables and will not show give an output below.
 madeUpVariable = 1  + 1
 evenMoreMadeUpVariable = 13.5
# Variable values persist across blocks (both above and below), so always make sure your variables use the correct values!
 
 # Take note of the following. Output is hidden unless it is either placed on the last line or wrapped in a 'Print' function, like so.
 print(evenMoreMadeUpVariable)
 
 # Though none of this will show, the program will still run.
 evenMoreMadeUpVariable * evenMoreMadeUpVariable
 
 # This will show up since it is on the last line.
 madeUpVariable

Colab notebooks allow you to combine markup, executable code, and text into a single document, along with images, HTML, LaTeX and more. When you create your own Colab notebooks, they are stored in your Google Drive account. You can easily share your Colab notebooks with co-workers or friends, allowing them to comment on your notebooks or even edit them. To learn more, see Overview of Colab. To create a new Colab notebook you can use the File menu above, or use the following link: create a new Colab notebook.

Colab notebooks are Jupyter notebooks that are hosted by Colab. To learn more about the Jupyter project, see jupyter.org.

All blockquotes in the section above was pulled from the header's link.

Colab Menu Bar

Everything you need can be found in your menu bar.

Follow the brief outline below:

Colabs Menu

File (accessible on the left hand drawer)

  • Locate in drive

  • New, Open, Upload, Save, Download

  • Save to Github or Drive

Edit

  • Undo

  • Select all, Cut, Copy, Paste, Delete

  • Find, Replace

  • Show/Hide all code

  • Clear all code outputs

View

  • Table of Contents (accessible on the left hand drawer)

  • Executed Code History

  • Diff Notebooks

  • Collapse Sections

Insert

  • Code/Text Cell

  • Section Header

  • Code Snippet (accessible on the left hand drawer)

Runtime

  • Run - This action can be used to execute all cells, or at least anything before, after, or in a selected cell.

  • Interrupt Execution - Just in case the code is caught in an eternal loop or is hanging.

  • Restart (and optionally re-run all) - Installed modules are kept but must be re-imported.

  • Factory reset runtime - Must re-install all modules

Tools

  • Command Palette - Clickable menu of shortcuts

  • Settings

    • Site - Set theming

    • Editor - Set indentation, fontsize, line width

    • Misc - Enable 'Corgie' and or 'Kittie' Mode.

  • Keyboard Shortcuts

Help

Overview of Colaboratory Features

Features in the header link's article are accessible from the Menu Bar.

Colaboratory "magics" are shorthand annotations that change how a cell's text is executed.

Much more on this is covered below. For now, observe what you can do with it:

Here, we use python magics in the first line of this code-block to have the remaining lines display HTML

With magics, you can execute terminal commands straight from a code block!

Preface your terminal command with a ! or $ so the interpreter knows the text is not Python.

!ls
sample_data/
  • warning: Use cd or $cd to change directories; !cd will not work as expected.

Which means a change directory command won't persist.

!cd sample_data
ls
sample_data/

Unless you use%

% cd sample_data/
/content/sample_data
ls
anscombe.json* mnist_test.csv california_housing_test.csv mnist_train_small.csv california_housing_train.csv README.md*

Python variables can take output from a terminal command.

pythonVariable = !cat README.md
pythonVariable[0]

Terminal commands can take variable using {python variable}.

!echo {pythonVariable[0]}
cd ../
/content
ls
sample_data/

The output response from the execution of a terminal command can even be stored as Python variables!

cow = !ls
 cow

When you change directories $cd ./filepath/

More Tricks

Other advanced code tricks include the following:

  • Hosting notebooks online using GitHub and myBinder.

  • Notebooks can also be colaboratively edited by sharing a link on Google Drive.

  • Colabs can connect to and run on your local machine.

Markup

In computer text processing, a markup language is a system for annotating a document in a way that is syntactically distinguishable from the text,[1] meaning when the document is processed for display, the markup language is not shown, and is only used to format the text. -wikipedia

Markdown Guide

A) Markdown is the name given to markup used for making text rich-text.

B) Text cells (not code-cells) in Google Colab will automatically understand Markdown and display it appropriately.

C) Within Colabs, many HTML elements can readily be rendered within Markdown cells like the enriched text in this sentence.

  • This is not a given on other markdown viewers and can be prevented by encapsulating the html <u>with backticks</u> .

Badges

Badges are (typically) action-enabled icons used to call attention to the reader. These are often displayed using HTML or Markdown.

Pick a template and create your own badge from shields.io to get started!

More on Markdown:

Flags: Magics and Comments

A. 'Flags' are a special form of shorthand annotation that change how code-block's are executed.

B. These annotations augment the interpreters handling of a cell or line.

C. Flags are placed on the first line or on a per line basis depending on intent

D. There exists two types of Flags: Comment and Magics

  1. Magics is often identified by two %'s at the top of the document followed by the intendid magical affect.

  2. Comments use a single # and are less favored since the # symbol is already overloaded.

  • Under normal circumstances, a # will preface a numeral, whats more,

  • Markdown uses #'s to denote a header element.

Common Uses:

A) Create section titles from within a codeblock using #@title <TITLENAME>

B) Suppress cell output using %%capture.

C) Execute terminal commands in a cell by prefacing it with the ! line-magics.

D) Comment-ify a line in your code using the #' prefix.

E) Render the cell as %%html or %%javascript or a single line with #@markdown.

F) Creating input forms by placing the line-magics #@param {type:"DATA-TYPE"} at the end of a variable declaration.

The Python Enviornment

Now that we understand a bit more about Colab, we can address the following questions.

What is Python?

From the Docs

(emphasis my own)

Python is an interpreted, object-oriented, high-level programming language with dynamic semantics. Its high-level built in data structures, combined with dynamic typing and dynamic binding, make it very attractive for Rapid Application Development, as well as for use as a scripting or glue language to connect existing components together. Python's simple, easy to learn syntax emphasizes readability and therefore reduces the cost of program maintenance. Python supports modules and packages, which encourages program modularity and code reuse. The Python interpreter and the extensive standard library are available in source or binary form without charge for all major platforms, and can be freely distributed.

Often, programmers fall in love with Python because of the increased productivity it provides. Since there is no compilation step, the edit-test-debug cycle is incredibly fast. Debugging Python programs is easy: a bug or bad input will never cause a segmentation fault. Instead, when the interpreter discovers an error, it raises an exception. When the program doesn't catch the exception, the interpreter prints a stack trace. A source level debugger allows inspection of local and global variables, evaluation of arbitrary expressions, setting breakpoints, stepping through the code a line at a time, and so on. The debugger is written in Python itself, testifying to Python's introspective power. On the other hand, often the quickest way to debug a program is to add a few print statements to the source: the fast edit-test-debug cycle makes this simple approach very effective.

What makes Python high-level?

Because it is not assembly or as a series of ones and zeroes, memory management is made automatic.

What makes Python Object-Oriented

Basically, everything in Python is an object?! We will get back to this later. But for now, here's a peek.

undefined

More information on JSON:

What makes Python interpreted?

  1. Machines run on machine code and Python needs some way to be translated to machine code.

  2. When you execute a line of python code, the process of interpreting the python code and translating (compiling) it to machine code happens in real-time.

  3. While all languages need to be interpreted, the real-time compilation during code execution is why Python is called an interpreted as apposed to compiled language.

'Installing python' is really just the process of installing an interpreter.

  • Colab comes with a built-in interpreter that runs every time a cell runs.

  • Use this guide to learn more about local installation.

Python files can be imported for use in other scripts or interpretated directly using a Python terminal command.

  • python ./path/to/file/nameOfFile.py

What is the difference between Python 2 and Python 3?

The difference should not matter!

It used to, but Python 2 is now depricated. Everyone should be using Python 3.

If your computer comes with Python built-in, chances are it came with Python 2. Finagling with two versions of Python can be a pain since they use different notations.

With Colabs, this is simply not a problem because of they are brand new virtualized enviornments every time.

What are modules?

A module is a Python object with arbitrarily named attributes that you can bind and reference. Simply, a module is a file consisting of Python code. A module can define functions, classes and variables. A module can also include runnable code.... You can use any Python source file as a module by executing an import statement in some other Python source file. - TutorialsPoint

'Package' is a term often used to describe a suite of modules.

What are PIP and PyPI?

PIP is a de facto standard package-management system used to install and manage software packages written in Python. Many packages can be found in the default source for packages and their dependencies — Python Package Index (PyPI). Most distributions of Python come with PIP preinstalled. Python 2.7.9 and later (on the Python 2 series), and Python 3.4 and later include PIP (PIP3 for Python 3) by default. - Wikipedia

If you find Python code you like on GitHub, see if it can be found on PyPI.

If so, type pip install package into the terminal to install the module.

Once installed, you can now 'import' the package in your Python code.

For more information on PIP, check out this cool guide

To import a library that is not in Colaboratory by default, you can use !pip install or !apt-get install. - Snippets: Importing Libraries

Pandas

Colab comes with PIP pre-installed but can be installed using 'pip install pandas'

!pip install pandas

Congratulations! You've installed Pandas.

# Now that pandas has been installed on the virtual enviornment, import it as a module into your codes memory!
 # This looks a bit redundent but in this instance, we are assigning the pandas module to the variable 'pd'.
 import pandas as pd

Pandas provides tools for data analysis. As an example, let's import some JSON data!

# To use the pandas module, we refer to it by its namespace.
 # In this example, we use the pandas 'read_json' function to prepare our json for data play.  
 pd.read_json('{"row 1":{"col 1":"a","col 2":"b"},"row 2":{"col 1":"c","col 2":"d"}}', orient='index')

You can do awesome things with data when it is being interpreted as a 'dataframe'. Take a look!

newlyCreatedDataframeVariable = pd.read_json('{"row 1":{"col 1":"a","col 2":"b"},"row 2":{"col 1":"c","col 2":"d"}}', orient='index')
# Show the first row
 newlyCreatedDataframeVariable.head()
# Make a copy of the dataset
 variable2 = newlyCreatedDataframeVariable.copy()
# Show the first row
 variable2.tail(2)
variable2['col 1']
# This would save the file as a CSV onto wherever the virtual enviornment is mounted.
 # This may be the temporary mount-point, or google drive/ local hard drive.
 #variable2.to_csv(index=False)

Pandas works with bunch of great utilities like Dexplot and Geopandas for enhanced visualizations.

A more thorough introduction to pandas on colabs can be found here.

Learning Objectives:

  • Gain an introduction to the DataFrame and Series data structures of the pandas library

  • Access and manipulate data within a DataFrame and Series

  • Import CSV data into a pandas DataFrame

  • Reindex a DataFrame to shuffle data

Be sure to take a look at its online library, provided to help you along the way!

External Data

The most simple way to access your data is by mounting Google Drive to your virtual enviornment.

# Run this. 
 # Click the link that shows itself. 
 # Give permission. 
 # Copy the link and paste it back here.
 from google.colab import drive
 drive.mount('/content/drive')
cd ./drive/'My Drive'/colabs/DATA
ls

You can store a user's input as a value, like so:

left_on = input("Left on: " )

A neat trick to get form values can be done like this:

#@title Example form fields
 #@markdown Forms support many types of fields.
 
 filename = 'concrete.csv'  #@param
 displayColumn = 'Cement'  #@param {type: "string"}
 multiplyer2 = 100  #@param {type: "slider", min: 100, max: 200}
 multiplyer1 = 102  #@param {type: "number"}
 variable5 = '2010-11-05'  #@param {type: "date"}
 variable6 = "monday"  #@param ['monday', 'tuesday', 'wednesday', 'thursday']
 displayColumn2 = "Strength" #@param ["Strength", "bananas", "oranges"] {allow-input: true}
 #@markdown ---

Just be sure to re-run the cell block to update the variable values.

concreteDataframe = pd.read_csv(filename)
 concreteDataframe.head()
concreteDataframe['NewAttribute'] = (concreteDataframe[displayColumn].head() * multiplyer2) - (concreteDataframe[displayColumn].head() * multiplyer1)
 concreteDataframe.head()

Putting it Together

dataguide is a package I am working on to help work with data. It provides tools and tutorials for data manipulation.

With this package, you can install ACS data with relative ease.

! pip install dataguide geopandas
from dataguide.acsDownload import retrieve_acs_data 
 
 # Define our download parameters.
 # More information on these parameters can be found in the tutorials!
 tract = '*'
 county = '510'
 state = '24'
 tableId = 'B19001'
 year = '17'
 saveAcs = False
 
 retrieve_acs_data(state, county, tract, tableId, year, saveAcs).head(2)
# Get the Second dataset. 
 # Our example dataset contains Polygon Geometry information. 
 # We want to merge this over to our principle dataset. we will grab it by matching on either CSA or Tract.
 
 # The url listed below is public.
 
 print('Crosswalk Example: https://docs.google.com/spreadsheets/d/e/2PACX-1vREwwa_s8Ix39OYGnnS_wA8flOoEkU7reIV4o3ZhlwYhLXhpNEvnOia_uHUDBvnFptkLLHHlaQNvsQE/pub?output=csv')
 print('Boundaries Example: https://docs.google.com/spreadsheets/d/e/2PACX-1vQ8xXdUaT17jkdK0MWTJpg3GOy6jMWeaXTlguXNjCSb8Vr_FanSZQRaTU-m811fQz4kyMFK5wcahMNY/pub?gid=886223646&single=true&output=csv')
 
 inFile = input("\n Please enter the location of your file : \n" )
 
 crosswalk = pd.read_csv( inFile )
 crosswalk.head()
import dataguide.mergeData
# Table: FDIC Baltimore Banks
 # Columns: Bank Name, Address(es), Census Tract
 left_ds = 'https://docs.google.com/spreadsheets/d/e/2PACX-1vSHFrRSHva1f82ZQ7Uxwf3A1phqljj1oa2duGlZDM1vLtrm1GI5yHmpVX2ilTfMHQ/pub?gid=601362340&single=true&output=csv'
 left_col = 'Census Tract'
 
 # Table: Crosswalk Census Communities
 # 'TRACT2010', 'GEOID2010', 'CSA2010'
 right_ds = 'https://docs.google.com/spreadsheets/d/e/2PACX-1vREwwa_s8Ix39OYGnnS_wA8flOoEkU7reIV4o3ZhlwYhLXhpNEvnOia_uHUDBvnFptkLLHHlaQNvsQE/pub?output=csv'
 right_col='TRACT2010'
 
 merge_how = 'outer'
 interactive = True
 use_crosswalk = True
 
 merged_df = mergeDatasets( left_ds=left_ds, left_col=left_col, 
               right_ds=right_ds, right_col=right_col, 
               merge_how='left', interactive =True, use_crosswalk=use_crosswalk )
dir(mergeDatasets)
mergeDatasets
dir(retrieve_acs_data)