Home

Datascience

Clustering Associati... Distributions And Re... Geospatial Timeserie... Linear Regressions Ml Process Python Eda Spatial Stationarity Viz Notes Examples

Don't Look! I'm changing!

URL Copied

Common Python Data ManipulationsFilteringGrouping/ Aggregating/ ManipulatingCommon Python Cleaning Operations:Todo:MiscImportParse The DatatypesPlot HistogramsBasic OpsCategorical Analysis:Numeric Analysis:

Welcome

This notebook has functions to help handle common tasks

Functions

Common Python Data Manipulations

https://datascience.stackexchange.com/questions/37878/difference-between-isna-and-isnull-in-pandas

Common Python Data Manipulations

  1. Check the data types of all column in the data-frame
  2. Create a new data-frame excluding all the 'object' types column
  3. Select elements from each column that lie within 3 units of Z score

FILTERING

GROUPING/ Aggregating/ Manipulating

Common Python Cleaning operations:

biggest data cleaning task, missing values

Pandas will recognize both empty cells and “NA” types as missing values. Anything else should to be specified on import

In the code we’re looping through each entry in the “Owner Occupied” column. To try and change the entry to an integer, we’re using int(row). If the value can be changed to an integer, we change the entry to a missing value using Numpy’s np.nan. On the other hand, if it can’t be changed to an integer, we pass and keep going. The .loc method is the preferred Pandas method for modifying entries in place. https://pandas.pydata.org/pandas-docs/version/0.21/generated/pandas.DataFrame.loc.html

Read In Data

dashboards notes reduced

The functions that transform notebooks in a library

Basic Text

TODO :

FillNA = -1, avg

FillNA THEN Coerce

Todo:

MISC

Import

Parse The DataTypes

NOTES

Plot Histograms

Basic ops

Categorical Analysis:

Count, Unique, Top, Frequency

Numeric Analysis:

Geo

Future Self Service Tool

Data analytics

  1. Self Service
  2. Reccurent Reports
  3. Embedded Analytics.

GisHandler()

Main( Check For Missing Values, Perform Operation)

readFile() - csv/postgis -df -reverseGeocode? ColumnToCords? -Geodf

Geodataframe -toCrs, - saveGeoDataFrame

MergeBounds()

FilterBounds()

FilterPoints() Bounds Points

PoinsInPoly()

Applied Spatial Statistics

Applied Spatial Statistics -> Prior/Posteriors, MCMC, Kernel methods, dynamic state space modeling, multiple linear regression, multilevel models(causal inference, meta analysis), multi agent decision making, variable transformations, eigenvalues, Spatial models (Car,Sar) Kriging Time Series Models : ARM ARMA Dynamic linear models

Exploratory spatial analysis, spatial autocorrelation, spatial regression, interpolation, grid based stats, point based stats, spatial network analysis, spatial clustering.

Big-Data, Structure(Semi/Un), Time-Stamped, Spatial, Spatio-Temporal, Ordered, Stream, Dimensionality, Primary Keys, Unique Values, Index, Spatial, Auto Increment, Default Values, Null Values

Geographic Inquery:

http://pysal.org/notebooks/explore/esda/Spatial_Autocorrelation_for_Areal_Unit_Data.html Python Spatial Analysis library. https://pysal.org/notebooks/intro Python Spatial Analysis library. Shape Analysis hull: calculate the convex hull of the point pattern mbr: calculate the minimum bounding box (rectangle) The python file centrography.py contains several functions with which we can conduct centrography analysis.

Random point patterns are the outcome of CSR. https://en.wikipedia.org/wiki/Complete_spatial_randomness CSR has two major characteristics: Uniform: each location has equal probability of getting a point (where an event happens) Independent: location of event points are independent It usually serves as the null hypothesis in testing whether a point pattern is the outcome of a random process. There are two possible objectives in a discriminant analysis:

Misc

https://www.gnu.org/philosophy/open-source-misses-the-point.html

It seems to me that the chief difference between the MIT license and GPL is that the MIT doesn't require modifications be open sourced whereas the GPL does.

You don't have to open-source your changes if you're using GPL. You could modify it and use it for your own purpose as long as you're not distributing it

BUT...

if you DO distribute it, then your entire project that is using the GPL code also becomes GPL automatically Which means, it must be open-sourced, and the recipient gets all the same rights as you - meaning, they can turn around and distribute it, modify it, sell it, etc.

And that would include your proprietary code which would then no longer be proprietary - it becomes open source.

with MIT is that even if you actually distribute your proprietary code that is using the MIT licensed code you do not have to make the code open source you can distribute it as a closed app where the code is encrypted or is a binary.

 Including the MIT-licensed code can be encrypted, as long as it carries the MIT license notice. 
 

Clear indexdb -> readFile. Insert into IndexDB V.1.0