You first determine if a parameter is best expressed with a polynomial or linear model. Concatenating models create a single multivariate equation.
- Polynomial equations are special Multivariates *
https://automating-gis-processes.github.io/2018/notebooks/L4/reclassify.html
https://www.analyticsvidhya.com/blog/2015/12/complete-tutorial-time-series-modeling/ stationary series, random walks , Rho Coefficient, Dickey Fuller Test of Stationarity
- The mean of the series should not be a function of time rather should be a constant.
- The variance of the series should not a be a function of time.
- The covariance of the i th term and the (i + m) th term should not be a function of time.
Now, we will vary the value of Rho to see if we can make the series stationary. Introduced coefficient : Rho X(t) = Rho * X(t-1) + Er(t) Auto-Regressive Time Series Model Exploring data becomes most important in a time series model – without this exploration, you will not know whether a series is stationary or not.
The current GDP of a country say x(t) is dependent on the last year’s GDP i.e. x(t – 1). Hence, we can formally write the equation of GDP as: x(t) = alpha * x(t – 1) + error (t) The alpha is a coefficient which we seek so as to minimize the error function. Moving Average Time Series Model This equation is known as AR(1) formulation. The numeral one (1) denotes that the next instance is solely dependent on the previous instance. The alpha is a coefficient which we seek so as to minimize the error function. x(t) = beta * error(t-1) + error (t) In MA model, noise / shock quickly vanishes with time. The AR model has a much lasting effect of the shock.
There are three commonly used technique to make a time series stationary:
- Detrending : Here, we simply remove the trend component from the time series. For instance, the equation of my time series is: x(t) = (mean + trend * t) + error We’ll simply remove the part in the parentheses and build model for the rest.
- Differencing : This is the commonly used technique to remove non-stationarity. Here we try to model the differences of the terms and not the actual term. For instance, x(t) – x(t-1) = ARMA (p , q) This differencing is called as the Integration part in AR(I)MA. Now, we have three parameters… p : AR, d : I, q : MA
- Seasonality : Seasonality can easily be incorporated in the ARIMA model directly. More on this has been discussed in the applications part below.
Line or Polygon from a Collection of Point Union, Difference, Distance Intersects, Touches, Crosses, Within using a function called .within() that checks if a point is within a polygon using a function called .contains() that checks if a polygon contains a point if objects intersect, the boundary and interior of an object needs to intersect in any way with those of the other. If an object touches the other one, it is only necessary to have (at least) a single point of their boundaries in common but their interiors shoud NOT intersect. nearest_points()
https://towardsdatascience.com/time-based-cross-validation-d259b13d42b8
https://towardsdatascience.com/detecting-stationarity-in-time-series-data-d29e0a21e638 https://towardsdatascience.com/how-to-predict-a-time-series-part-1-6d7eb182b540 classical forecasting methodology (arima, exponential smoothing state space models , moving average etc…)
https://nbviewer.jupyter.org/github/pmaji/data-science-toolkit/blob/master/time-series/forecasting_with_prophet.ipynb https://www.analyticsvidhya.com/blog/2018/08/auto-arima-time-series-modeling-python-r/
Time Series: https://towardsdatascience.com/time-series-forecasting-arima-models-7f221e9eee06 https://towardsdatascience.com/time-series-forecasting-with-prophet-54f2ac5e722e https://towardsdatascience.com/an-end-to-end-project-on-time-series-analysis-and-forecasting-with-python-4835e6bf050b https://towardsdatascience.com/forecasting-exchange-rates-using-arima-in-python-f032f313fc56 https://towardsdatascience.com/forecasting-with-prophet-d50bbfe95f91 https://towardsdatascience.com/exploring-the-sp500-with-r-part-2-asset-analysis-657d3c1caf60 https://towardsdatascience.com/financial-analytics-exploratory-data-analysis-of-stock-data-d98cbadf98b9 https://nbviewer.jupyter.org/github/changhiskhan/talks/blob/master/pydata2012/pandas_timeseries.ipynb https://towardsdatascience.com/time-series-machine-learning-regression-framework-9ea33929009a
https://colab.research.google.com/notebooks/widgets.ipynb#scrollTo=QKk_E6-QRVPW
https://towardsdatascience.com/detecting-stationarity-in-time-series-data-d29e0a21e638
https://towardsdatascience.com/how-to-predict-a-time-series-part-1-6d7eb182b540
classical forecasting methodology (arima, exponential smoothing state space models , moving average etc…)
https://www.analyticsvidhya.com/blog/2018/08/auto-arima-time-series-modeling-python-r/
Time Series: https://towardsdatascience.com/time-series-forecasting-arima-models-7f221e9eee06
https://towardsdatascience.com/time-series-forecasting-with-prophet-54f2ac5e722e
https://towardsdatascience.com/forecasting-exchange-rates-using-arima-in-python-f032f313fc56
https://towardsdatascience.com/forecasting-with-prophet-d50bbfe95f91
https://towardsdatascience.com/exploring-the-sp500-with-r-part-2-asset-analysis-657d3c1caf60
https://towardsdatascience.com/time-series-machine-learning-regression-framework-9ea33929009a
https://colab.research.google.com/notebooks/widgets.ipynb#scrollTo=QKk_E6-QRVPW
Applied Spatial Statistics
- Prior Posterior Distribution
- Hierarchal Models
- Markov Chain Monte Carlo
- Kernal Methods
- Dynamic State Space Modeling
- Multiple linear Regressions
- Spatial Models (Car Sar) Kriging| Time series models: ARM ARMA ,Dynamic linear models
- multi level models - causal inference - meta analysis
- multi agent decision making
- variable transformations
- eigenvalues
Internally,PostGIS stores geometries in a binary specification, but it is queried and viewed outside as a hex-encoded string. There are two popular variations of well-known binary (WKB):
http://andrewgaidus.com/Build_Query_Spatial_Database/ https://gis.stackexchange.com/questions/89323/postgis-parse-geometry-wkb-with-ogr Well-known binary (WKB):
- ST_AsGeoJSON returns geometry object
- ST_GeomFromWKB(bytea) returns geometry
- ST_AsBinary(geometry) returns bytea
- ST_AsEWKB(geometry) returns bytea
BEST:
- ST_AsText(ST_Transform(the_geom,4326))
Pandas offers the ability to iterate over the database by specifying the chunksize keyword argument, where chunksize is the number of rows to include in each chunk
df = pd.DataFrame()
for chunk in pd.read_sql('select * from mdprop_2017v2', con=conn, chunksize=5000):
- df = df.append(chunk)
from shapely.geometry import Point gdf = gpd.GeoDataFrame(df, crs={'init' :'epsg:4326'}, geometry=Point( df['the_geom']) )
https://github.com/geopandas/geopandas/blob/master/geopandas/io/sql.py
http://geopandas.org/reference.html
2248 is What we use internally at BNIA #http://www.spatialreference.org/ref/epsg/2248/
gdf = gdf.to_crs(epsg=4326)
ST_AsText(ST_Transform(the_geom,4326))
ANOVA stands for Analysis of Variance. It is performed to figure out the relation between the different group of categorical data. Under ANOVA we have two measures as result: – F-testscore : which shows the variaton of groups mean over variation – p-value: it shows the importance of the result we can say that there is a strong correlation between other variables and a categorical variable if the ANOVA test gives us a large F-test value and a small p-value.
The Granger causality test is a statistical hypothesis test for determining whether one time series is useful in forecasting another, first proposed in 1969.[1] Ordinarily, regressions reflect "mere" correlations, but Clive Granger argued that causality in economics could be tested for by measuring the ability to predict the future values of a time series using prior values of another time series. Since the question of "true causality" is deeply philosophical, and because of the post hoc ergo propter hoc fallacy of assuming that one thing preceding another can be used as a proof of causation, econometricians assert that the Granger test finds only "predictive causality".[2]
A time series X is said to Granger-cause Y if it can be shown, usually through a series of t-tests and F-tests on lagged values of X (and with lagged values of Y also included), that those X values provide statistically significant information about future values of Y.
https://dusk.geo.orst.edu/gis/lec11_12.pdf
6.2 Uncertainty in the conception of geographic phenomena Many spatial objects are not well defined or their definition is to some extent arbitrary, so that people can reasonably disagree about whether a particular object is x or not. There are at least four types of conceptual uncertainty.
Spatial uncertainty Spatial uncertainty occurs when objects do not have a discrete, well defined extent. They may have indistinct boundaries (where exactly does a wetland end?), they may have impacts that extend beyond their boundaries (should an oil spill be defined by the dispersion of pollutants or by the area of environmental damage?), or they may simply be statistical entities. The attributes ascribed to spatial objects may also be subjective—for example, the spatial distributions of poverty and biodiversity depend on human interpretations of what these things mean. Vagueness Vagueness occurs when the criteria that define an object as x are not explicit or rigorous. In a land cover analysis, how many oaks (or what proportion of oaks) must be found in a tract of land to qualify it as oak woodland? What incidence of crime (or resident criminals) defines a high crime neighborhood? Ambiguity Ambiguity occurs when y is used as a substitute, or indicator, for x because x is not available. The link between direct indicators and the phenomena for which they substitute is straightforward and fairly unambiguous. Soil nutrient levels (y) are a direct indicator of crop yields (x). Indirect indicators tend to be more ambiguous and opaque. Wetlands (y) are an indirect indicator of animal species diversity (x). Of course, indicators are not simply direct or indirect; they occupy a continuum. The more indirect they are, the greater the ambiguity and the less certain it is that an object being approximated using y really is x.
Physical measurement error Instruments and procedures used to make physical measurements are not perfectly accurate. For example, a survey of Mount Everest might find its height to be 8,850 meters, with an accuracy of plus or minus 5 meters.
Digitizing error A great deal of spatial data has been digitized from paper maps. Digitizing, or the electronic tracing of paper maps, is prone to human error.
Error caused by combining data sets with different lineages Data sets produced by different agencies or vendors may not match because different processes were used to capture or automate the data. For example, buildings in one data set may appear on the opposite side of the street in another data set.
6.4 Uncertainty in the analysis of geographic phenomena Spatial analysis methods can create further uncertainty. The ecological fallacy The ecological fallacy is the mistake of assuming that an overall characteristic of a zone is also a characteristic of any location or individual within the zone.
The Modifiable Areal Unit Problem (MAUP) The results of data analysis are influenced by the number and sizes of the zones used to organize the data. The Modifiable Area Unit Problem has at least three aspects: 1.The number, sizes, and shapes of zones affect the results of analysis. 2.The number of ways in which fine-scale zones can be aggregated into larger units is often great. 3.There are usually no objective criteria for choosing one zoning scheme over another. An example of the influence of the number of zones on analysis is the 1950 study by Yule and Kendall which found that the correlation between wheat and potato yields in England changed from low to high as the data were grouped into fewer and fewer zones (starting with 48 and ending with 2).
https://mgimond.github.io/Spatial/uncertainty-in-census-data.html
Many census datasets such as the U.S. Census Bureau’s American Community Survey (ACS) data1 are based on surveys from small samples. This entails that the variables provided by the Census Bureau are only estimates with a level of uncertainty often provided as a margin of error (MoE) or a standard error (SE). Note that the Bureau’s MoE encompasses a 90% confidence interval2 (i.e. there is a 90% chance that the MoE range covers the true value being estimated). This poses a challenge to both the visual exploration of the data as well as any statistical analyses of that data
7.2 Mapping uncertainty One approach to mapping both estimates and SE’s is to display both as side-by-side maps. While there is nothing inherently wrong in doing this, it can prove to be difficult to mentally process the two maps, particularly if the data consist of hundreds or thousands of small polygons. Another approach is to overlay the measure of uncertainty (SE or MoE) as a textured layer on top of the income layer. Or, one could map both the upper and lower ends of the MoE range side by side.
https://en.wikipedia.org/wiki/Time_geography
VS Considerations
Indicators
Population Change
Number of Children Attending Baltimore City Public Schools
Aggregated assessment scores for children in the neighborhood
Vacant building notices
Rehabilitation Permits
Demolition Permits
New Construction/ Certificates of Occupancy Permits
Real Estate Projects Under Development Review
Home Sales price and Arms-length transaction values (non-sales)
Owner-occupancy rates
311 calls for service rates for trash, street light outages and clogged storm drains.
911 calls for narcotics
Crime incidences (all part 1 crimes including gun related homicides)
Number of businesses and number of employees in the neighborhood
Normalized
Values vary across indicator
Disparity and Outliers exist
Favorable outcomes to Stable Neighborhoods
Negative outcomes in Stressed Neighborhoods
Hidden Variables exist impacting Neighborhoods
Identify characteristics regarding Neighborhood Trajectory
Extend existing infrastructure
Understand Dynamics
Track Progress of Indicators Over Time.
Monitor Trends
Develop Solutions
Measure Organizational/ Programmatic Outcomes
Track Quantitative measures across 7 years
Track 2 indicators at a time
Compare X Neighborhoods across Z indicators
Interactive (what part of this)
Relationships across indicators and neighborhoods
Cluster into Typology
Inform the BIW entrepreneurship strategies with respect to * existing trends opportunity needs
Inform the kinds of issues social entrepreneurs aim to address. * Ground business strategies in cluster-types that grow to scale across typologies.
Quantative Data - Subjects in controlled experiments that do not recieve the treatment Control Group - Repition of experiment under same or similar conditions replication - numerical description that summarizes data for an entire population
data maturity - people need the tools to explore the data before going futher
GeoId = sate + county + tract ( = blockgroups == tracts = blocklots = lots + blocks)
measures have multi facet dimensions with each dimension with possible granularity that form hierarchy
to explore data is less meaningfull than to act on it bevause p != np. assessing all possible realities is often not possible and we experience biases (like from mental models anchoring, survivorship) that coellesce to unforeseable black swan events.
Convexity of risk tolerence
classification and labeling
gestalt processess. small multiples. loops. psychophysics -> DNA Database
https://colah.github.io/posts/2015-09-Visual-Information/
Geospatials
Line or Polygon from a Collection of Point
Union, Difference, Distance
Intersects, Touches, Crosses, Within
using a function called .within() that checks if a point is within a polygon
using a function called .contains() that checks if a polygon contains a point
if objects intersect, the boundary and interior of an object needs to intersect in any way with those of the other.
If an object touches the other one, it is only necessary to have (at least) a single point of their boundaries in common but their
interiors shoud NOT intersect.
nearest_points()