Machine learning on geodata

'Sensors, sensors everywhere!' Seems to be the future. Many of the sensors are linked to location information. Which means there's plenty of geolocation data that we can use to make better decisions. But sensor data is messy and incomplete. In this masterclass, you will learn how to operate with geodata and run machine learning algorithms on them. As it happens, geolocation data often comes as streams of data (which is something we cover in the streams course). We'd encourage you to take that course if you are interested in the topic.

Data mungling techniques for geospatial data

  • Basics of geospatial objects: points, segments, polygons, multipolygons -- projections
  • Relations between geospatial objects: intersects, contains, distance
  • Setting up a postGIS db with geospatial data
  • Install postgresql, postGIS, enable postGIS, create spatial database
  • Load spatial data from shapefiles
  • Inspection of geospatial data
  • Installation and basics of IPython notebook
  • Putting geospatial data on the web
  • Use of leaflet for easy visualization of maps
  • Access postgis data via sql queries over python
  • Display a variety of geodata over maps -- on ipython notebook and on html custom sites

Machine learning for geospatial data

  • Introduction to Machine learning for geospatial data
  • Open problems in ML for geospatial data
  • Clustering of spatial data as a prime example of ML for geodata
  • Cluster analysis
  • Hierarchical clustering
  • Centroid clustering
  • Density clustering
  • Hands-on: K-means clustering
  • Implementing Lloyd's algorithm in python
  • How to determine K in K-means clustering: gap statistic and f(K) methods
  • Improved seeding for clustering: K-means++
  • Application of K-means++ with selection of K to geodata from block 1, visualization of results

You will learn

  • How to solve the most common problems working with geodata
  • How to setup and configure PostGIS
  • Basics of geospatial objects
  • Spatial transformations, and why geodata cannot be used 'as is'
  • Running different clustering models on geodata
  • Selecting the number of clusters is cluster methods