Philipp Schmidt is a Data Scientist and Software Engineer at Fyber, a mobile advertising technology company built by developers, with 500+ million monthly active users. Philipp has equally strong and broad machine learning capabilities. Philipp is a fan of Scala and Akka, and endorses reactive and functional programming paradigms.
Philipp was awarded a 'Diplom-Ingenieur' in Computer Engineering by the Technical University of Berlin. He also participates in Kaggle competitions.
Model optimizations in ML
- Participants will be able to relate constrained optimization to the training objective of SVMs. Also, they will learn how to interpret linear SVMs geometrically in a binary (not necessarily though) classification setting. This will enable participants overall to better understand how optimization problems are used also in machine learning training objectives.
- Basics of model optimization, primal/dual optimizations for SVM, relation to (constrained) convex optimization. Geometric interpretations of linear models (specifically SVM).
- For hands-on: could use convex optimization toolbox in python https://github.com/cvxopt/cvxopt/.
Basic math, algebra.
Lean ML pipelines (lowering the entry barrier)
- Participants will learn how to use tools and machine learning for sparse data.Specifically fastparquet (a module implementing parquet storage format) for reading/writing data and scikit learn for encoding of either continuous or categorical covariates in a memory efficient manner. Additionally, the participants will learn how to use the encoded data to build basic machine learning pipelines with Logistic Regression and SVM.
- Sparse representations allow efficient use of the computers main memory. Many methods, including Logistic Regression and SVM, can deal with sparse data thus enabling low-memory footprint machine learning training and inference.
Basic linear algebra.
Going from Sci-kit to JVM deployments
- In this course, participants will learn how to transfer python based machine learning pipelines to the JVM. They will learn how to combine scikit learn pandas integration (https://github.com/pandas-dev/sklearn-pandas) with sklearn2pmml, enabling them to create easy to interpret clear text representations of scikit learn models in PMML format. The next step is to enable the participants to take the PMML based models to JVM based (in that case it would be scala) production environments with jpmml-evaluator.
- Going from scikit based models to production systems on the JVM. We'll use different kinds of tools (https://github.com/jpmml/sklearn2pmml & https://github.com/jpmml/jpmml-evaluator) for serializing scikit-based models to the de-facto industry standard PMML. On the JVM, these PMML representations can then be deserialized to instantiate predictive models and serve high load production requests.
- None really, but it helps if candidates know scikit a little and how to do JVM based deployments