Data Science Retreat brings together top data scientists and participants seeking to grow an exceptional amount quickly.
Thank you! Check your INBOX for a confirmation email.
Oops! Something went wrong while submitting the form
Our approach to teaching is highly opinionated, and is based on our extensive experience in state-of-the-art machine learning, data science, and big data engineering. We know what works, and what doesn’t (we could write a book on the latter!) for face-to-face instruction. If you want to learn this stuff as fast as humanly possible, this is your best bet: small class sizes, expert mentors who can teach, and battle-tested material. Our classes are very much hands-on, and are taught by senior data scientists and senior data engineers with many years of practical experience.
Data Science Retreat is for people with previous work experience and at least some basic knowledge of machine learning. You need to have spent at least 1000 hours programming, even in a language that is not ‘data science friendly’. You will spend most of your time here programming, so you must be confident in your skill. You do not need a deep mathematical background, although some techniques do rely on knowing some linear algebra and probability theory.
The emphasis of our curriculum is on finding questions you can answer with modern techniques, and on producing the best performing model humanly possible. You will prepare communications to different audiences (we train you, but you may want to feel confident speaking in public before applying).
The most important part of your retreat experience will be your portfolio project. We’ll be around to give advice when you get stuck, but you’re going to build something amazing, on your own. Think of this as the demo that gets you a new job.
A crash course in Python, including the language and its ecosystem. We also cover the basics of working with git, writing tests, building packages, and how to create and consume APIs.
This two-day workshop is designed to teach developers who have taken online courses on machine learning how to implement models that perform well. It is not an advanced course; it’s designed to kill many misconceptions that people have about core machine learning models.
Once your company starts fitting models, methodology matters. It is easy to simply pile up complexity without managing it. Fortunately, we now have best practices (and libraries) that make it easy to iterate over preprocessing, model families, and parameters.
Numpy and scipy took Python from a general programming language to a very powerful, matrix-oriented one. Pandas brought data.frames to Python (data.frames is one of the core concepts in modern data analysis). Building on these data structures, scikit-learn brought killer implementations of best-of-breed algorithms, all under a standarized library. Together, these packages have made Python the programming language of choice for many data scientists.
Originally a research project at UC Berkeley, Spark is now a top-level Apache project and the fastest-growing open source project in history. In this master class, developers will use hands-on exercises to learn how to work with the relevant parts of the Hadoop ecosystem, and the principles of Spark programming.
In this hands-on workshop you will build a real-time data pipeline that receives data from Twitter, stores it into Kafka, processes the stream using Spark, and stores the processed stream into Elasticsearch.
Recommendations are widely used in many industries, such as e-commerce, jobs, music, and social media. This course goes beyond the basics and emphasizes solutions to problems you will face when your business deploys a recommender system.
'Sensors, sensors everywhere!' Seems to be the future. Many of the sensors are linked to location information. Which means there's plenty of geolocation data that we can use to make better decisions. But sensor data is messy and incomplete. In this master class you will learn how to operate with geodata, and run machine learning algorithms on them.
Busy web sites and intelligent sensor network applications require incremental updates to aggregated measurements and models as new data arrives. Traditional, brute force techniques become intractable at web scale. Sketching techniques efficiently compute reasonable approximations, making the difference between feasibility and infeasibility for a wide range of use cases.
When writing high-quality data analysis software in R or Python that will be used by other people, you should use a compiled language if you aim to deliver the best possible performance. The aim of this course is to give you a working introduction to best practices of C++ programming, data structures, and algorithms so that you can achieve these goals.
The aim of this two day hands-on masterclass is to introduce Deep Learning and give insight into the hype. In the tutorial Ludwig will give an overview of the existing techniques and applications, show the differences to traditional approaches, and discuss the limitations of deep learning. As part of the tutorial we will build a deep learning system from the ground up, and train it.
What are Microservices and why should you care? Are they a buzzword, a revolution or the new normal? This lecture will look at how Microservices came about and how they fit into the modern software industry. We'll be looking at the good and the bad, trying to offer a honest view on when you should (and shouldn't) use Microservices. We'll examine the technologies and practices that underpin Microservices systems, combining real world examples with practical experience and advice.
Jesús Martinez Blanco
While we live in the era of Data, we humans are still visual animals. Being able to build a proper visualisation is your key to extracting insights from data as well as communicating it to decision makers. From data exploration all the way to analysis reporting, your data visualisation skills are indispensable for succeeding as a data scientist. This course will focus on using the web browser as the perfect platform both for sharing your visualisation and making them interactive.
This class teaches you real-time stream processing with Apache Spark while applying machine learning on streams. To be able to make guarantees on the throughput and latency a deeper technical look is taken. Typical scenarios of applying machine learning to real-time streams are discussed such as adapting to trends with streaming linear regression or adaptive clustering of tweets as they arrive. Further you learn to deal with the typical headaches regarding testing and deployment.
DSR is the only program worldwide whose mentors are at the Chief Data Scientist and CTO level. They are invested in your progress, and will train you to have the right mindset, to solve business questions with technology, and how to advise leadership. Some mentors teach, others only provide advice during portfolio project time.
Pere is co-founder and CTO of Datasalt. He’s a core committer in two Hadoop-based open-source projects, Splout SQL and Pangool. Splout provides a SQL view over Hadoop's Big Data with sub-second latencies and high throughput. Pangool is an improved low-level Java API for Hadoop based on the Tuple MapReduce paradigm (ICDM 2012). Pere is an early adopter of Hadoop, working in Big Data projects since 2008. He’s also the organizer of Big Data Beers Berlin.
Adam Drake is Chief Data Officer at one of the world's most successful online travel companies. He has been in technology roles for over 15 years in a variety of industries, including online marketing, financial services, healthcare, and oil and gas. His background is in Applied Mathematics, and his interests include online learning systems, high-frequency/low-latency data processing systems, recommender systems, distributed systems, and functional programming (especially in Haskell).
Mikio is a data science researcher and blogger. He previously was co-founder of streamdrill, a company focussing on real-time data analysis. He is part of the Berlin Big Data Competence Center, which aims to bring together machine learning and scalable technologies to create the next generation of Big Data infrastructure. He is also the author of jblas, a fast matrix library for Java which is used by PayPal, and Breeze, and Apache Spark.
Jose Quesada is the founder and director of DSR. Jose helps others to decide better, do better, or be better through data. Like everyone else, he doesn’t know what data science really is, but suspects it has to do with predicting the future before it catches you empty-handed. He has a PhD in Machine learning and worked at top research labs (U. of Colorado, Boulder, Carnegie Mellon, Max Planck Institute). Previously he was a data scientist consultant, specializing in customer lifetime value, and as the head data scientist for GetYourGuide.
Trent is co-founder and CTO of ascribe, which uses modern crypto, ML, and big data to tackle challenges in digital property ownership. His two previous startups applied ML in the enterprise semi-conductor space: ADA was acquired in 2004 and Solido is going strong. He has an engineering PhD in applied ML from KU Leuven, Belgium. His interests include large scale regression, automating creativity, anything labeled "impossible", and thousand-fold improvements. He was raised on a pig farm in Canada.
Arunkumar Srinivasan is finishing a PhD in Bioinformatics from the Max Planck Institute. He started using R in late 2011 and is coauthor of the data.table R package, which offers fast aggregation of large data (e.g. 100GB in RAM), fast ordered joins, fast add/modify/delete of columns by group using no copies at all, list columns and a fast file reader (fread). Arun has a passion for developing tools and algorithms facilitating big-data analyses.
Marek is a true R hacker and enthusiast since the Paleozoic era of R_1.4.0. Author of a best-selling Polish book on R programming and many R packages, including the famous stringi packages. Computer programmer since the age of 6 (C64 basic, C/C++, assembler, PHP, Java, VHDL, bash, Julia, Maxima, Lisp, Fortran and many others). Marek has a PhD in computer science and specializes in data aggregation, fusion and mining, computational statistics, and uncertainty modeling. Currently an assistant professor and a tutor and mentor at the Warsaw University of Technology, Poland.
Daniel is an expert software engineer, Python programmer, and machine learning specialist. When he's not developing high-performing, end-to-end pattern recognition and predictive analytics systems for his clients, Daniel's learning new tricks to train deep neural networks more efficiently. Through his company Natural Vision, he's been successfully applying deep learning to problems in bioacoustics, computer vision, and text mining.
David is the Head of Big Data Engineering at DSR. He began his career as a senior research scientist at Carnegie Mellon University, Mitsubishi Electric Research Labs, and Sun Labs. His research career focused on tangible user-interfaces and real-world applications of machine learning. Since 2005, David has been leading the development of data intensive applications for companies across europe — most recently as CTO at RetentionGrid.
got multiple interviews out
had to choose from multiple job offers.
got the job they wanted out of DSR.
There’s plenty of good material online to learn machine learning and data science on your own. We now live in an autodidact’s paradise. The question is, how can you get there faster than everyone else?
No matter how many MOOCs you do, there’s a barrier that very few people ever get past. Jump over it.
Products are the new CVs. What interviewers really want to see is “What have you done when nobody told you what to do?”
We accept about 10 people out of the 200 who apply for each batch. They are extremely motivated and have skillsets complementary to yours. Do you want to spend time in the same room with them?
San Francisco (Zipfian Academy)
New York City (Metis; NYC Data Science Academy)
Berlin (Data Science Retreat)
1 bedroom + utilities
1 bedroom (in a shared flat)
$726 (658 EUR)
$497 (450 EUR)