Data Science at PMI – The Tools of The Trade
Data Science is not a one man show. It is a team effort that requires every team member to master the tools of the trade. This is extremely important for effectively putting data science to work in a global organization. In this talk we would like to share with you the best practices to start, develop and ship data science products developed inside PMI – the best practices and tools, currently in use by 40+ data scientists across four different location, where data science labs of PMI were established in 2017.
We would like to share with you how the technologies (Kubernetes, Docker, Artifactory, Jenkins) and methods (templates in Cookiecutter, CI/CD with GitFlow) well-known from software engineering are helping us in creating data science workflow that adapts to specific needs of every peculiar use case we need to deal with, provides transparency at all times, is reproducible not only at the data science but also data engineering and devops dimensions and allows at the same time frictionless development of data products and gives us the freedom to experiment.
If you’re interested in how Python, Jupyter notebooks, Docker, AWS, Hadoop ecosystem, Spark, Artifactory, Jenkins, Atlassian suite, etc. are setup to support our collaborative work, devoted to building predictive models, this talk is for you.