There have been two divergent viewpoints regarding this topic. a) No, it can’t be done - agile in data is too dissimilar to traditional software; b)Yes, it can be done - it’s just code after all. Perhaps not surprisingly, this is a false dichotomy. Reality is more complicated than that.
There are parts of data-related tasks that can be managed in an agile way, and some not. I’ll mark those components as stochastic (uncertain, experimental, un-plannable) and deterministic (certain, plannable), with the assumption that the former doesn’t fit with the agile methodology, while the latter does.
Several arguments aim to differentiate between those two. Still, I’ll focus on the most commonly occurring one: the inherently experimental nature of data projects renders them unplannable, hence impossible to fit an agile methodology. This project characteristic is named “stochastic” in our diagram and is subject to Subjectivity, Complexity, and Variety (SCV) effects.
I illustrate the differences between agile data and agile software with a diagram of two architectures, one for a standard web development project and one for a data science one. For the former, let’s say an e-commerce store, the only stochastic element is the design. It’s subjective. For a data project, almost all components are influenced by SCV, hence are stochastic. What is our hypothesis? Which data to select to test it? The modeling and its evaluation are also unpredictable in their outcome. The only part of such a deterministic workflow is continuous integration and delivery (CI/CD) - and that’s ported from traditional software development.
With this (admittedly) simplistic example, I showed the discrepancies in adopting the agile methodology in the data field. Knowing this, we can have more clarity on possible solutions, which I’ll share in future posts.