When I assembled my first data science team, the term was barely getting printed in the Harvard Business Review. I had no clue that I was building a team pioneering in Big Data and data science. Now is a good time to reflect on this story that started twelve years ago.
At first, I really wanted to title this article “How I built the perfect data science team (without knowing it).” However, I did not want to give the impression I did not know what I was doing (I think I did). Nevertheless, here is my story…
In 2007, I founded GreenIvory. The idea was to build a toolbox for web marketers. Whether a marketer wanted to automate content distribution, content generation, or measure brand awareness through sentiment analysis, we had a solution (and more!). A little later, the team started working on NLP (natural language processing) and we released a first product, capable of sentiment analysis at scale in early 2011. We solved many technology challenges but let’s focus on the human and organizational aspect.
The “green team” was comprised of a bunch of talented software engineers. Each engineer had its strengths in various key elements of the system: UI, data, crawling, system, ops, and more. We rolled out several projects and products before. It was a working model. However, we did not have the science. We needed someone who could help us infuse the scientific knowledge in the engineering team. That’s where we teamed up with the University of Strasbourg and hired a data scientist (that was not his title back then).
Timeline and business value
Our main issue was the timeline. Or more precisely the lack of alignment between the pace of data science and engineering. At that time, we were already following agile methodologies. As most companies in those days, it was a home-grown version, but it was team-driven and we had a great agile champion. It was working smoothly.
Each sprint was delivering business value and we frequently updated our artefacts in production. The challenge was to incorporate the work of the scientist in an engineering organization.
It came down to integrating him directly in the development team. I wanted him to act as a lighthouse. It was not easy. There was a bit of a culture clash. The engineers did not get why it took so much time to get something and why it was so rough in the making. On his side, the scientist could not understand why his experiments, although successful on his Mac, would not scale when we would throw millions of sentences at his algorithms.
Finally, after numerous sessions of pair programming, discussions, and building a stronger team spirit, we were able to leverage the science in our product.
More recently, I have experienced a different organization, where data scientists were parked in a silo. The idea was to deliver the science almost as a consumer-ready product to business analysts and users.
Don’t get me wrong, they were able to deliver, but the silo remained the silo. The knowledge and intelligence built by the team was not transpiring to the rest of the organization.
A side effect was that the team was growing, eventually merged with another team… and you know what happened: they needed more pizzas. And when you need more pizza, productivity goes down. If it’s not in the original Agile Manifesto, it must definitely be in its first amendment.
Twelve years in the making of a data science team
More recently, I attended an inspiring talk by Stacey Ronaghan at Think 2019. Ronaghan is a data scientist, part of IBM. She was summarizing her experience as a data scientist, part of a team. This is when I realized that, twelve years ago, we were not that far off.
She defined the team as a key driver for AI success. The teams she worked with have various roles around the data scientist, like an executive sponsor, a database administrator (those darn data!), a business analyst, a project manager (in 2019, we call them scrum master), SME (subject matter expert), solution architect, software engineer, designer, and design thinking practitioner. So yes, it is a very eclectic and cross functional team. Like a software engineering team.
The delivery is based on the value it brings to the organization. The team is not living in isolation lab or remote comfy cocoon where they just study for the art of studying. They deliver. They solve problems.
And solving problems help them bring business value. Like an agile team. Her team works in an agile way. Achieving two weeks sprint is also possible.
Like in a software product organization, her team goes through building an MVP (minimum viable product). That’s where her customers can takeover.
Each stakeholder has a role. The scientists can define a vision, craft a conceptual idea, find the right algorithms. The engineers can then “take it home” and transform the idea into production code in their toolbox or platform. Finally, the application developers can combine the science, now industrialized in the platform, to a build a great product. This is what I call the industrialization of data science.
After these experiences, and being able to confront some ideas and part of those experiences with others, here are my conclusions (so far):
- A data science team is not very different than a software engineering team.
- Expectations are different, as the experimental part of it is more important.
- Standard software methodology (agile, SAFe…) can apply but it is more challenging on the research part.
- As TDD is becoming a standard, Test Driven Data Science is not there yet.
- There are new challenges like bias, but couldn’t that be part of the QA?
- Governance of models is also a challenge that did not exist before.