Since it emerged roughly a decade ago, the role of data scientist has come a long way—from obscure tech analyst to rock star of IT. Data scientist is No. 1 on the list of “50 Best Jobs in America for 2019,” published by recruiting site Glassdoor.
Talent is “high in demand, short in supply,” says Pina Nicoli, a metro market manager with Robert Half, a technology staffing agency.
Driving this demand is the mass adoption of machine learning, artificial intelligence, and advanced data analytics in the enterprise. Data scientists are increasingly expected to have deep knowledge of business problems and the communication skills to foster cross-function collaboration, as well as the strategic vision to help shape the use of data for long-term goals.
“For a long time, a data scientist was somebody who just had very strong technical skills and could ingest and analyze data in interesting ways,” says Joel Shapiro, a professor of data analytics at Northwestern University’s Kellogg School of Management. “However, people are increasingly recognizing that a good data scientist needs some pretty strong business acumen.”
Here are a few of the considerations for companies looking to develop data science into a core competency.
Core skills and roles
At the most basic level, data scientists combine analytical and computer skills and apply them to business problems. Increasingly, those skills include the ability to use advanced data-mining and machine-learning techniques to tackle complex analytical problems, and to automate data-driven business processes.
Several years ago, for instance, the data-science team at a high-end hotel chain might have analyzed check-in data to compare customer satisfaction between guests who used the mobile app and those who arrived at the front desk. With machine learning techniques, a similar team can predict which guests are unlikely to return, or what combination of incentives paired with target offers will yield the highest conversion rates.
Today’s corporate data scientists wear many hats. Some are data-mining engineers who build, train and maintain machine-learning algorithms. Others take on the role of “data journalists,” interpreting the results of those models for non-experts across the business.
Other skills found in data-science teams include data architects or data engineers, who typically design, test, and maintain the data infrastructure used for training and applying machine-learning models. They also might include data-visualization engineers who plug results into business-facing applications.
Data science team-building
Companies can integrate data-science teams into the business in different ways, according to Accenture research. One approach is to run a centralized data-science operation, in which a single team applies analytics and machine learning either on strategic tasks for the entire company or on specialized projects for different business units. This approach helps ensure that the data organization is adequately funded and enables it set the analytics priorities for the entire company.
An alternative is to embed data scientists in functional or product teams, such as marketing or manufacturing operations. This enables the group to develop deep domain knowledge so there’s a much flatter learning curve when taking on new projects.
Chuong Do is the vice president of engineering at online education marketplace Udemy and former director of analytics at Coursera. Both companies rely heavily on machine learning applications. Do is a proponent of the embedding model, where data scientists report to a central data group but spend a significant amount of their time working (and frequently physically embedding) with specific product or business units.
Do likes the approach for two reasons. First, central reporting ensures greater organization efficiency, because data scientists “can be re-deployed as needed depending on business demand.” Second, embedding with a specific business unit helps data scientists gain specific domain expertise while building relationships with colleagues on the business side.
“Strong relationships are key to internal influence,” says Do. The domain focus also helps data scientists develop skills as strategists. “Embedding helps them develop the intuition needed to move from answering questions for stakeholders to being part of the process of framing analytical questions in the first place.”
Airbnb, the home-sharing service, uses a variant of this model. It runs a central data-science organization but with team members who are split up into smaller units that work directly with product managers, marketers, designers and others. Its data scientists work across the company, running experiments on new service features and creating machine-learning algorithms to address business needs.
For example, to comply with local regulations in the cities where it operates, Airbnb limits the number of nights per year that hosts can offer their properties before they have to register as professional hoteliers. Enterprising hosts could get around this restriction by re-registering their properties under different names.
Airbnb data scientists created a machine-learning tool that predicts likely repeat listings and flag them for removal by company agents. The algorithm learns from experience and becomes more accurate over time. “As the tool works, you get more and more data to get a better classifier,” explained Daniel Martin, Airbnb data science manager, in a recent conference presentation.
The availability of user-friendly, off-the-shelf tools to run machine learning models —such as Amazon SageMaker or the open-source platform TensorFlow—makes it easier than ever for data scientists to develop and deploy AI models, freeing them to focus more on solving business problems.
“They need to work effectively with product and business teams to translate business problems into data science challenges,” says Do. While technical skills remain important, he adds, “figuring out the technology is no longer the bottleneck it used to be.”
Cindy Waxer is a business and technology writer based in Toronto.