Scarcity of Data Science Unicorns Is Stifling Business Growth

Article originally posted on

Businesses from every vertical are partnering with data and science consultancies who readily have teams of skilled data science people.

At a symposium on data and analytics at MIT, I heard Ron Bodkin, the technical director of applied Artificial Intelligence at Google, speak about the challenges of Machine Learning. The secret to success, Ron noted, is not only having sufficient computation resources, but also having the right talent, as ML practitioners are hard to find and expensive to hire.

Data science and analytics take a village—and not just any village, but a highly skilled data science village at that. However, with the lack of skilled workers available, many organizations will find themselves in a data science ghost town.

Ron Bodkin delivering Keynote at MIT Symposium

Ron Bodkin delivering Keynote at MIT Symposium

In order to launch a successful data science project, an organization must have sufficient staff consisting of both data scientists and data engineers. Data scientists are the rare mythical unicorns that understand the business while being able to create sophisticated analytical models that are used to build new datasets and derive new insights from data. These data Swiss-army knives have been in short supply since 2012-when the title was coined the sexiest job of the 21st Century.

Data engineers are responsible for much of the work required to support a data science workload. They are highly sought after, scarce, and currently so essential to supporting data science, that some industry experts crowned 2018 as the year of the Data Engineer. Experts in architecting, building and maintaining data-based systems, Data Engineers perform much of the heavy-lifting that supports an organization’s analytical and transactional operations.

“Typically, to run a successful data science project, it takes a ratio of two data engineers for every data scientist,” according to the President and founder of Caserta, Joe Caserta, a strategic data and analytics consultancy. “We’re seeing a lack of engineering talent in the market. Organizations seeking to reduce time-to-value are coming to us not just for a strategic assessment and roadmap for their data science initiatives, but also to implement them with our skilled data engineers.”

Not only is it hard to find data talent, but context is also crucial when it comes to the degree of hiring difficulty. Take the statistical programming language R, for example. Although it is one of the most commonly requested Data Scientist skills, its scarcity among Finance and Risk Analytics Managers makes proficiency in R one of the highest-paying skills for this role. It’s not difficult to understand why the finance industry has already recruited skilled workers from quantitative disciplines to its ranks as Financial Quantitative Analysts.

data analytics jobs by occupation

“The average data science platform take six to 12 months to complete, depending on the complexity and the discovery process,” noted Dovy Paustakys, a senior solutions architect consultant and developer of big data frameworks. “Those six to 12 months can help propel an organization to a value the organization could have only dreamed of, however, afterwards that talent may be squandered.”

Hiring people with the right skills for data science projects is no small feat, but the effort may not even be worth it. Many data science projects take between six to 12 months. It would be untenable to go through the entire hiring process, build a team of venerable data science people, commence a project and move it into production only to not fully utilize them after completion.

In 2019, demand for data scientists and data engineers will continue to increase and employers across the US will continue struggling to hire workers with these necessary skillsets.Stiff competition in attracting the best hires is hindering the time to value of many organizations’ big data projects. A quick search on Glassdoor reveals 84,548 open data engineer jobs-many exceeding $150k. Searching for “data scientist” returns 23,845 jobs, many also exceeding $150k. Geography is also a critical factor in talent scarcity. Companies far from the tech centers of San Francisco, the Northeast and Texas, will find it more difficult to recruit the talent they need.

“Companies that are not located near tech centers are struggling to bring on the right people for their data science projects,” explains Caserta. “The benefit of working with a data and analytics consultancy, like Caserta, is that our talent is not limited by geography. Our consultants travel to our clients located all over the US and Canada.”

data science jobs by state

Organizations will either need to figure out methods to attract data talent or find other methods to fulfill the promise of big data. Technology consultancies have readily available teams of skilled data engineers, data scientists and data analysts that can come in, assess a project and then build it. Having worked on many data science projects, teams have the knowledge and experience needed to reduce costs and time to value.

Competition for data science talent will continue to grow. By 2022 the Global Big Data market is expected to reach a staggering $118.52 billion, growing at a CAGR of 26% from 2015 to 2022. Rapid growth in consumer data, advances in information security and improved business efficiencies are some of the agents fueling the market growth. By 2020 there are expected to be 364,000 new DSA job postings in the US.

high paying data skills by occupation

The best path forward for analytics leaders looking to start a data science project or get one back on track is to partner with a data analytics consultancy. With a large pool of available talent, a consultancy will be able to complete the project on time and budget and advise on how to maintain going forward. This would translate into big-dollar savings, as it wouldn’t be necessary to carry the data science team payroll for years for skills that are no longer affordable or necessary.

“We’re helping many companies with limited resources move their data science projects into production,” notes Caserta. “The majority of firms don’t need to find the expensive scarce talent for short-term projects. Instead, they are better served finding a consultancy like Caserta that will do exactly what they need quickly and efficiently at a lower cost than a firm can do by carrying expensive talent for years.”

Machine Learning Is the Key to Cracking Marketing Attribution

Article originally posted on the Caserta Blog.

From the moment we open our eyes and check our notifications until to the time we fall asleep with our devices carefully placed within arm’s reach, we are bombarded by a daily barrage of about 4,000 marketing messages. Each email, banner ad, social media post, direct mail, product placement, and other marketing message, is fighting a war for our attention—and the competition is fierce. The technology in the battle for our eyeballs is constantly advancing in a perpetual competition to push through the noise.

With all of these messages competing for attention, how would a marketer know which touch points or certain combinations of touch points are most effective? Marketers need data frameworks that effectively gather data and deliver actionable insights to increase revenue, reduce costs and gain market share. Successful organizations are  turning to outside technology consultants to transform their data intelligence strategies and foster a culture of data science.

In 2017 a staggering $206.77 billion was spent on media advertising in the United States. With such a hefty marketing-message price tag, business users would want to optimize their spends and know which touch points are most effective in the customer journey, and which ones squander precious resources. Despite massive marketing budgets, the majority of marketers still struggle to attribute which touch-points or combinations of touch-points are most effective.

Many organizations are still using simplistic attribution models that hinder their ability to make data-driven decisions in the competition for converting customers. Organizations may lack the framework and technology needed to properly gather large quantities of data, stitch together each touch-point in a customer journey, and understand and appreciate the contribution of each message. Many marketers may not even act on insights, despite marketing attribution, according to 2017 State of Marketing Attribution report.

Adopting a Culture of Measurement and Accuracy

Data champions who foster a culture of measurement and accuracy inside an organization are crucial to effective marketing attribution. According to the 2017 State of Marketing Attribution report, 80% of brands and 71% of agencies rate “creating a culture of measurement and accuracy” as a top-three marketing attribution issue.

However, even those organizations that are already gathering data and feeding it to analytics and business intelligence tools are still missing a powerful weapon in the marketing attribution arsenal: Machine Learning.

Use Case: Implementing Machine Learning with Spark for Marketing Attribution

An organization based in the U.S. wanted to develop an infrastructure that would enable them to gather, store, cleanse, consolidate and distribute data in order to propel them closer to their business goals. Their data was comprised of offline, online and third-party sources. The organization approached Caserta, a technology consulting and implementation firm, with their data challenges and marketing goals. Full disclosure, I’m the VP, Marketing at Caserta and I’m blown away by our bold tech solutions.

Marketers beware, tech talk ahead.

In order for the organization to perform advanced analytics and Machine Learning, Caserta built an all-purpose data lake with Spark, an analytics framework, to conduct the big data transformations. Spark can power not only Business Intelligence queries, which organizations are already accustomed to, but also Machine Learning processes effectively. Spark comes with a machine learning toolkit, MLlib, which creates trained models for predictive analytics. However, these algorithms need vast computational power to do so.  Caserta opted for using Databricks to manage the computational infrastructure and propel discovery on the data lake using their Spark managed service. Databricks’ technology can simply ask more machines from AWS to power Spark for the required scale of the data, giving nearly infinite processing ability.

Now that all offline, online and third-party data sources are inside a single all-purpose data lake with a managed Spark service, the organization’s data scientists are free to analyze, transform and model data without the constraints of scaling tools. Removing this roadblock increases the room to innovate and will help marketers discover their optimum marketing mix. Organizations that promote a data culture will celebrate marketers that know how to derive insights from Machine Learning and take action.

As long as organizations continue to use simplistic data attribution models, or fail to use attribution at all, they will cede success and market share to those organizations who have the tech to understand what actually works in their marketing channels. Marketers will increasingly depend on Machine Learning to provide them with deep insights into the customer journey. The data may reveal perhaps that three direct mailings, one email open and seeing five social media posts is your organization’s perfect recipe for converting a first-time customer. Aren’t you curious about what your yet-to-be discovered combination of marketing touches is magic mix for customer acquisition?

For more information, check out this webinar: Using Machine Learning & Spark to Power Data-Driven Marketing.

Can you answer the 20 million euro question?

This post originally appeared on the Caserta Data Blog.

The new GDPR regulation will change the way organizations worldwide handle data.

Be honest with yourself for a just a moment. I’m going to ask you a few questions and if you can answer positively, then your organization is in good shape. If not, then this article is for you.

Download GDPR white paper>>

If Anna Schmidt, a customer from Berlin, would contact your organization and request to be forgotten and her data expunged, could you actually do it? Would you really know where all of her data is located in order to delete it? Most likely her data is found in multiple data sets across an organization and you may not even be aware of all of them. How could you afford Frau Schmidt the right to be forgotten, as granted in the GDPR, if you don’t know where all her data is in the first place?

GDPR protects people’s privacy.

The GDPR, or General Data Protection Regulation is a new set of laws that require organizations to protect the data and privacy of EU citizens. This affects companies both in the EU and those worldwide with data on EU citizens including employees, clients and prospects. Companies in the United States with users, like Anna Schmidt in Germany, would need to comply with the new regulation—and the penalties for not complying are severe.

Companies outside the EU that wish to continue to have data on EU citizens and do business with them will need to have a rep in the EU. According to the regulation: “Non-Eu businesses processing the data of EU citizens will also have to appoint a representative in the EU.”

GDPR requires that you allow people to exercise the right to be forgotten, but that’s not all. For all of your marketing campaigns and email lists can you clearly demonstrate “proof of consent” to be marketed to, which is stored in a way that makes it easy to access. What if Anna Schmidt says that she never consented to give you her data in the first place. Could you prove it?

GDPR compliance is not a trivial task.

Before Frau Schmidt wants to be forgotten by your firm, she may first want to see what information you have on her. Is your business able to package all of her data and transfer it to her on-demand? Not such a simple feat.

If you’re unsure of the status of any of the previous three questions you’re not ready for the GDPR and you’re not alone. Around 66% of businesses surveyed say they aren’t sure if they can erase a person’s data by the GDPR deadline, according to a survey by Solix Technologies.

Penalties for non-compliance are steep.

The deadline for organizations to comply with the new stringent GDPR privacy laws is May 25th 2018. What would happen, however, if a company doesn’t comply with the GDPR? After all, the regulations are pervasive into a company’s data and to comply through manually updating records would take years. An automated big data approach would help to properly tag and identify data in order to be compliant.

According to the GDPR, the penalty for not complying is a fine up to €20 million or 4% of global annual turnover—whichever is higher. Many organizations aren’t doing enough to prepare for the GDPR deadline. Until the EU starts fining companies, many may not take proper action. In order to avoid harsh penalties ahead, your organization needs to have a comprehensive data compliance strategy in place that answers the demands of the GDPR.