Scarcity of Data Science Unicorns Is Stifling Business Growth

Article originally posted on

Businesses from every vertical are partnering with data and science consultancies who readily have teams of skilled data science people.

At a symposium on data and analytics at MIT, I heard Ron Bodkin, the technical director of applied Artificial Intelligence at Google, speak about the challenges of Machine Learning. The secret to success, Ron noted, is not only having sufficient computation resources, but also having the right talent, as ML practitioners are hard to find and expensive to hire.

Data science and analytics take a village—and not just any village, but a highly skilled data science village at that. However, with the lack of skilled workers available, many organizations will find themselves in a data science ghost town.

Ron Bodkin delivering Keynote at MIT Symposium

Ron Bodkin delivering Keynote at MIT Symposium

In order to launch a successful data science project, an organization must have sufficient staff consisting of both data scientists and data engineers. Data scientists are the rare mythical unicorns that understand the business while being able to create sophisticated analytical models that are used to build new datasets and derive new insights from data. These data Swiss-army knives have been in short supply since 2012-when the title was coined the sexiest job of the 21st Century.

Data engineers are responsible for much of the work required to support a data science workload. They are highly sought after, scarce, and currently so essential to supporting data science, that some industry experts crowned 2018 as the year of the Data Engineer. Experts in architecting, building and maintaining data-based systems, Data Engineers perform much of the heavy-lifting that supports an organization’s analytical and transactional operations.

“Typically, to run a successful data science project, it takes a ratio of two data engineers for every data scientist,” according to the President and founder of Caserta, Joe Caserta, a strategic data and analytics consultancy. “We’re seeing a lack of engineering talent in the market. Organizations seeking to reduce time-to-value are coming to us not just for a strategic assessment and roadmap for their data science initiatives, but also to implement them with our skilled data engineers.”

Not only is it hard to find data talent, but context is also crucial when it comes to the degree of hiring difficulty. Take the statistical programming language R, for example. Although it is one of the most commonly requested Data Scientist skills, its scarcity among Finance and Risk Analytics Managers makes proficiency in R one of the highest-paying skills for this role. It’s not difficult to understand why the finance industry has already recruited skilled workers from quantitative disciplines to its ranks as Financial Quantitative Analysts.

data analytics jobs by occupation

“The average data science platform take six to 12 months to complete, depending on the complexity and the discovery process,” noted Dovy Paustakys, a senior solutions architect consultant and developer of big data frameworks. “Those six to 12 months can help propel an organization to a value the organization could have only dreamed of, however, afterwards that talent may be squandered.”

Hiring people with the right skills for data science projects is no small feat, but the effort may not even be worth it. Many data science projects take between six to 12 months. It would be untenable to go through the entire hiring process, build a team of venerable data science people, commence a project and move it into production only to not fully utilize them after completion.

In 2019, demand for data scientists and data engineers will continue to increase and employers across the US will continue struggling to hire workers with these necessary skillsets.Stiff competition in attracting the best hires is hindering the time to value of many organizations’ big data projects. A quick search on Glassdoor reveals 84,548 open data engineer jobs-many exceeding $150k. Searching for “data scientist” returns 23,845 jobs, many also exceeding $150k. Geography is also a critical factor in talent scarcity. Companies far from the tech centers of San Francisco, the Northeast and Texas, will find it more difficult to recruit the talent they need.

“Companies that are not located near tech centers are struggling to bring on the right people for their data science projects,” explains Caserta. “The benefit of working with a data and analytics consultancy, like Caserta, is that our talent is not limited by geography. Our consultants travel to our clients located all over the US and Canada.”

data science jobs by state

Organizations will either need to figure out methods to attract data talent or find other methods to fulfill the promise of big data. Technology consultancies have readily available teams of skilled data engineers, data scientists and data analysts that can come in, assess a project and then build it. Having worked on many data science projects, teams have the knowledge and experience needed to reduce costs and time to value.

Competition for data science talent will continue to grow. By 2022 the Global Big Data market is expected to reach a staggering $118.52 billion, growing at a CAGR of 26% from 2015 to 2022. Rapid growth in consumer data, advances in information security and improved business efficiencies are some of the agents fueling the market growth. By 2020 there are expected to be 364,000 new DSA job postings in the US.

high paying data skills by occupation

The best path forward for analytics leaders looking to start a data science project or get one back on track is to partner with a data analytics consultancy. With a large pool of available talent, a consultancy will be able to complete the project on time and budget and advise on how to maintain going forward. This would translate into big-dollar savings, as it wouldn’t be necessary to carry the data science team payroll for years for skills that are no longer affordable or necessary.

“We’re helping many companies with limited resources move their data science projects into production,” notes Caserta. “The majority of firms don’t need to find the expensive scarce talent for short-term projects. Instead, they are better served finding a consultancy like Caserta that will do exactly what they need quickly and efficiently at a lower cost than a firm can do by carrying expensive talent for years.”

Machine Learning Is the Key to Cracking Marketing Attribution

Article originally posted on the Caserta Blog.

From the moment we open our eyes and check our notifications until to the time we fall asleep with our devices carefully placed within arm’s reach, we are bombarded by a daily barrage of about 4,000 marketing messages. Each email, banner ad, social media post, direct mail, product placement, and other marketing message, is fighting a war for our attention—and the competition is fierce. The technology in the battle for our eyeballs is constantly advancing in a perpetual competition to push through the noise.

With all of these messages competing for attention, how would a marketer know which touch points or certain combinations of touch points are most effective? Marketers need data frameworks that effectively gather data and deliver actionable insights to increase revenue, reduce costs and gain market share. Successful organizations are  turning to outside technology consultants to transform their data intelligence strategies and foster a culture of data science.

In 2017 a staggering $206.77 billion was spent on media advertising in the United States. With such a hefty marketing-message price tag, business users would want to optimize their spends and know which touch points are most effective in the customer journey, and which ones squander precious resources. Despite massive marketing budgets, the majority of marketers still struggle to attribute which touch-points or combinations of touch-points are most effective.

Many organizations are still using simplistic attribution models that hinder their ability to make data-driven decisions in the competition for converting customers. Organizations may lack the framework and technology needed to properly gather large quantities of data, stitch together each touch-point in a customer journey, and understand and appreciate the contribution of each message. Many marketers may not even act on insights, despite marketing attribution, according to 2017 State of Marketing Attribution report.

Adopting a Culture of Measurement and Accuracy

Data champions who foster a culture of measurement and accuracy inside an organization are crucial to effective marketing attribution. According to the 2017 State of Marketing Attribution report, 80% of brands and 71% of agencies rate “creating a culture of measurement and accuracy” as a top-three marketing attribution issue.

However, even those organizations that are already gathering data and feeding it to analytics and business intelligence tools are still missing a powerful weapon in the marketing attribution arsenal: Machine Learning.

Use Case: Implementing Machine Learning with Spark for Marketing Attribution

An organization based in the U.S. wanted to develop an infrastructure that would enable them to gather, store, cleanse, consolidate and distribute data in order to propel them closer to their business goals. Their data was comprised of offline, online and third-party sources. The organization approached Caserta, a technology consulting and implementation firm, with their data challenges and marketing goals. Full disclosure, I’m the VP, Marketing at Caserta and I’m blown away by our bold tech solutions.

Marketers beware, tech talk ahead.

In order for the organization to perform advanced analytics and Machine Learning, Caserta built an all-purpose data lake with Spark, an analytics framework, to conduct the big data transformations. Spark can power not only Business Intelligence queries, which organizations are already accustomed to, but also Machine Learning processes effectively. Spark comes with a machine learning toolkit, MLlib, which creates trained models for predictive analytics. However, these algorithms need vast computational power to do so.  Caserta opted for using Databricks to manage the computational infrastructure and propel discovery on the data lake using their Spark managed service. Databricks’ technology can simply ask more machines from AWS to power Spark for the required scale of the data, giving nearly infinite processing ability.

Now that all offline, online and third-party data sources are inside a single all-purpose data lake with a managed Spark service, the organization’s data scientists are free to analyze, transform and model data without the constraints of scaling tools. Removing this roadblock increases the room to innovate and will help marketers discover their optimum marketing mix. Organizations that promote a data culture will celebrate marketers that know how to derive insights from Machine Learning and take action.

As long as organizations continue to use simplistic data attribution models, or fail to use attribution at all, they will cede success and market share to those organizations who have the tech to understand what actually works in their marketing channels. Marketers will increasingly depend on Machine Learning to provide them with deep insights into the customer journey. The data may reveal perhaps that three direct mailings, one email open and seeing five social media posts is your organization’s perfect recipe for converting a first-time customer. Aren’t you curious about what your yet-to-be discovered combination of marketing touches is magic mix for customer acquisition?

For more information, check out this webinar: Using Machine Learning & Spark to Power Data-Driven Marketing.

Can you answer the 20 million euro question?

This post originally appeared on the Caserta Data Blog.

The new GDPR regulation will change the way organizations worldwide handle data.

Be honest with yourself for a just a moment. I’m going to ask you a few questions and if you can answer positively, then your organization is in good shape. If not, then this article is for you.

Download GDPR white paper>>

If Anna Schmidt, a customer from Berlin, would contact your organization and request to be forgotten and her data expunged, could you actually do it? Would you really know where all of her data is located in order to delete it? Most likely her data is found in multiple data sets across an organization and you may not even be aware of all of them. How could you afford Frau Schmidt the right to be forgotten, as granted in the GDPR, if you don’t know where all her data is in the first place?

GDPR protects people’s privacy.

The GDPR, or General Data Protection Regulation is a new set of laws that require organizations to protect the data and privacy of EU citizens. This affects companies both in the EU and those worldwide with data on EU citizens including employees, clients and prospects. Companies in the United States with users, like Anna Schmidt in Germany, would need to comply with the new regulation—and the penalties for not complying are severe.

Companies outside the EU that wish to continue to have data on EU citizens and do business with them will need to have a rep in the EU. According to the regulation: “Non-Eu businesses processing the data of EU citizens will also have to appoint a representative in the EU.”

GDPR requires that you allow people to exercise the right to be forgotten, but that’s not all. For all of your marketing campaigns and email lists can you clearly demonstrate “proof of consent” to be marketed to, which is stored in a way that makes it easy to access. What if Anna Schmidt says that she never consented to give you her data in the first place. Could you prove it?

GDPR compliance is not a trivial task.

Before Frau Schmidt wants to be forgotten by your firm, she may first want to see what information you have on her. Is your business able to package all of her data and transfer it to her on-demand? Not such a simple feat.

If you’re unsure of the status of any of the previous three questions you’re not ready for the GDPR and you’re not alone. Around 66% of businesses surveyed say they aren’t sure if they can erase a person’s data by the GDPR deadline, according to a survey by Solix Technologies.

Penalties for non-compliance are steep.

The deadline for organizations to comply with the new stringent GDPR privacy laws is May 25th 2018. What would happen, however, if a company doesn’t comply with the GDPR? After all, the regulations are pervasive into a company’s data and to comply through manually updating records would take years. An automated big data approach would help to properly tag and identify data in order to be compliant.

According to the GDPR, the penalty for not complying is a fine up to €20 million or 4% of global annual turnover—whichever is higher. Many organizations aren’t doing enough to prepare for the GDPR deadline. Until the EU starts fining companies, many may not take proper action. In order to avoid harsh penalties ahead, your organization needs to have a comprehensive data compliance strategy in place that answers the demands of the GDPR.


Webinars Don’t Get the Love that They Deserve

At a recent computing convention where we exhibited in San Jose, our CEO Eli struck up a conversation with one of the attendees at the show. The bold-faced company name on the attendee’s badge jumped out at Eli. This was a company that our sales team prospected and sought to connect with a few times to no avail. After Eli used the standard opener—if the prospect had heard of our company, Jethro—he answered, “Of course! I saw your webinar and I was intrigued by your tech. Tell me more.” A week later we moved forward with a POC.

Webinars can be one of your most effective B2B marketing media regardless if your audience is like Jethro’s, whose main personas are all niche techies like database architects and CTOs, a B2B healthcare startup focusing on conquering Cancer or even a boutique social media marketing agency that wants to inform you about the Top 10 Ways to Slay on SnapChat.

More than just hard-sell lead generating opportunities, live webinars are effective as a conversational long-term approach. A successful webinar will be about an interesting topic that is relevant to your audience and delivered by an authority on the subject matter.

Like all marketing campaigns, there are good webinars and bad webinars. In this post I’ll assume that you’re going to produce a kick-ass webinar using your creative marketing super-powers. I’m going to presume that your webinar will have an engaging topic or case study performed by a well-seasoned host or hosts who are comfortable with their topic and not afraid of public speaking. You can’t pull ROI out of thin air—plan this out and it’ll come back to you exponentially.

More than 60% of marketers include webinars as part of their content marketing programs according to the CMI. However, in a research survey by Ascend2, only 30% of marketers say that webinars and online events are effective. These two stats are far from water tight, but they point to the fact that webinars are highly underrated! Many marketers don’t understand the full picture regarding the effectiveness of a webinar and how it extends further than the 30 minute live webinar. The topics from research/ white papers and case studies are effective content for webinars, which are then repackaged as video content. You can quickly see how webinars combine different marketing media together and easily produce subsequent content. Webinars kill two or three content marketing birds with one stone!

Most Difficult Types of Content to Create

Although marketers may think that a webinar is difficult to produce, the benefits of a webinar go beyond short-term customer conversion and have far reaching effects. As a content marketing tool, webinars work simultaneously on multiple parts of the funnel. The audience is exposed to your brand at multiple touch points via promotion of the webinar, the webinar itself and subsequent follow up touch points. Potential leads will happily hand over their coveted contact info in exchange for a well thought-out and organized webinar—especially about industry trends by an authority on the topic.

Brand-building is an unavoidable (and desirable) side effect of a webinar and as such should be within your brand guidelines. As an interactive and immersive content experience, webinars empower you to build brand awareness, brand equity and build trust by establishing yourself an authority in your field. If you’re that cancer-conquering B2B healthcare startup, you’ll want to do a webinar with doctors and researchers that have the credibility necessary for your product. Make sure your webinar is structured in an enjoyable and informative way with real added value to the audience member. You want the viewer to finish watching the webinar with the opinion that this was a high-value way to spend their time, that they can trust your brand and that they’d like to do business with you.

The most effective webinars are interactive and conversational. Audience participation is a two-way street that not only enriches the webinar topic, but also lends insight to what potential clients are thinking. At our company, we always review the questions that participants ask and see how we can update our marketing materials assuage their concerns. Sometimes questions indicate industry trends that are becoming more prevalent and should be addressed.

A Live Webinar is an Immersive Virtual Happening

Hosting events and meet-ups can be effective ways to meet prospects in the real world and forge new relationships with potential clients and partners. For smaller companies that can’t afford the overhead of hosting such a large event, as well as larger companies that want to be more targeted, a webinar is the perfect event alternative. A webinar is the most immersive content experience currently available. I’ve yet to experience a VR presentation, although that might not be far behind.

Much of the same marketing groundwork needs to be done to promote online and offline events. You need to brainstorm the idea behind the event or talk and make sure that it will be relevant and interesting to your client base. You’ll need to secure hosts, send out email invitations and build a landing page and set-up lead-collection forms on your marketing automation software or CRM. That’s already quite a lot of work to accomplish—especially for a smaller marketing team. By this point (after a FOMO-inducing reminder email) you’re ready to start the webinar at fairly little cost. If you’re orchestrating a real-world event or meet-up, this is just the beginning.

Webinars Are Cost-Effective with High ROI

The logistics and operations surrounding a real-world event pose a barrier for pulling it off at all. At a previous company I was with, before webinars were a big thing, the CEO was dead set on doing a product roadshow. We were a small marketing department and the resources necessary for pulling this off were draining and detrimental to our other marketing activities. The events were costly both in man hours and in cold hard cash, and after two events we decided to call it quits. We spent a half-a-year’s worth of our online advertising budget in two events and had zero clients to show from it.

Webinars on the other hand, are convenient ROI-friendly marketing endeavors because they’re so inexpensive to produce and have such high gains. According to ReadyTalk, the average cost of conducting a webinar is between $100 and $3,000 depending on promotion and technology costs. Today the tech to produce webinars is dirt cheap and high-quality. We use Fuze, which for $40/month will give you unlimited HD webinars for up to 250 people with HD recordings. That’s quite a departure from the clunky software of just a few years ago.

Not only do you see rewards from the actual live webinar, but also from all of the marketing activities surrounding and promoting the webinar. ROI is also higher with webinars due to its wider reach. A potential real-world attendee may want to go to your spectacular event, but his kid’s ballet recital is at the same time across town… or in another town… or time zone. Residing online, the webinar knows no boundaries, and a recorded webinar lives on further bolstering ROI as more and more people consume your content on-demand without any time constraints.

A Webinar Is the Gift That Keeps on Giving

When done right, a webinar is far more than just the half-hour that you’re on the air. Like I said before, in order to promote a webinar, you need to send out emails, create landing pages, possibly even campaigns surrounding this. The webinar in and of itself is a happening that creates buzz! This is a live event and as such drums up some excitement when relevant and with an authority who can talk on the topic. If you’re a marketing manager from a tech company, find a product manager with the cred to do the talk.

After your webinar, you have the added benefit of not only delivering the recorded webinar, but also creating more opportunities to further fuel prospects’ journey down the funnel in real-time. Your sales team will now how a list of qualified leads that can be contacted directly and encouraged to convert.

In addition to the buzz generated from promoting the webinar, you also have the residual webinar aftershocks. This should be treated as a new piece of content and promoted as such. Since you have a recorded copy (you remembered to hit record, right?!) you now have content to put on your YouTube channel, push on social media, email both to the attendees and also to your entire list, and possibly include in future drip marketing campaigns. From one marketing channel, you now have six or more!

Non-gated webinars boost SEO and generate results. I see leads coming in that originated from searching specifically for webinars on a certain topic and also from searches with keywords from the webinar. When posting the webinar, it’s always best to set up a dedicated keyword page complete with descriptions about the webinar.

BI on Hadoop in 2016 – The Elephant in the Room

The below article was originally posted in the Jethro blog.

BI-on-Hadoop in 2016 – The Elephant in the Room

Last year it seemed like every organization was talking about harnessing the power of big data to gain the crucial business insights required to make data driven decisions. The need for people and groups to blend together big data sets and analyze big data with the finest resolution has become standard practice. However, the pursuit of gathering and analyzing big data has ushered in a new set of challenges. Despite data sets burgeoning to larger and larger magnitudes of size and complexity, people still expect to interact with the data using their BI and visual data discovery tools (Tableau, Qlik, Microstrategy etc.) at the speed of thought. In our “world of now,” no one likes to try and drill-down into their data only to sit agonizing minutes waiting for queries to stream back from Hadoop to their data visualization or BI tool.

In 2016 we are going to see individuals and organizations demanding that the data discovery process accommodate more massive and more varied datasets as well as allow for more complex analysis—all at interactive speeds. Business users are demanding self-service tools without requiring any IT assistance, which will enable superior flexibility with their data exploration in order to derive actionable business insights. The rigidness of partial extracts and predefined cubes of yesterday are not going to quench the business user’s thirst for complex data discovery and analysis. IT will no longer be able to keep up with the business users’ demands leaving them to seek out a more flexible and sustainable solution.


Hadoop by design was not intended for interactive BI use and SQL-on-Hadoop tools scramble to serve the full range of analytic and ETL use-cases. Each SQL-on-Hadoop solution has its own unique properties and best use-cases. For the unique needs of BI and data discovery, however, Jethro takes the cake. TPC-DS live benchmarks using Tableau and Qlik have shown that Jethro has the fastest query response time. Even on 2.9 billion rows of data, Jethro enables business users to interactively analyze data at the speed of thought. Jethro also gives BI users the most flexibility, as its index-based architecture indexes every single column alleviating the need for limiting predefined extracts and cubes.

Certain BI tools, like MicroStrategy, can tap directly into Hadoop via native connectivity with HDFS, but big data performance issues surface while dealing with large and complex data sets. These users would benefit from an indexing and caching layer that would accelerate the queries from Hadoop.


Data will always exist in multiple places with a variety of sources—even as Hadoop becomes more widely adopted. In order to allow for boundless data discovery and the highest granularity, SQL-on-Hadoop solutions will need to ingest data from both Hadoop and non-Hadoop sources. Jethro grants BI tools access to any data required, as it ingests data from any data source including traditional structured data.

2016 will be about addressing the elephant in the room and finding ways to overcome the BI on Hadoop hurdles in order to empower the business user to discover, visualize, analyze and derive priceless data-driven business insights to propel business forward.