The reality of most companies using data science is akin to blind men describing an elephant. I’ve seen this both from my days in education and now working in IT marketing. That’s why we need to separate data science, a useful field, from data-ism, the cargo cult of non-scientists invoking data for ill-informed decisions.
Data science is a fine art fraught with uncertainty, biases and outright bullshit. Nate Silver seems like he’s wrong or hedging his bets more often than he’s right. Proving causation, even in ‘obvious’ cases like smoking and climate change, is difficult. Taleb and others have devoted thousands of pages to the shortcomings of predicting based on past data.
The picture that companies selling ‘data-driven solutions’ present is like the TV show CSI: entertainment based on science fiction that leaves the viewer convinced what they saw was real science. The same thing is happening with managerial types. They eat up marketing spiels about big data and uncritically take every last word as gospel truth. This is data-ism.
The assumptions of data-ism
- Collecting and storing as much data as possible is an inherently good thing. No serious discussions on the value of the data or the risks associated with it are allowed.
- All decisions need to be made based on data.
- The lower you are in the hierarchy the more you are judged by data.
- Data is whatever someone higher on the ladder says data is.
- Criticism of the system is not allowed.
Data-ism reinforces top-down management by pretending the whole charade is scientific. A loose definition of data (whatever management claims is data) and ruthlessly judging only low ranking employees using metrics is man behind the curtain. The classic example is firing productive employees for asinine reasons while upper management gets handsomely rewarded for driving a company into the ground.
Most managerial types can handle basic spreadsheet formulas and read glossy publications with graphs. That’s the extent of their training in data science. In practice, this means that ‘data-driven decisions’ come from blog posts and marketing materials.
A managerial type reads a blog post about some unrelated businesses killing it with Facebook ads. These posts are everywhere. While some can be entertaining and informative, they need to be taken with a massive grain of salt: All data is self-reported, negative data is usually ignored and there’s usually a call to action at the bottom — buy my viral marketing guru course. Nonetheless, the ill-fated ad campaign must go on! When it inevitably fails, the designer or copywriter is held accountable for not A/B testing two virtually identical versions of the campaign.
Faux scientific data is the norm on company landing pages. Let’s take a look at HubSpot’s purported ROI.
A few of the laughable points here:
- These growth numbers came during a great economy. You could be doing everything wrong and still grow.
- There’s no control group or analysis of other CRMs.
- HubSpot ain’t cheap, it’s not clear that 2.1 times more visitors per month actually pays for itself.
- All numbers are self reported with no third party verification of the data.
While a data scientist would get a chuckle out of this, actual decision makers treat these PDFs as sacred. Arguing against these ‘statistics’ becomes a futile exercise in circular logic. If you point out that you can save yourself hundreds of dollars and use a spreadsheet to do 90% of what HubSpot does, you’ll immediately be shot down since you don’t have ‘data’. If you’ve used HubSpot for over a year with no result, well obviously you just need to buy a more expensive plan!
This is data theater in the same vein as the TSA’s security theater. No amount of failed audits or comparisons with my sane countries will convince a true believer that the TSA is worthless. I see this myself and have heard it anecdotally from enough people that I don’t believe my experience is isolated.
The most common fallacies
My story with HubSpot illustrates the problems of not having a control and using self-reported data. A more full list of common errors managerial types always make:
- No control group: maybe people not using the product did better!
- Self-reported data: is somebody selectively sharing data?
- Small sample sizes: valid A/B testing and NPS require hundreds, even thousands of test subjects.
- Mistaken causality: correlation isn’t causation.
- Vanity metrics: getting likes on Facebook isn’t the same as selling something.
- Goodhart’s Law: when a metric becomes a target.
While not exactly a data fallacy, bikeshedding is usually a part of any data discussion with managerial types. A few meaningless but easy to understand metrics are picked, and everybody spends their energy focusing on these so we can say we’re data-driven.
The whole list of cognitive biases is relevant, but watch out for confirmation bias and double standards. During company strategy sessions some ideas don’t have to meet rigorous testing but others do. HubSpot’s claims are taken at face value despite having no valid data to support them; not using HubSpot is out of the question because there’s no data.
The right role of data
My contention isn’t that all data science is modern alchemy. Instead, most managers are using ‘data-driven’ as a cloak for poor decision making.
Data can help you make informed decisions, but not everything can be neatly measured. If a blog post or marketing strategy fails, that should inspire a post mortem with more questions. There are a million variables: timing, content, promotion strategy, economic situation, etc. Play around and tweak. When you hit good metrics, be skeptical of the causes and blindly replicating them.
Becoming obsessed with absolute metrics such as page views is a dangerous game. A company like facebook depends on sheer numbers, but most businesses use a different model. Let’s say you’re a niche SaaS company, becoming authoritative in your area is far more important than absolute page views.
Marketing is a long-term endeavor. Expecting instant results is another fallacy of the data-driven age. Not that long ago it was standard to accept that many business decisions would come at a short-term loss in order to build a long-term brand. Today, if you don’t hit quarterly KPIs, you’re out the door.
In short: do your keyword research, glance at traffic and keep an eye on trends without becoming a slave to metrics and short-term thinking.
Optimized to banality
All this optimization is more than just reinforcing existing hierarchies under a new name. Sure, Uber turns a 15 minute drive into an hour because drivers follow obviously bad maps. There are all sorts of stories of algorithms misbehaving and people sheepishly obeying.
My concerns are more mundane. Because of ‘data-that-can’t-be-questioned’, design and copywriting are locked into trends that are making the internet unusable.
- Needlessly long-winded writing. There’s some myth that the optimal post length for SEO is 1,500–2,000 words. If you can tell your story in 300 words, please do so without adding over a 1,000 words of filler.
- Meaningless stock photos everywhere. Graphics and photos that enhance content are useful. I don’t need to see (and wait for them to load) random unsplash pictures.
- Bootstrappy design. Very few companies have the courage to go beyond the standard bootstrap design because it’s ‘safe’.
Going to nearly any website feels like shopping at WalMart. If I go to a high-end company’s website, I don’t want popups and cruft — that’s why I’m paying a premium price. Notice how Apple’s site doesn’t have popups or a chatbot? That’s what I expect from a premium brand. The Economist looks like what I’d expect from the National Enquirer.
My take is that Apple has enough people with sense in their marketing department to ignore short-term gains — I’m sure popups and chatbots would sell a few more iPhone — for the sake of brand image down the road. Compare the Economist to the Atlantic that gives subscribers a PDF. How on earth do they survive without tracking every single click?
Optimization based on data has created a bland web where companies are afraid to use bold design, have a distinctive voice and think long-term. This is filtering down to personal blogs and social media accounts. When I hear regular people talking about blogging for SEO and their social media engagement, something’s gone wrong.
The odd thing about the data age where workers are fined for going to the bathroom (lost productivity!) and the smallest design changes are subject to A/B testing is that management is less accountable than ever.
The bankers that caused the 2008 crisis remain free. Equifax has faced no consequences for losing the personal information of nearly every US citizen. Facebook keeps on plugging away despite playing a huge rule in ongoing genocide and countless scandals.
Of course, these are political problems. More technology isn’t going to fix them. So no, we don’t need to put Facebook on the blockchain or whatever the herd is chanting on Medium.
For more on the general problem of algorithms, Weapons of Math Destruction by Cathy O’Neil is a good place to start. Her main point is that companies offering machine learning products often use faulty data science. The poor and working classes suffer from badly designed algorithms while the rich get personalized, human treatment.
Optimizing creates vicious feedback loops. Every website looking the same is one end of the spectrum, the other is political positions going further away from the center. The same phenomenon is driving producers to favor more extreme genres of pornography.
While I take issue with a lot of Taleb’s positions, antifragility and the ludic fallacy are concepts every marketer needs to be familiar with. Thinking about things negatively, that it’s impossible to measure how many sales you lose because serious customers are put off by your chatbot comes from Taleb. My vision of the managerial type is ripped from the intellectual yet idiot
Going through the list of cognitive biases is worthwhile. Thinking, Fast and Slow was a good read, but many of the specific examples are being hit by the replication crisis, so beware. The general premise still stands, though. Even data scientists see the world through a flawed human brain, a managerial type even more so. If you’re aware of the main cognitive biases, you can at least watch out for them and try to avoid using data to reinforce flawed logic.