Big Data B.S. and Picnicking in the Mindfield

(All right! Down to a mere 70 draft items once I hit publish on this one. Woo, and, I might add, Hoo…)

My company held its annual all-hands strategic planning day yesterday (two months ago – old draft), during which we had the inevitable review of current and future technology and trends.

I tend toward a grizzled veteran’s view of tech trends: if it’s obvious, proven and clearly will make or save a ton of money, it will only take 10 – 20 years to get adopted. A large part is that the providers and their inside champions need to sell the idea to mostly risk-averse management (1) – and that takes time.  But the main factor, the one I’ve seen in just about every case, is that the people proposing the technology woefully underestimate how much trouble it will be for a company to implement it. I myself have committed this sin – I’ve tried to get people to use certain analytics without recognizing, at first, how difficult – well-nigh impossible – it is to get usable data upon which to do the fancy-dan analysis (2). Everybody – well, almost – thinks what I’m proposing is cool, and if they could get the data without having to design a massive IT project and get it funded, they might do it. In other words, it ain’t happenin’.

Take system integration. Now, things are mostly integrated – your phone gets emails and can access Wikipedia and perform stock trades and get you on an airplane, among hundreds of other things. Which is pretty darn cool. All it took was about 30 years of tech and infrastructure development costing billions, maybe trillions, to pull off. We old guys can remember, or were even involved in, efforts 30+ years ago to get things integrated. Everybody could see the value. For some levels of basic integration, the tech was there. The savings/revenues were obvious.

30 years later, headway has been made!

That said, we’re hitting at least the 10 year mark on Big Data. To me, a very lightweight math guy but possessed of some philosophy chops, the underlying concept of big data analysis are – fraught with risk sounds a little dire, but something like that. The simple, obvious risk is that we’ll get it wrong – that by applying Big Data analysis, we will come to think we understand things that we don’t understand. Then, with that confident misunderstanding – hey, it’s backed by Big Data! – we’ll make predictions and chart courses that don’t work, that have unpleasant unintended consequences, or lead people to take actions that harm people for no benefits. (3)

The other, larger problem is well-illustrated by those scenes from that Captain America movie, sort of psychohistory-lite, where Hydra claims, in a totally Big Data way, to have identified all the troublemakers out there, who will of course now, like so many Kulaks, be executed.

Nothing in the last century, certainly not J. Edgar Hoover’s blackmailing his way to the top nor 100 years of Chicago politics, would lead one to worry that Big Data would be misused, and the examples of  the Soviet, Chinese, and Cambodian mass slaughter of unarmed civilians can’t possibly apply to this country – our socialist are like Uncle Bernie, not like Uncle Joe! Right?

So far, we see only benign things, like Amazon suggesting that, based on my other searches, I might be interested in the works of some guy named Homer. However, one thing has long seemed odd to me: Is it a coincidence that, once the Chicago Machine was able to apply its years of, um, expertise to the Federal government,  Congress’s ancient jealousy of the White House infringing on its Constitutional powers seemed to fade away, after the manner of any opposition to J. Edgar? Would it be unduly mean-spirited to consider the possibility that a city with a porous government/mafia interface(4), as it were, would use the unprecedented domestic spying powers the government granted itself after 9/11 to reach an understanding with a few key congress critters?

Nah, that could *never* happen.

Effectively unlimited domestic spying + Big Data + political ambitions – any discernable moral restraint = uh oh.

  1. Management is risk averse for very good reasons, as the paragraphs that follow show.
  2. I’m working on another project at the moment that will, as a side benefit, collect exactly the data needed for the analysis – woot! Once more, dear readers, into the breech!
  3. As I understand it, and I got this from reading some Google documents years ago, the premise with Big Data is that you do correlation analysis without first having any hypotheses about causality – you don’t know or even have a theory about what the relations should be. Then, by thus naively crunching huge amounts of data, trends and correlations will be revealed. Next, if you were doing it right, you’d create hypotheses about what *causes* those relationships, assuming that there is a cause (that it is not accidental in the Greek sense), and test them with further data analysis.  But this last part is likely to get skipped. Some more or less innocent things will happen immediately (they are already happening): since people under 30 who use Uber on Tuesdays tend to order pizza when out of town, we’ll sell ads to out of town pizza vendors and push them at the victims whenever they travel! But there are other ideas that are not nearly so innocuous.
  4. Fred Roti, a known La Cosa Nostra made man, who ran the Chicago city government for over 20 years as an alderman – the kind of alderman who always voted first, so everybody else would know how they were to vote – was only put away in the 1990’s. As Wikipedia so delicately puts it: “Roti’s legacy lives on through the many City of Chicago employees whose hiring he effected.” Ya think? Those would be the people behind our current administration.
Advertisements

Author: Joseph Moore

Enough with the smarty-pants Dante quote. Just some opinionated blogger dude.

5 thoughts on “Big Data B.S. and Picnicking in the Mindfield”

  1. I worked in IT for a Large Financial Services company, now one of the Too Big to Fail (or, as I liked to say, succeed). I found that management (particularly non-technical mgmt) was generally unqualified to make decisions on systems related purchases. They would buy anything that was shiny enough, especially if a squirrel went by as they were closing the deal. Then, when it wouldn’t do what they wanted (because the system’s architecture was antithetical to that goal), rather than admit that the pooch was thoroughly and irretrievably screwed, they would double down on the decision. We had some large clients walk over that one.

    1. Oh, yeah. Then the vultures circle – consultants, experts – and feed on the twitching corpse. Problem is, competent decision making and accountability is something the people in charge aren’t going to buy. If they are planning it right, they get kicked upstairs before the stench of the tire-fire they started clings too much to them personally – and the next guy inherits the mess. Expecting solid tech decisions in such an environment is a fool’s errand – I know, I’ve done it.

      1. The problem with putting the salesmen in charge is that they believe what the brochures say. During the discovery process they ask tough questions like ‘how many colors does it come in?’. When we discovered that our typical daily volume processed in an hour (go Big Iron!) would take 25 hours on the new system (true fact) we got feedback like ‘they’re both computer systems, they can’t be that different.’ I retired last year so my interest in the problem has substantially waned.

  2. the premise with Big Data is that you do correlation analysis without first having any hypotheses about causality – you don’t know or even have a theory about what the relations should be. Then, by thus naively crunching huge amounts of data, trends and correlations will be revealed.

    Thus falling victim to errors of the first kind (wild goose chases) in massive frequency. A/k/a “data seining” using techniques (p-values!) devised for handling data obtained through small, carefully designed samples and applied to massive amounts of carelessly-collected, error-prone data never subjected to editing. A large number of wild goose chases is tolerable iff the value of the goose is high and the cost of the chase is low. Easier by far simply to announce that a goose has been found and forego the chasing of it entirely.

    1. Of course. While it is somewhat comforting to imagine that almost all businesses will eventually give up activities that cost a lot and produce little, governments are much less so constrained. (Then I think of Google, lead by ideologues and sitting on billions of cash, and think: uh-oh.)

      There might be a slightly more positive/less stupid activity going on. I think sometimes there are theories about causality that are not acknowledged on principle. If what you’re after is selling more soap and beer, you already have some idea of what you’re looking for: characteristics that identify potential targets for your sales efforts. So, while Big Data in theory says those characteristics might include shoe size and brand of refrigerator, you are probably going to pay more attention to stuff like ‘buys a lot of hotdogs’ or ‘spends time in the gardening section’ because you suspect those things might actually have some sort of direct or indirect causal relationship with purchases of beer and soap. But as you have pointed out on occasion in so many words, the capacity to make stupid decisions based on misunderstanding what the numbers say is effectively unlimited.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s