Linear Regression Fallacies


As a data scientist, I’m continually amused with the on-going misuse of the principle of linear regression.  You see it all the time in the press. Lately, regression theory is underlying much of the talk surrounding the demise of American entitlement programs like Medicare and Social Security. Forecasts abound, saying that the country can no longer afford retirement benefits and elder healthcare coverage. But you must stop to ask what these so-called forecasts are based on?

Many forecasts unwisely project healthcare costs far into the future by assuming that the trends of the past would continue unaltered. But this mentality ignores realty. To look at this another way, just because your son is 4 feet tall at age 6 doesn’t mean he’ll be 12 feet tall at age 18. And just because the average American born today will live to the age of 78 doesn’t mean that the baby born in 2032 will live to 100. Forecasting something like healthcare is just as prickly.

Making bets on the distant future is not wise because it’s unknowable. Too many exceptional events can occur to affect life along the regression line – wars, financial bubbles, financial crashes, extreme weather, political demagoguery, etc. No one – no business, no government agency – makes plans today based on a vision of the world 20 years ahead. Apple doesn’t do it. Google doesn’t do it. The Department of Defense doesn’t do it. You and I don’t do it. Not even insurance companies do it and much of their business is tied to the future. But many in our U.S. government prodded along by the anti-entitlement lobby are pushing to gut these programs while relying on projecting the past experience into the future without adjusting for changes in behavior or policy.

All this does not mean that uncertainty in economic forecasts means there’s no point in economic planning at all. There are indeed good reasons for looking ahead, in small increments, along the regression line, just not good reason or making sweeping changes in programs as complex and far reaching as Social Security and Medicare. So let’s accept life on the regression line for what it is – a limited and isolated view of the future.


Posted on December 4, 2012, in Opinion. Bookmark the permalink. 2 Comments.

  1. This is why time series analysis is the bigger player in dealing with these kinds of problems. Discovery of a ‘signal’ or something which translates to ‘meaningfulness’ in noisy data (e.g. stochastic processes) often involves a more robust and complicated analysis. Just as a fundamental aside, I have seen VERY few individuals, including those calling themselves data scientists, who can give an adequate definition of ‘signal’ and ‘noise’ and the differences between the two. This simple lack in understanding can have an extraordinary impact on how one interprets a data-driven world, from any perspective.

    The statistical community, which with the physics community developed time series methods, has long been aware of and trying to combat these problems. The statisticians’ warnings aren’t always heard well as the business will look at a straight line as ‘good enough’. Stakeholders often scoff at more robust models thinking that they are just ‘too complex’ to grasp. Further, many analysts, and even those calling them data scientists simply don’t have the skillset to develop, deploy, and communicate the meaning of more complex time-series models. Hence why I believe this problem persists in industry.

    Check out articles by Rob Hyndman who I think exemplifies the problem well and why learning and attempting to master time-series analysis is, although a difficult proposition for many, very worthwhile in the end.

  2. To take your observations one step further, imagine the upward slope data analysts in other disciplines have when trying to find the “signal.” Take for example the scientists at SETI. Finding signals from distant extraterrestrials is a daunting task. So far nothing. So they’re using innovative theories to improve their chances — like using holography with the ATA to use the direction of arrival information (encoded as the signal’s phase) digitized directly from the radio antenna. This way, using something like holography, the direction from which a signal arrives can be derived and thus rule out potential signals that are not coming from outer space (Earth based noise in effect).

    Further new techniques being deployed by SETI will greatly enhance the number of different signal types the equipments can be sensitive to including conventional carrier waves (e.g. AM radio), as well as various wide-bandwidth signals like those used for satellite communications on Earth. They plan to test for literally billions of signal types never probed before.

    Amazing stuff!


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Starbridge Partners

Welcome to STARBRIDGE PARTNERS! We Help Build Careers in Data Science. We Specialize in Placing Data Scientists & Data Engineers.

Follow the Data

A data driven blog

Sponsored by AMULET Analytics

Simply Statistics

Sponsored by AMULET Analytics

Data Science 101

Learning To Be A Data Scientist

R and Data Mining

Sponsored by AMULET Analytics

%d bloggers like this: