Prediction Mathematics

Before I go any further into this topic, I want all the other (and more-qualified-than-I) statisticians out there please to hold off on quibbles about minutiae, because this is a fairly simplistic overview, not an academic treatise about the topic. For the record, however, let me remind everybody that I was involved in designing predictive algorithms in my past life as a consultant in the supermarket industry, and my specialty was assessing and assigning the different weighting factors involved in predicting incremental sales created by price- and other kinds of promotions. I didn’t design the algorithms — that was the job of some seriously-brainy boffins from MIT, University of Chicago and Northwestern — but I did advise them on the above, and the results were predictive algorithms that generated forecasts which were generally between 95% and 97% accurate.

What prompted this post was this article, which I urge  you to read before continuing, because otherwise what I’m going to say may not make sense.

Here’s a quick thumbnail sketch as to how all this works — and I’m not going to use the supermarket business because even I fall asleep because of its mind-numbing boredom. Let’s make it more current, more contemporaneous.

Say we want to establish the likelihood of someone becoming a terrorist who wants to blow a bunch of innocent people up in a suicide attack. Note the terms of the discussion carefully, because they are important.

  • “Terrorist” = somebody who wants to terrorize the population at large
  • “Innocent people” = people who are not actively inimical to the terrorist’s philosophy, group or society
  • “Suicide” = someone who knows that he will perish in the attack.

Note that this predictive algorithm is not going to identify Timothy McVeigh, for example, because while some innocent people were killed in his Oklahoma City attack, the bomb he created was specifically targeted at an IRS building as opposed to, say, a Pink Floyd concert. Likewise, McVeigh made careful plans to avoid being killed in the bomb blast, and his attack was probably designed to create fear among government employees. (Yes, of course he was a terrorist, just not the kind we’re trying to predict below.)

So how does one establish an algorithm to foresee (and, one hopes, guard against) a terrorist attack such as described in the brief? One looks at history (without which all predictions are called “guesswork”) and looks at the profiles of all other people who have perpetrated such crimes in the past, and not the distant past either, because time has a way of making predictive algorithms irrelevant as circumstances change. From that, we can deduce the following contemporary factors:

  • religious fanaticism
  • age
  • sex
  • exposure to radical philosophy
  • societal alienation
  • socio-economic status

That’s not a comprehensive list by any means, but it will give you an idea of what’s involved. What this algorithm is supposed to do is drill down through the total population of a defined universe (a country, an area, the entire world) to identify a potential terrorist as defined above. So here we go, and let’s build a set of simple parameters for our algorithm from some of the above factors, starting with the easiest one first.

  • Socio-economic status:
    We can eliminate the upper echelons of society from any inspection. Saudi or Swedish princes and billionaire oil oligarchs don’t blow themselves up in Parisian shopping malls, or at least none have so far. Almost exclusively, terrorists have come from middle-class origins and the unemployed- or low-wage scale segments. These are micro-weightings, i.e. applied within the criterion itself. Using a scale of 1-100, we can estimate that upper-class: 0.5; middle-class: 40; low-wage: 50; unemployed: 65. (Note that they don’t have to add up to 100 collectively; we’re establishing a risk factor for each group.)
    The more interesting question is: how important is socio-economic status as a predictive factor compared to, say, religion? Probably not as much; but how much less important? This is a macro-weighting, which is applied across all the identified criteria. For the sake of argument, let’s assign the socio-economic factor a weighting of, say, 35 overall.
  • Societal alienation:
    Immigrant or native-born? Immigrants or, as we used to call them, “strangers in town” or “newcomers” may feel that they’re not part of the new society in which they find themselves — especially if that society is radically different from the one they left. Newcomers also have fewer “roots” in that society, which makes anti-social activity less problematic for their conscience. If the newcomers are also part of an ethnic group which sets themselves apart from the mainstream of their adopted society — a combination of socially, philosophically or physically — this will add to their feelings of alienation. The second determinant, native-born, is probably less important, although if they are members of a “set-apart” group, that micro-weighting needs to be adjusted upwards, and especially if they have constant contact with newcomers. Once again, we can assign micro-weightings of 60 and 45 respectively.
    For the macro-weighting, we can ask how important alienation is, compared to socio-economic status? Probably a lot more, but once again, how much more? — which is the weighting decision. More than socio-economic’s 35? Definitely — more like 60, almost twice as likely.
  • Age:
    Most terrorists are young — under the age of forty. While an age of, say, sixty-five is not a disqualifying criterion, it certainly suggests a far smaller weighting than someone who is in their twenties (which group has supplied the far-greater proportion of terrorists than sexagenarians). We can assign weightings by specific age groups (e.g. 12-16, 17-25, 26-30 and so on), but to keep things simple, we’ll give the under-40s a cumulative micro-weighting of 90, and the over-40s a score of 5.
    As a macro-weighting, age is one of the principle determinants of likely terrorists, and incidentally of most major criminal activity in general (check the distribution curve of ages among prison inmates and known terrorists to verify this statement). Let’s give this group a score of 50 — less than socio-economic status, but not much less.
  • Religious fanaticism:
    Almost all religions engender fanaticism in one way or another, but in recent times (remember the “recent history” issue), Islam has produced by far the greater number, and has caused by far the greatest number of terrorist-inspired incidents, which have killed by far the greatest number of innocent people. (Note that Nazi fanatics killed far more innocent people in the past two hundred-odd years, but in the past two decades have killed almost none — hence the recency determinant.) At the moment, therefore, an adherent of Islam would need to get a far greater micro-weighting than, say, a Nazi, Christian or Buddhist.
    As a macro-weighting (applied against the total population), Islam is probably the single most important determinant — and if one were to apply a weighting factor along that scale of 1-100, one could easily assign a contemporary weighting of 95 or even higher.

Of course, anyone suggesting weightings such as the above is going to be accused of “profiling” by the moral relativists, SJWs, ACLU, SPLC and suchlike Useful Idiots, but I should point out that on that basis, no courts should use the COMPAS system at all.

What should be fairly obvious to anyone is that while the overall algorithm design can be a proprietary affair, the weighting factors within the algorithms need to be subject to the closest scrutiny and debate possible. I should also point out that a lack of such analysis has enabled the scam known as global warming / -cooling / climate change to be accepted by the gullible and ignorant, but we can talk about that another time.

Suffice it to say that the more daylight involved, and most certainly the daylight within the group building and implementing the forecast criteria — statisticians, intelligence services, law enforcement and the judicial system, the more accurate the algorithms will become. Most important, however, is the fact that the predictive algorithms will engender a higher degree of trust in the population.

Let Africa Sink

As my Let Africa Sink essay from 2002(!) is going to feature in my Monday post, I thought I’d take the opportunity to re-publish it below, pretty much un-edited except for a few typos which somehow survived to the present day.  Read more