Calculating the strength and lift of a rule directly from the available data will often result in values that do not reflect likely strength and lift on future data. For example, consider a database in which the records relate to customers and, among other information, the age and gender are recorded. For ease of explanation we will assume that half the customers are male and half female. Suppose, however, that there is only one customer who is ninety years old and that this customer happens to be female. As all ninety-year-olds in the data are female, the rule age=90 -> female will have strength=1.0 and hence lift=2.0. While it might be the case that for our application all ninety-year-olds will be female, in the absence of further relevant information this single example does not warrant concluding so. Typically we would want many more examples before we would conclude that the likely strength was even close to 1.0. Note, however, that this one example does provide some evidence that more ninety-year-olds will be female than male as it rules out the possibility that no ninety-year-olds are female, but does not rule out the possibility that no ninety-year-olds are male.
The m-estimate provides a form of Bayesian estimate of the likely value of strength and lift for a rule given a finite sample of cases. It takes into account the number of examples that the rule covers, the strength of the rule given those examples, and the frequency of the RHS in the data as a whole. It adjusts the observed strength toward the latter value. The size of the adjustment depends upon the cover of the rule and on a user-specified value m. The formulae are
Strength estimate = (support + m * prior) / (coverage + m)
Lift estimate = strength estimate / prior
where prior is the frequency of the RHS in the data as a whole.
For our example rule age=90 -> female, support = 1 (because there is one female ninety-year-old), coverage = 1 (because there is one ninety-year-old) and prior = 0.5 (because half the customers are female). If m = 2, strength estimate = 0.75 and lift estimate = 1.5. If m is increased then more weight is placed on the prior. For example, with m = 10, strength estimate = 0.545 and lift estimate = 1.091.
In the interactive system the m-estimate check box is used to control whether an m-estimate is used. In the commend line system the m command is used.