Association Discovery with Magnum Opus 4.3

A Tutorial Introduction

 

Table of contents

Preliminaries. 2

A simple example. 2

A worked example. 3

Searching by strength. 7

Searching by lift 12

Selecting RHS elements. 14

Itemsets. 17

Attribute-value data. 19

A worked example of attribute-value data. 21

Contrast discovery. 25

Statistically sound association discovery. 28

A worked example of holdout evaluation for rules. 31

A worked example of holdout evaluation for itemsets. 37

Computation time, snapshots and anytime results. 41

Some final thoughts. 43

 

Copyright © 2007=2009, G. I. Webb & Associates Pty Ltd.


Preliminaries

Magnum Opus detects associations within data.

The data is imported into the system from a text file.   Users typically extract data from a database into a text file for use with the system.  There is considerable flexibility in the formats that may be employed.

The user selects settings that control a search for associations in the data.  The user can choose the type of association to be found and between alternative measures of the relative value of an association.  The user also specifies the maximum number of associations to be found and any further restrictions on the associations to be considered.

Within the restrictions specified by the user, Magnum Opus finds the associations with the highest values on the specified measure.  Magnum Opus will only find fewer than the specified number of associations if the search is terminated by the user or there are fewer than the specified number that satisfy the user specified constraints.

The associations found are recorded in an output file and may optionally be exported to a comma separated value file suitable for input into a spreadsheet for further analysis.

A simple example

We start with a simple invented example of analyzing the purchasing habits of a customer of a fictitious grocery store.  The customer has visited the store on ten occasions, each time buying a different selection of goods.  The following item-list file records the customer’s purchasing behavior.  Each line represents the items bought on a single visit.

plums, lettuce, tomatoes

celery, confectionery

apples, carrots, tomatoes, potatoes

potatoes

confectionery

carrots

apples, oranges, lettuce, tomatoes

peaches, oranges, celery, potatoes, confectionery

oranges, lettuce, carrots, tomatoes

apples, bananas, plums, carrots, tomatoes, onions

These can be processed by Magnum Opus to find rules such as the following four.

apples -> tomatoes [Coverage=0.300 (3); Support=0.300 (3); Strength=1.000; Lift=2.00; Leverage=0.1500 (1.5)]

lettuce -> tomatoes [Coverage=0.300 (3); Support=0.300 (3); Strength=1.000; Lift=2.00; Leverage=0.1500 (1.5)]

tomatoes -> apples [Coverage=0.500 (5); Support=0.300 (3); Strength=0.600; Lift=2.00; Leverage=0.1500 (1.5)]

tomatoes & oranges -> lettuce [Coverage=0.200 (2); Support=0.200 (2); Strength=1.000; Lift=3.33; Leverage=0.1400 (1.4)]

Each rule presents a list of items to the left of the arrow that are associated with the single item to the right of the arrow.  Then a number of relevant statistics are presented that describe the nature of the association.  Thus, the first two of these rules indicate that whenever either apples or lettuce are purchased, tomatoes are also purchased.  The third and fourth rules indicate that both apples and lettuce are more likely to be purchased if tomatoes are purchased.  The final rule shows that whenever both tomatoes and oranges are purchased, lettuce is also purchased.

This is a very simplistic example.  In practice it would be foolish to draw strong conclusions from such limited data.  Indeed, Magnum Opus includes facilities for assessing the strength of evidence in support of a rule, and these mechanisms would reject all the above rules as having insufficient support.  This example is intended to illustrate the type of analysis that Magnum Opus performs, albeit, normally on much larger volumes of more complex data.

A worked example

We now provide a fully worked example of an extended variant of the above scenario.  The data is now extended to include all customers of the store for a given period of time, resulting in a total of 1000 transactions.  The data is contained in the example file distributed with Magnum Opus called tutorial.itl. 

Note, there are two versions of Magnum Opus.   The command line version runs on Linux systems.  The interactive version runs under Windows.  In the following and all subsequent examples we provide both a command line for executing the example on the command line system and a step-through of the process for running it on the interactive system.  We present the output from the interactive system which may vary in minor respects from that of the command line system.

In the first example we run Magnum Opus with its default settings, except that we limit the number of rules produced to five only.

 

Command line: mocl item-list-file=tutorial.itl maximum-results=5

 

Interactive system.  First run Magnum Opus.  From the File Menu select Import Data.  The system will display a dialog for selecting a file to open.  If necessary, navigate to the Example Files folder within the folder into which you installed the software.  Select the file tutorial.itl.  The system will now display the following dialog box.

The system recognizes from the itl file extension that the file is probably an item list file.  As this is correct and we wish to use the default settings, click the Import Now button.  After importing the data the screen should appear as follows.

As we want to limit the number of rules to five, edit the Maximum no. edit box accordingly.

Now click the GO button to commence a search with the selected settings.  A dialog will be displayed that allows you to select the file into which the results will be stored.  Specify a file name and navigate to the folder in which you want it stored.  Then click on the Save button.  The system will perform the search, saving the results in the specified file and then open the file for inspection.

 

Output:

Magnum Opus - The leader in association discovery technology.

Version 4.3

Copyright (c) 1999-2009 G. I. Webb & Associates Pty Ltd.

 

Data file: Tutorial.itl

 

1000 cases / 0 holdout cases / 16 items

 

Search for rules

Search by leverage

Filter out rules that are insignificant, critical value=0.05

 

Maximum number of attributes on LHS = 4

Maximum number of rules = 5

Minimum leverage = -1.0

Minimum leverage count = -2147483647

Minimum coverage = 0.0

Minimum coverage count = 1

Minimum support = 0.0

Minimum support count = 0

Minimum lift = 0.0

Minimum strength = 0.0

 

All values allowed on LHS

 

All values allowed on RHS

 

 

Found 5 rules

 

tomatoes -> lettuce

[Coverage=0.263 (263); Support=0.111 (111); Strength=0.422; Lift=1.94; Leverage=0.0539 (53.9); p=2.35E-019]

 

lettuce -> tomatoes

[Coverage=0.217 (217); Support=0.111 (111); Strength=0.512; Lift=1.94; Leverage=0.0539 (53.9); p=2.35E-019]

 

tomatoes -> carrots

[Coverage=0.263 (263); Support=0.085 (85); Strength=0.323; Lift=1.85; Leverage=0.0390 (39.0); p=1.83E-012]

 

carrots -> tomatoes

[Coverage=0.175 (175); Support=0.085 (85); Strength=0.486; Lift=1.85; Leverage=0.0390 (39.0); p=1.83E-012]

 

onions -> potatoes

[Coverage=0.189 (189); Support=0.082 (82); Strength=0.434; Lift=1.53; Leverage=0.0285 (28.5); p=5.30E-007]

The output file begins with a record of the settings used to produce the rules.  It then states the number of rules found, followed by each of those rules. Each rule is composed of two parts.  The left-hand-side (LHS) appears before the arrow and the right-hand-side (RHS) appears after the arrow.  Then a number of statistics are presented that describe the relationship between the LHS and RHS.

The first rule describes an association between tomatoes and lettuce.  The following measures are presented that describe the association.

 

Coverage

The coverage of the rule is the number of cases that contain the LHS.  In this data 263 cases contain tomatoes, which is 0.263 of the 1000 cases in the data.

Support

The support of the rule is the number of cases that contain both the LHS and the RHS.  In this data there are 111 cases that contain both tomatoes and lettuce which represents 0.111 of the total data. 

Strength

The strength is the support divided by the coverage.  This represents the proportion of the cases that contain the LHS that also contain the RHS.  It can be thought of as an estimate of the probability that the RHS will occur in a case if the LHS occurs. 

Lift

The lift is the strength divided by the strength that would be expected if there were no relationship between the LHS and the RHS.  A value of 1.0 suggests that there is no relationship between the two.  Higher values suggest stronger positive relationships.  Lower values suggest stronger negative relationships (the presence of the LHS reduces the likelihood of the RHS).

Leverage

The leverage is the support minus the support that would be expected if the LHS and RHS were unrelated to one another.  A positive value suggests a positive relationship and a negative value suggests a negative relationship.

p

The result of a statistical evaluation of the significance of the rule.  The lower this value the less likely that this rule is a spurious outcome resulting from adding an irrelevant value into the LHS.

 

Searching by strength

Magnum Opus has several valuable features not found in most association discovery systems.  One important difference is that it allows the user to specify both how many associations to find and what measure should be used to judge how interesting a association is.  Any of the measures coverage, support, strength, lift or leverage can be used for this purpose.

The first example run, above, found the five rules with the highest leverage.  High leverage rules have a strong positive association between the LHS and RHS and maximize the number of times more frequently the RHS occurs in the context of the LHS than would be expected if they were not associated with one another.

The two other measures that are most frequently used are strength and lift.  For our next example we will rerun the previous analysis using strength as the measure by which to search. 

 

Command line: mocl item-list-file=tutorial.itl \
maximum-results=5 search-mode=strength

 

Interactive system.  Continuing from the previous point, select Strength in the Search by combo box.  The screen should now appear as follows.

Now click the GO button to commence the search.  As previously, a dialog will be displayed that allows you to select the file into which the results will be stored.  Specify a file name and navigate to the folder in which you want it stored.  Then click on the Save button.  The system will perform the search, saving the results in the specified file and then open the file for inspection.

                       

Output:

Magnum Opus - The leader in association discovery technology.

Version 4.3

Copyright (c) 1999-2009 G. I. Webb & Associates Pty Ltd.

 

Data file: Tutorial.itl

 

1000 cases / 0 holdout cases / 16 items

 

Search for rules

Search by strength

Filter out rules that are insignificant, critical value=0.05

 

Maximum number of attributes on LHS = 4

Maximum number of rules = 5

Minimum leverage = -1.0

Minimum leverage count = -2147483647

Minimum coverage = 0.0

Minimum coverage count = 1

Minimum support = 0.0

Minimum support count = 0

Minimum lift = 0.0

Minimum strength = 0.0

 

All values allowed on LHS

 

All values allowed on RHS

 

 

Found 5 rules

 

bananas & lettuce & peaches -> apples

[Coverage=0.004 (4); Support=0.004 (4); Strength=1.000; Lift=4.52; Leverage=0.0031 (3.1); p=0.0260]

 

bananas & plums & lettuce -> potatoes

[Coverage=0.004 (4); Support=0.004 (4); Strength=1.000; Lift=3.53; Leverage=0.0029 (2.9); p=0.0385]

 

lettuce & confectionery & carrots & oranges -> beans

[Coverage=0.002 (2); Support=0.002 (2); Strength=1.000; Lift=14.49; Leverage=0.0019 (1.9); p=0.0476]

 

plums & onions & peas -> bananas

[Coverage=0.002 (2); Support=0.002 (2); Strength=1.000; Lift=7.87; Leverage=0.0017 (1.7); p=0.0452]

 

lettuce & oranges & onions -> potatoes

[Coverage=0.008 (8); Support=0.007 (7); Strength=0.875; Lift=3.09; Leverage=0.0047 (4.7); p=0.0369]

 

Comparing the two sets of rules, the first thing to note is that the rules in the first set all have substantially higher leverage while the second have much higher strength, as these are the measures that each seeks to optimize.  It is also notable that the coverage for the rules in the second set is much lower.  When coverage is small, there is a substantial risk that values of strength and lift will be overestimated.   To guard against this, Magnum Opus supports a Bayesian smoothing mechanism called the m-estimate that adjusts values of strength and lift to reduce this risk.  For our next example we will rerun the previous analysis using this mechanism.

 

Command line: mocl item-list-file=tutorial.itl \
maximum-results=5 search-mode=strength m=2

 

Interactive system.  Continuing from the previous point, select the m-estimate check box.  The screen should now appear as follows.

Now click the GO button to commence the search.  As previously, a dialog will be displayed that allows you to select the file into which the results will be stored.  Specify a file name and navigate to the folder in which you want it stored.  Then click on the Save button.  The system will perform the search, saving the results in the specified file and then open the file for inspection.

                       

Output:

Magnum Opus - The leader in association discovery technology.

Version 4.3

Copyright (c) 1999-2009 G. I. Webb & Associates Pty Ltd.

 

Data file: Tutorial.itl

 

1000 cases / 0 holdout cases / 16 items

 

Search for rules

Search by strength

Filter out rules that are insignificant, critical value=0.05

 

Maximum number of attributes on LHS = 4

Maximum number of rules = 5

Minimum leverage = -1.0

Minimum leverage count = -2147483647

Minimum coverage = 0.0

Minimum coverage count = 1

Minimum support = 0.0

Minimum support count = 0

Minimum lift = 0.0

Minimum strength = 0.0

 

Use m-estimate, m = 2

 

All values allowed on LHS

 

All values allowed on RHS

 

 

Found 5 rules

 

lettuce & carrots -> tomatoes

[Coverage=0.045 (45); Support=0.039 (39); Strength estimate=0.841; Lift estimate=3.20; Leverage=0.0272 (27.2); p=3.16E-008]

 

bananas & plums & lettuce -> potatoes

[Coverage=0.004 (4); Support=0.004 (4); Strength estimate=0.761; Lift estimate=2.69; Leverage=0.0029 (2.9); p=0.0385]

 

lettuce & oranges & onions -> potatoes

[Coverage=0.008 (8); Support=0.007 (7); Strength estimate=0.757; Lift estimate=2.67; Leverage=0.0047 (4.7); p=0.0369]

 

bananas & lettuce & peaches -> apples

[Coverage=0.004 (4); Support=0.004 (4); Strength estimate=0.740; Lift estimate=3.35; Leverage=0.0031 (3.1); p=0.0260]

 

carrots & corn -> lettuce

[Coverage=0.006 (6); Support=0.005 (5); Strength estimate=0.679; Lift estimate=3.13; Leverage=0.0037 (3.7); p=0.00473]

 

Note first of all that the values for strength and lift are called Strength Estimate and Lift Estimate when the m-estimate is used.  Also note that while a number of the same rules are discovered as previously, the estimates of their strength and lift are substantially reduced.  Finally, note that one of the rules discovered using the m-estimate has substantially higher coverage than those previously discovered, and that the strength estimate for this rule is quite close to the observed strength (39 / 45 = 0.867).  The use of m-estimates is strongly advised when searching by strength or lift.

Searching by lift

A search by strength with an m-estimate will tend to find strongly predictive rules.  These are rules for which the RHS is very likely whenever the LHS occurs.  However, some times rather than highly predictive rules, it is desirable to find rules that ‘beat the odds.’  For example, suppose there is a product that most people buy most of the time, such as might be the case if customers are required to purchase the bags if they wish to have their purchases packed.  Let us assume that 90% of customers buy bags.  In this case the rule

confectionery -> bags [Coverage=0.336 (336); Support=0.302 (302); Strength=0.900; Lift=1.000; Leverage=0.0000 (-0.4)]

will enable us to predict with reasonable accuracy that the probability of a customer purchasing a bag if they purchase confectionery is 90%.  However, such a rule may not be very useful, as it does not change our default expectation of the probability the customer will purchase a bag.  Lift measures how much the rule increases the probability of the RHS relative to the default.  To illustrate this, we next perform a search by lift.  Note that we will use an m-estimate, as in the previous example.

 

Command line: mocl item-list-file=tutorial.itl \
maximum-results=5 search-mode=lift m=2

 

Interactive system.  Continuing from the previous point, select Lift in the Search by ComboBox.  The screen should now appear as follows.

Now click the GO button to commence the search.  As previously, a dialog will be displayed that allows you to select the file into which the results will be stored.  Specify a file name and navigate to the folder in which you want it stored.  Then click on the Save button.  The system will perform the search, saving the results in the specified file and then open the file for inspection.

 

Output:

Magnum Opus - The leader in association discovery technology.

Version 4.3

Copyright (c) 1999-2009 G. I. Webb & Associates Pty Ltd.

 

Data file: Tutorial.itl

 

1000 cases / 0 holdout cases / 16 items

 

Search for rules

Search by lift

Filter out rules that are insignificant, critical value=0.05

 

Maximum number of attributes on LHS = 4

Maximum number of rules = 5

Minimum leverage = -1.0

Minimum leverage count = -2147483647

Minimum coverage = 0.0

Minimum coverage count = 1

Minimum support = 0.0

Minimum support count = 0

Minimum lift = 0.0

Minimum strength = 0.0

 

Use m-estimate, m = 2

 

All values allowed on LHS

 

All values allowed on RHS

 

 

Found 5 rules

 

lettuce & confectionery & carrots & oranges -> beans

[Coverage=0.002 (2); Support=0.002 (2); Strength estimate=0.534; Lift estimate=7.75; Leverage=0.0019 (1.9); p=0.0476]

 

plums & potatoes & grapes -> beans

[Coverage=0.003 (3); Support=0.002 (2); Strength estimate=0.428; Lift estimate=6.20; Leverage=0.0018 (1.8); p=0.0474]

 

apples & peaches & onions -> peas

[Coverage=0.007 (7); Support=0.004 (4); Strength estimate=0.463; Lift estimate=5.45; Leverage=0.0034 (3.4); p=0.0307]

 

bananas & beans -> corn

[Coverage=0.010 (10); Support=0.003 (3); Strength estimate=0.259; Lift estimate=4.80; Leverage=0.0025 (2.5); p=0.0357]

 

plums & onions & peas -> bananas

[Coverage=0.002 (2); Support=0.002 (2); Strength estimate=0.564; Lift estimate=4.44; Leverage=0.0017 (1.7); p=0.0452]

 

Whereas the search by strength found rules with higher strength, this search finds rules with reasonable strength for items that are not frequently purchased.  For example, beans are only purchased by 6.9% of customers, but when lettuce, confectionery, carrots and oranges are all purchased, beans are always purchased.  While the system discounts this evidence due to the small number of examples, it is still taken as evidence of a large increase in the frequency with which beans are purchased by such customers.

Selecting RHS elements

Sometimes it will be desirable to find rules for predicting one particular outcome.  For example, you might only be interested in predicting the likelihood that customers will purchase beans.  The system allows you to restrict the items that are allowed to appear on either the LHS or RHS of a rule.  For the next example we will rerun the last analysis but with the RHS restricted to beans.

 

Command line: mocl item-list-file=tutorial.itl \
maximum-results=5 search-mode=lift m=2 \
rhs-available=beans

 

Interactive system.  Continuing from the previous point, select beans in the Values allowed on RHS selection box.  The screen should now appear as follows.

Now click the GO button to commence the search.  As previously, a dialog will be displayed that allows you to select the file into which the results will be stored.  Specify a file name and navigate to the folder in which you want it stored.  Then click on the Save button.  The system will perform the search, saving the results in the specified file and then open the file for inspection.

 

Output:

Magnum Opus - The leader in association discovery technology.

Version 4.3

Copyright (c) 1999-2009 G. I. Webb & Associates Pty Ltd.

 

Data file: Tutorial.itl

 

1000 cases / 0 holdout cases / 16 items

 

Search for rules

Search by lift

Filter out rules that are insignificant, critical value=0.05

 

Maximum number of attributes on LHS = 4

Maximum number of rules = 5

Minimum leverage = -1.0

Minimum leverage count = -2147483647

Minimum coverage = 0.0

Minimum coverage count = 1

Minimum support = 0.0

Minimum support count = 0

Minimum lift = 0.0

Minimum strength = 0.0

 

Use m-estimate, m = 2

 

All values allowed on LHS

 

Values allowed on RHS:

  beans

 

Only 2 rules satisfy the specified constraints.

 

lettuce & confectionery & carrots & oranges -> beans

[Coverage=0.002 (2); Support=0.002 (2); Strength estimate=0.534; Lift estimate=7.75; Leverage=0.0019 (1.9); p=0.0476]

 

plums & potatoes & grapes -> beans

[Coverage=0.003 (3); Support=0.002 (2); Strength estimate=0.428; Lift estimate=6.20; Leverage=0.0018 (1.8); p=0.0474]

 

Only rules with beans on the RHS are returned. In this case only two such rules can be found.

Sometimes some data elements represent inputs to a process and other outputs.  In such circumstances it will often be useful to limit the LHS values to the inputs and the RHS values to the outputs.  The rules that are discovered will then represent ways of manipulating the inputs in order to produce specific outcomes.

Itemsets

Rules are a useful way to describe interactions between elements of the data when the objective is to predict the probability of specific items in specific contexts.  Sometimes, however, the primary issue is simply to identify which items occur together.  In this case, presenting the interactions as rules can be distracting.  For example, a single interaction between elements can result in many rules.

Itemsets are simply collections of items that appear together.  The system supports two measures of the importance of an itemset, coverage and leverage.  The coverage is the number of transactions or cases that contain the itemset.  The leverage is the difference between this and the maximum coverage that would be expected assuming that any two subsets of the items were unrelated to one another.

The next example finds itemsets for the tutorial data.

 

Command line: mocl item-list-file=tutorial.itl \
maximum-results=5 find-itemsets

 

Interactive system.  Continuing from the previous point, select itemsets in the Search for comboBox.  The screen should appear as follows.

Now click the GO button to commence the search.  As previously, a dialog will be displayed that allows you to select the file into which the results will be stored.  Specify a file name and navigate to the folder in which you want it stored.  Then click on the Save button.  The system will perform the search, saving the results in the specified file and then open the file for inspection.

 

Output:

Magnum Opus - The leader in association discovery technology.

Version 4.3

Copyright (c) 1999-2009 G. I. Webb & Associates Pty Ltd.

 

Data file: Tutorial.itl

 

1000 cases / 0 holdout cases / 16 items

 

Search for itemsets

Search by leverage

Filter out itemsets that are insignificant, critical value=0.05

 

Maximum number of values in an itemset = 4

Maximum number of itemsets = 5

Minimum leverage = -1.0

Minimum leverage count = -2147483647

Minimum coverage = 0.0

Minimum coverage count = 1

 

All values allowed

 

Found 5 itemsets

 

lettuce & tomatoes

[Coverage=0.111 (111); Leverage=0.0539 (53.9); p=2.35E-019]

 

tomatoes & carrots

[Coverage=0.085 (85); Leverage=0.0390 (39.0); p=1.83E-012]

 

potatoes & onions

[Coverage=0.082 (82); Leverage=0.0285 (28.5); p=5.30E-007]

 

bananas & peaches

[Coverage=0.040 (40); Leverage=0.0235 (23.5); p=2.74E-009]

 

lettuce & tomatoes & carrots

[Coverage=0.039 (39); Leverage=0.0196 (19.6); p=1.43E-006]

 

Each itemset is presented as a list of the items in the set.  The coverage and leverage statistics that are provided were described above.  To illustrate how itemset leverage is calculated, consider the lettuce & tomatoes & carrots itemset.  There are 111 cases that contain lettuce & tomatoes and 175 that contain carrots.  Thus, if lettuce & tomatoes were not related to carrots one would expect there to be approximately 19.4 cases ([175/100] ´ [111/1000] ´ 1000) containing all three elements.  There are 85 cases that contain tomatoes & carrots and 217 that contain lettuce.  If these two groups were unrelated one would expect approximately 18.4 cases to contain all three items.  There are 45 cases that contain lettuce & carrots and 263 that contain tomatoes.  If these two groups were unrelated one would expect approximately 11.3 cases to contain all three items.  Thus, the maximum coverage that can be expected given any assumption that some subsets of these items are unrelated to each other is 19.4.  The leverage is the observed coverage less this amount.  The p value is the probability that this coverage would be observed if the two subgroups that result in the highest expected coverage were actually unrelated to one another.

Attribute-value data

So far we have considered only data in the form of lists of items.  Many data are recorded in tabular format, with columns representing attributes or fields and each row representing a distinct entity.  The cells contain the values of the respective attributes or fields for the given entity.  Magnum Opus supports such data, which must be listed in a data file.  The columns are separated by a delimiter character such as a TAB or COMMA. 

It is also necessary to specify the names and types of the attributes.  This information provided in a separate file called the names file.  Each line of a names file starts with the name of an attribute, the first line referring to the leftmost column, the second line to the second leftmost column, and so on. 

For categorical attributes, the attribute name is followed by a colon (:) and then either the keyword categorical or a comma separated list of the values that are allowed for the attribute.

Example:

Department: bakery, dairy, beverages

This specifies that the attribute Department can assume any one of three values bakery, dairy, or beverages.  Any case containing any other value will be discarded and an error message generated.

Example:

Department: categorical

This specifies that the attribute Department can assume any value that appears in the data file.

For compatibility with See-5, Magnum Opus also accepts the keyword discrete which is treated as equivalent to categorical.

Numeric attributes must be divided into sub-ranges.  These can be specified in the names file.  Alternatively, the names file can simply identify the number of sub-ranges and Magnum Opus will select the sub-ranges for you.

For a numeric attribute with specified sub-ranges, the attribute name is followed by a list of sub-range cut points. These indicate how the numeric values for the attribute are to be subdivided into sub-ranges. Each cut point is introduced by one of the relations < or <= which is followed by the value that terminates the sub-range. If the relation is <, the sub-range includes all values less than the specified value. If the relation is <=, the sub-range includes all values less than or equal to the specified value.

Example:

Spend < 10 <= 100

This specifies that the attribute Spend has three sub-ranges, below the first cut point, between the two cut points, and above the last cut point:

Spend < 10

10 <= Spend <= 100

Spend > 100

To allow Magnum Opus to select sub-ranges, use the keyword numeric, followed by the number of sub-ranges required.

Example:

Spend: numeric 5

For compatibility with See-5, Magnum Opus also accepts the keyword continuous which is treated as numeric 3.

The keyword ignore instructs Magnum Opus to discard any data for the given attribute. This is useful for handling attributes that may appear in the data but which should not be used, such as record identifiers.

A worked example of attribute-value data

We now provide a worked example using the example files distributed with Magnum Opus, tutorial.nam and tutorial.data.  Tutorial.nam contains the following:

Profitability99: numeric 3

Profitability98: numeric 3

Spend99: numeric 3

Spend98: numeric 3

NoVisits99: numeric 3

NoVisits98: numeric 3

Dairy: numeric 3

Deli: numeric 3

Bakery: numeric 3

Grocery: numeric 3

SocioEconomicGroup: categorical

Promotion1: t, f

Promotion2: t, f

Most of these attributes are numeric. These numeric attributes have been designated numeric 3, indicating that they should be divided into three sub-ranges, each of which contains approximately the same number of cases. The profitability attributes represent respectively the profit made from a customer in 1999 and 1998. The spend attributes represent the total amount spent by a customer in each year. The NoVisits attributes represent the numbers of store visits in each year. The Dairy, Deli, Bakery, and Grocery attributes record the customer's total spend in each of four significant departments. The remaining three attributes are categorical. The SocioEconomicGroup attribute records an assessment of the customer's socio-economic group. The keyword categorical tells Magnum Opus to use whatever values it finds in the corresponding column in the data file. The final two attributes record whether the customer participated in each of two store promotions. The values that are allowed are listed. This allows error checking. If any other value appears in the column for the attribute an error message will be displayed.

The first line of the data file describes the first entity:

829, 709, 5250, 6560, 70, 82, 1074, 390, 878, 1995, C, f, f

This indicates that for the first entity the value of Profitability99 is 829 and so on through to the value of Promotion2 being ‘f’.

In the next example we run Magnum Opus on this names file and data file with its default settings, except that we limit the number of rules produced to five only.

 

Command line: mocl names-file=tutorial.nam \
data-file=tutorial.data maximum-results=5

 

Interactive system.  First run Magnum Opus.  From the File Menu select Import Data.  The system will display a dialog for selecting a file to open.  If necessary, navigate to the Example Files folder within the folder into which you installed the software.  Select the file tutorial.nam.  The system will now display the following dialog box.

The system recognizes from the nam file extension that the file is a names file.  As this is correct, we click the Next > button.  The system then displays the following dialog box for selecting the data fule.

As the system has defaulted to the correct file name and we wish to use the default settings, click the Import Now button.  After importing the data the screen should appear as follows.

 

As we want to limit the number of rules to five, edit the Maximum no. edit box accordingly.

Now click the GO button to commence a search with the selected settings.  A dialog will be displayed that allows you to select the file into which the results will be stored.  Specify a file name and navigate to the folder in which you want it stored.  Then click on the Save button.  The system will perform the search, saving the results in the specified file and then open the file for inspection.

 

Output:

Magnum Opus - The leader in association discovery technology.

Version 4.3

Copyright (c) 1999-2009 G. I. Webb & Associates Pty Ltd.

 

Names file: Tutorial.nam

Data file: Tutorial.data

 

1000 cases / 0 holdout cases / 39 values

 

Search for rules

Search by leverage

Filter out rules that are insignificant, critical value=0.05

 

Maximum number of attributes on LHS = 4

Maximum number of rules = 5

Minimum leverage = -1.0

Minimum leverage count = -2147483647

Minimum coverage = 0.0

Minimum coverage count = 1

Minimum support = 0.0

Minimum support count = 0

Minimum lift = 0.0

Minimum strength = 0.0

 

All values allowed on LHS

 

All values allowed on RHS

 

 

Found 5 rules

 

Spend99<2030 -> Profitability99<419

[Coverage=0.333 (333); Support=0.302 (302); Strength=0.907; Lift=2.72; Leverage=0.1911 (191.1); p=1.66E-178]

 

Profitability99<419 -> Spend99<2030

[Coverage=0.333 (333); Support=0.302 (302); Strength=0.907; Lift=2.72; Leverage=0.1911 (191.1); p=1.66E-178]

 

Spend98<1782 -> Profitability98<327

[Coverage=0.331 (331); Support=0.295 (295); Strength=0.891; Lift=2.68; Leverage=0.1848 (184.8); p=5.12E-165]

 

Profitability98<327 -> Spend98<1782

[Coverage=0.333 (333); Support=0.295 (295); Strength=0.886; Lift=2.68; Leverage=0.1848 (184.8); p=5.12E-165]

 

NoVisits98<31 -> NoVisits99<35

[Coverage=0.325 (325); Support=0.288 (288); Strength=0.886; Lift=2.69; Leverage=0.1811 (181.1); p=1.89E-159]

As can be seen, the output is very similar to that for transaction data, except that each item consists of an attribute-value pair.

Contrast discovery

A common analytic task seeks to identify factors that distinguish different groups. This type of analysis is called contrast discovery.   To perform contrast discovery it is necessary to provide each example in the data with a label identifying to which group it belongs.  For attribute-value data this means providing an attribute whose values indicate group membership.  For example, in the tutorial.data file, the Profitability99 attribute might be used to indicate that each example belongs to one of three groups, low profit (Profitability99<419), medium profit (419<=Profitability99<=897) or high profit (Profitability99>897).  For transaction data it is necessary to add another item to each transaction.  It is important to use a name for these labels that will not be used or mistaken for a standard item. For example, one might add items such as *profitable* and *unprofitable* to the transactions in the tutorial.itl data.

Once group labels have been added to the data, simply run Magnum Opus restricting the RHS values to the group labels.  The next example illustrates this process using the data in the file tutorial.data, treating the Profitability99 attribute as the group variable.

 

Command line: mocl names-file=tutorial.nam data-file=tutorial.data \
maximum-results=5 rhs-available=Profitability99

 

Interactive system.  Continuing from the point at which the last example left off, select the three values for profitability in the Values allowed on RHS edit box by first left-clicking Profitability99<419

and then, holding down the SHIFT key and left-clicking Profitability99>897.

Now click the GO button to commence a search with the selected settings.  A dialog will be displayed that allows you to select the file into which the results will be stored.  Specify a file name and navigate to the folder in which you want it stored.  Then click on the Save button.  The system will perform the search, saving the results in the specified file and then open the file for inspection.

 

Output:

Magnum Opus - The leader in association discovery technology.

Version 4.3

Copyright (c) 1999-2009 G. I. Webb & Associates Pty Ltd.

 

Names file: Tutorial.nam

Data file: Tutorial.data

 

1000 cases / 0 holdout cases / 39 values

 

Search for rules

Search by leverage

Filter out rules that are insignificant, critical value=0.05

 

Maximum number of attributes on LHS = 4

Maximum number of rules = 5

Minimum leverage = -1.0

Minimum leverage count = -2147483647

Minimum coverage = 0.0

Minimum coverage count = 1

Minimum support = 0.0

Minimum support count = 0

Minimum lift = 0.0

Minimum strength = 0.0

 

All values allowed on LHS

 

Values allowed on RHS:

  Profitability99<419  419<=Profitability99<=897  Profitability99>897

 

Found 5 rules

 

Spend99<2030 -> Profitability99<419

[Coverage=0.333 (333); Support=0.302 (302); Strength=0.907; Lift=2.72; Leverage=0.1911 (191.1); p=1.66E-178]

 

Spend99>4278 -> Profitability99>897

[Coverage=0.333 (333); Support=0.287 (287); Strength=0.862; Lift=2.60; Leverage=0.1768 (176.8); p=8.57E-149]

 

Spend99<2030 & Grocery<873 -> Profitability99<419

[Coverage=0.278 (278); Support=0.265 (265); Strength=0.953; Lift=2.86; Leverage=0.1724 (172.4); p=2.52E-008]

 

Grocery<873 -> Profitability99<419

[Coverage=0.333 (333); Support=0.277 (277); Strength=0.832; Lift=2.50; Leverage=0.1661 (166.1); p=6.14E-129]

 

Spend99<2030 & NoVisits99<35 -> Profitability99<419

[Coverage=0.272 (272); Support=0.255 (255); Strength=0.938; Lift=2.82; Leverage=0.1644 (164.4); p=0.000257]

The LHS of each rule that is discovered indicates a set of factors that are more frequently associated with the RHS than with any of the other groups.  For example, the first rule indicates that customers with Profitability99<419 are more likely to have a low value for Spend99 than are customers with other levels of Spend99.

Statistically sound association discovery

Due to the large number of potential associations that are considered during association discovery, it is inevitable that some associations will be ‘discovered’ that only appear strong by chance.  Magnum Opus incorporates unique facilities for controlling the risk of finding such associations by applying statistical tests.  These tests are adjusted for the size of the search space and the number of associations found, as appropriate.  Assuming the sample data are a random sample of the broader population about which you wish to reach conclusions, these tests ensure that the risk of ‘discovering’ a spurious association is no greater than the user-specified significance level.  By default, significance levels are set to 0.05.

Magnum Opus supports two mechanisms for statistically sound association discovery.  Within-search testing adjusts the significance level applied to statistical tests used while the search is being conducted.  Use the Unsound filter to perform within-search testing.  For rule discovery the unsound filter discards any rule whose strength is not significantly higher than that of any of its generalizations (rules formed by deleting elements from the LHS).  For itemset discovery, the unsound filter discards itemsets that are not significantly more frequent than could be expected by assuming that any two subsets of the itemset are independent of one another.

Note that the default filter, the Insignificant filter, also applies a statistical test, but that this test is not adjusted for the size of the search space and hence is not statistically sound.  The Insignificant filter is useful for discarding rules and itemsets that are very likely to be spurious, but is likely to still accept some spurious associations.

Command line: Add the option filter=unsound to the command line.

% mocl names-file=tutorial.nam data-file=tutorial.data \
filter=unsound

 

Interactive system.  Select UNSOUND as the value for the Filter out option.

The second mechanism is holdout evaluation.  This requires that the data are divided into an exploratory and a holdout set.  The associations are discovered from the exploratory data and tested on the holdout data.  One way to do this is to have Magnum Opus randomly divide the data into these two sets when it is imported.  You must then specify that holdout evaluation is to be performed and which statistical tests to employ.

The following holdout evaluation tests are supported for rules. 

Test

Null Hypothesis

Statistical technique

Minimum Coverage

Coverage Min Coverage

Binomial sign test

Minimum Support

Support Min Support

Binomial sign test

Minimum Strength

Strength Min Strength

Binomial sign test

Minimum Lift

Lift Min Lift

Binomial sign test

Minimum Leverage

Leverage Min Leverage

Binomial sign test

Positive correlation

Support Coverage × RHS_Coverage

Fisher exact test

Improvement over generalizations

Strength the maximum Strength of any generalization of the current rule

Fisher exact test

Partial with respect to specializations

There exists another rule GLHS -> RHS in the set of best rules, that has not been rejected by holdout evaluation, that is a specialization of the current rule, and such that the LHS and RHS of the current rule are conditionally independent given the negation of GLHS.

Fisher exact test

The following holdout evaluation tests are supported for itemsets

Test

Null Hypothesis

Statistical technique

Minimum Coverage

Coverage Min Coverage

Binomial sign test

Minimum Leverage

Leverage Min Leverage

Binomial sign test

Improvement over generalizations

Coverage the maximum of coverage(A) × coverage(B) for any partition of the current itemset into two subsets A and B.

Fisher exact test

Self-sufficient

Coverage ≤ the maximum of coverage(A) × coverage(B) for any partition of the current itemset into two subsets A and B within the set of cases not covered by the difference between the current itemset and any of its productive supersets.

 

The positive correlation test is the default test for rules.   It tests whether the leverage of the rule is greater than zero.  The improvement over generalizations test is the default test for itemsets.  The improvement over generalization tests are equivalent to the tests applied by the unsound filter.  The Partial with respect to specializations and Self-sufficient tests check whether a specialization of a rule (a rule created by adding elements to the LHS) or the supersets of an itemset, can explain the frequency with which the itemset occurs.

For more information on statistically sound association discovery see the following worked examples and the research paper:

Webb, G.I. (2007). Discovering Significant Patterns. Machine Learning 68(1). Netherlands: Springer, pages 1-33.

Webb, G.I. (2008). Layered Critical Values: A Powerful Direct-Adjustment Approach to Discovering Significant Patterns. Machine Learning 71(2-3). Netherlands: Springer, pages 307-323 [Technical Note].

 

A worked example of holdout evaluation for rules

To illustrate holdout evaluation for rules we run Magnum Opus on the tutorial.itl data, selecting 50% of the data for the exploratory set and the remaining 50% for the holdout set, using holdout evaluation, using the partialness and improvement tests, searching by support, using no filtering and selecting only tomatoes and potatoes for the LHS and lettuce and carrots for the RHS.

Command line: mocl item-list-file=tutorial.itl proportion=0.5 \ out-of-sample-holdout-evaluation \
test-partialness=yes test-improvement=yes \ search-mode=support filter=none \
lhs-available=tomatoes,potatoes \
rhs-available=lettuce,carrots

 

Interactive system.  First run Magnum Opus.  From the File Menu select Import Data.  The system will display a dialog for selecting a file to open.  If necessary, navigate to the Example Files folder within the folder into which you installed the software.  Select the file tutorial.itl.  The system will now display the following dialog box.

The system recognizes from the itl file extension that the file is an item list file.  As this is correct, click the Next > button to go to the next screen.

This screen allows you to select the delimiter character.  As the default is correct for this file, click Next > to go to the next screen.

This screen allows you to select how much data of the should be loaded into the exploratory set.  For this example we wish to load 50%, so change the Percentage box to 50.

Now click Next >.

The next screen allows you to select whether holdout evaluation is to be performed.  If it is, you have the choice of either using the data not included in the exploratory set (the default), or of loading the data from another file.  As we wish to use the default, click Import Data.  This takes us to the main screen.

We want to select the holdout tests, so select Rule Evaluation Holdout Settings from the Preferences menu.  This leads to a dialog that allows you to select the tests and significance level to be applied during holdout evaluation.  Select Improvement over generalizations and Partial with respect to specializations.

Then click OK to return to the main screen.

On the main screen select Search by Support and Filter out None.  Then select potatoes and tomatoes for the Values allowed on the LHS and carrots and lettuce for the Values allowed on the RHS.

Now click GO to commence the search.

 

Output:

Magnum Opus - The leader in association discovery technology.

Version 4.3

Copyright (c) 1999-2009 G. I. Webb & Associates Pty Ltd.

 

Data file: Tutorial.itl [50% sample]

 

500 cases / 500 holdout cases / 16 items

 

Search for rules

Search by support

 

Maximum number of attributes on LHS = 4

Maximum number of rules = 1000

Minimum leverage = -1.0

Minimum leverage count = -2147483647

Minimum coverage = 0.0

Minimum coverage count = 1

Minimum support = 0.0

Minimum support count = 0

Minimum lift = 0.0

Minimum strength = 0.0

 

Values allowed on LHS:

  potatoes

  tomatoes

 

 

Values allowed on RHS:

  carrots

  lettuce

 

 

Only 6 rules satisfy the specified constraints.

 

The following 2 rules passed holdout evaluation

 

tomatoes -> lettuce

[Coverage=0.244 (122); Support=0.106 (53); Strength=0.434; Lift=1.96; Leverage=0.0518 (25.9)]

 

tomatoes -> carrots

[Coverage=0.244 (122); Support=0.080 (40); Strength=0.328; Lift=1.95; Leverage=0.0390 (19.5)]

 

 

The following 4 rules failed holdout evaluation, adjusted critical value = 0.012500

 

potatoes -> lettuce

[Coverage=0.272 (136); Support=0.076 (38); Strength=0.279; Lift=1.26; Leverage=0.0156 (7.8)]

Holdout coverage = 147, holdout support = 37, holdout strength = 0.252

Fails positive correlation, p = 0.101

Fails significant improvement with respect to DEFAULT, p = 0.101

Fails partial test with respect to tomatoes & potatoes, p = 0.952

 

potatoes -> carrots

[Coverage=0.272 (136); Support=0.056 (28); Strength=0.206; Lift=1.23; Leverage=0.0103 (5.2)]

Holdout coverage = 147, holdout support = 40, holdout strength = 0.272

Fails partial test with respect to tomatoes & potatoes, p = 0.120

 

tomatoes & potatoes -> lettuce

[Coverage=0.072 (36); Support=0.040 (20); Strength=0.556; Lift=2.50; Leverage=0.0240 (12.0)]

Holdout coverage = 41, holdout support = 23, holdout strength = 0.561

Fails significant improvement with respect to tomatoes, p = 0.0172

 

tomatoes & potatoes -> carrots

[Coverage=0.072 (36); Support=0.028 (14); Strength=0.389; Lift=2.31; Leverage=0.0159 (8.0)]

Holdout coverage = 41, holdout support = 19, holdout strength = 0.463

Fails significant improvement with respect to tomatoes, p = 0.0166

The rules that fail holdout evaluation are listed after those that pass.  The rule tomatoes & potatoes -> lettuce illustrates the significant improvement test.  The rule tomatoes -> lettuce has strength 0.434.  The 20 examples that provide the support for the longer rule do not provide sufficient evidence that the strength of association is truly higher than that of the shorter rule.

The rule potatoes -> lettuce illustrates the partialness test.  The rule tomatoes & potatoes -> lettuce covers 36 of the 136 examples covered by the shorter rule.  It also covers 20 out of the 38 examples that have both potatoes and lettuce.  Once the 36 examples covered by the longer rule are removed, the remaining support is just 18 out of 100 examples.  The resulting Strength (0.180) is lower than the default strength for tomatoes of (0.244).  In consequence, it appears that the increased frequency of lettuce in the context of potatoes is solely due to its increased frequency when both potatoes and tomatoes are present.

A worked example of holdout evaluation for itemsets

To illustrate holdout evaluation for itemsets we continue the previous example.  As before, we use Magnum Opus on the tutorial.itl data, selecting 50% of the data for the exploratory set and the remaining 50% for the holdout set and using holdout evaluation.  This time, however, we search for itemsets using the self-sufficient and improvement tests, searching by coverage, using no filtering and selecting only tomatoes and potatoes for the LHS and lettuce and carrots for the RHS.

Command line: mocl item-list-file=tutorial.itl proportion=0.5 \ out-of-sample-holdout-evaluation \
find-itemsets test-self-sufficient=yes \
test-improvement=yes search-mode=coverage \ filter=none \
items-available=tomatoes,potatoes,lettuce,carrots

 

Interactive system.  Continuing from the previous example, select ITEMSETS in the Search for box, select COVERAGE in the Search by box, and select the items carrots, lettuce, tomatoes and potatoes for the Values allowed in itemset.

Then select the Itemset Holdout Evaluation Settings… option from the Preferences menu.  The default option, Improvement over generalizations should already be selected.  Click Self-sufficent to also select it.

Click OK to return to the main window.  Now press GO to commence the search.

 

Output:

Magnum Opus - The leader in association discovery technology.

Version 4.2

Copyright (c) 1999-2007 G. I. Webb & Associates Pty Ltd.

 

Data file: Tutorial.itl [50% sample]

 

500 cases / 500 holdout cases / 16 items

 

Mon Mar 15 17:59:13 2007

 

Search for itemsets

Search by leverage

 

Maximum number of values in an itemset = 4

Maximum number of itemsets = 100

Minimum leverage = -1.0

Minimum leverage count = -2147483647

Minimum coverage = 0.0

Minimum coverage count = 1

 

Values allowed:

  carrots

  lettuce

  potatoes

  tomatoes

 

Only 16 itemsets satisfy the specified constraints.

 

The following 9 itemsets passed holdout evaluation

 

lettuce & tomatoes

[Coverage=0.106 (53); Leverage=0.0518 (25.9)]

 

tomatoes & carrots

[Coverage=0.080 (40); Leverage=0.0390 (19.5)]

 

lettuce & tomatoes & carrots

[Coverage=0.036 (18); Leverage=0.0182 (9.1)]

 

carrots & potatoes

[Coverage=0.056 (28); Leverage=0.0103 (5.2)]

 

{}

[Coverage=1.000 (500); Leverage=0.0000 (0.0)]

 

potatoes

[Coverage=0.272 (136); Leverage=0.0000 (0.0)]

 

tomatoes

[Coverage=0.244 (122); Leverage=0.0000 (0.0)]

 

lettuce

[Coverage=0.222 (111); Leverage=0.0000 (0.0)]

 

carrots

[Coverage=0.168 (84); Leverage=0.0000 (0.0)]

 

 

The following 7 itemsets failed holdout evaluation, adjusted critical value = 0.00313

 

lettuce & potatoes

[Coverage=0.076 (38); Leverage=0.0156 (7.8)]

Holdout coverage = 37

Fails significant improvement with respect to lettuce and potatoes, p = 0.101

 

lettuce & tomatoes & potatoes

[Coverage=0.040 (20); Leverage=0.0112 (5.6)]

Holdout coverage = 23

Fails significant improvement with respect to lettuce & tomatoes and potatoes, p = 0.0498

 

tomatoes & carrots & potatoes

[Coverage=0.028 (14); Leverage=0.0062 (3.1)]

Holdout coverage = 19

Fails significant improvement with respect to tomatoes & carrots and potatoes, p = 0.0381

 

lettuce & tomatoes & carrots & potatoes

[Coverage=0.016 (8); Leverage=0.0062 (3.1)]

Holdout coverage = 11

Fails significant improvement with respect to lettuce & tomatoes & carrots and potatoes, p = 0.0204

 

tomatoes & potatoes

[Coverage=0.072 (36); Leverage=0.0056 (2.8)]

Holdout coverage = 41

Fails significant improvement with respect to tomatoes and potatoes, p = 0.580

 

lettuce & carrots

[Coverage=0.042 (21); Leverage=0.0047 (2.4)]

Holdout coverage = 24

Fails significant improvement with respect to lettuce and carrots, p = 0.118

Fails test for self-sufficiency, p = 0.965

 

lettuce & carrots & potatoes

[Coverage=0.016 (8); Leverage=0.0032 (1.6)]

Holdout coverage = 13

Fails significant improvement with respect to lettuce and carrots & potatoes, p = 0.0570

The itemset, tomatoes & potatoes provides a good example of the improvement test.  Tomatoes occurs in 0.282 of all holdout records and potatoes occurs in 0.294 of all holdout records.  If these items were independent of each other then tomatoes & potatoes would be expected to occur in 0.083 (41.45) of all holdout records.  In fact they occur in 41 holdout records, and hence do not indicate any improvement.

The itemset lettuce & carrots illustrates the self-sufficiency test.  This itemset appears in 24 holdout records.  Its superset, lettuce & tomatoes & carrots, appears in 21 of these holdout records, accounting for all of the improvement in the shorter itemset.  

Computation time, snapshots and anytime results

Magnum Opus provides tremendous flexibility to the user.  Many forms of analysis can be requested, and Magnum Opus always provides exact results.  However, some analyses are intrinsically difficult, and hence require large amounts of computation to complete.  Unfortunately, it is not possible to accurately predict in advance which analyses will take extreme lengths of time to complete and which will complete quickly.

When a computation is taking a long time it is often helpful to view the best results discovered so far.  This allows you to both assess whether you are actually performing the correct analysis and whether the results already obtained satisfy the analytic requirement.  A set of intermediate results created while computation is in progress is called a snapshot.  The following process is used to create a snapshot.

Command line:  While the system is running, send the SIGUSR1 signal to the process.  The exact command required may vary depending upon the precise operating system and command shell used.  The following provides an example under bash on Linux.

% mocl names-file=tutorial.nam data-file=tutorial.data > tutorial.out &

[1] 3342

% kill -SIGUSR1 3342

In this example the process has been run in the background and has been assigned the process ID 3342.

 

Interactive system.  When the system is in the process of a search the screen will appear as follows:

Simply click on the blue camera icon.  A dialog will appear that allows you to specify a file into which the snapshot will be saved.

 

In general the following actions will decrease compute time.

-   Increase the minimum leverageleverage.

-   Increase the minimum coveragecoverage.

-   Increase the minimum supportsupport.

-   Decrease the maximum LHS length.

-   Decrease the maximum number of rules to be found.

-   Decrease the number of values allowed on the LHS and the RHS of rules.

Note, increasing the minimum lift or strength will only decrease compute time if use m-estimate is checked or the minimum coverage or support is set to a high value.  Increasing minimum lift or strength when minimum coverage and support are both low can substantially increase compute time.

Search by lift and search by strength are both substantially faster when the m-estimate is used.

Some final thoughts

Magnum Opus is a powerful and flexible tool.  The default settings are sufficient for many analytic tasks.  However, advanced users can use the sophisticated controls to perform a wide variety of complex analyses.  We recommend that new users start by using the default settings and only start using the other controls as they become familiar with the system.