The default filter mode now varies depending on whether holdout evaluation is being performed. If holdout evaluation is performed, the filter mode defaults to insignificant, as in previous versions. If holdout evaluation is not performed, the filter mode now defaults to unsound.
In the Windows version these defaults can be changed in the Preferences: Search Defaults... dialog.
Previous versions required that values be set for at least one of minimum-support, minimum-coverage, or minimum-leverage when search-mode was strength or lift. This requirement has now been removed.
Under Windows Vista and 7 the default folder used when opening or saving a file kept reverting to the last folder opened the last time that the system was run. This no longer happens.
Version 4.6.2 fixes an issue whereby the automated division of numeric attributes into more than three sub-ranges could occasionally fail.
Version 4.6.1 is a minor update for Windows only. Some data import error messages have been expanded to provide more information. Evaluation license verification has been changed to an online process.
The help system and "Getting Started" documentation have been integrated, creating a more comprehensive help and documentation environment. The tutorial has been refined in response to user testing.
The installation process provides better support for Windows Vista and Windows 7. In particular it is now possible to install Magnum Opus for use by all users on these systems.
Magnum Opus is now able to check for updates each time it runs. This functionality can be controlled using the Interface Options dialog accessed through the Preferences menu.
An issue has been resolved that in previous versions sometimes resulted in the system crashing when importing holdout data.
The data import process has been re-engineered. This has resulted in very substantial speed-up of data import for transaction data in the interactive Windows system and modest speed-up with the command-line Linux system.
A number of improvements result in even greater reductions in Magnum Opus' industry best compute times.
The insignificant and unsound filters for earlier versions of the Windows interactive system checked a rule against all of its generalizations. Version 4.4 checks only against immediate generalizations and the default rule because our testing shows this rarely affects which rules are accepted and can greatly reduce compute time.
An error has been fixed in earlier versions of the Linux command line system that caused occasional errors in results of the insignificant and unsound filters.
There has been a major enhancement to the error reporting in Magnum Opus 4.4. More detail is provided in most error messages and the documentation on error messages has been substantially enhanced. In addition, a menu has been added to the Windows interactive system's error log window that allows you to save the error log to a file for ease of reference.
Magnum Opus 4.4 adds User Account Control and HTML Help features to the Windows interactive system. This improves the functionality of the help system under Windows Vista and Windows 7. It also resolves an issue in these versions of Windows whereby if an output file was saved in the Example Files folder, it could not be opened by Notepad for viewing.
A breakthrough in our proprietary search technologies means that the time taken for association discovery is decreased dramatically for some discovery tasks. Speed-ups should be especially noticeable when searching by strength or lift.
As itemsets of size 1 cannot be sound, the search space for itemsets of size 1 is no longer considered when setting the critical values for use with the unsound filter during search of itemsets.
For consistency with the interactive version, the default critical value used for significance tests has been altered from 0.01 to 0.05. This will affect the rules and itemsets that are found using both the insignificant and unsound filters. This value can be altered using the significance-level command.
Magnum Opus 4.2 incorporates a test for self-sufficient itemsets. An itemset is self-sufficient if its coverage is higher than can be accounted for solely in terms of the coverage of either its subsets or supersets. For example, the last of the following itemsets is not self-sufficient because it has exactly the coverage that should be expected given the coverage of each of the items it contains. Note that such an itemset fails the test for improvement that is required when a test for self-sufficiency is performed.
a [Coverage=0.500 (500); Leverage=0.0000 (0.0); p=0.000000]
b [Coverage=0.500 (500); Leverage=0.0000 (0.0); p=0.000000]
a & b [Coverage=0.250 (250); Leverage=0.0000 (0.0); p=0.525212]
Nor is the first of the next two itemsets self-sufficient. In this case it is because the second itemset, which is a superset of the first, accounts for all of the coverage of the first.
a & b [Coverage=0.500 (500); Leverage=0.0000 (0.0); p=0.000000]
a & b & c [Coverage=0.500 (500); Leverage=0.0000 (0.0); p=0.000000]
As rules or itemsets are being found, Magnum Opus 4.2 reports on the number found so far and the minimum and maximum value for the target metric of those found.
Magnum Opus 4.1 fixes a bug in versions 3.0 to 4.1 that could result in itemsets containing more than 3 items being incorrectly identified as failing a test for improvement.
Holdout evaluation for itemsets has been altered to make test-improvement=yes the default. This test is not included by default during holdout evaluation for rules.
Magnum Opus 4.1 treats any two rules X -> Y and Y -> X (that is, two rules in which the LHS and RHS are swapped) as a single pattern unless holdout tests are applied for minimum lift, strength or coverage. This increases the numbers of rules that are usually accepted by holdout evaluation while strictly bounding the risk of accepting any spurious rules at the user-specified confidence level.
Performance has been improved when using the filters None, Redundant or Unproductive for itemset discovery .
Magnum Opus 4.0 introduces a number of significant enhancements to functionality.
Most pattern discovery systems incorporate an inherent risk of discovering large numbers of spurious patterns. These are apparent patterns that appear in the available data by chance alone. Magnum Opus 4.0 provides unique facilities to control this risk. In addition to support for holdout evaluation, it provides a new filter for unsound rules. This allows the risk of any spurious discoveries to be strictly controlled without recourse to holdout evaluation.
Major enhancements to the itemset discovery engine have resulted in substantial reductions in both compute time and memory requirements.
It is now possible to define hierarchies of values for discrete valued attributes.
When the insignificant or unsound filters are employed, the p-values of the significance test are included with the other statistics for a rule or itemset.
The default output format has been changed to Concise, 2 lines per rule.
Trivial rules have been renamed redundant in keeping with general practice in the wider pattern discovery community.
A new tutorial introduction provides an introduction to key features of the system.
A new menu item restores all optional settings to their default values.
Magnum Opus 3.0 uses new proprietary data mining technologies to deliver substantially faster processing of most rule discovery tasks, increasing the size and complexity of the data that can be processed. Its flexibility and power are now brought to bear on itemset as well as rule discovery. The holdout evaluation capabilities have been greatly expanded. The power of the default filter has been substantially upgraded. Finally, a facility has been added for compiling a list of all attribute-values or items together with counts of the number of times each appears in the data.
New proprietary technologies deliver substantial speed-ups compared with previous performance. On some tasks compute time for version 3.0 is as much as an order of magnitude less than version 2.0, and for most tasks compute time is substantially lower. This greater efficiency makes it feasible to analyze even larger and more complex data than before.
An itemset is a set of attribute-values or items. Magnum Opus 3.0 can now find the k-optimal itemsets using either of two metrics, coverage or leverage. The system's ability to search for itemsets that maximize leverage provides a unique capability to identify itemsets that occur more frequently in the data than can be accounted for by any collection of independent sub-itemsets.
Magnum Opus' unique holdout evaluation process supports statistically sound exploratory data mining. Version 3.0 allows the user to select from a suite of statistical tests to be applied during holdout evaluation. The risk of any rule or itemset being accepted that does not satisfy all specified tests is no more than the selected significance level. The user can now set the desired significance level.
The significance filter now uses the Fisher exact test in place of the less powerful binomial sign test, enabling the discovery of more subtle patterns.
It is often useful to know the frequency of each individual attribute-value or item, if only for comparison with how those frequencies are affected by the other values in a rule or itemset. A new option creates a reference list of attribute-values and their frequencies.
In previous versions, the lists of values allowed on the LHS and RHS of a rule became unwieldy when the numbers of values became too large. The user can now specify a maximum number of values to be displayed. If this number is exceeded, only the most frequent attribute-values are displayed, greatly improving usability.
Magnum Opus 2.0 is built on new rule discovery technologies, delivering faster, more flexible rule discovery than ever before. The new rule discovery engine supports a number of major enhancements, including statistically sound exploratory rule discovery; improved handling of transaction data; an improved filter for insignificant rules; support for Bayesian estimates of strength and lift; absolute valued constraints; multiple rule formats; and a rule discovery snapshot facility.
Magnum Opus 2.0 introduces a unique capability to exploratory rule discovery: the ability to perform statistically sound rule discovery. It is now possible to import both exploratory data, from which Magnum Opus finds rules, and holdout data, which Magnum Opus uses to evaluate the rules to determine whether they are statistically significant. See holdout evaluation for details.
Other systems cannot provide such a facility because they do not allow the user to specify a limit on the number of rules to be found.
Previous versions were designed for efficient processing of attribute-value data. Magnum Opus 2.0 now operates in two different internal modes, providing much more efficient processing of most transaction data (identifier-item and item-list data).
Previous versions used a normal approximation of the binomial distribution to test whether a rule was insignificant during rule filtering. They also used a heuristic approach to rule filtering that meant that some insignificant rules could pass the filter because they were not compared against the generalizations upon which they did not significantly improve. Magnum Opus 2.0 uses exact binomial tests and guarantees that the Filter Insignificant filter will remove all and only insignificant rules. As a result, the rules found by version 2.0 may differ from those found by previous versions.
It is also now possible to specify any critical value for use with the Filter Insignificant filter.
Version 2.0 allows the user to use Bayesian estimates of strength and lift. The m-estimate provides a conservative estimate of the expected strength and lift of a rule, resulting in more realistic estimates of strength and lift on future data.
It is now possible to specify minimum coverage, support and leverage as numbers of cases in addition to as proportions of the total number of cases.
The minimum lift has now been reduced to 0.0 and the minimum leverage to -1.0. As a result, the only system imposed constraint on the rules that Magnum Opus can discover is that coverage must be at least one case.
It is now possible to select between a range of different rule formats. A new detailed format is intended for first-time users and provides greater explanation of the various elements of a rule. Three variants of the concise format allow the LHS, RHS and statistics to be written on different lines. It is also possible to specify exactly which statistics are printed.
Magnum Opus is an anytime system. At any time during its search it is possible to terminate the search and obtain a list of the best rules discovered so far. Now a new snapshot facility allows the user to view the current best rules at any stage as the search progresses.
It is now possible to change the default values used throughout the system with the new Preferences menu and Set as default options on all Data Import Wizard screens.
The Data Import Wizard now includes an Import Now option that skips all subsequent screens using their default values.
The user interface has been substantially revised, the most important revision being the removal of the Output Log. All output is now written directly to a disk file and then displayed using an external viewer that the user can select. A new optional statistic, RHS Coverage has been added.
Release 1.3.1 resolves an issue with license key verification when multiple users share the software on a single computer.
Two new filters have been added to Magnum Opus that discard rules that occur when adding a value to the Left-Hand-Side of an existing rule does not result in a statistically significant increase in rule strength. This allows Magnum Opus to discard more spurious rules than previously possible, restricting as much as possible the list of rules that you need to consider to those that are likely to be of interest. See rule filtering for details.
The rule discovery algorithm has been revised resulting in substantial speed-up for some rule discovery tasks.
Magnum Opus 1.3 incorporates further improvements in the efficiency of the data import process, especially for item-list format data with large numbers of items.
Magnum Opus 1.3 improves the record in the output log of the values allowed on the Left-Hand-Side and Right-Hand-Side of a rule. The system now records either the values allowed, or the values disallowed, depending upon which provides the most succinct summary of available values. When both these methods will result in an excessively long list, the system displays only the number of values allowed. This circumvents the problem of having lists of thousands of values appear at the start of a log.
Magnum Opus has been substantially upgraded to
Magnum Opus 1.2 can automatically filter out rules that are unlikely to be of interest, delivering shorter lists of more interesting rules. Three levels of rule filtering are provided: Filter-out Unproductive, Filter-out Trivial, and Filter-out None. Previous versions of Magnum Opus always filtered out trivial rules only.
Data import has been upgraded to reduce the time taken to import large data sets.
Data set size advice has been added to the system. Evaluation has shown that the time taken to find rules is very sensitive to the availability of sufficient physical memory. Processing time increases dramatically when the data set size requires the use of virtual memory . Magnum Opus now monitors the data import process and estimates the amount of memory required to process the data being loaded. As soon as it is apparent that the memory requirements will exceed available physical memory, an advisory message is displayed advising that processing time is likely to be significantly slowed and suggesting a sample size that may provide better processing efficiency within the constraints of installed memory.
When the identifier-item data format is selected, the user is asked whether the data is ordered by identifier (all the entries for an identifier appear on consecutive lines of the data file). If the user advises that the data is ordered, the data import time is greatly reduced.
Search by Strength, Search by Coverage, and Search by Support are now supported, allowing search for the rules with the highest values for each of these metrics. These three new search modes augment Search by Leverage and Search by Lift as supported by previous versions.
Edit boxes that allow constraints on strength and support have been added to the Search Settings Page. These constraints add to constraints on the number of values on the Left Hand Side, the leverage, the lift, and the coverage of a rule, as supported by previous versions. Magnum Opus will now find the specified number of rules that maximize the metric specified by the Search Mode, of those that satisfy the specified constraints on the number of values on the Left Hand Side, leverage, lift, strength, coverage and lift. This allows greater control over the types of rules that are found.
A new keyword categorical tells Magnum Opus to accept any value for the attribute. This saves the user from having to list all of the values for the attribute. See the SocioEconomicGroup attribute in the tutorial for an example.
The layout of the Output Log has been refined. All information describing the data set has been moved to the start of the log. The number of cases is now displayed at the start of the log rather than being displayed at the end of the log and in the Search Settings Page. The number of items or attribute values is also displayed. When data is sampled during data import the sample percentage is displayed after the file name. The number of rules found is displayed above the rules.
Maintenance release 1.2.1 fixes an issue with internal calculation of minimum cover which occasionally set the minimum cover one case too high.
Maintenance release 1.2.2 fixes two issues. Filtering occasionally discarded rules when it should not. The list of values allowed on the LHS and RHS was occasionally updated incorrectly after the first search.
Magnum Opus 1.1 brings fast rule analysis to transaction data. In addition to the attribute-value data supported by Magnum Opus 1.0, data describing collections of items are now supported. Two alternative data file formats can be used to import basket data.
The identifier-item format accepts input where each line contains a basket identifier coupled with one of the items in that basket.
The item-list format accepts variable length input records, where each record consists of a complete list of items contained in a single basket.
Unlike attribute-value data, basket data does not require a names
file.
To assist users in importing data, an easy to use
data
import wizard has been introduced. This wizard allows the user to select data
formats, select data field delimiters, and even perform sampling
during the data import process.
For many data format errors, Magnum Opus 1.1 is able to continue importing data,
discarding the line on which the error occurs. All error messages are displayed in
the Output Log.
The main screen now contains two pages, the Search Options and Output Log pages.
The Search Options page contains the controls previously found on the main screen.
The Output Log page contains a log of error messages generated by the
data import wizard and the output from the most recent
search for rules.
The main screen can now be resized, allowing more LHS and RHS options to be displayed
on the Search Options page and allowing easy display of the Output Log page.
Numeric attributes can now be automatically partitioned into sub-ranges of values. Declaring an attribute as NUMERIC n partitions it into n sub-ranges. For example,
income: numeric 5
declares that the values of attribute income should be partitioned into five
sub-ranges.
The rules generated by Magnum Opus can now be saved in a comma separated value
format suitable for loading into a spreadsheet. This allows for further convenient
analysis of the rules that Magnum Opus discovers.
Magnum Opus 1.0 treated bar characters (|) as comment characters, ignoring all text that followed such a character. To make it easier to import a wide variety of existing files without special modification, bar characters are no longer treated as special characters in data files. Bar characters are now treated as comment characters in names files only if they appear at the start of a line. Any line starting with a bar character is ignored. Within a line, a bar character is treated like any other character.