SAN MATEO (07/03/2000) - Right now, you are sitting on a virtual gold mine of transactional data that could transform your company into a leaner, quicker, and more profitable organization. Extracting useful information from this data could help you adjust your marketing strategies or streamline your operations, making your company exponentially more competitive.
But if you're like many IT leaders, you are torn between the need to capture useful operational data patterns and the frustration of dealing with statistical math's intricacies, which are required to yield any worthwhile results. This is especially true when analyzing huge quantities of historical data, such as sales transactions or behavior records of online customers. If this sounds familiar, I recommend PolyAnalyst 4.1, the latest data mining release from Megaputer Intelligence Inc.
Although there are many comprehensive data mining solutions on the market, such as IBM Corp. Visual Warehouse and Oracle Corp. Express, PolyAnalyst focuses more effectively on data discovery than its competition and provides efficient algorithms. This self-contained data mining product was easy to use, adaptable to multiple business contexts, and affordable, thereby earning a score of Very Good.
Squeezing gold from data
Probably one the most impressive characteristic of PolyAnalyst is the sheer number of data mining tasks it can tackle. It provides 11 data exploration capabilities that should cover even the most demanding tasks -- from the discovery of simple relations between two fields (how do discounts affect quantities ordered?) to the identification of more complex relationships (which of my products are most often purchased together?).
PolyAnalyst works with Microsoft Windows 95/98/2000 and Windows NT but unfortunately does not offer either Unix or Linux support. You can choose between a single station and a client/server version of the product. Using PolyAnalyst is as simple as importing the data set that you want to analyze, typically a table from a relational database, and launching the data mining function you want to start with.
I installed the single-station version on a dual processor machine with 256MB of RAM running Window NT 4.0 without any problem and began exploring the product.
PolyAnalyst can access relational databases via ODBC, desktop database formats such as Microsoft Access and Excel spreadsheets, and text data. For IBM Visual Warehouse and Oracle Express users, the product offers direct connectivity to those OLAP (online analytical processing) data repositories, allowing convenient access to predefined cubes of data.
The product also offers a comprehensive set of data analysis engines that work from an easy-to-use point-and-click GUI. In the client-server version, the data scans are off-loaded to a server. The product creates HTML reports, extracts formulas from data patterns, and imports and joins data from multiple data structures such as spreadsheets, relational databases, and text.
Extracting meaning from your data is easiest if you use a gradual approach, such as first taking an overall picture of your data set and then drilling down to specific details.
For example, if the objective of your data analysis is to optimize your inventory turn-around time, you can begin with a general picture of how goods flow in and out of your warehouses and identify items that are more expensive, overstocked, or have a poor rotation index. Moving from that view, you can then focus on those details that have significant financial relevance and conduct specific assessments on those only.
PolyAnalyst's Summary Statistics function is very helpful for an initial understanding of your data. It groups information according to variance of values for each field and calculates statistics, such as standard deviation, frequency, and mean. Conceptually, this is similar to drafting the layout of your home before initiating a remodeling project: It's a blueprint that will keep further investigation on target.
To test this functionality, I opened an Excel spreadsheet containing a computer inventory with a row for each machine and several columns containing data such as brand, model, operating system, and database software. For your importing ease, a wizard guides the data import step by step and allows you to select the whole data set or a random sample. To reduce clutter you can remove unwanted columns or fields from your data.
Almost immediately I saw my computer inventory listed as a group of icons on the left pane of PolyAnalyst. The product had automatically detected the name of each column from the first row of the spreadsheet and set the correct data format. In addition, when importing spreadsheets, PolyAnalyst gave me the option to simultaneously fire up Excel to verify the file content. This can be a real time-saver when working with numerous files.
To start my overall data analysis, I chose Explore/Summary Statistics from the menu and received a report in HTML format showing a summary view of my inventory. As a bonus, the Summary Statistics report allowed me to create charts according to the value that I selected. For example, I could instantly create several pie charts grouping computers by manufacturer, operating system, or any other field.
The Summary Statistics report is very useful when analyzing a new set of data, because it allows you to quickly explore the boundaries of your information.
For example, you can summarize historic sales data by the combination of products ordered by each customer, the total amount of sales, or the dates of the purchase activities.
Examining the details
Although it offers useful overviews of data, the Summary Statistics function does not provide information that you can immediately apply to business. This is where the Find Rule engine steps in.
One of the objectives of data mining is to find a mathematical expression that can accurately describe facts and help you analyze data for use in current forecasts and initiatives. This, in the data mining language, is called "finding a rule."
PolyAnalyst shines in this arena, thanks to its unique Find Rule engine, which can express data mining results as a math expression that you can use in your forecasts.
Unlike other data mining tools, PolyAnalyst does not require you to come up with possible formulas. Instead, it automatically formulates hypotheses based on data content and returns the proper equations and estimates of accuracy.
Obviously, this will save data analysts time and reduce the risk of selecting a nonsatisfactory expression.
Besides its unfortunate Windows-centricity, the only other gripe I have with PolyAnalyst 4.1 is its inability to combine data from different databases into a single set of information for viewing. This can be inconvenient when analyzing several groups of information at the same time.
But overall, PolyAnalyst's flexibility, ease of use, dynamite data discovery engines, and affordable price make it an appealing solution and more than earn it a score of Very Good. If you want to uncover hidden information that is just sitting in your archives waiting to make you money, I suggest you implement PolyAnalyst posthaste.
Mario Apicella is a technology analyst for the InfoWorld Test Center. Send him e-mail at firstname.lastname@example.org.
THE BOTTOM LINE: VERY GOOD
Business Case: The capability of this affordable and general-purpose data mining product to independently discover data relations can save significant time and cost. The product addresses multiple business considerations, such as linear regression, what-ifs, and shopping-basket analysis. Companies with limited data mining requirements can tailor the product to their specific needs without paying for unwanted functionality.
Technology Case: The product comes with a friendly and multifaceted GUI and embeds OLE DB for data mining and COM connectivity. The developer version offers the possibility to integrate the product functionality in custom applications. Unfortunately, PolyAnalyst only supports Windows platforms and has difficulty combining information from disparate sources.
+ Adaptable to multiple data analysis scenarios+ Competitively priced+ Modularity allows for selection of specific data engines+ Dynamite Find Rule engine expresses data as a business-critical mathematical expressionCons:
- Runs on Windows platform only
- Limited data structure discovery
Cost: $2,300 to $14,900, depending on algorithms chosen; developer kit is $16,000 plus componentsPlatform(s): Windows 95/98/2000, Windows NTMegaputer Intelligence Inc. Bloomington, Indiana; (812) 330-0110; www.megaputer.com.