SAN MATEO (05/15/2000) - To paraphrase Freud, sometimes a database is just a database. But these days we expect more from our database servers than mere data storage and acquisition. The database plays a pivotal role in determining how reliable, scalable, adaptable, and interoperable our large-scale applications will be.
The modern database is not just a database: It is the hub through which all of our business data flows. Pick the wrong product -- the one that can't scale up to support your burgeoning e-commerce site or the one that can't make heads or tails of your XML data types, for example -- and watch the waters of success dry up before your very eyes.
There are many players in the database market, but for most large organizations the choice comes down to two: Oracle Corp. and SQL Server. History has provided some simple criteria that are often used to decide between the two. For big, fast, scalable databases, companies buy Oracle and run Unix. For small, easy, affordable databases, companies buy SQL Server and run Windows.
With the release of SQL Server 2000, Microsoft Corp. hopes to give Oracle some competition in the enterprise database arena. If the retail product retains the strengths of Beta 2 while polishing its rough edges, Microsoft is headed in the direction of more enterprise respectability; but even fully leveraged, this product will not stem the dominance of the Oracle-on-Unix combination.
SQL Server 2000 cuts a wide swath. Its editions cover everything from freeware, such as Microsoft Data Engine, and Pocket PCs to heavily clustered data centers. Major new features include numerous performance enhancements, robust support for XML, full-text and English queries, data mining and analysis, and extraordinarily flexible replication. Developers will appreciate new user-defined functions, SQL debugging, indexed views, scripted data transformations, SQL templates, and HTTP access.
For your IS staff there is Kerberos and SSL (Secure Sockets Layer) security, multiple service instances, installation cloning, large and clustered systems support, differential backups, online re-indexing, warm standby, and one-step database copying. You'll find the complete list of new features at www.microsoft.com/sql.
Getting under way
I tested the Enterprise Edition of SQL Server 2000, Beta 2, in a lab with three machines running Windows 2000 Advanced Server. The master server was a 1GHz Athlon with 384MB of RAM. It shared the LAN with a pair of lesser servers -- an 850MHz Athlon and a 500MHz Pentium III.
The Beta 2 installer performs three types of installations: normal, upgrade, and multi-instance. The latter allows you to run multiple SQL Server 2000 instances on one machine. I used this trick on a server that straddles my firewall. One instance served nonsensitive, read-only data to remote users via the Internet. The other instance, invisible to outsiders, managed private data.
The instances were entirely independent, but I managed them from a single console.
SQL Server 2000 will share a server with SQL Server 7.0. If you choose to upgrade from Version 7.0, the installer will update your executables, regenerate your data, and preserve existing users, system par-ameters, and other settings. If you're justifiably wary of replacing a production release with a beta, you can do a side-by-side installation -- 7.0 and 2000 on one machine. An upgrade wizard will migrate your data to the new release when you're ready.
Not an island
Smaller companies that want a fast and cost-effective way of expanding their enterprises rapidly will appreciate SQL Server's new SMP (symmetric multiprocessing) scalability support. You can increase processing power in a modular fashion by creating a cluster of computers or by adding one or more additional computers using low-cost components right off the shelf.
Each server is "independent," with separate operational rules, and can support independent processes and data. By partitioning across a group of computers in this way, SQL Server can support much larger processing loads.
For servers with multiple CPUs, SMP support will reduce execution delays for complex operations by spreading queries and database integrity checks across all the CPUs. If you choose, you can apportion one or more CPUs to a specific instance of SQL Server. For example, giving three CPUs to user queries and one CPU to low-priority maintenance tasks such as backups will vastly reduce query response time.
SQL Server's traditional specialty has been single-server databases. Because PC servers are not as expandable as the Unix counterparts, being restricted to one server forces you to live under a very low ceiling. The most important new features in SQL Server 2000 link database servers to each other, to applications, and to remote users.
If SQL Server 2000 has one killer feature it is publish/subscribe replication.
By efficiently distributing database changes among all the servers in a group, SQL Server 2000's new replication model makes a foundation for server farms of virtually infinite size. Publish/subscribe replication is a vast improvement over what could be done with SQL 7, but because SQL Server 2000 lacks support for fully distributed configurations such as those that can be done within Oracle, it is far short of perfect.
Publish/subscribe replication begins with the appointment of a distributor. The distributor is a SQL Server 2000 instance that keeps track of published data. A publisher is the primary database server for a group, holding the master copy of the objects being replicated. You may publish any quantity of data objects, from several databases down to a single table.
Subscriber servers either pull in changes on request or have changes pushed to them by the publisher as they occur. In the latter case, as applications modify data on subscriber servers, those changes are replicated to the publisher.
Changes are then relayed to other subscribers in the group. The result is a latticework of synchronized servers.
When I saw the simple interface attached to replication -- a wizard sets it up -- I expected a limited implementation. But in fact, it is quite versatile. The wizard lets you select push or pull, one-way or two-way, instant or queued replication of any combination of data objects.
Aside from its ease and versatility, what makes publish/subscribe replication so worthwhile is that it does not require OS clustering or shared hard drives.
You can use Microsoft Message Queue (MSMQ) Services if you like, but it is not required. Even non-Microsoft database servers, including Oracle, can be added as subscribers. No additional software or special hardware is required: You don't need to modify your applications.
Replication does a fabulous job of creating database server farms, but what then? Microsoft has a few strategies for distributing database workload across servers, but they lack replication's panache. Some facility should optimize queries to exploit a set of synchronized servers. Sadly, no such facility exists. The only automatic methods are IP load balancing and Windows 2000's limited clustering, and both are clueless about how to get the most from a database server farm. The only sensible option is custom code.
If you're a veteran of Oracle, Sybase Inc., or another database created in the Unix model, SQL Server 2000's administrative interface may leave you dizzy. You can still fire up a textual SQL console and manage everything from the keyboard. But for this release, Microsoft has gone nuts with the graphical tools. It takes one click to pull up a lovely graphical report, done in dynamic HTML, of the relative disk usage of each database on your server. It takes about five clicks to create a new database.
If you're willing to suffer the burden of about 10 clicks, you can build a reusable DTS (data transformation services) package that converts a table to comma-separated text and uploads it to a password-protected FTP site. (See related article, above, for more on DTS and SQL Server's new data-analysis and -transformation capabilities.)Publish/subscribe replication is a dandy feature, but it's overkill if you don't need to maintain tight synchrony among servers. A simpler method, log shipping, exports transaction logs from a master server to read-only or warm standby servers. Each server plays the log back, synchronizing the database with the master.
Large database backups create headaches for administrators. Here, too, SQL Server 2000 offers multiple solutions. For a busy database, snapshot replication creates a point-in-time image of selected data. You can then back up that snapshot from an offline server without putting a strain on the production server. When generating a snapshot, filters weed out nonvital data to reduce the size of the backup image.
Snapshots are functional and require no effort to produce (only lots of spare disk space), but they don't grab my interest as do differential backups. After you have burned a tape of your whole database, SQL Server 2000 will create backups containing only new transactions. The resulting tape data sets are comparatively tiny, and differential backups run with minimal overhead on live servers. You still need to perform periodic -- for example, weekly -- full backups, but frequent differential backups will eliminate the typical loss of interim changes.
For the programmer in you
SQL Server 2000's GUI appeal extends beyond administration. The package includes a surprising array of graphical design and development tools.
Query Analyzer, Microsoft's graphical SQL tool, has been beefed up with new features and a split-pane layout. The object browser uses a tree view to list all the databases, tables, columns, procedures, and other elements on your server. To the right of that are arrayed any number of SQL editor windows. To insert its name into your SQL statement, drag any element from the tree to the editor. Invoke the object search to locate an identifier by its attributes.
Query Analyzer templates are a clever turn. The time spent during most SQL coding sessions is split evenly between actual coding and looking in the manual. Once you get the hang of a complex SQL query, you can turn it into a template. After you load the template file, a dialog box pops up and asks you to supply the values of the variables embedded in the template. Query Analyzer inserts those values into your template to produce a perfect, customized SQL statement. Standard templates cover common tasks such as table creation.
Templates hardly set the woods ablaze, but they are handy.
Of greatest interest to developers are SQL Server 2000's user-defined functions, SQL debugger, and XML support. Simply put, user-defined functions are groups of SQL statements that you execute as you would a built-in SQL function and are far easier to invoke than procedures.
Should you get lost in your code, the new SQL debugger should help you find your way. Typical debugger functions such as breakpoints, single-stepping, watch variables, and call stacks are reportedly implemented, but I did not find the SQL debugger in Query Analyzer's menus.
Rather than shoehorn XML into SQL Server as an ill-fitting accessory, Microsoft chose to extend the Transact-SQL language with XML-specific statements. The fit is quite natural. Any SQL query can return its results set in XML for ready formatting with XSL (Extended Stylesheets Language). That same XML exports easily to a file for offline viewing and processing. There are three XML formatting options, none of which returned the data I expected. However, with experimentation I quickly adapted to the XML output and fed it cleanly into the Microsoft XML parser for analysis.
XML Views provide an XML DOM (Document Object Model) window into SQL Server data. Working from your XDR (XML-Data Reduced) schema, SQL Server 2000 interactively translates SQL tables and columns to XML elements and attributes.
XML Views reduces to one step the inefficient process of iteratively assigning SQL table data to program variables. Working with XML-mapped data makes programmatic analysis far easier. As a bonus, the XDR schema insulates your code from changes to the database layout.
To facilitate access to XML documents from SQL procedures, SQL Server 2000 implements a new OpenXML keyword. An XML document, which you can load from a file or embed in a SQL procedure, becomes a SQL rowset, which is a collection of rows and columns. You can use the XML rowset as you would a SQL table, querying, sorting, and even modifying it through SQL statements. When you're done you can dump the modified XML rowset into a SQL table or save it in XML form.
After the retail edition of SQL Server 2000 ships, Microsoft plans to offer XML bulk updates and XML "updategrams," database updates encoded as XML, as free downloadable add-ons. Updategrams allow your users to selectively insert, update, or delete table data from anywhere, even through firewalls, while XML bulk update will enable you to transmit, transform, and load data from any source into SQL Server relational tables.
Will SQL ever be equal?
By any measure, SQL Server 2000 is an ambitious venture. Its long list of new features, the bulk of which aren't even discussed in this article because of space constraints, impresses but falls shy of bringing Microsoft's database server to par with mighty Oracle. SQL Server 2000 can't win this race by out-featuring its rival: Oracle has too great a head start.
Initially, Microsoft's new database will stem the defections of Windows shops to Oracle. We won't know whether Microsoft can win back enterprise accounts lost to Oracle and Sun until daylight greets the retail release of SQL Server 2000, Visual Studio 7, big-iron PC servers, and Windows 2000 Data Center Edition. That lineup is a long way off, but as this convergence draws near, it's bound to shake things up.
Tom Yager is a senior analyst for the InfoWorld Test Center. His book, Windows 2000 Web Applications Developer's Guide, is now available from Prentice Hall.
E-mail him at firstname.lastname@example.org. Geoffrey Hollander contributed to this report.
Scaling up with SQL Server 2000
SQL Server 2000 is ready to scale to a more powerful server configuration as soon as you are.
* Maximum number of processors supported: 32* Maximum RAM utilization in the Enterprise Edition: 64GB* Maximum data storage: 2TBWise up with full-text searchWhen it comes to searching through sales, customer transactions, and other business-critical information, many organizations confront the costly inconvenience of scattered data. A great deal of operational information is not stored in databases but in unintegrated file systems that cannot be simultaneously queried with database documents. Because these file systems are usually unstructured and principally are in text forms, it is impractical to import them into a database. Thus, companies must perform time-consuming searches to uncover data.
SQL Server 2000 offers a set of powerful integrated search features that could improve this lengthy, expensive process. New extensions to the SQL language, which are the same in both SQL Server 2000 and Microsoft Indexing Service 2.0 , the software that supports external text searches, allow you to search SQL Server 2000 database and independent file systems at the same time.
This capability, which is also included in SQL Server distributed queries and Microsoft Internet Information Server 4.0, allows both Property searches, in which you can query document properties such as author, subject, and word count, and full-text searches, in which you can query specific words within files. The full-text search function creates word indexes to support linguistic and proximity searches, so you don't need to enter exact phraseology to uncloak desired hits.
Moreover, SQL Server 2000's search functions are easily accessible through custom front-end interfaces. As are the applications used for Internet searches, these features are designed to help users find queried text from within text fields, documents, or both.
Mining your data with SQL Server 2000
Every day, your company supports customer transactions that could potentially unlock the secret to your next successful marketing campaign or profit-making venture. But how do you extract operative data from the thousands of records that occupy your database and continue to multiply?
Many companies employ a combination of OLTP (online transactional processing) and OLAP (online analytical processing) technologies to dig through their databases for possible useful and strategic information. Although this process is effective, it is extremely expensive and often requires weeks of IT labor to complete preparatory data transformations.
OLTP systems are enterprise-level database solutions for storing data generated by your company's daily transactions. These systems are designed for functions such as recording orders from point-of-sale terminals or tracking employee data -- not for providing summary information for business analysis.
OLAP systems interact with OLTP data to create reports from which managers can identify consumer habits or pinpoint critical sales factors. To perform this analysis, OLAP can either search OLTP tables directly or first convert them into multilayered OLAP data that will respond more productively to searches.
The problem is that OLAP search mechanisms that query OLTP data tables must first collect and organize disparate records, a laborious task that hinders general database functionality and provides only one-dimensional results. To solve this problem, OLAP converts OLTP data into multidimensional cubes that when queried provide illuminating information without affecting overall database performance.
For example, a corporation that oper-ates multiple hardware stores can analyze its revenue by organizing a three-dimensional cube with Season, Store, and Product dimensions. This structure not only displays simple results but also can illustrate the ways in which different factors interact to determine consumer trends at certain times of the year, at particular stores, and within specific product categories. You can get an even more detailed analysis by further subdividing each dimension: For example, the Season dimension can be broken down into Year, Month, Week, and Day categories.
Unfortunately, this process is neither simple nor cheap. The OLAP technology that reorganizes OLTP information can cost as much as $75,000, not to mention the painstaking labor required to set it up. Your IT personnel will be busy for weeks just converting your OLTP relational data into the appropriately joined and summarized flat file formats with the correct application-specific data changes.
SQL Server 2000 ends this dilemma with its enhanced OLAP and data mining capabilities. By enabling you to work directly with the OLTP data already residing in your database, SQL Server spares your staff, and your budget, the burden of converting information into OLAP form.
SQL Server's OLAP features include ROLAP (relational online analytical processing), which you can use to enable dimensional data to reside in the database and to change specific dimensions without reprocessing the entire three-dimensional cube. These benefits save a good deal of time and hassle while preparing searches and organizing results.
Also new to SQL Server 2000 are decision trees and neural networks. These data mining tools will boost your sales and marketing successes and will undoubtedly excite strategic decision makers more than any other features of the product.
The decision trees identify likely respondents to a marketing or sales effort and predict consumer behavior. For example, let's say you use direct mail as an advertising medium and have retained a response history of actual and prospective customers. Using some key characteristics of these customers, such as age, gender, income, and so forth, you could set up a decision tree that would use the "good responder" profile to determine those customers or prospects who are most likely to respond to future mailings.
Moreover, you can establish a DTS (data transformation services) package that would periodically (before each mailing, for example) rerun the decision tree using your most current response data and output the best candidates for your next marketing efforts.
Meanwhile, neural-network models can improve your profits by clustering lists of purchasing information for specific profile types and then use this data to entice existing customers to keep buying new products. Amazon.com's "personal" book recommendations, based on purchases made by shoppers that fit a similar profile, are an example of neural-networks in use.
These data mining tools perform equally well with OLAP or relational input and include algorithms that function perfectly right out of the box. If you're a statistical specialist or you work with data that contains crucial anomalies, you'll appreciate the option to write your own algorithms, using the OLE DB for Data Mining specification, and run them from within SQL Server 2000's infrastructure.
Careful and thorough data analysis can make your marketing resources go further and can increase profits by heightening reactions to sales materials. SQL Server 2000's OLAP and data-mining capabilities will get you on that path and will positively impact the bottom line with relatively little pain.
THE BOTTOM LINE: BETA
Microsoft SQL Server 2000, Beta 2
Business Case: When combined with the middleware services in Windows 2000, SQL Server 2000 fares well in a feature-for-feature comparison with more costly Unix-based solutions. Whether its performance will rival them remains to be seen.
Technology Case: Features such as full-text searches, user-defined functions, data mining and analysis, SQL debugging, and integrated XML add powerful tools to developers' arsenals. Beta 2 reveals an uncharacteristic emphasis on standards and interoperability.
+ Easy to install, configure, and manage+ Strong new enterprise features+ Well-integrated XML supportCons:
- Runs only on Windows
- Performance relative to Unix-based competitors is unprovenCost: Not yet determinedPlatform(s): Windows 95/98/2000, Windows NT; Windows CE later this yearShipping: UnknownMicrosoft Corp., Redmond, Washington; (425) 882-8080; www.microsoft.com