Question: My organization has a mandate to deploy Data Leakage Prevention next quarter. How do I gauge the completeness of a solution with respect to coverage of all network traffic?
Data leakage prevention (DLP) refers to a class of detection and enforcement technologies aimed at securing internal information. The latter can be anything from compliance-related data (social security and credit card data) to intellectual property (IP). Enforcement capabilities extend from detection and alerting all the way to blocking, quarantining, or encrypting the outbound network traffic. Initial technology deployments focused on e-mail (an easily proxied protocol) but have recently begun to include HTTP, FTP, and various chat or IM services as well as encrypted transports such as SSL and SSH.
Verifying content against a security policy requires significant computational resources. To minimize the processing load, most DLP vendors use hard-coded port assignments to analyse and decode protocols. For example, e-mail is assumed to exist only on port 25, and conversely, all traffic over port 25 is assumed to be e-mail. Other protocols usually configured on fixed ports include FTP (port 21), HTTP (port 80), and the IM protocols (various well-known ports). Furthermore, some vendors require the IP addresses of e-mail servers to be specified up front or use sampling to handle higher-bandwidth networks. These assumptions (processing only selected ports or IP addresses) lead to a number of security exploits and greatly reduce the overall effectiveness of the deployment.
Outbound e-mail that bypasses the corporate e-mail servers is the most significant security risk. This scenario exists at most large corporations where employees might access home e-mail accounts at IP addresses unknown to network administrators. In addition, ISPs typically block port 25 to prevent spam from infiltrating their systems; this forces home users to configure e-mail servers on some other port. All such e-mail is effectively invisible from any DLP system that requires the user to specify ports for the protocols to be analysed. This is analogous to a security guard watching the front door of a building but ignoring all side and rear exits; it is not an effective security approach.
Other protocols such as Internet Relay Chat (IRC) may reuse port 80 to escape firewall blocking or corporate restrictions. As port 80 is universally opened for outgoing connections to support general web browsing (HTTP), access control lists or other firewall mechanisms can do little to control the applications or protocols using this port. Assuming that all traffic over port 80 is HTTP allows any number of protocols to bypass traditional content-scanning systems. In my history of a few hundred customer engagements, I have been consistently surprised at the ways data leaves corporate networks, both intentionally and unintentionally. I have concluded that we cannot expect administrators to know what, how, and when data will leave, and who will send it. Requiring this would be foolhardy.
The solution to this problem is to require the DLP system to use a port-agnostic method for identifying protocols. One such method is Bayesian analysis, which establishes a token set for e-mail and checks it against all network traffic. Only when the traffic is positively identified as e-mail would an e-mail protocol handler be employed to enforce policy. That decision would be based on finding a sufficient number of e-mail-related tokens (sometimes within a specified proximity or in a certain order), regardless of what port is used.
Port-agnostic classification and analysis are even more important for other protocols (such as peer-to-peer traffic) that use random ports to escape detection. Newer protocols can also tunnel under existing protocols (on presumably safe ports). My advice to those evaluating DLP solutions is to look under the hood and ensure the solution does not require you to know exactly how information could leave your network, what channels it may use, or what format it must be in. Without such an evaluation, you may find you have protected the front door while your assets are flying out the side and back doors.
Erik de la Iglesia is the cofounder and chief architect of Reconnex. Previously, he was a logic design manager at Extreme Networks and has worked in fields ranging from processor design and Internet marketing to network appliances. He holds an MSEE from Stanford and a BSEE from the University of Florida.