Friday 29 April 2011

HONEYPOTS TUTORIAL

Introduction
For every consumer and business that is on the Internet, viruses, worms, and crackers are but a few security threats. There are the obvious tools that aid information security professionals against these problem such as anti-virus software, firewalls, and intrusion detection systems, but these systems can only react to or prevent attacks – they cannot give us information about the attacker, the tools used or even the methods employed. Given all of these security questions, honeypots are a novel approach to network security and security research alike.  This paper will first give an introduction to honeypots – the types and uses. We will then look at the nuts and bolts of honeypots and how to put them together. With a more advanced idea of how honeypots work, we will then investigate the research related to honeypots and look at the possible legal ramifications for the those who deploy them. Finally, we shall conclude by looking at what the future holds for honeypots and honeynets.


What is a Honeypot?
Spitzner defines a honeypot as an “information system resource whose value lies in unauthorized or illicit use of that resource.” Essentially, honeypots are resources that allow anyone or anything to access it and more importantly, honeypots do not have any real production value. The connections that are made to a honeypot are most likely probes and attacks in hopes of compromising a network. More often than not, a honeypot is simply an unprotected, unpatched, unused workstation on a network being closely watched by administrators.



The two main reasons why honeypots are deployed are:
1. To learn how intruders probe and attempt to gain access to your systems and gain insight into attack methodologies to better protect real production systems.
2. To gather forensic information required to aid in the apprehension or prosecution of intruders. (Bandy).



Types of Honeypots
Honeypots come in two flavors: low-interaction and high-interaction. Interaction measures the amount of activity that an intruder may have with a honeypot.



Low-interaction honeypots are easy to deploy and maintain as they essentially emulate services and operating systems. They are constrained by a limited set of responses and thus are only able to trap automated scanners.


High-interaction honeypots do not provide emulation. Instead, they employ real operating systems and services. These honeypots are more complex and allow one to obtain a vast amount of information regarding the attacker’s activity. Since they employ real OSes, they could themselves be compromised and turned into sources of further attack.



Information Capturing Mechanisms
Capturing data on a system designed for compromise must be done in a fashion that allows for significant analysis of activity, yet is un-obtrusive and transparent to the individual(s) who are compromising the  oneypot. Data can be captured at three distinct points, all offering their own benefits and drawbacks:



Host-based:
Data capture on the compromised host allows the greatest potential to log incoming and outgoing connections, commands entered on the host via the command line, and snapshots of running processes. Unfortunately, this method also presents the greatest risk. An intruder will often look for any logs and/or security tools, and attempt to disable them in order to conceal their presence. This being the case, data capture could be halted or modified, thus tainting the results of our experiment. Examples of tools used to log activity on a Honeypot are the operating system’s system log (typically the first target of an intruder), any intrusion detection system with packet capture ability, such as Snort, or a packet capture and analysis tool such as Ethereal, both discussed below.


Network-based:
A safer, but more complex solution to data capture involves the Honeypot clandestinely logging activity and sending it to a remote server for further analysis. This solution allows us to archive the data collected by the Honeypot on a remote machine. We assume this server to be hardened against attack, as the intruder may notice a data stream leaving the Honeypot, and attempt to disable the collection mechanism. Using tools such as Sebek1, we can effectively hide a data capture service on the Honeypot, and collect data on a remote server via a UDP connection. Sebek records the activity of the intruder and covertly sends it to a gateway, server within the network, or server elsewhere on the internet.




Router/Gateway-based:
The final common method used for data collection is at the actual ateway, router or firewall level of the network. As a gateway moves all data between the hosts on a network and the internet, we have the opportunity to log all connections and data moving from the internet to our Honeypot(s). This offers a slight increase in risk over the Sebek solution described above, as a gateway is typically not hidden in a network, and itself becomes a target for attack. Additionally, this is a more hardware intensive solution, as you require a server to act in a gateway role. Many small-scale or home gateways do not offer significant logging capabilities, and cannot be used in this role. Without robust data-capture techniques, the validity of information gathered from the host machines is greatly reduced. One of the main goals of defensive information warfare is to understand your opponent – the capture and analysis of this data is the method with which we begin to accomplish this.



Uses of Honeypots
Honeypots have several applications to the world of network security. They serve as network decoys to prevent attacks on an organization’s real network by appearing to be easy targets. By tracking all activity on a honeypot, viruses and worms can easily be detected.
              In addition, honeypots can be used to combat spam. Spammers are constantly searching for sites with vulnerable open relays to forward spam on to other networks. (Vaughan- Nichols). Honeypots can be set up as open proxies or relays to allow spammers to use their sites. This in turn allows for identification of spammers.


Recently, they have been used to learn about credit card frauds. The Honeynet project, a honeypot research group, has been able to get an insight into such activities by observing IRC channels. As traffic for these IRC channels passes through a proxy on a honeypot, administrators and law enforcement officials are able to observe illegal traffic (The reader is encouraged to refer to http://www.honeynet.org/papers/profiles/ccfraud. pdf for more details).



Advantages and Disadvantages:
Honeypots have several advantages. They collect small amounts of information that have great value. This captured information provides an in-depth look at attacks that very few other technologies offer. Honeypots are designed to capture any activity and can work in encrypted or IPv6 networks. In addition, honeypots are relatively simple to create and maintain (Spitzner).

On the other hand, honeypots also have some disadvantages. Honeypots can only track activity that directly interacts with them as opposed to all traffic across the network. There is also a level of risk to consider (Spitzner), since a honeypot may be compromised and used as a platform to attack another network. However, this risk can be mitigated by controlling the level of direct interaction that attackers have with the honeypot.

Honeynets
A collection of honeypots are combined to create a single honeynet. Honeynets are a step towards aggressive security strategies where one engages the blackhat community to attack the system without their knowledge of the monitoring. By creating an entire  fraudulent network, the amount of information that can be gathered is multiplied greatly. Honeynets can be classified as high interaction honeypots.

Honeypot Architecture
A typical low-interaction honeypot is also known as a GenI honeypot. This is a simple system which is very effective against automated attacks or beginner level attacks. Honeyd is one such GenI honeypot which emulates services and their responses for typical network functions from a single machine, while at the same time making the intruder believe that there are numerous different operating systems (Provos). It also allows the simulation of virtual network topologies using a routing mechanism that mimics various network parameters such as delay, latency and ICMP error messages. The primary architecture consists of a routing mechanism, a personality engine, a packet dispatcher and the service simulators. The most important of these is the personality engine which gives services a different ‘avatar’ for every operating system that they
emulate.

Drawbacks
This architecture provides a restricted framework within which emulation is carried out. Due to the limited number of services and functionality that it emulates, it is very easy to fingerprint. A flawed implementation (a behavior not shown by a real service) can also ender itself to alerting the attacker. It has constrained applications in research, since every service which is to be studied will have to be rebuilt for the honeypot.



Structure of a High-interaction Honeypot
A typical high-interaction honeypot consists of the following elements: resource of interest, data control, data capture and external logs (“Know Your Enemy: Learning with VMware. Honeynet Project”). These are also known as GenII honeypots and started development in 2002. They provide better data capture and control mechanisms. This makes them more complex to deploy and maintain in comparison to low-interaction honeypots. High interaction honeypots are very useful in their ability to identify vulnerable services and applications for a particular target operating system. Since the Honeypots have fullfledged operating systems, attackers attempt various attacks, providing administrators with very detailed information on attackers and their methodologies. This is essential for researchers to identify new and unknown attacks, by studying patterns generated by these honeypots.

Drawbacks
However, Gen II Honeypots do have their drawbacks as well. To simulate an entire network, with routers and gateways, would require an extensive computing infrastructure, since each such virtual element would have to be installed in its entirety. In addition, this setup is not comprehensive; the attacker can know that the network he is on is not the real one. This is one primary drawback of Gen II Honeypots. For example, on an average system with 512Mb RAM, there can be at the most 5 Windows VMs running at any given time with 64 Mb RAM each1. The number of honeypots in the network is limited. The risk associated with Gen II Honeypots is higher because they can be used easily as  launch-pads for attacks.

Comparison between GenI and GenII Honeypots


Feature
Low-interaction Honeypot

High-interaction Honeypot

Number of virtual systems /
services that can be deployed
Large
Small
Data control
Limited
Extensive
Level of Interaction
Low
 High
Ability to discover new attacks
Low
High
Risk
Low
High




Building a honeypot
To build a honeypot, a set of virtual machines (VMs) are created. They are then setup on a private network with the host OS. To facilitate data control, a stateful firewall such as IPTables can be used to log connections. This firewall would typically be configured in Layer 2 bridging mode, rendering it transparent to the attacker. The final step is data capture, for which tools such as Sebek and TermLog can be used. Once data has been captured, analysis on the data can be performed using tools such as HoneyInspector, PrivMsg and SleuthKit. We found this approach remarkable in its simplicity and feel that a few significant issues need to be brought to light. The first is the choice of a private host-only network. Though this may seem counter intuitive at first, there is a relatively sound reasoning for doing so. While bridging the VMs on to the physical network would seem like a better approach because it transparently forwards packets to the VMs and eliminates an additional layer of routing, it requires an additional data control device which will monitor the packets being sent from the VMs. The operation of data control cannot be performed by the host OS when the VMs are in bridged mode, since all data from the VMs bypass any firewalls  or IDSs which exist at the application layer on the host, as shown in the figure below. Additionally, the firewall on the host should be transparent to the attacker. This requires considerable effort, since firewalls by default work at Layer 3 or greater. To render the firewall transparent to the attacker requires recompilation of the kernel. This may not be possible on all operating systems such as Windows. Structure of a VM Based Honeypot Finally, once a honeypot is compromised, a restoration mechanism has to be implemented so that it is instantly taken off the network and all its holes carefully plugged before placing it back on the network. This is currently a manual process and can only be partly automated.


Honeynets: Challenges
So far, we have looked at how honeynets are deployed by the research community and how they are used for worm detection, hacker tracking and a host of other activities. The underlying assumption about Honeynets is that the attacker is unaware of the monitoring
and cannot easily fingerprint the honeypot. But one cannot assume that the blackhat community would just walk into the traps laid for them. We present a few possible challenges to that assumption as discussed by Oudot and Holz:
Fingerprinting a VMWare virtual machine: The IEEE standard has assigned
the following range of IP Addresses to VMWare:
00-05-69-xx-xx-xx, 00-0C-29-xx-xx-xx, 00-50-56-xx-xx-xx.
An attacker can check the MAC address of the machine and can conclude that a particular system is a virtual machine. Though this doesn’t reveal that a particular
system is necessarily a honeypot, a skilled or well informed hacker might keep away from such a system. (Oudot and Holz – Defeating Honeypots Part 1).


Fingerprinting Sebek:-
 Sebek is a kernel root kit used to hijack the system read( ) call and log all data accessed via read. It uses a covert channel to send data to the logging server. Also, Sebek is well designed to hide itself from being listed as a kernel module and also bypasses the TCP/IP stack so that an attacker cannot detect it using sniffers. However, one way of fingerprinting a system running Sebek could be the amount of traffic generated by Sebek. For a single byte read(), Sebek transfers close to 100 bytes of data. A continuous execution of single byte reads could sufficiently slow things down on a network which can be detected using ping requests. (Oudot and Holz .


Reverse Firewall:- Honeynets typically have a reverse firewall, which limits the amount of outgoing traffic from the system. A clever attacker can observe the patterns in the outgoing traffic and can fingerprint the system as a honeynet if a sufficiently large number of outgoing packets are dropped or modified (Oudot and Holz – Defeating Honeypots Part 2). We believe that with the current deployment of honeypots there is no traffic to and from a honeypot except the attacker traffic. A system of interest to the hacker might be a database server or a web-server handling lots of requests. But, even though on a honeypot, the services can be provided or emulated, no active connections can be observed. An attacker can check for other connections or the amount of traffic on a system that he/she has compromised to get an idea about the system. A smart attacker might bundle all these checks into a script which is run as soon he/she gains access to the system and can back off as soon the script generates alerts. Beyond being able to keep the attacker from fingerprinting the honeynet, there are other challenges too facing the research community. Honeynets are useful in detecting worm attacks which follow the epidemic model3. Researchers have proposed Warhol worms and flash worms that do not follow the epidemic model and threaten to bring down the Internet in matter of minutes. Honeynets might provide little or no benefits in such a case. To overcome the challenges stated in above we suggest the following practices:


• MAC addresses of VMs must be changed from their default values.

• The system should be configured to generate fake traffic to the Honeypots and modify data capture accordingly.

• Recompile Sebek to use a more efficient covert channel, by compressing data or
by setting up a virtual service which pretends to transmit the data to another host.


Legal Issues Pertaining to Honeypots:-
In the technology world there are many legal questions concerning honeypots. First, honeypots are relatively new. Security professionals are still using them in new ways and the legal community is just starting to hear questions about them. The main issue is that there is no legislation concerning them. While there are hundreds of laws governing the United States, most US policy is made through court cases, where there are very few concerning Honeypots. That being understood, most of the research we found in the area concluded that there are three major legal spectrums concerning honeypots; entrapment, liability, and privacy (Honeypots: Are they Legal?”).


Entrapment:-
Entrapment can be claimed by a defendant when, according to the Webster, he “… would not have broken the law if not tricked into doing it by law enforcement officials.”. In other words, entrapment is a defense against criminal prosecution. An example would be a police officer asking you if you wished to buy illegal drugs from him. Honeypots do not coerce people to use them like the police officer does with the drugs. Honeypots are much like homes; if someone wishes to break in, they have to do all the work. They have to open the door, they have to look around the house, and they have to steal the items. While honeypots do not necessarily fall into the entrapment category, the do have many privacy concerns.




Honeytokens:-
A honeytoken is a data entity whose value lies in the inherent use of that data. Similar in concept to a honeypot, where the use of the honeypot itself is subject to scrutiny, honeytokens are entities such as false medical records, incorrect credit card numbers and invalid social security numbers. The very act of accessing these numbers, even by legitimate entities, is suspect. We believe that this concept is especially useful in preventing large classes of attacks. For instance, a database which contains credit card information could have certain 'honeytokened' credit card numbers. These records cause alerts to be fired the instant they are accessed. Since a legitimate user has no reason to access the honeytoken (there will be no legitimate need for a user to select all the records from a credit card table), the person accessing the record will undoubtedly have malicious intent. Thus, attacks ranging from an SQL Injection attack which selects all the data from the database to a DB Server hack which extracts all the records from a table or exploitation of a vulnerability in a database server to extract tuples can all be foiled using Honeytokens.

Conclusion
In this project, we have looked at various aspects of Honeypots. We described the architecture of low and high interaction honeypots and their possible drawbacks. We discussed the use of Honeypots in research and surveyed the research work related to Georgia Tech Honeynet project. A lot of challenges face the Honeynet research and deployment and we have presented a few such challenges and possible solutions. We also presented a detailed view of legal challenges with respect to Honeypot deployment. We have explored systems similar to Honeypots such as Tarpits and Honeytokens. We do believe that, although Honeypots have legal issues now, they do provide beneficial information regarding the security of a network. We think it is important that new legal policies be formulated to foster and support research in this area. This will help solve the current challenges and make it possible to use Honeypots for the benefit of the broader Internet community. 

No comments:

Post a Comment