Hybrid Intrusion Detection System using Fuzzy Logic Inference Engine for SQL Injection Attack

SQL injection attacks toward web application increasingly prevalent. Testing to the web that will published is the one of preventive measures. However, this method sometimes ineffective because constrained by various things. Instrusion detection system (IDS) is able to help protect the website from various attacks. This study proposed an IDS for web applications from SQL injection-based attacks. The IDS is based on hybrid architecture with a signature-based detection method, type of data to analyzed is network packet and error log. The fuzzy logic inference engine used to be drawn the conclusion based on analyzed data. Proposed hybrid IDS has good result on detecting the various type of SQL injection attack and significantly reduce or even remove the false positive and false negative.


INTRODUCTION
Today the development of web-based applications growth rapidly. However, this is not followed by well knowledge from the web developers on important aspects of web security [1], its can caused the web applications vulnerable to attacks. Thousands of attacks attempted on various web applications around the world in a day [2]. SQL injection is one of the most common types of web attacking, and mostly dangerous [3].
There are several ways that can be done to protect the web from the attacking. Among them are testing the input type, checking the encoding input, and so forth [4]. However, this does not guarantee that the web application will be secure. Beside of the attacking techniques always growing, the web developer does not always know which parts of the web that has a weaknesses so it can exploited by the attacker.
Intrusion Detection System (IDS) is a software or hardware that performs to observation, inspection, processing, and correlates the information that collected then perform a particular action when evidence of the attack has been obtained [5]. This research proposed development of hybrid IDS with signature-based detection method approach. Detection of attacks based on data analysis of network data packet and web server error log. The uses of fuzzy logic inference engine to the two mentioned data is expected to improve detection accuracy and reduce the false positive and false negative.

INTRUSION DETECTION SYSTEM
Based on the architecture, an IDS can be classified into three types: host-based IDS, network-based IDS, and hybrid IDS. Where the host-based IDS works by analyze data from monitored hosts, network-based IDS analyze from network traffic, and hybrid IDS is the combination of both [6], [7].
Based on detection methods, an IDS can be classified into three types: signature-based, anomaly-based, and stateful protocol analysis [8]. Signature-based work by recognized attacks from previously known patterns. The working principle of this method in recognized an attack based on knowledge that have been defined before. So this method also called as knowledge-based detection or misuse detection. Anomaly-based detection works by observed the normal habits of the system. The anomaly will be known if there are something deviates from normal habit. For example: failed too many in login, excessive use of processor resources and memory, mass email delivery at a time, etc. This method is also called as behavior detection because of detection method that based on normal habits.
The workings of the stateful protocol analysis method is similar to anomaly-based detection, ie detecting possible anomaly on the system. The distinguishes is this method analyzes anomaly does not based on internal system or network standards, but based on standards that arranged by international standards organizations, for example from the Internet Engineering Task Force (IETF) organization. In general, the classification of IDS can be seen in Figure 1.

FUZZY LOGIC
Fuzzy logic is an inference engine that solved the problems with approaches like human reasoning. Fuzzy logic uses multilevel logic rather than true or false statements [9]. Fuzzy logic aims to formalize reasoning modes with an approximation rather than exact [10]. There are several reasons why the fuzzy logic uses, namely [11]: 1. The concept is easy to understand, it uses a simple mathematical concept. 2. Flexible and have a tolerance to inapposite data. 3. Can be modeled a complex nonlinear functions. 4. Can cooperate with conventional control techniques 5. Can build and apply expert experience without going through the training process. Unlike classical logic which has only two values, fuzzy logic can have many membership values that are divided into degrees of membership and the degree of truth between 0 and 1 values. In fuzzy logic it is also possible that one value becomes partially true or partly wrong at the same time [12].

SQL INJECTION
SQL injection is web attack that consists of characters or keywords entered by user to the parameters of web application that aimed to changing the meaning and logic of the actual SQL command [13]. SQL injection attacks caused the database can be accessed or manipulated by unauthorized people. A SQL injection attack can be classified into several groups, namely [14], [

ERROR LOG WEB SERVER
Error log is the one of the most important log types. This is where the web server send the diagnostic information and record all of errors that occur while web application running. Error log is the first places to seen in case of a problem related to the operation of server [16]. Figure 2 shows an example of error log on the Apache web server associated with a query error that indicates a SQL injection attack has occurred.

RELATED WORKS
Several studies have been published in the topic of network security, especially in the field of intrusion detection system (IDS). The research classified into two categories, the first related with hybrid IDS, and two are associated with IDS for SQL injection attacks. In first category, the studies generally built hybrid IDS with a combination of network data packet and system activity logs, among others Shanmugam and Idris [7], Sharma and Chandel [17], and Yunmar [18]. Except of Yunmar research, the first category generally focused on analyze network based attacks, such as DoS, U2R, R2L, etc.
While the research of hybrid IDS that discusses attack related with web application is not much.
In the second category, the development of IDS for SQL injection attacks built with signature-based methods. Where the data analyzed sourced from the network data packet that passing through port 80, as done by Maheswari and Anita [19], Kroné and Bahtijaragic [20], also Bhat and Mumbarkar [15]. While Irawan, et al. [14] developed a signature-based IDS with data sourced from port 3306.
The contribution of this research is looking for the impact usage of data source which in the previous research that very little even never to be used in the analysis of attack detection, that is combination of network data packet and web server error log. So, its expected to reduce or remove the false positive and false negative.

HYBRID INTRUSION DETECTION
In this study, the IDS built with hybrid architecture that combine network-based IDS and host-based IDS. Network-based IDS works at network level. Network-based IDS realized with the presence of Sniffer Agent in charge of capturing and analyze network data packet that captured through port 80. Meanwhile, hostbased IDS works at application level.
Host-based IDS is realized with the presence of Log Analyzer Agent in charge of processing data that obtained from Apache error log. Suspicious things that found by Sniffer Agent and Log Analyzer will be sent to the Fuzzy inference engine to be determine the final result, whether classified into SQL injection attacks, or normal web access. Figure 3 shows the proposed hybrid IDS design. This study uses signature-based approach as detection method. To recognize an attack, both of Sniffer Agent and Log Analyzer Agent works by extracting data with the Regular Expression techniques based on pattern that wrote in knowledge based.   Sniffer Agent that suspect as SQL injection attack As the example shown in Figure 4, the Sniffer Agent captures the suspected data packet that indicates as an attack.
Sniffer Agent knows this is part of attacking attemp because the "UNION SELECT" string that contained on URI indicates as one of the SQL injection techniques. String that suspected by Sniffer Agent as an attack attemp is a string that associated with SQL keywords that just as described in Section 2.3. Among them are: UNION SELECT, SHUTDOWN, OR 1=1, DROP TABLE, and so forth. This attacking attemp will be reported to the Fuzzy inference engine to be determine the final result. The flow of attack detection process on Sniffer Agent can be seen in Figure 5. Attack identification process in the Log Analyzer Agent applies the same principe as the Sniffer Agent done, its extract data that sourced from Apache error log with Regular Expression techniques based on the known pattern. As shown in Figure 6, Log Analyzer suspected those log item is part of attacking attempt because the error related with mistake in MySQL query, such as mysql_num_rows, and mysql_fetch_array. Log Analyzer will be reports this to the Fuzzy inference engine to be determine the final result. The flow of attack detection process on Log Analyzer Agent can be seen in Figure 7.
Every single attack attemp that recognized by Sniffer Agent and Log Analyzer Agent will be reported to the Fuzzy inference engine. The data that will be reported is IP Address of suspected attacker, and website targeted. The intensity of those attacking attemp can be input variables of the Fuzzy inference engine in determining the final decision, ie whether the activities performed by a user of an particular IP Address can be classified into SQL injection attacks or not.

INFERENCE ENGINE DESIGN Linguistic Variable
Linguistic variable is the numeric intervals that have values, whose semantics are defined by their membership function. There are two variables that are used as consideration in performing an inference process, namely the error log variable and intensity variable sourced from network data packet that suspect as attacking attempt. Fuzzy logic inference engine will produce an output variable that indicate occurrence of attack activity.

First Variable
The 1st variable is intensity value of error log items appearence that sourced from a particular host or IP address. The graph of the membership function in this variable is described by the trapezoid as seen in Figure 8.

Second Variable
The 2nd variable is intensity of network data packets that come from an IP address that suspected to be part of the attack. These values obtained from the Sniffer Agent which is responsible for processing the incoming network data packets. The graph of the membership function in this variable is shown as in Figure 9.

Third Variable
The 3rd variable is the linguistic variable that determines the final output (defuzzification result) of proposed system which is indicates value the occurrence of an attack. The graph of membership function in this variable is illustrated in Figure 10.

Fuzzy Rule Base
The determination of the fuzzy rule base on the two linguistic input variables that mentioned before will be affect to the probabililty value of occurrence of attack. So, the fuzzy rule base can written as: R1 = IF "Error Log Intensity" = "Low" AND "Intensity of Data Packet Suspected" = "Low" THEN "Attacking Indication" = "Low" R2 = IF "Error Log Intensity" = "High" AND "Intensity of Data Packet Suspected" = "Low" THEN "Attacking Indication" = "Medium" R3 = IF "Error Log Intensity" = "Low" AND "Intensity of Data Packet Suspected" = "High" THEN "Attacking Indication" = "Medium" R4 = IF "Error Log Intensity" = "High" AND "Intensity of Data Packet Suspected" = "High" THEN "Attacking Indication" = "High"

SQL INJECTION SCENARIO
Attacking attempts begin when the attacker uses certain SQL injection techniques against the web through a computer network. This attacking activity creates a trace of log errors that produced by Apache web server. The Log Analizer Agent that recognized this pattern will reports it to the Fuzzy inference engine. The same thing also done by Sniffer Agent that recognizes the attack patterns of data packet that passing through a network. Figure 11 illustrates an attack scenario until a hybrid IDS detected an attacking activity. SQL injection attacks can be done by manipulate the part of web that has direct access to queries, for example: URI, or input form. The SQL injection code can be manipulated in such a way as to be part of a normal URI, or a normal data that input to the form. For the example shown in Figure 12, that normal URI was manipulated by adding UNION query injection, and Piggy-backed query at once. Figure 13 shows how the input form (login) can be used as entry point for Tautology attack, one of SQL injection technique. This technique can be done by entering characters that are known to be used in Tautology-based attacks.
Generally, the tools that exist today such as Schemafuzz, SQL Map, and SQL Power Injector can replace the manual attacking as done above.
The working principle is the same, but this tools has the ability in automation, which is makes the tools can do the attacking attemp more faster than manual way.

SYSTEM AND DATA TESTING
The data used in this research comes from two sources, there is a network data packet, and an error log. Network data packet comes from scanning HTTP GET data packets that pass through Ethernet devices on port 80. While data of error log obtained from scanning performed on Apache error log. Various Linux distributions typically located that error at /var/log/apache/error_log. The error type of Apache error log that uses in study is MySQL error that produced during web application running. In this study the network data packet scanning task was handed over to the Sniffer Agent, while Apache error log scanning was handed over to the Log Analyzer Agent.
System testing conducted by using three scenarios, namely: testing with normal website access, testing with normal access but inserting data string pattern that recognized as part of SQL injection code, and testing that is truly a SQL injection attack. In each scenario, IDS will use several types of data to analyze. Types of data used include: network data packet only, Apache error log only, and a combination of network data packet and Apache error log. The purpose of this test is to see how far the IDS can distinguish which is really attack, and which is not, based on the type of data for analysis. Table 1-3 shows the results of the tests conducted on the three proposed scenarios. While Table 4 shows the detail tests of several SQL injection type toward proposed hybrid IDS that uses combination of network data packet and Apache error log to analyze an attack.
Hybrid IDS written on Perl programming language that implemented on Linux-based operating system, Apache web server, and MySQL database. The testing of SQL injection attacks done by the various of penetration tools, such as Schemafuzz, SQLMap, SQL Power Injector. In addition, testing is also done

RESULT AND DISCUSSION
The IDS that uses only one data in analyzing an attack has potential to produces an error, both false positives or false negatives. The appearance of false positive is shown in Table 1, where something that is not an attack will be considered as a form of attack just because the data packets containing patterns that suspicious strings as part of SQL injection code.
The appearance of false negative can occur in IDS which only relies on data analysis from the Apache error log. Table 2 shows the existence of these error, namely the escape of a web access that should be identified as a SQL injection attack.
The value of false positives can be even greater if the attacker can write the injection code that can follow the SQL logic that used by programmers to develop the web.
Hybrid IDS that proposed in this study can eliminate errors as described above. The hybrid IDS uses two types of data to analyze the status of a web access. Hybrid IDS will cross-check a web access that indicated as an attack. For example, if the Sniffer Agent finds a string pattern of suspicious network data packet from an IP address, the hybrid IDS through the Log Analyzer Agent will cross-validate with the Apache error log data.
If the Log Analyzer Agent finds the appearance of MySQL error caused by web access from that IP address, then the IDS hybrid will conclude that user from those IP address is doing the trial of SQL injection attack.
The accuracy of proposed hybrid IDS can be calculated by Equation (1).
The development of IDS with a signature based method in many studies produces much of false positive and false negative. But in this study it can be reduce or even remove entirely by combining two data sources for material analysis, namely: network data packet and Apache error log. The performance comparation of proposed hybrid IDS and another studies shown in Figure 14.  Figure 14. Comparation of studies

CONCLUSION
Hybrid IDS that proposed in this study has good result on detecting the various types of SQL injection attacks. Both of agents works based on a written pattern. An attack with a new pattern that did not wrote on knowledge base has great potential to escape from the detection system that applied by IDS. For the example: the attacking pattern code that not listed in the knowledge base will be not be detected by hybrid IDS agents. In addition there are several types of web-based application attacks such as Cross Site Scripting (XSS), Local File Inclusion (LFI), Remote File Inclusion (RFI), and others that also require IDS handling. In the future, these issues are becoming challenge in research related with IDS development, especially IDS for webbased applications.