DETECTING ENCRYPTED INTERACTIVE STEPPING-STONE CONNECTIONS Ting He and Lang Tong ABSTRACT Network intruders often hide their identities by sending attacks through a chain of compromised hosts that are used as “stepping stones”. The difficulty in defending against such attacks lies in detecting stepping-stone connections at the compromised hosts. In this paper, to distinguish normal from attacking connections, we consider strategies that do not depend on the content of the traffic so that they are applicable to encrypted traffic. We propose a low complexity detection algorithm that has no miss detection and an exponentially-decaying false alarm probability. A sequential strategy is then developed to reduce the required number of testing packets. Keywords: Stepping-stone detection, intrusion detection algorithms, encrypted stepping-stone attacks, interactive steppingstone attacks. 1. INTRODUCTION Stepping-stone attack is a common way for network intruders to conceal their identity. In a stepping-stone attack, the attacker compromises (multiple) hosts as relay machines, constructs a chain of connections through these hosts using remote login such as Telnet or SSH, and then sends attacking commands through this chain to the victim [1]. Because each connection is made by a separate remote login, the next host in the chain can only see the identity of its immediate upper stream neighbor, and the victim only sees the identity of the last host. Therefore, we have to trace back the chain to find the origin of an attack. Such tracing can be overwhelming because of the huge volume and highly dynamic nature of the network traffic. To address this issue, Donoho et al. propose in [2] to install stepping-stone monitors at each gateway node to detect steppingstone pairs1 by examining the incoming/outgoing traffic. In practice, the monitor has to make decisions by observing live traffic, which may not include the beginning or the end of the connection. Therefore, it is desirable that the detection strategy does not require synchronization between incoming and outgoing streams. T. He and L. Tong ({th255,lt35}@cornell.edu) are with the School of Electrical and Computer Engineering, Cornell University, Ithaca, NY 14853. This work is supported in part by TRUST (The Team for Research in Ubiquitous Secure Technology), which receives support from the National Science Foundation (NSF award number CCF-0424422) and the following organizations: Cisco, ESCHER, HP, IBM, Intel, Microsoft, ORNL, Qualcomm, Pirelli, Sun and Symantec, and the U. S. Army Research Laboratory under the Collaborative Technology Alliance Program, Cooperative Agreement DAAD19-01-2-0011. The U. S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation thereon. 1 A pair of incoming and outgoing streams is called a stepping-stone pair if it is part of a stepping-stone attack. Otherwise, the pair is normal. Besides, the connections may be encrypted (e.g., SSH sessions) so that the monitor cannot rely on the content of the traffic. Furthermore, a careful attacker may even actively modify the traffic each time it passes through a host in order to confuse the monitor. 1.1. Related Work Staniford and Heberlein [1] are the first to consider the problem of stepping-stone connection detection. The early work is based on the content of the traffic, e.g., see [1, 3]. These techniques help to recognize connections on the same intrusion path by analyzing the content of the packets. Later on timing characteristics of the traffic are used to detect encrypted stepping-stone connections, examples of which include [4–6]. The drawback of these techniques is that they are vulnerable to the active timing perturbation by the attacker. There are a few results on detecting encrypted, perturbed steppingstone connections; see [2, 7, 8]. The key assumption of these approaches is that there is a limit on the attacker’s ability to alter the traffic. Specifically, in [2] it is assumed that there is a maximum tolerable delay for attacking packets, in [7] the attacker’s timing perturbation is independent and identically distributed across packets, and in [8] there are constraints not only on the maximum delay, but also on the maximum number of packets that can be sent during the delay. From an algorithmic point of view, Blum, Song and Venkataraman [8] develop the first detection algorithms which require provable (polynomial) sample sizes to achieve certain false alarm probabilities. 1.2. Summary of Results and Organization Our work is based on the same assumptions as in [8]. In this paper, we consider detecting encrypted, interactive stepping-stone connections. By “encrypted” we mean that we cannot use the content of packets. “Interactive connections” means that the new commands to be sent depend on the feedback of previous commands. Therefore, the attacker cannot wait too long for the packets to be relayed, and cannot issue packets too fast because he needs time to process the feedback. With these constraints, we reduce the problem to testing pairs of independent point processes against relayed point processes with bounded delay and bounded peak rate. We take an algorithmic approach. Noticing that under the bounded delay and bounded peak rate assumption, the maximum variation (defined later) for stepping-stone pairs is always bounded, we develop a detection algorithm based on the maximum variation statistics. The algorithm has no miss detection and an exponentiallydecaying false alarm probability. Moreover, we explore the possibility of reducing sample size by using sequential detection. Specifically, we propose an iterative algorithm which distributes the total false alarm constraint among iterations. For given false alarm constraint, we show how to decide the detection threshold adaptively so that the constraint is satisfied. We also consider how to distribute the false alarm constraint to minimize the maximum sample size, and we show that the minimax distribution reduces the sequential algorithm to a fixed-sample-size algorithm. Our analysis focuses on the error exponent of various algorithms. We show that although our algorithm needs the same order of sample size as the algorithm proposed in [8], our algorithm has a much larger error exponent. The rest of the paper is organized as follows. Section 2 defines the problem. Section 3 presents several detection algorithms together with the performance analysis and comparison. There are also simulation results to verify our analysis. 2. THE PROBLEM STATEMENT (i) (i) (i) Let S1 , S2 (Si = (. . . , s−1 , s0 , s1 , . . .), i = 1, 2) be the incoming and outgoing streams at a particular gateway node2 , and (i) (i) (i) Ti = {. . . , s−1 , s0 , s1 , . . .} be the set of the elements in Si . Assume that if (S1 , S2 ) is a normal pair, they are independent Poisson processes. If (S1 , S2 ) is a stepping-stone pair, then there exists a bijection g : T1 → T2 such that 0 ≤ g(s) − s ≤ ∆ for any s ∈ T1 ; furthermore, |{s ∈ S1 : s ∈ [t, t + ∆]}| ≤ p∆ for any t. Here ∆ is the maximum tolerable delay, and p∆ is the largest number of packets the attacker can send within ∆3 . We want to test the following binary hypotheses: H0 : H1 : (i) (S1 , S2 ) is a normal pair, (S1 , S2 ) is a stepping-stone pair. (i) (1) (2) (i) by observing (s1 , s2 , s3 , . . .) (i = 1, 2). 3. DETECTION ALGORITHMS AND PERFORMANCE ANALYSIS (1) (s1 , (1) s2 , . . .) (2) (s1 , j=1 where I· is the indicator function. Define the cumulative difference d(w) and the maximum variation v(w) as ∆ ∆ a a b t b t (i) N2 (I) N1 (I) ∆ (ii) Fig. 1. (i): N2 (I) > N1 (I); (ii): N1 (I) > N2 (I). at most the packets transmitted in [a − ∆, b], and at least those transmitted in [a, b − ∆]; see Fig. 1. Therefore, for any interval I, −p∆ ≤ N1 (I) − N2 (I) ≤ p∆ . We can use the maximum difference |N1 (I) − N2 (I)| over all I to detect stepping-stone pairs. Noticing that max 1≤i≤j≤w |(N1 (j) − N1 (i)) − (N2 (j) − N2 (i))| = v(w), we can equivalently use the maximum variation to detect steppingstone pairs. The algorithm is shown in Table 3.1. DETECT-MAXIMUM-VARIATION(S1 , S2 , p∆ , n): dmax = dmin = 0; for w = 1 :½n d(w − 1) + 1 if sw ∈ S1 d(w) = ; d(w − 1) − 1 if sw ∈ S2 dmax = max(dmax , d(w)); dmin = min(dmin , d(w)); if dmax − dmin > p∆ return NORMAL; end return ATTACK; Table 1. DETECT-MAXIMUM-VARIATION (DMV). (2) s2 , . . .) and order the union as and Merge s1 , s2 , s3 , . . .. Let Ni (w)(i = 1, 2) be the number of packets from w ∆ P Si when the total number of packets is w, i.e., Ni (w)= Isj ∈Si , d(w)=N1 (w) − N2 (w), N2 (I) N1 (I) ∆ v(w)= max d(i) − min d(i). 1≤i≤w 1≤i≤w Algorithm DMV has time complexity O(n) and uses only constant memory (O(log p∆ ), to be precise). By Proposition 3.1, any stepping-stone pair will be detected after n packets, i.e., miss detection is totally avoided. We only need to be concerned about the false alarm probability, which is bounded as follows. Theorem 3.2 The false alarm probability of DMV is bounded by PF (DMV) ≤ 3.1. DETECT-MAXIMUM-VARIATION (DMV) Given interval I, let Ni (I) be the number of packets on Si in the interval I. We notice that the stepping-stone pairs have bounded difference in Ni (I), as stated in the following proposition: where ρ = cos π . p∆ +2 Furthermore, lim − 1 log PF (DMV) = − log ρ. n n→∞ Proposition 3.1 For stepping-stone pairs, we have |N1 (I) − N2 (I)| ≤ p∆ ∀ interval I. Proof: Let I = [a, b]. N1 (I) is the number of incoming packets in I. By the bounded delay assumption, N2 (I) can include (p∆ + 1) n ρ , 1−ρ Proof: See Appendix. 2 s(i) k is the arrival epoch of the kth packet on stream i since the monitor starts (if k ≤ 0, it is the (−k + 1)th packet before the monitor starts). 3 The notion of ∆ and p is first used in [2] and [8] respectively. Here ∆ we do not consider inserting chaff packets. See [2] for the description of such scenario. Remarks: For given false alarm constraint δ, making the upper bound in Theorem 3.2 equal to δ yields a sample size PF vs. sample size n 1 p∆ = 4, ∆ = 1 10 DA DA lower bound DMV DMV upper bound 0 10 ³ log δ(1 − ρ) − log (p∆ + 1) p∆ ´ . = O p2∆ log log ρ δ −1 10 −2 10 Blum et al. [8] proposed an algorithm called “DETECT-ATTACKS” (DA) for stepping-stone detection. Algorithm DA divides samples into groups of 2(p∆ + 1)2 packets each, and computes the cumulative difference d(w) for each group. It returns NORMAL if |d(w)| > p∆ in any of the groups. Blum et al. prove that 2(p∆ + 1)2 log 1δ packets are required to guarantee a false alarm probability no more than δ. We point out that DMV always outperforms DA4 . The reason is that since v(w) ≥ max |d(i)|, for every realization, if DMV −3 10 PF n= −4 10 −5 10 −6 10 −7 10 −8 10 has a false alarm, DA must have a false alarm too. Now we compare their false alarm probabilities. We have the following lemma: Lemma 3.3 For independent Poisson processes and large p∆ , PF (DA) ≥ where σ = cos π , 2(p∆ +1) µ K 1 2(p∆ +1)2 and K = σ ¶n sin 2(p π +1) ∆ 2(p∆ +1)(1−σ) (3) . 0 50 100 150 n 1≤i≤w Fig. 2. PF of algorithms DA and DMV. (Assume normal pairs are independent Poisson processes with the same rate.) Using v(w) as statistics, we obtain from the proof of Theorem 3.2 that, for normal pairs, ³ ´w τw cos τwπ+1 ∆ Pr{v(w) < τw } ≤ =f (τw , w), for any τw ≥ 1. 1 − cos τwπ+1 ∆ Proof: See Appendix. From Lemma 3.3 we have that for large p∆ , the error exponent 1 If τw = sup{integer k : f (k, w) ≤ qw δ}, and the decision rule is to return ATTACK if v(w) < τw , then the false alarm probability of the wth iteration is bounded by qw δ. Therefore we have the sequential algorithm in Table 3.2. of DA is at most − log (K 2(p∆ +1)2 σ). By Taylor expansion, − log ρ = 1 − log (K 2(p∆ +1)2 σ) = π2 +o 2(p∆ + 2)2 π2 4 + log π2 +o 2(p∆ + 1)2 SEQUENTIAL-DMV(S1 , S2 , p∆ , δ, q): µ 1 p2∆ ¶ , µ 1 p2∆ ¶ . Therefore, for large p∆ , the false-alarm error exponent of DMV is at least 3.38 times larger than that of DA. Fig. 2 plots the simulated false alarm probabilities of DA and DMV and their bounds5 . It confirms our claim that the false alarm probability of DMV decays much faster than that of DA. dmax = dmin = 0; for w = 1, ½2, . . . d(w − 1) + 1 if sw ∈ S1 d(w) = ; d(w − 1) − 1 if sw ∈ S2 dmax = max(dmax , d(w)); dmin = min(dmin , d(w)); if dmax − dmin > p∆ return NORMAL; τ = sup{integer k : f (k, w) ≤ qw δ}; if dmax − dmin < τ return ATTACK; end Table 2. SEQUENTIAL-DMV (SDMV). 3.2. SEQUENTIAL-DMV (SDMV) In both DA and DMV, the decision of ATTACK requires a fixed sample size. We hope to make ATTACK decisions sequentially so that we can possibly use fewer packets. To this end, we propose to use an iterative algorithm and divide the total false alarm constraint among iterations. Specifically, we split the total false alarm probability δ into q1 δ, q2 δ, q3 δ, . . ., where q = (q1 , q2 , . . .) sat∞ P qw = 1. If, in each iteration w, the false isfies qw ≥ 0 and w=1 alarm probability is bounded by qw δ, then by union bound we see that the total false alarm will be bounded by δ. 4 Note that in terms of the order of sample size with respect to p and ∆ δ, DMV and DA are comparable. 5 Note that the sample size of DA has to be a multiple of the group size 2(p∆ + 1)2 . Algorithm SDMV also uses the maximum variation as the statistic, and therefore can be thought of as a sequential version of DMV. The vector q is part of the algorithm design. Ideally, we want to choose q to minimize the average sample size. If the attacker’s strategy is unknown, then we may wish to minimize the largest sample size. Specifically, if the attacker does the best to evade detection by keeping v(w) = p∆ for all w ≥ p∆ , then the best q is ³ ´w (p∆ + 1) cos p∆π+2 qn = 1 for n = inf w : ≤δ π 1 − cos p +2 ∆ and qw = 0 for all w 6= n. That is, the minimax q reduces SDMV to the fixed-sample-size algorithm DMV. 4. CONCLUSION In this paper, we consider detecting stepping-stone connections with bounded delay and bounded peak rate. Our techniques can rule out independent connection pairs with provable confidence and hopefully leave a much smaller number of suspicious connections for further examination. Therefore, they are most useful in the scenario when the total volume of the traffic to be analyzed is large. In [9], we consider detecting stepping-stone connections with bounded delay only or bounded memory. These more general assumptions will make the detection techniques easier to apply in practice. and then applying (4) with p = q = 21 . Furthermore, by (5) it is easy to see that lim − n1 log PF ( DMV) = − log ρ. n→∞ For the proof of Lemma 3.3, note that DA has false alarm in a given group if and only if the maximum |d(i)| in that group is within p∆ . Observing that Pr{ max i∈{1,..., 2(p∆ +1)2 } |d(i)| ≤ p∆ } = Pr{N−(p∆ +1), (p∆ +1) > 2(p∆ + 1)2 }, we apply (6) with a = b = p∆ + 1 to lower bound the false alarm of DA on one group. Then taking the product over all groups gives the desired result. 5. APPENDIX 5.1. Proof of Theorem 3.2 and Lemma 3.3 6. REFERENCES Proof: The proof is based on the theory of random walk. Let {Xn }n≥0 be a simple random walk, i.e., X0 = 0, Xn = Z1 + Z2 + . . . + Zn , (n > 0) where {Zi }i=1, 2,... are i.i.d. random variables taking value in {−1, 0, 1}. Let p = Pr{Zi = 1}, q = Pr{Zi = −1}. Define the hitting time of −b or a (a, b ≥ 0) as N−b, a = inf{n ≥ 1 : Xn = −b or a}. In [10], it is proved that Pr{N−b, a where s1 = 1 = n} ≤ 2 µ ¶a/2 µ ¶b/2 1 q 1 1 p , (4) n−1 + q 2 p s1 s1n−1 1 1 π 1−p−q+2(pq) 2 cos ( a+b ) lim − n→∞ , and 1 log Pr{N−b, a > n} = log s1 . n (5) If a = b, then for large n, Pr{N−b, a = n} ≥ π 2a n−1 . 2as1 sin (6) For the proof of Theorem 3.2, note that for independent Poisson processes, it is known that d(w) is a simple random walk. Define extreme values Un = max d(i), Ln = min d(i). A i=0,..., n i=0,..., n false alarm occurs in DMV if and only if Un −Ln < p∆ +1. Note that the false alarm probability is the largest if d(w) is symmetric (i.e., p = q = 12 ). We have PF (DMV) = Pr{Un − Ln < p∆ + 1} = Pr{ p∆ +1 [ {Un < a, Ln > −(p∆ + 2 − a)}} a=1 p∆ +1 ≤ X Pr{Un < a, Ln > −(p∆ + 2 − a)}(7) a=1 ≤ where ρ = cos noticing (p∆ + 1) π . p∆ +2 ρn , 1−ρ (8) Here (7) is by union bound, and (8) is by Pr{Un < a, Ln > −(p∆ +2−a)} = Pr{N−(p∆ +2−a), a > n}, [1] S. Staniford-Chen and L. Heberlein, “Holding intruders accountable on the internet,” in Proc. the 1995 IEEE Symposium on Security and Privacy, (Oakland, CA), pp. 39–49, May 1995. [2] D. Donoho, A. Flesia, U. Shankar, V. Paxson, J. Coit, and S. Staniford, “Multiscale stepping-stone detection: Detecting pairs of jittered interactive streams by exploiting maximum tolerable delay,” in 5th International Symposium on Recent Advances in Intrusion Detection, Lecture Notes in Computer Science 2516, 2002. [3] X. Wang, D. Reeves, S. Wu, and J. Yuill, “Sleepy watermark tracing: An active network-based intrusion response framework,” in Proc. of the 16th International Information Security Conference, pp. 369–384, 2001. [4] Y. Zhang and V. Paxson, “Detecting stepping stones,” in Proc. the 9th USENIX Security Symposium, pp. 171–184, August 2000. [5] K. Yoda and H. Etoh, “Finding a connection chain for tracing intruders,” in 6th European Symposium on Research in Computer Security, Lecture Notes in Computer Science 1895, (Toulouse, France), October 2000. [6] X. Wang, D. Reeves, and S. Wu, “Inter-packet delay-based correlation for tracing encrypted connections through stepping stones,” in 7th European Symposium on Research in Computer Security, Lecture Notes in Computer Science 2502, pp. 244–263, 2002. [7] X. Wang and D. Reeves, “Robust correlation of encrypted attack traffic through stepping stones by manipulation of interpacket delays,” in Proc. of the 2003 ACM Conference on Computer and Communications Security, pp. 20–29, 2003. [8] A. Blum, D. Song, and S. Venkataraman, “Detection of Interactive Stepping Stones: Algorithms and Confidence Bounds,” in Conference of Recent Advance in Intrusion Detection (RAID), (Sophia Antipolis, French Riviera, France), September 2004. [9] T. He and L. Tong, “Detecting Encrypted Stepping-stone Connections,” Tech. Rep. ACSPTR-01-06-02, Cornell University, January 2006. http://acsp.ece.cornell.edu/pubR.html. [10] D. Cox and H. Miller, The Theory of Stochastic Processes. New York: John Wiley & Sons Inc., 1965.