Detecting Encrypted Interactive Stepping-Stone Connections

Ting He and Lang Tong
Network intruders often hide their identities by sending attacks
through a chain of compromised hosts that are used as “stepping
stones”. The difficulty in defending against such attacks lies in detecting stepping-stone connections at the compromised hosts. In
this paper, to distinguish normal from attacking connections, we
consider strategies that do not depend on the content of the traffic so that they are applicable to encrypted traffic. We propose a
low complexity detection algorithm that has no miss detection and
an exponentially-decaying false alarm probability. A sequential
strategy is then developed to reduce the required number of testing
Keywords: Stepping-stone detection, intrusion detection algorithms, encrypted stepping-stone attacks, interactive steppingstone attacks.
Stepping-stone attack is a common way for network intruders to
conceal their identity. In a stepping-stone attack, the attacker compromises (multiple) hosts as relay machines, constructs a chain of
connections through these hosts using remote login such as Telnet
or SSH, and then sends attacking commands through this chain to
the victim [1]. Because each connection is made by a separate remote login, the next host in the chain can only see the identity of
its immediate upper stream neighbor, and the victim only sees the
identity of the last host. Therefore, we have to trace back the chain
to find the origin of an attack. Such tracing can be overwhelming because of the huge volume and highly dynamic nature of the
network traffic.
To address this issue, Donoho et al. propose in [2] to install
stepping-stone monitors at each gateway node to detect steppingstone pairs1 by examining the incoming/outgoing traffic. In practice, the monitor has to make decisions by observing live traffic,
which may not include the beginning or the end of the connection. Therefore, it is desirable that the detection strategy does not
require synchronization between incoming and outgoing streams.
T. He and L. Tong ({th255,lt35} are with the School
of Electrical and Computer Engineering, Cornell University, Ithaca, NY
This work is supported in part by TRUST (The Team for Research in
Ubiquitous Secure Technology), which receives support from the National
Science Foundation (NSF award number CCF-0424422) and the following
organizations: Cisco, ESCHER, HP, IBM, Intel, Microsoft, ORNL, Qualcomm, Pirelli, Sun and Symantec, and the U. S. Army Research Laboratory under the Collaborative Technology Alliance Program, Cooperative
Agreement DAAD19-01-2-0011. The U. S. Government is authorized to
reproduce and distribute reprints for Government purposes notwithstanding any copyright notation thereon.
1 A pair of incoming and outgoing streams is called a stepping-stone
pair if it is part of a stepping-stone attack. Otherwise, the pair is normal.
Besides, the connections may be encrypted (e.g., SSH sessions) so
that the monitor cannot rely on the content of the traffic. Furthermore, a careful attacker may even actively modify the traffic each
time it passes through a host in order to confuse the monitor.
1.1. Related Work
Staniford and Heberlein [1] are the first to consider the problem
of stepping-stone connection detection. The early work is based
on the content of the traffic, e.g., see [1, 3]. These techniques help
to recognize connections on the same intrusion path by analyzing
the content of the packets. Later on timing characteristics of the
traffic are used to detect encrypted stepping-stone connections, examples of which include [4–6]. The drawback of these techniques
is that they are vulnerable to the active timing perturbation by the
There are a few results on detecting encrypted, perturbed steppingstone connections; see [2, 7, 8]. The key assumption of these approaches is that there is a limit on the attacker’s ability to alter
the traffic. Specifically, in [2] it is assumed that there is a maximum tolerable delay for attacking packets, in [7] the attacker’s timing perturbation is independent and identically distributed across
packets, and in [8] there are constraints not only on the maximum
delay, but also on the maximum number of packets that can be sent
during the delay. From an algorithmic point of view, Blum, Song
and Venkataraman [8] develop the first detection algorithms which
require provable (polynomial) sample sizes to achieve certain false
alarm probabilities.
1.2. Summary of Results and Organization
Our work is based on the same assumptions as in [8]. In this paper,
we consider detecting encrypted, interactive stepping-stone connections. By “encrypted” we mean that we cannot use the content
of packets. “Interactive connections” means that the new commands to be sent depend on the feedback of previous commands.
Therefore, the attacker cannot wait too long for the packets to be
relayed, and cannot issue packets too fast because he needs time to
process the feedback. With these constraints, we reduce the problem to testing pairs of independent point processes against relayed
point processes with bounded delay and bounded peak rate.
We take an algorithmic approach. Noticing that under the
bounded delay and bounded peak rate assumption, the maximum
variation (defined later) for stepping-stone pairs is always bounded,
we develop a detection algorithm based on the maximum variation
statistics. The algorithm has no miss detection and an exponentiallydecaying false alarm probability. Moreover, we explore the possibility of reducing sample size by using sequential detection. Specifically, we propose an iterative algorithm which distributes the total
false alarm constraint among iterations. For given false alarm constraint, we show how to decide the detection threshold adaptively
so that the constraint is satisfied. We also consider how to distribute the false alarm constraint to minimize the maximum sample size, and we show that the minimax distribution reduces the
sequential algorithm to a fixed-sample-size algorithm. Our analysis focuses on the error exponent of various algorithms. We show
that although our algorithm needs the same order of sample size
as the algorithm proposed in [8], our algorithm has a much larger
error exponent.
The rest of the paper is organized as follows. Section 2 defines the problem. Section 3 presents several detection algorithms
together with the performance analysis and comparison. There are
also simulation results to verify our analysis.
Let S1 , S2 (Si = (. . . , s−1 , s0 , s1 , . . .), i = 1, 2) be the
incoming and outgoing streams at a particular gateway node2 , and
Ti = {. . . , s−1 , s0 , s1 , . . .} be the set of the elements in Si .
Assume that if (S1 , S2 ) is a normal pair, they are independent
Poisson processes. If (S1 , S2 ) is a stepping-stone pair, then there
exists a bijection g : T1 → T2 such that 0 ≤ g(s) − s ≤ ∆ for any
s ∈ T1 ; furthermore, |{s ∈ S1 : s ∈ [t, t + ∆]}| ≤ p∆ for any
t. Here ∆ is the maximum tolerable delay, and p∆ is the largest
number of packets the attacker can send within ∆3 . We want to
test the following binary hypotheses:
H0 :
H1 :
(S1 , S2 ) is a normal pair,
(S1 , S2 ) is a stepping-stone pair.
by observing (s1 , s2 , s3 , . . .) (i = 1, 2).
(s1 ,
s2 , . . .)
(s1 ,
where I· is the indicator function. Define the cumulative difference
d(w) and the maximum variation v(w) as
N2 (I)
N1 (I)
Fig. 1. (i): N2 (I) > N1 (I); (ii): N1 (I) > N2 (I).
at most the packets transmitted in [a − ∆, b], and at least those
transmitted in [a, b − ∆]; see Fig. 1.
Therefore, for any interval I, −p∆ ≤ N1 (I) − N2 (I) ≤ p∆ .
We can use the maximum difference |N1 (I) − N2 (I)| over all
I to detect stepping-stone pairs. Noticing that
|(N1 (j) − N1 (i)) − (N2 (j) − N2 (i))| = v(w),
we can equivalently use the maximum variation to detect steppingstone pairs. The algorithm is shown in Table 3.1.
dmax = dmin = 0;
for w = 1 :½n
d(w − 1) + 1 if sw ∈ S1
d(w) =
d(w − 1) − 1 if sw ∈ S2
dmax = max(dmax , d(w));
dmin = min(dmin , d(w));
if dmax − dmin > p∆ return NORMAL;
return ATTACK;
s2 , . . .)
and order the union as
s1 , s2 , s3 , . . .. Let Ni (w)(i = 1, 2) be the number of packets from
∆ P
Si when the total number of packets is w, i.e., Ni (w)=
Isj ∈Si ,
d(w)=N1 (w) − N2 (w),
N2 (I)
N1 (I)
v(w)= max d(i) − min d(i).
Algorithm DMV has time complexity O(n) and uses only
constant memory (O(log p∆ ), to be precise). By Proposition 3.1,
any stepping-stone pair will be detected after n packets, i.e., miss
detection is totally avoided. We only need to be concerned about
the false alarm probability, which is bounded as follows.
Theorem 3.2 The false alarm probability of DMV is bounded by
PF (DMV) ≤
Given interval I, let Ni (I) be the number of packets on Si in the
interval I. We notice that the stepping-stone pairs have bounded
difference in Ni (I), as stated in the following proposition:
where ρ = cos
p∆ +2
lim −
log PF (DMV) = − log ρ.
Proposition 3.1 For stepping-stone pairs, we have
|N1 (I) − N2 (I)| ≤ p∆ ∀ interval I.
Proof: Let I = [a, b]. N1 (I) is the number of incoming
packets in I. By the bounded delay assumption, N2 (I) can include
(p∆ + 1) n
ρ ,
Proof: See Appendix.
2 s(i)
is the arrival epoch of the kth packet on stream i since the monitor
starts (if k ≤ 0, it is the (−k + 1)th packet before the monitor starts).
3 The notion of ∆ and p is first used in [2] and [8] respectively. Here
we do not consider inserting chaff packets. See [2] for the description of
such scenario.
Remarks: For given false alarm constraint δ, making the upper bound in Theorem 3.2 equal to δ yields a sample size
PF vs. sample size n
p∆ = 4, ∆ = 1
DA lower bound
DMV upper bound
log δ(1 − ρ) − log (p∆ + 1)
p∆ ´
= O p2∆ log
log ρ
Blum et al. [8] proposed an algorithm called “DETECT-ATTACKS”
(DA) for stepping-stone detection. Algorithm DA divides samples
into groups of 2(p∆ + 1)2 packets each, and computes the cumulative difference d(w) for each group. It returns NORMAL
if |d(w)| > p∆ in any of the groups. Blum et al. prove that
2(p∆ + 1)2 log 1δ packets are required to guarantee a false alarm
probability no more than δ.
We point out that DMV always outperforms DA4 . The reason
is that since v(w) ≥ max |d(i)|, for every realization, if DMV
has a false alarm, DA must have a false alarm too.
Now we compare their false alarm probabilities. We have the
following lemma:
Lemma 3.3 For independent Poisson processes and large p∆ ,
PF (DA) ≥
where σ = cos
2(p∆ +1)
2(p∆ +1)2
and K =
sin 2(p π +1)
2(p∆ +1)(1−σ)
Fig. 2. PF of algorithms DA and DMV. (Assume normal pairs are
independent Poisson processes with the same rate.)
Using v(w) as statistics, we obtain from the proof of Theorem
3.2 that, for normal pairs,
τw cos τwπ+1
Pr{v(w) < τw } ≤
=f (τw , w), for any τw ≥ 1.
1 − cos τwπ+1
Proof: See Appendix.
From Lemma 3.3 we have that for large p∆ , the error exponent
If τw = sup{integer k : f (k, w) ≤ qw δ}, and the decision rule is
to return ATTACK if v(w) < τw , then the false alarm probability
of the wth iteration is bounded by qw δ. Therefore we have the
sequential algorithm in Table 3.2.
of DA is at most − log (K 2(p∆ +1)2 σ). By Taylor expansion,
− log ρ
− log (K 2(p∆ +1)2 σ)
2(p∆ + 2)2
+ log π2
2(p∆ + 1)2
SEQUENTIAL-DMV(S1 , S2 , p∆ , δ, q):
Therefore, for large p∆ , the false-alarm error exponent of DMV is
at least 3.38 times larger than that of DA.
Fig. 2 plots the simulated false alarm probabilities of DA and
DMV and their bounds5 . It confirms our claim that the false alarm
probability of DMV decays much faster than that of DA.
dmax = dmin = 0;
for w = 1, ½2, . . .
d(w − 1) + 1 if sw ∈ S1
d(w) =
d(w − 1) − 1 if sw ∈ S2
dmax = max(dmax , d(w));
dmin = min(dmin , d(w));
if dmax − dmin > p∆ return NORMAL;
τ = sup{integer k : f (k, w) ≤ qw δ};
if dmax − dmin < τ return ATTACK;
In both DA and DMV, the decision of ATTACK requires a fixed
sample size. We hope to make ATTACK decisions sequentially so
that we can possibly use fewer packets. To this end, we propose
to use an iterative algorithm and divide the total false alarm constraint among iterations. Specifically, we split the total false alarm
probability δ into q1 δ, q2 δ, q3 δ, . . ., where q = (q1 , q2 , . . .) sat∞
qw = 1. If, in each iteration w, the false
isfies qw ≥ 0 and
alarm probability is bounded by qw δ, then by union bound we see
that the total false alarm will be bounded by δ.
4 Note that in terms of the order of sample size with respect to p and
δ, DMV and DA are comparable.
5 Note that the sample size of DA has to be a multiple of the group size
2(p∆ + 1)2 .
Algorithm SDMV also uses the maximum variation as the
statistic, and therefore can be thought of as a sequential version
of DMV. The vector q is part of the algorithm design. Ideally, we
want to choose q to minimize the average sample size. If the attacker’s strategy is unknown, then we may wish to minimize the
largest sample size. Specifically, if the attacker does the best to
evade detection by keeping v(w) = p∆ for all w ≥ p∆ , then the
best q is
(p∆ + 1) cos p∆π+2
qn = 1 for n = inf w :
1 − cos p +2
and qw = 0 for all w 6= n. That is, the minimax q reduces SDMV
to the fixed-sample-size algorithm DMV.
In this paper, we consider detecting stepping-stone connections
with bounded delay and bounded peak rate. Our techniques can
rule out independent connection pairs with provable confidence
and hopefully leave a much smaller number of suspicious connections for further examination. Therefore, they are most useful in
the scenario when the total volume of the traffic to be analyzed
is large. In [9], we consider detecting stepping-stone connections
with bounded delay only or bounded memory. These more general
assumptions will make the detection techniques easier to apply in
and then applying (4) with p = q = 21 . Furthermore, by (5) it is
easy to see that lim − n1 log PF ( DMV) = − log ρ.
For the proof of Lemma 3.3, note that DA has false alarm in
a given group if and only if the maximum |d(i)| in that group is
within p∆ . Observing that
i∈{1,..., 2(p∆ +1)2 }
|d(i)| ≤ p∆ } =
Pr{N−(p∆ +1), (p∆ +1) > 2(p∆ + 1)2 },
we apply (6) with a = b = p∆ + 1 to lower bound the false alarm
of DA on one group. Then taking the product over all groups gives
the desired result.
5.1. Proof of Theorem 3.2 and Lemma 3.3
The proof is based on the theory of random walk. Let {Xn }n≥0
be a simple random walk, i.e.,
X0 = 0, Xn = Z1 + Z2 + . . . + Zn , (n > 0)
where {Zi }i=1, 2,... are i.i.d. random variables taking value in
{−1, 0, 1}. Let p = Pr{Zi = 1}, q = Pr{Zi = −1}. Define
the hitting time of −b or a (a, b ≥ 0) as
N−b, a = inf{n ≥ 1 : Xn = −b or a}.
In [10], it is proved that
Pr{N−b, a
where s1 =
= n} ≤
µ ¶a/2
µ ¶b/2
1 q
, (4)
n−1 +
2 p
1−p−q+2(pq) 2 cos ( a+b
lim −
, and
log Pr{N−b, a > n} = log s1 .
If a = b, then for large n,
Pr{N−b, a = n} ≥
n−1 .
For the proof of Theorem 3.2, note that for independent Poisson processes, it is known that d(w) is a simple random walk. Define extreme values Un = max d(i), Ln = min d(i). A
i=0,..., n
i=0,..., n
false alarm occurs in DMV if and only if Un −Ln < p∆ +1. Note
that the false alarm probability is the largest if d(w) is symmetric
(i.e., p = q = 12 ). We have
Pr{Un − Ln < p∆ + 1}
p∆ +1
{Un < a, Ln > −(p∆ + 2 − a)}}
p∆ +1
Pr{Un < a, Ln > −(p∆ + 2 − a)}(7)
where ρ = cos
(p∆ + 1)
p∆ +2
Here (7) is by union bound, and (8) is by
Pr{Un < a, Ln > −(p∆ +2−a)} = Pr{N−(p∆ +2−a), a > n},
[1] S. Staniford-Chen and L. Heberlein, “Holding intruders accountable on the internet,” in Proc. the 1995 IEEE Symposium on Security and Privacy, (Oakland, CA), pp. 39–49,
May 1995.
[2] D. Donoho, A. Flesia, U. Shankar, V. Paxson, J. Coit, and
S. Staniford, “Multiscale stepping-stone detection: Detecting
pairs of jittered interactive streams by exploiting maximum
tolerable delay,” in 5th International Symposium on Recent
Advances in Intrusion Detection, Lecture Notes in Computer
Science 2516, 2002.
[3] X. Wang, D. Reeves, S. Wu, and J. Yuill, “Sleepy watermark
tracing: An active network-based intrusion response framework,” in Proc. of the 16th International Information Security Conference, pp. 369–384, 2001.
[4] Y. Zhang and V. Paxson, “Detecting stepping stones,” in
Proc. the 9th USENIX Security Symposium, pp. 171–184,
August 2000.
[5] K. Yoda and H. Etoh, “Finding a connection chain for tracing
intruders,” in 6th European Symposium on Research in Computer Security, Lecture Notes in Computer Science 1895,
(Toulouse, France), October 2000.
[6] X. Wang, D. Reeves, and S. Wu, “Inter-packet delay-based
correlation for tracing encrypted connections through stepping stones,” in 7th European Symposium on Research
in Computer Security, Lecture Notes in Computer Science
2502, pp. 244–263, 2002.
[7] X. Wang and D. Reeves, “Robust correlation of encrypted attack traffic through stepping stones by manipulation of interpacket delays,” in Proc. of the 2003 ACM Conference on
Computer and Communications Security, pp. 20–29, 2003.
[8] A. Blum, D. Song, and S. Venkataraman, “Detection of
Interactive Stepping Stones: Algorithms and Confidence
Bounds,” in Conference of Recent Advance in Intrusion Detection (RAID), (Sophia Antipolis, French Riviera, France),
September 2004.
[9] T. He and L. Tong,
“Detecting Encrypted
Stepping-stone Connections,”
Tech. Rep. ACSPTR-01-06-02,
Cornell University,
January 2006.
[10] D. Cox and H. Miller, The Theory of Stochastic Processes.
New York: John Wiley & Sons Inc., 1965.