presentation

Design of the iSCSI
Protocol
Kalman Meth
Julian Satran
IBM Haifa Research Lab
Overview
What is iSCSI?
Why TCP?
Alternatives to TCP
Drawbacks of TCP
Data Transfer Model
Data Placement
Recovery
What is iSCSI?
SCSI is a protocol for I/O devices such as disk,
tape, CD ROM
iSCSI = Internet SCSI = SCSI over TCP/IP
send SCSI commands over an IP network
Related SCSI transport technologies
SCSI Fibre Channel Protocol (FCP)
Serial Storage Architecture (SSA)
Serial Bus Protocol (SBP)
SCSI over Infiniband?
Classic SAN vs. iSCSI
Classic SAN
Clients
Database
HTTP
File
iSCSI
Clients
Database
File
Data IP
Network
Servers
Web
Server
File
Server
Database
Server
HTTP
Database
Server
Data and Storage
IP Network
Storage
FibreChannel
Network
File
Server
Storage
Storage
Web
Server
Layered Packet Format
Ethernet header
(14)
IP header
(20)
TCP header
(20)
iSCSI header
(48)
data ...
Why TCP?
Reliable connection protocol
Works over a variety of physical media
Implemented on a wide variety of machines
Field proven and scalable
End-to-end connection model independent of the
underlying network
SCSI over TCP
Alternatives
SCSI over ...
Ethernet
IP
UDP
SCTP
Exploit features of TCP/IP
TCP features
automatic acknowledgment
retransmission of lost and corrupted packets
guaranteed in-order delivery
congestion control
IP-family features
IPSec (security)
SLP (discovery)
DHCP (configuration)
Drawbacks of using TCP
Limited by TCP window size
cannot achieve maximum throughput on a single TCP
connection
Lost TCP packet causes delay in delivery of
subsequent packets
if lose TCP packet, don't know where to find next
iSCSI header(s)
TCP checksum not sufficient for storage data
integrity
TCP usually entails multiple copying of data
Sessions
Collection of TCP connections between an Initiator
and a Target
overcome bandwidth limitations imposed by TCP
window size
utilize multiple CPUs in an SMP
Connections of a session may traverse different
physical interconnects
aggregate bandwidth from multiple interconnects
Must now coordinate between multiple TCP
connections
Sessions (cont.)
physical interconnect
TCP Connections
physical interconnect
Initiator
TCP Connections
Target
Data Transfer Model
Asymmetric
single control channel
multiple data channels
control channel used to transfer commands, status,
task management
Symmetric
all channels identical
send data and status over same channel as
corresponding command
Data Transfer Model
(continued)
Advantages of Asymmetric model
no backlog of data to block control channel
Task Management operation can always be timely
delivered
Advantages of Symmetric model
iSCSI adapter can be self-contained
no need to transfer command between adapters
simpler software implementations
RDMA descriptors
iSCSI Task Tags can be RDMA descriptors
used together with offset and length fields
Initiator Task Tag
provided in Command PDUs
copied to Data-In PDUs
Target Task Tag
provided on R2T
copied to Data-out PDUs
SCSI Command PDU
Byte /
0
|
1
|
2
|
3
|
/
|
|
|
|
|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|
+---------------+---------------+---------------+---------------+
0|.|I| 0x01
|F|R|W|0 0|ATTR | Reserved
|
+---------------+---------------+---------------+---------------+
4|TotalAHSLength | DataSegmentLength
|
+---------------+---------------+---------------+---------------+
8| Logical Unit Number (LUN)
|
+
+
12|
|
+---------------+---------------+---------------+---------------+
16| Initiator Task Tag
|
+---------------+---------------+---------------+---------------+
20| Expected Data Transfer Length
|
+---------------+---------------+---------------+---------------+
24| CmdSN
|
+---------------+---------------+---------------+---------------+
28| ExpStatSN
|
+---------------+---------------+---------------+---------------+
32/ SCSI Command Descriptor Block (CDB)
/
+/
/
+---------------+---------------+---------------+---------------+
48| AHS (if any), Header Digest (if any)
|
+---------------+---------------+---------------+---------------+
/ (DataSegment - Command Data + Data Digest (if any))(optional) /
+/
/
+---------------+---------------+---------------+---------------+
Data-in PDU
Byte /
0
|
1
|
2
|
3
|
/
|
|
|
|
|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|
+---------------+---------------+---------------+---------------+
0|.|.| 0x25
|F|A|0 0 0|O|U|S| Reserved
|Status or Rsvd |
+---------------+---------------+---------------+---------------+
4|TotalAHSLength | DataSegmentLength
|
+---------------+---------------+---------------+---------------+
8| LUN or Reserved
|
+
+
12|
|
+---------------+---------------+---------------+---------------+
16| Initiator Task Tag
|
+---------------+---------------+---------------+---------------+
20| Target Transfer Tag or 0xffffffff
|
+---------------+---------------+---------------+---------------+
24| StatSN or Reserved
|
+---------------+---------------+---------------+---------------+
28| ExpCmdSN
|
+---------------+---------------+---------------+---------------+
32| MaxCmdSN
|
+---------------+---------------+---------------+---------------+
36| DataSN
|
+---------------+---------------+---------------+---------------+
40| Buffer Offset
|
+---------------+---------------+---------------+---------------+
44| Residual Count
|
+---------------+---------------+---------------+---------------+
48
Out of Order Data
Placement
TCP delivers data in order
If packet is dropped (at 10 Gb) or have digest error,
have big data backlog
either store data on adapter (100s of MBs)
save data in temporary host memory and copy
drop data after missing packet
Use markers (or framing) to find next iSCSI PDU
Place data (from next PDU) in memory
don't yet inform application of data arrival
preserve TCP ordering semantics
Framing
Can we tell from TCP packet where next iSCSI/ULP
(Upper Level Protocol) packet begins?
Needed when a packet is dropped or corrupted to
jump to next iSCSI/ULP packet
IETF Working Group looking into problem
No agreed upon mechanism yet
iSCSI Recovery
Main reasons for iSCSI-level recovery
TCP connections occasionally break
maintain session across new connection
Digest errors
Critical for long distance and tape operations
do not want to restart a large data transfer due to
a transient TCP problem
Levels of Recovery
Session Recovery (required)
Connection Recovery
Recovery within connection
Recovery within command
Summary
iSCSI leverages existing features of TCP and the IP
family of protocols
iSCSI was designed with features to overcome TCP
limitations
sessions with multiple connections
CRC digests
possible out of order data placement
Multiple recovery options for different environments
Design of the iSCSI
Protocol
Kalman Meth
Julian Satran
IBM Haifa Research Lab
Separate or Common
Network
Clients
Database
HTTP
File
Clients
Database
File
Data IP
Network
Servers
Web
Server
File
Server
Database
Server
HTTP
Database
Server
Data and Storage
IP Network
Storage IP
Network
File
Server
Storage
Storage
Web
Server