tcp(7P)
NAME
tcp, TCP - Internet Transmission Control Protocol
SYNOPSIS
#include <sys/socket.h>
#include <netinet/in.h>
s = socket(AF_INET, SOCK_STREAM, 0);
s = socket(AF_INET6, SOCK_STREAM, 0);
t = t_open("/dev/tcp", O_RDWR);
t = t_open("/dev/tcp6", O_RDWR);
DESCRIPTION
TCP is the virtual circuit protocol of the Internet protocol
family. It provides reliable, flow-controlled, in order,
two-way transmission of data. It is a byte-stream protocol
layered above the Internet Protocol (IP), or the Internet
Protocol Version 6 (IPv6), the Internet protocol family's
internetwork datagram delivery protocol.
Programs can access TCP using the socket interface as a
SOCK_STREAM socket type, or using the Transport Level Inter-
face (TLI) where it supports the connection-oriented
(T_COTS_ORD) service type.
TCP uses IP's host-level addressing and adds its own per-
host collection of "port addresses." The endpoints of a TCP
connection are identified by the combination of an IP or
IPv6 address and a TCP port number. Although other proto-
cols, such as the User Datagram Protocol (UDP), may use the
same host and port address format, the port space of these
protocols is distinct. See inet(7P) and inet6(7p) for
details on the common aspects of addressing in the Internet
protocol family.
Sockets utilizing TCP are either "active" or "passive."
Active sockets initiate connections to passive sockets.
Both types of sockets must have their local IP or IPv6
address and TCP port number bound with the bind(3SOCKET)
system call after the socket is created. By default, TCP
sockets are active. A passive socket is created by calling
the listen(3SOCKET) system call after binding the socket
with bind(). This establishes a queueing parameter for the
passive socket. After this, connections to the passive
socket can be received with the accept(3SOCKET) system call.
Active sockets use the connect(3SOCKET) call after binding
to initiate connections.
By using the special value INADDR_ANY with IP, or the
unspecified address (all zeroes) with IPv6, the local IP
address can be left unspecified in the bind() call by either
active or passive TCP sockets. This feature is usually used
if the local address is either unknown or irrelevant. If
left unspecified, the local IP or IPv6 address will be bound
at connection time to the address of the network interface
used to service the connection.
Once a connection has been established, data can be
exchanged using the read(2) and write(2) system calls.
Under most circumstances, TCP sends data when it is
presented. When outstanding data has not yet been ack-
nowledged, TCP gathers small amounts of output to be sent in
a single packet once an acknowledgement has been received.
For a small number of clients, such as window systems that
send a stream of mouse events which receive no replies, this
packetization may cause significant delays. To circumvent
this problem, TCP provides a socket-level boolean option,
TCP_NODELAY. TCP_NODELAY is defined in <netinet/tcp.h>, and
is set with setsockopt(3SOCKET) and tested with
getsockopt(3SOCKET). The option level for the setsockopt()
call is the protocol number for TCP, available from
getprotobyname(3SOCKET).
Another socket level option, SO_RCVBUF, can be used to con-
trol the window that TCP advertises to the peer. IP level
options may also be used with TCP. See ip(7P) and ip6(7p).
TCP provides an urgent data mechanism, which may be invoked
using the out-of-band provisions of send(3SOCKET). The
caller may mark one byte as "urgent" with the MSG_OOB flag
to send(3SOCKET). This sets an "urgent pointer" pointing to
this byte in the TCP stream. The receiver on the other side
of the stream is notified of the urgent data by a SIGURG
signal. The SIOCATMARK ioctl(2) request returns a value
indicating whether the stream is at the urgent mark. Because
the system never returns data across the urgent mark in a
single read(2) call, it is possible to advance to the urgent
data in a simple loop which reads data, testing the socket
with the SIOCATMARK ioctl() request, until it reaches the
mark.
Incoming connection requests that include an IP source route
option are noted, and the reverse source route is used in
responding.
A checksum over all data helps TCP implement reliability.
Using a window-based flow control mechanism that makes use
of positive acknowledgements, sequence numbers, and a
retransmission strategy, TCP can usually recover when
datagrams are damaged, delayed, duplicated or delivered out
of order by the underlying communication medium.
If the local TCP receives no acknowledgements from its peer
for a period of time, as would be the case if the remote
machine crashed, the connection is closed and an error is
returned to the user. If the remote machine reboots or oth-
erwise loses state information about a TCP connection, the
connection is aborted and an error is returned to the user.
SunOS supports TCP Extensions for High Performance (RFC
1323) which includes the window scale and time stamp
options, and Protection Against Wrap Around Sequence
Numbers (PAWS). SunOS also supports Selective Acknowledgment
(SACK) capabilities (RFC 2018) and Explicit Congestion
Notification (ECN) mechanism (RFC 3168).
Turn on the window scale option in one of the following
ways:
o An application can set SO_SNDBUF or SO_RCVBUF size in
the setsockopt() option to be larger than 64K. This
must be done before the program calls listen() or con-
nect(), because the window scale option is negotiated
when the connection is established. Once the connec-
tion has been made, it is too late to increase the
send or receive window beyond the default TCP limit of
64K.
o For all applications, use ndd(1M) to modify the confi-
guration parameter tcp_wscale_always. If
tcp_wscale_always is set to 1, the window scale option
will always be set when connecting to a remote sys-
tem. If tcp_wscale_always is 0, the window scale
option will be set only if the user has requested a
send or receive window larger than 64K. The default
value of tcp_wscale_always is 0.
o Regardless of the value of tcp_wscale_always, the win-
dow scale option will always be included in a connect
acknowledgement if the connecting system has used the
option.
Turn on SACK capabilities in the following way:
o Use ndd to modify the configuration parameter
tcp_sack_permitted. If tcp_sack_permitted is set to 0,
TCP will not accept SACK or send out SACK information.
If tcp_sack_permitted is set to 1, TCP will not ini-
tiate a connection with SACK permitted option in the
SYN segment, but will respond with SACK permitted
option in the SYN|ACK segment if an incoming
connection request has the SACK permitted option.
This means that TCP will only accept SACK information
if the other side of the connection also accepts SACK
information. If tcp_sack_permitted is set to 2, it
will both initiate and accept connections with SACK
information. The default for tcp_sack_permitted is 2
(active enabled).
Turn on TCP ECN mechanism in the following way:
o Use ndd to modify the configuration parameter
tcp_ecn_permitted. If tcp_ecn_permitted is set to 0,
TCP will not negotiate with a peer that supports ECN
mechanism. If tcp_ecn_permitted is set to 1 when ini-
tiating a connection, TCP will not tell a peer that it
supports ECN mechanism. However, it will tell a peer
that it supports ECN mechanism when accepting a new
incoming connection request if the peer indicates that
it supports ECN mechanism in the SYN segment. If
tcp_ecn_permitted is set to 2, in addition to nego-
tiating with a peer on ECN mechanism when accepting
connections, TCP will indicate in the outgoing SYN
segment that it supports ECN mechanism when TCP makes
active outgoing connections. The default for
tcp_ecn_permitted is 1.
Turn on the time stamp option in the following way:
o Use ndd to modify the configuration parameter
tcp_tstamp_always. If tcp_tstamp_always is 1, the time
stamp option will always be set when connecting to a
remote machine. If tcp_tstamp_always is 0, the
timestamp option will not be set when connecting to a
remote system. The default for tcp_tstamp_always is 0.
o Regardless of the value of tcp_tstamp_always, the time
stamp option will always be included in a connect ack-
nowledgement (and all succeeding packets) if the con-
necting system has used the time stamp option.
Use the following procedure to turn on the time stamp option
only when the window scale option is in effect:
o Use ndd to modify the configuration parameter
tcp_tstamp_if_wscale. Setting tcp_tstamp_if_wscale to
1 will cause the time stamp option to be set when con-
necting to a remote system, if the window scale option
has been set. If tcp_tstamp_if_wscale is 0, the time
stamp option will not be set when connecting to a
remote system. The default for tcp_tstamp_if_wscale
is 1.
Protection Against Wrap Around Sequence Numbers (PAWS) is
always used when the time stamp option is set.
SunOS also supports multiple methods of generating initial
sequence numbers. One of these methods is the improved
technique suggested in RFC 1948. We HIGHLY recommend that
you set sequence number generation parameters to be as close
to boot time as possible. This prevents sequence number
problems on connections that use the same connection-ID as
ones that used a different sequence number generation.
The /etc/init.d/inetinit script contains commands which
configure initial sequence number generation. The script
reads the value contained in the configuration file
/etc/default/inetinit to determine which method to use.
The /etc/default/inetinit file is an unstable interface, and
may change in future releases.
TCP may be configured to report some information on connec-
tions that terminate by means of an RST packet. By default,
no logging is done. If the ndd(1M) parameter tcp_trace is
set to 1, then trace data is collected for all new connec-
tions established after that time.
The trace data consists of the TCP headers and IP source and
destination addresses of the last few packets sent in each
direction before RST occurred. Those packets are logged in a
series of strlog(9F) calls. This trace facility has a very
low overhead, and so is superior to such utilities as
snoop(1M) for non-intrusive debugging for connections ter-
minating by means of an RST.
SEE ALSO
ndd(1M), ioctl(2), read(2), write(2), accept(3SOCKET),
bind(3SOCKET), connect(3SOCKET), getprotobyname(3SOCKET),
getsockopt(3SOCKET), listen(3SOCKET), send(3SOCKET),
inet(7P), inet6(7P), ip(7P), ip6(7P)
Ramakrishnan, K., Floyd, S., Black, D., RFC 3168, The Addi-
tion of Explicit Congestion Notification (ECN) to IP, Sep-
tember 2001.
Mathias, M. and Hahdavi, J. Pittsburgh Supercomputing
Center; Ford, S. Lawrence Berkeley National Laboratory;
Romanow, A. Sun Microsystems, Inc. RFC 2018, TCP Selective
Acknowledgement Options, October 1996.
Bellovin, S., RFC 1948, Defending Against Sequence Number
Attacks, May 1996.
Jacobson, V., Braden, R., and Borman, D., RFC 1323, TCP
Extensions for High Performance, May 1992.
Postel, Jon, RFC 793, Transmission Control Protocol - DARPA
Internet Program Protocol Specification, Network Information
Center, SRI International, Menlo Park, CA., September 1981.
DIAGNOSTICS
A socket operation may fail if:
EISCONN
A connect() operation was attempted on a socket on
which a connect() operation had already been per-
formed.
ETIMEDOUT
A connection was dropped due to excessive retransmis-
sions.
ECONNRESET
The remote peer forced the connection to be closed
(usually because the remote machine has lost state
information about the connection due to a crash).
ECONNREFUSED
The remote peer actively refused connection establish-
ment (usually because no process is listening to the
port).
EADDRINUSE
A bind() operation was attempted on a socket with a
network address/port pair that has already been bound
to another socket.
EADDRNOTAVAIL
A bind() operation was attempted on a socket with a
network address for which no network interface exists.
EACCES
A bind() operation was attempted with a "reserved"
port number and the effective user ID of the process
was not the privileged user.
ENOBUFS
The system ran out of memory for internal data struc-
tures.
Man(1) output converted with
man2html