[INLINE] CIS 307: Unix IV: Sockets [INLINE] Online references: * Primer on Sockets by Jim Frost (Software Tool & Die) * Introductory tutorial on IPC in 4.4BSD-Unix (by S.Sechrest UC-Berkeley) (Postscript) * Advanced tutorial on IPC in 4.4BSD-Unix (by S.Leffler,R.Fabry,W.Joy,P.Lampsey UC-Berkeley, S.Miller,C.Torek U-Maryland) (Postscript) [Introduction], [Client-Server Architecture], [Summary on Socket Functions], [Socket Functions], [Examples] Introduction We examine the functions for communication through sockets. [Though in your practice you may be able to skip this software level and use things (middleware) like DCE RPC (Distributed Computing Environment Remote Procedure Call) still an understanding of the socket API provides a grounding on some of the issues and problems of distributed computations.] A socket is an endpoint used by a process for bidirectional communication with a socket associated with another process. Sockets, introduced in Berkeley Unix, are a basic mechanism for IPC on a computer system, or on different computer systems connected by local or wide area networks. In the following we will not be concerned with networks and data communications. We will take instead a strictly operational viewpoint: how to program with sockets to create communication channels. The communication channel created with sockets can be like a telephone line (connection oriented), with the sockets as telephones over which a conversation can take place. Or the channel can be as when we send mail (datagram oriented), with the sockets as mailboxes. Connection oriented communication is reliable, i.e. the system takes care of errors. Datagram oriented communication is unreliable, i.e. messages can be lost, or they may not be delivered in the order in which they were sent. A socket appears to the user to be like a file descriptor on which we can read, write, and ioctl. In the connection oriented mode, the file is like a sequence of characters that we can read with as many read operations as we like. In the connectionless mode we have to get a whole message in a single read operation. If we don't, what is left over of the message is lost. Though sockets can be used in a single computer system for interprocess communication (the Unix domain), we will only consider their use for communication across computer systems (the Internet domain). It is possible to send message on a socket that take precedence over other undelivered messages. These priority messages are called out-of-band messages. We will not deal with them in this note. We will also not deal with composite messages (sendmsg and recvmsg). A problem in communication is how to identify interlocutors. In the case of phones we have telephone numbers, for mail we have addresses. For communicating between sockets we [usually, since within a single computer we could use file names] identify an interlocutor with a pair: IP address and port. [In reality there is a third component, the protocol, but that will not be relevant to us.] This represents the address or name of the interlocutor. IP addresses (things like 155.247.207.190) are 32 bit unsigned integers (155, 247, 207, 190 are the bytes). An IP address consists of two parts, one identifying a network, the other identifying a computer within that network and can be used in a number of formats. IP addresses are more easily rememberer as host names (things like snowhite.cis.temple.edu). [You may find about IP to host conversions with the nslookup command and by looking in the /etc/hosts file.] [IP addresses, to be exact, identify the Network Interface Card between a computer and a network, and a computer might have a number of such cards connecting to a number of networks. But for brevity we will not worry about this distinction.] A special IP is used to refer to the local host, 127.0.0.1, the loopback localhost. The IP address, 0.0.0.0, is called INADDR_ANY and is tied to all the IP addresses of this machine (used during bootstrap of this system). Another special IP, 255.255.255.255, is used for broadcast to all hosts on the local network of the current machine. And the host id consisting of all 1s is used to broadcast to all the computers of a LAN. Ports are 16 bit unsigned integers. (The first 1024 port numbers are reserved for things like http, 80. You can look in the files /etc/services and /etc/inetd.conf to see standard uses (ftp, telnet, finger, ..) of these ports. From 1024 to 5000 the ports are reserved for allocation by the kernel on demand. Over 5000 the ports can be chosen directly by the users. For instance I use 8080 for my httpd server. The port 0 is used as a wild card, to request the kernel to find a port for us, we do not care which. Client-Server Architecture A standard way of using sockets and communication channels is between clients and servers. A server is a process that is able to carry out some function, called a service, like transferring files, translating host names to IP addresses, or inverting a matrix. A client is a process that requests a server to do a service (say, "translate snowhite.cis.temple.edu"). Typically the server will be at a known IP address and will respond to requests sent to a known port. In some cases that port is not universally known, so the server will advertize the port it is currently using (it may advertize the port by printing out its value, or sending email, or having inetd, a special process, know about it, etc.). In some cases the IP address of the server is not known and one may have a "standard" server that responds to requests of the form "where can I find service Moo" by responding with an appropriate IP address. The client requests the kernel to obtain a free port to be used for communication with the server. The server does not to have to know in advance the identity of its clients. It is ready to accept a message from any interlocutor. When it receives a message from a client, the message itself contains the IP and the port of the client, so that the server knows whom to answer to. An address, host+port, can be used for multiplexing more than one communication channel. So one server can communicate simultaneously with more than one client. Each communication channel on the server will have its own socket bound to the same address. Summary On Socket Functions The following is a summary of the basic socket functions as they are used for datagram and connection oriented service by clients and servers. In the following section we will go in greater detail over these functions. Datagram Service Client socket => (bind => [connect =>] {write => read}*) | {sendto => recvfrom}* => close | shutdown In words: create a socket, then bind it to a local port, establish the address of the server, write and read from it, or just sendto and recvfrom it; then terminate. In the case that client is not interested in a response, it does not need to use bind. Connect is worth using when we send many datagrams to the same server. Server socket => bind => {read | recvfrom => write | sendto}* => close | shutdown In words: create a socket, bind it to a local port, accept and reply to messages from client, terminate. In the case that the server does not need reply to the client, it can just use read instead of recvfrom. Connection Oriented service Client socket => bind => connect => {write | sendto => read | recvfrom }* => close | shutdown In words: create a socket, bind it to a local port, establish the address of the server, communicate with it, terminate. Bind is not needed if we do not want a reply from server. Server socket => bind => listen => {accept => {read | recvfrom => write | sendto}* }* => close | shutdown In words: create a socket, bind it to a local port, set up service with indication of maximum number of concurrent services, accept requests from connection oriented clients, receive messages and reply to them, terminate. Socket Functions Creating a socket #include #include int socket(int domain, int type, int protocol) domain is either AF_UNIX, AF_INET, or AF_OSI, or .. AF_UNIX is the Unix domain, it is used for communication within a single computer system. AF_INET is for communication on the internet to IP addresses. We will only use AF_INET. type is either SOCK_STREAM (TCP, connection oriented, reliable), or SOCK_DGRAM (UDP, datagram, unreliable), or SOCK_RAW (IP level). It is the name of a file if the domain is AF_UNIX. protocol specifies the protocol used. It is usually 0 to say we want to use the default protocol for the chosen domain and type. We always use 0. It returns, if successful, a socket descriptor which is an int. It is -1 in case of failure. Here is a typical call to socket: if ((sd = socket(AF_INET, SOCK_DGRAM, 0) < 0) { perror("socket"); exit(1);} Socket Addresses Here are the structures (OSF Unix) used to store socket addresses as used in the domain AF_INET: struct in_addr { u_long s_addr; }; struct sockaddr_in { u_short sin_family; /*protocol identifier; usually AF_INET */ u_short sin_port; /*port number. 0 means let kernel choose */ struct in_addr sin_addr; /*the IP address. INADDR_ANY refers */ /*IP addresses of the current host.*/ char sin_zero[8];}; /*Unused, always zero */ In order to use struct sockaddr_in you need to include in your program #include The following structure sockaddr is more generic than but compatible with sockaddr_in (both 16 bytes starting with the same field). In the Unix domain we have a different address, sockaddr_un, which is also compatible with sockaddr. In order to use sockaddr_un you need to include in your program #include struct sockaddr { u_short sa_family; char sa_dat[14];}; Binding to a local port #include #include int bind(int sd, struct sockaddr *addr, int addrlen) sd: File descriptor of local socket, as created by the socket function. addr: Pointer to protocol address of this socket. It usually is INADDR_ANY . The port is usually 0 to request the kernel to provide a port. addrlen: Length in bytes of addr. It returns an integer, the return code (0=success, -1=failure) Bind is used to specify for a socket the protocol port number where it will wait for messages. Here is a typical call to bind: struct sockaddr_in name; ..... bzero((char *) &name, sizeof(name)); /*zeroes out sizeof(name) characters*/ name.sin_family = AF_INET; /*use internet domain*/ name.sin_port = htons(0); /*ask kernel to provide a port*/ name.sin_addr.s_addr = htonl(INADDR_ANY); /*use all IPs of host*/ if (bind(sd, (struct sockaddr *)&name, sizeof(name)) < 0) { perror("bind"); exit(1);} A call to bind is optional on the client side, required on the server side. We need to understand the reasons for the calls to htons and htonl. Numbers on different machines may be represented differently (big-endian machines and little-endian machines). So we need to make sure that the right representation is used on each machine. We use functions to convert from host to network form before transmission (htons for short integers, and htonl for long integers), and from network to host form after reception (ntohs for short integers, and ntohl for long integers). The functions bzero zeroes out a buffer of specified length. It is one of a group of functions for dealing with arrays of bytes. bcopy copies a specified number of bytes from a source to a target buffer. bcmp compares a specified number of bytes of two byte buffers. Connecting to a Server A remote process, usually a server, is identified by an IP address and a port number. The connect operation is used on the client side to identify and, possibly, start the connection to the server. It is required in the case of connection oriented communication. In the datagram case it is not required, but, if used, it gives the default name of the interlocutor so that we do not need to repeat it in each message. #include #include int connect(int sd, struct sockaddr *addr, int addrlen) sd file descriptor of local socket addr pointer to protocol address of other socket addrlen length in bytes of address It returns an integer (0=success, -1=failure) Here is a typical call to connect: #define SERV_NAME ... /* say, "snowhite.cis.temple.edu */ #define SERV_PORT ... /* say, 8001 */ struct sockaddr_in servaddr; struct hostent *hp; /* Here we store information about host*/ int sd; /* File descriptor for socket */ ....... /* initialize servaddr */ bzero((char *)&servaddr, sizeof(servaddr)); servaddr.sin_family = AF_INET; servaddr.sin_port = htons(SERV_PORT); hp = gethostbyname(SERV_NAME); if (hp == 0) { fprintf(stderr, "failure to address of %s\n", SERV_NAME); exit(1);} bcopy(hp->h_addr_list[0], (caddr_t)&servaddr.sin_addr, hp->h_length); if (connect(sd, (struct sockaddr *)&servaddr, sizeof(servaddr)) < 0) { perror("connect"); exit(1);} The function gethostbyname is described below. Gethostbyname The function gethostbyname given a host name, like snowhite.cis.temple.edu, returns 0 in case of failure, or a pointer to a struct hostent object which gives information about the host names+aliases+IPaddresses: struct hostent { char *h_name; /* official name of host */ char **h_aliases; /* null terminated list of aliases*/ int h_addrtype; /* host address type */ int h_length; /* length of address */ char **h_addr_list; /* null terminated list of addresses */ /* from name server */ #define h_addr h_addr_list[0] /*address,for backward compatibility*/}; In this structure h_addr_list[0] is the first IP address associated with the host. In order to use this structure you must include in your program: #include Other functions help us find out things about hosts, services, protocols, networks: gethostbyaddr, getprotobyname, getprotobynumber, getprotoent, getservbyname, getservbyport, getservent, getnetbyname, getnetbynumber, getnetent. Listening for a Client The listen function is used on the server in the case of connection oriented communication to prepare a socket to accept messages from clients. It has the form: int listen(int fd, int qlen) fd file descriptor of a socket that has already been bound qlen specifies the maximum number of messages that can wait to be processed by the server while the server is busy servicing another request. It returns an integer (0=success, -1=failure) Here is a typical call to listen: if (listen(sd, 3) < 0) { {perror("listen"); exit(1);} Accepting a connection from a Client The accept function is used on the server in the case of connection oriented communication (after a call to listen) to accept a connection request from a client. #include #include int accept(int fd, struct sockaddr *addressp, int *addrlen) fd is an int, the file descriptor of the socket the server was listening on addressp points to an address. It will be filled with address of the calling client addrlen is an integer that will contain the actual length of address of client It returns an integer representing a new socket (-1 in case of failure). It is the socket that the server will use from now on to communicate with the client that requested connection. Here is a typical call to accept: struct sockaddr_in client_addr; int ssd, csd, length; ........... if ((cfd = accept(ssd, (struct sockaddr *)&client_addr, &length) < 0) { perror("accept"); exit(1);} /* here we give the new socket to a thread or a process that will */ /* handle communication with this client. */ Read, Write We can use a socket like a normal file descriptor and read from it or write to it. In order to do so the socket must be connected to an interlocutor. (Other commands we can use when a socket is connected are send and recv.) Sendto and Recvfrom #include #include int sendto(int sd, char *buff, int len, int flags, struct sockaddr *addressp, int addrlen) sd, socket file descriptor buff, address of buffer with the information to be sent len, size of the message flags, usually 0; could be used for priority messages, etc. addressp, address of process we are sending message to addrlen, length of message It returns number of characters sent. It is -1 in case of failure. #include #include int recvfrom (int sd, char *buff, int len, int flags, struct sockaddr *addressp, int *addrlen) sd, socket file descriptor buff, address of buffer where message will be stored len, size of buffer flags, usually 0; used for priority messages, peeking etc. addressp, buffer that will receive address of process that sent message addrlen, contains size of addressp buffer; it will contain the size of the address It returns number of characters received. It is -1 in case of failure. Shutdown It is like close but more flexible. It allows to close just read operations, or write operations or all. int shutdown(int sd, int action) sd is a socket descriptor action is (0 = close for reads) (1 = close for writes) (2 = close for both reads and writes) It returns an integer (0=success, -1=failure) Getsockname It is used to determine the address to which a socket is bound. int getsockname(int sd, struct sockaddr *addrp, int *addrlen) sd is the socket descriptor of a bound socket. addrp points to a buffer. After the call it will have the address associated to the socket. addrlen gives the size of the buffer. After the call gives size of address. It returns an integer (0=success, -1=failure) Getpeername It is used to obtain the address of the remote host connected to the current socket. It is used when the socket is connected to a remote host. int getpeername(int sd, struct sockaddr *addrp, int *addrlen) sd is the socket descriptor of a bound socket. addrp points to a buffer. After the call it will have the address associated to peer of socket. addrlen gives the size of the buffer. After the call gives size of address. It returns an integer (0=success, -1=failure) Here is an example of use of getpeername: struct sockaddr_in name; in namelen = sizeof(name); ....... if (getpeername(sd, (struct sockaddr *)&name, &namelen) < 0) { perror("getpeername"); exit(1);} printf("Connection from %s\n", inet_ntoa(name.sin_addr)); We see here a new function inet_ntoa: Translate an internet integer address into a dot.formatted character string. It requires the include files: #include #include Examples Example 1: Simple example where we use gethostname, gethostbyname, socket, bind, getsockname. The example does not do anything useful except show the use of these functions. Example 2 (Datagram communication): a client and a server. In a loop, the client sends the current time to the server, waits for the reply, prints it out, and sleeps for a while. The server receives messages from clients and prints them out. It replies with its own current time. No provision is made to cope with the unreliability of the communication channel. Example 3: (Datagram communication) a client and a server. The client is invoked with three parameters: the name of a user, of a host, and a port. It sends the user name to the server and prints out the response. The server when it receives a user name checks if the user is currently logged on the host. It replies with an appropriate response. No provision is made to cope with the unreliability of the communication channel. Example 4: (Datagram communication): a client and a server. A client in a loop sends a message to the server and waits with timeout for reply. The server receives messages and gives them to threads to respond to. Example 5: Threaded Server from the Threads Primer book. ingargiola.cis.temple.edu