[INLINE] CIS 307: Unix IV: Sockets [INLINE]
                                       
   Online references:
     * Primer on Sockets by Jim Frost (Software Tool & Die)
     * Introductory tutorial on IPC in 4.4BSD-Unix (by S.Sechrest
       UC-Berkeley) (Postscript)
     * Advanced tutorial on IPC in 4.4BSD-Unix (by
       S.Leffler,R.Fabry,W.Joy,P.Lampsey UC-Berkeley, S.Miller,C.Torek
       U-Maryland) (Postscript)
       
   [Introduction], [Client-Server Architecture], [Summary on Socket
   Functions], [Socket Functions], [Examples]
   
Introduction

   We examine the functions for communication through sockets. [Though in
   your practice you may be able to skip this software level and use
   things (middleware) like DCE RPC (Distributed Computing Environment
   Remote Procedure Call) still an understanding of the socket API
   provides a grounding on some of the issues and problems of distributed
   computations.] A socket is an endpoint used by a process for
   bidirectional communication with a socket associated with another
   process. Sockets, introduced in Berkeley Unix, are a basic mechanism
   for IPC on a computer system, or on different computer systems
   connected by local or wide area networks. In the following we will not
   be concerned with networks and data communications. We will take
   instead a strictly operational viewpoint: how to program with sockets
   to create communication channels. The communication channel created
   with sockets can be like a telephone line (connection oriented), with
   the sockets as telephones over which a conversation can take place. Or
   the channel can be as when we send mail (datagram oriented), with the
   sockets as mailboxes. Connection oriented communication is reliable,
   i.e. the system takes care of errors. Datagram oriented communication
   is unreliable, i.e. messages can be lost, or they may not be delivered
   in the order in which they were sent. A socket appears to the user to
   be like a file descriptor on which we can read, write, and ioctl. In
   the connection oriented mode, the file is like a sequence of
   characters that we can read with as many read operations as we like.
   In the connectionless mode we have to get a whole message in a single
   read operation. If we don't, what is left over of the message is lost.
   Though sockets can be used in a single computer system for
   interprocess communication (the Unix domain), we will only consider
   their use for communication across computer systems (the Internet
   domain). It is possible to send message on a socket that take
   precedence over other undelivered messages. These priority messages
   are called out-of-band messages. We will not deal with them in this
   note. We will also not deal with composite messages (sendmsg and
   recvmsg).
   
   A problem in communication is how to identify interlocutors. In the
   case of phones we have telephone numbers, for mail we have addresses.
   For communicating between sockets we [usually, since within a single
   computer we could use file names] identify an interlocutor with a
   pair: IP address and port. [In reality there is a third component, the
   protocol, but that will not be relevant to us.] This represents the
   address or name of the interlocutor. IP addresses (things like
   155.247.207.190) are 32 bit unsigned integers (155, 247, 207, 190 are
   the bytes). An IP address consists of two parts, one identifying a
   network, the other identifying a computer within that network and can
   be used in a number of formats. IP addresses are more easily
   rememberer as host names (things like snowhite.cis.temple.edu). [You
   may find about IP to host conversions with the nslookup command and by
   looking in the /etc/hosts file.] [IP addresses, to be exact, identify
   the Network Interface Card between a computer and a network, and a
   computer might have a number of such cards connecting to a number of
   networks. But for brevity we will not worry about this distinction.] A
   special IP is used to refer to the local host, 127.0.0.1, the loopback
   localhost. The IP address, 0.0.0.0, is called INADDR_ANY and is tied
   to all the IP addresses of this machine (used during bootstrap of this
   system). Another special IP, 255.255.255.255, is used for broadcast to
   all hosts on the local network of the current machine. And the host id
   consisting of all 1s is used to broadcast to all the computers of a
   LAN.
   Ports are 16 bit unsigned integers. (The first 1024 port numbers are
   reserved for things like http, 80. You can look in the files
   /etc/services and /etc/inetd.conf to see standard uses (ftp, telnet,
   finger, ..) of these ports. From 1024 to 5000 the ports are reserved
   for allocation by the kernel on demand. Over 5000 the ports can be
   chosen directly by the users. For instance I use 8080 for my httpd
   server. The port 0 is used as a wild card, to request the kernel to
   find a port for us, we do not care which.
   
Client-Server Architecture

   A standard way of using sockets and communication channels is between
   clients and servers. A server is a process that is able to carry out
   some function, called a service, like transferring files, translating
   host names to IP addresses, or inverting a matrix. A client is a
   process that requests a server to do a service (say, "translate
   snowhite.cis.temple.edu"). Typically the server will be at a known IP
   address and will respond to requests sent to a known port. In some
   cases that port is not universally known, so the server will advertize
   the port it is currently using (it may advertize the port by printing
   out its value, or sending email, or having inetd, a special process,
   know about it, etc.). In some cases the IP address of the server is
   not known and one may have a "standard" server that responds to
   requests of the form "where can I find service Moo" by responding with
   an appropriate IP address. The client requests the kernel to obtain a
   free port to be used for communication with the server. The server
   does not to have to know in advance the identity of its clients. It is
   ready to accept a message from any interlocutor. When it receives a
   message from a client, the message itself contains the IP and the port
   of the client, so that the server knows whom to answer to.
   
   An address, host+port, can be used for multiplexing more than one
   communication channel. So one server can communicate simultaneously
   with more than one client. Each communication channel on the server
   will have its own socket bound to the same address.
   
Summary On Socket Functions

   The following is a summary of the basic socket functions as they are
   used for datagram and connection oriented service by clients and
   servers. In the following section we will go in greater detail over
   these functions.
   
    Datagram Service
    
          
        Client
                socket => (bind => [connect =>] {write => read}*) |
                {sendto => recvfrom}* => close | shutdown
                In words: create a socket, then bind it to a local port,
                establish the address of the server, write and read from
                it, or just sendto and recvfrom it; then terminate. In
                the case that client is not interested in a response, it
                does not need to use bind. Connect is worth using when we
                send many datagrams to the same server.
                
        Server
                socket => bind => {read | recvfrom => write | sendto}* =>
                close | shutdown
                In words: create a socket, bind it to a local port,
                accept and reply to messages from client, terminate. In
                the case that the server does not need reply to the
                client, it can just use read instead of recvfrom.
                
    Connection Oriented service
    
          
        Client
                socket => bind => connect => {write | sendto => read |
                recvfrom }* => close | shutdown
                In words: create a socket, bind it to a local port,
                establish the address of the server, communicate with it,
                terminate. Bind is not needed if we do not want a reply
                from server.
                
        Server
                socket => bind => listen => {accept => {read | recvfrom
                => write | sendto}* }* => close | shutdown
                In words: create a socket, bind it to a local port, set
                up service with indication of maximum number of
                concurrent services, accept requests from connection
                oriented clients, receive messages and reply to them,
                terminate.
                
Socket Functions

  Creating a socket
  
    #include <sys/types.h>
    #include <sys/socket.h>

    int socket(int domain, int type, int protocol)
      domain is either AF_UNIX, AF_INET, or AF_OSI, or ..
         AF_UNIX is the Unix domain, it is used for communication within a
            single computer system.
         AF_INET is for communication on the internet to IP addresses.
            We will only use AF_INET.
      type is either SOCK_STREAM (TCP, connection oriented, reliable),
         or SOCK_DGRAM (UDP, datagram, unreliable), or SOCK_RAW (IP level).
         It is the name of a file if the domain is AF_UNIX.
      protocol specifies the protocol used. It is usually 0
         to say we want to use the default protocol for the chosen
         domain and type. We always use 0.
      It returns, if successful, a socket descriptor which is an int.
      It is -1 in case of failure.

   Here is a typical call to socket:
    if ((sd = socket(AF_INET, SOCK_DGRAM, 0) < 0) {
       perror("socket"); exit(1);}

  Socket Addresses
  
   Here are the structures (OSF Unix) used to store socket addresses as
   used in the domain AF_INET:

    struct in_addr {
        u_long s_addr;
      };

    struct sockaddr_in {
        u_short        sin_family; /*protocol identifier; usually AF_INET */
        u_short        sin_port;   /*port number. 0 means let kernel choose */
        struct in_addr sin_addr;   /*the IP address. INADDR_ANY refers */
                                   /*IP addresses of the current host.*/
        char           sin_zero[8];}; /*Unused, always zero */

    In order to use struct sockaddr_in you need to include in your program

        #include <netinet/in.h>

    The following structure sockaddr is more generic than but compatible
    with sockaddr_in (both 16 bytes starting with the same field).
    In the Unix domain we have a different address, sockaddr_un, which is
    also compatible with sockaddr. In order to use sockaddr_un you need to
    include in your program

        #include <sys/un.h>

    struct sockaddr {
        u_short  sa_family;
        char     sa_dat[14];};

  Binding to a local port
  
    #include <sys/types.h>
    #include <sys/socket.h>
    int bind(int sd, struct sockaddr *addr, int addrlen)
       sd: File descriptor of local socket, as created by the socket
          function.
       addr: Pointer to protocol address of this socket.
          It usually is INADDR_ANY . The port is usually 0 to request
          the kernel to provide a port.
       addrlen: Length in bytes of addr.
    It returns an integer, the return code (0=success, -1=failure)

   Bind is used to specify for a socket the protocol port number where it
   will wait for messages. Here is a typical call to bind:

    struct sockaddr_in name;
    .....
    bzero((char *) &name, sizeof(name)); /*zeroes out sizeof(name) characters*/
    name.sin_family = AF_INET;           /*use internet domain*/
    name.sin_port = htons(0);            /*ask kernel to provide a port*/
    name.sin_addr.s_addr = htonl(INADDR_ANY); /*use all IPs of host*/
    if (bind(sd, (struct sockaddr *)&name, sizeof(name)) < 0) {
       perror("bind"); exit(1);}

    A call to bind is optional on the client side, required on the server side.

   We need to understand the reasons for the calls to htons and htonl.
   Numbers on different machines may be represented differently
   (big-endian machines and little-endian machines). So we need to make
   sure that the right representation is used on each machine. We use
   functions to convert from host to network form before transmission
   (htons for short integers, and htonl for long integers), and from
   network to host form after reception (ntohs for short integers, and
   ntohl for long integers).
   
   The functions bzero zeroes out a buffer of specified length. It is one
   of a group of functions for dealing with arrays of bytes. bcopy copies
   a specified number of bytes from a source to a target buffer. bcmp
   compares a specified number of bytes of two byte buffers.
   
  Connecting to a Server
  
   A remote process, usually a server, is identified by an IP address and
   a port number. The connect operation is used on the client side to
   identify and, possibly, start the connection to the server. It is
   required in the case of connection oriented communication. In the
   datagram case it is not required, but, if used, it gives the default
   name of the interlocutor so that we do not need to repeat it in each
   message.

    #include <sys/types.h>
    #include <sys/socket.h>

    int connect(int sd, struct sockaddr *addr, int addrlen)
       sd file descriptor of local socket
       addr pointer to protocol address of other socket
       addrlen length in bytes of address
    It returns an integer (0=success, -1=failure)

   Here is a typical call to connect:
    #define SERV_NAME ...    /* say, "snowhite.cis.temple.edu */
    #define SERV_PORT ...    /* say, 8001 */
    struct sockaddr_in servaddr;
    struct hostent *hp;      /* Here we store information about host*/
    int sd;                  /* File descriptor for socket */
    .......
    /* initialize servaddr */
    bzero((char *)&servaddr, sizeof(servaddr));
    servaddr.sin_family = AF_INET;
    servaddr.sin_port = htons(SERV_PORT);
    hp = gethostbyname(SERV_NAME);
    if (hp == 0) {
       fprintf(stderr, "failure to address of %s\n", SERV_NAME); exit(1);}
    bcopy(hp->h_addr_list[0], (caddr_t)&servaddr.sin_addr, hp->h_length);
    if (connect(sd, (struct sockaddr *)&servaddr, sizeof(servaddr)) < 0) {
       perror("connect"); exit(1);}

   The function gethostbyname is described below.
   
  Gethostbyname
  
   The function gethostbyname given a host name, like
   snowhite.cis.temple.edu, returns 0 in case of failure, or a pointer to
   a struct hostent object which gives information about the host
   names+aliases+IPaddresses:

    struct  hostent {
        char    *h_name;        /* official name of host */
        char    **h_aliases;    /* null terminated list of aliases*/
        int     h_addrtype;     /* host address type */
        int     h_length;       /* length of address */
        char    **h_addr_list;  /* null terminated list of addresses */
                                /* from name server */
    #define     h_addr  h_addr_list[0] /*address,for backward compatibility*/};

   In this structure h_addr_list[0] is the first IP address associated
   with the host. In order to use this structure you must include in your
   program:

    #include <netdb.h>

   Other functions help us find out things about hosts, services,
   protocols, networks: gethostbyaddr, getprotobyname, getprotobynumber,
   getprotoent, getservbyname, getservbyport, getservent, getnetbyname,
   getnetbynumber, getnetent.
   
  Listening for a Client
  
   The listen function is used on the server in the case of connection
   oriented communication to prepare a socket to accept messages from
   clients. It has the form:

    int listen(int fd, int qlen)
       fd file descriptor of a socket that has already been bound
       qlen specifies the maximum number of messages that
          can wait to be processed by the server while the server is
          busy servicing another request.
       It returns an integer (0=success, -1=failure)

   Here is a typical call to listen:
    if (listen(sd, 3) < 0) {
       {perror("listen"); exit(1);}

  Accepting a connection from a Client
  
   The accept function is used on the server in the case of connection
   oriented communication (after a call to listen) to accept a connection
   request from a client.

    #include <sys/types.h>
    #include <sys/socket.h>

    int accept(int fd, struct sockaddr *addressp, int *addrlen)
       fd is an int, the file descriptor of the socket the
          server was listening on
       addressp points to an address. It will be filled with
          address of the calling client
       addrlen is an integer that will contain the actual length
          of address of client
    It returns an integer representing a new socket (-1 in case of failure).
    It is the socket that the server will use from now on to communicate
    with the client that requested connection.

   Here is a typical call to accept:
    struct sockaddr_in client_addr;
    int ssd, csd, length;
    ...........
    if ((cfd = accept(ssd, (struct sockaddr *)&client_addr, &length) < 0) {
       perror("accept"); exit(1);}
    /* here we give the new socket to a thread or a process that will */
    /* handle communication with this client. */

  Read, Write
  
   We can use a socket like a normal file descriptor and read from it or
   write to it. In order to do so the socket must be connected to an
   interlocutor. (Other commands we can use when a socket is connected
   are send and recv.)
   
  Sendto and Recvfrom
  
    #include <sys/types.h>
    #include <sys/socket.h>

    int sendto(int sd, char *buff, int len, int flags,
           struct sockaddr *addressp, int addrlen)
       sd, socket file descriptor
       buff, address of buffer with the information to be sent
       len, size of the message
       flags, usually 0; could be used for priority messages, etc.
       addressp, address of process we are sending message to
       addrlen, length of message
       It returns number of characters sent. It is -1 in case of failure.

    #include <sys/types.h>
    #include <sys/socket.h>

    int recvfrom (int sd, char *buff, int len, int flags,
           struct sockaddr *addressp, int *addrlen)
       sd, socket file descriptor
       buff, address of buffer where message will be stored
       len, size of buffer
       flags, usually 0; used for priority messages, peeking etc.
       addressp, buffer that will receive address of process that
            sent message
       addrlen, contains size of addressp buffer; it will contain
            the size of the address
       It returns number of characters received. It is -1 in case of failure.

  Shutdown
  
   It is like close but more flexible. It allows to close just read
   operations, or write operations or all.

    int shutdown(int sd, int action)
       sd is a socket descriptor
       action is (0 = close for reads) (1 = close for writes)
          (2 = close for both reads and writes)
    It returns an integer (0=success, -1=failure)

  Getsockname
  
   It is used to determine the address to which a socket is bound.
    int getsockname(int sd, struct sockaddr *addrp, int *addrlen)
       sd is the socket descriptor of a bound socket.
       addrp points to a buffer. After the call it will have the
          address associated to the socket.
       addrlen gives the size of the buffer. After the call gives
          size of address.
    It returns an integer (0=success, -1=failure)

  Getpeername
  
   It is used to obtain the address of the remote host connected to the
   current socket. It is used when the socket is connected to a remote
   host.

    int getpeername(int sd, struct sockaddr *addrp, int *addrlen)
       sd is the socket descriptor of a bound socket.
       addrp points to a buffer. After the call it will have the
          address associated to peer of socket.
       addrlen gives the size of the buffer. After the call gives
          size of address.
    It returns an integer (0=success, -1=failure)

   Here is an example of use of getpeername:
    struct sockaddr_in name;
    in namelen = sizeof(name);
    .......
    if (getpeername(sd, (struct sockaddr *)&name, &namelen) < 0) {
       perror("getpeername"); exit(1);}
    printf("Connection from %s\n", inet_ntoa(name.sin_addr));

   We see here a new function inet_ntoa: Translate an internet integer
   address into a dot.formatted character string. It requires the include
   files:

    #include <netinet/in.h>
    #include <arpa/inet.h>

Examples

   Example 1: Simple example where we use gethostname, gethostbyname,
   socket, bind, getsockname. The example does not do anything useful
   except show the use of these functions.
   
   Example 2 (Datagram communication): a client and a server. In a loop,
   the client sends the current time to the server, waits for the reply,
   prints it out, and sleeps for a while. The server receives messages
   from clients and prints them out. It replies with its own current
   time. No provision is made to cope with the unreliability of the
   communication channel.
   
   Example 3: (Datagram communication) a client and a server. The client
   is invoked with three parameters: the name of a user, of a host, and a
   port. It sends the user name to the server and prints out the
   response. The server when it receives a user name checks if the user
   is currently logged on the host. It replies with an appropriate
   response. No provision is made to cope with the unreliability of the
   communication channel.
   
   Example 4: (Datagram communication): a client and a server. A client
   in a loop sends a message to the server and waits with timeout for
   reply. The server receives messages and gives them to threads to
   respond to.
   
   Example 5: Threaded Server from the Threads Primer book.
   
   
    ingargiola.cis.temple.edu