reference:
https://www.ntu.edu.sg/home/ehchua/programming/webprogramming/HTTP_Basics.html
HTTP
HTTP (Hypertext Transfer Protocol)
- client/server model
- Pull protocol: An HTTP client sends a request message to an HTTP server. The server, in turn, returns a response message. HTTP is a pull protocol, the client polls information from the server.
- Stateless Protocol: the current request doesn't know what has been done in the previous requests.
- Permit negotiating of data type and representation, so as to allow system to be built independently of the data being transferred.
- Application layer: Doesn't care about how to send and request data. Usually TCP handles that. TCP is transport data level protocol. Application layers sits above TCP or transport layer.
- client actions: always send an action along with it to the server
- defines server status codes like 404 not found
- header: included in both request and response. A tiny bit of customer information along with. The main purpose
- e.g.: the header can be the content type of data returned, like xml, json
URL (Uniform resource locator)
A URL (Uniform Resource Locator) is used to uniquely identify a resource over the web. URL has the following syntax:
protocol://hostname:port/path-and-file-name
There are 4 parts in a URL:
- Protocol: The application-level protocol used by the client and server, e.g., HTTP, FTP, and telnet.
- Hostname: The DNS domain name (e.g.,
www.nowhere123.com
) or IP address (e.g., 192.128.1.2) of the server. - Port: The TCP port number that the server is listening for incoming requests from the clients.
- Path-and-file-name: The name and location of the requested resource, under the server document base directory.
For example, in the URLhttp://www.nowhere123.com/docs/index.html
, the communication protocol is HTTP; the hostname iswww.nowhere123.com
. The port number was not specified in the URL, and takes on the default number, which is TCP port 80 for HTTP. The path and file name for the resource to be located is "/docs/index.html
".
HTTP over TCP/IP
HTTP is a client-server application-level protocol. It typically runs over a TCP/IP connection.
TCP/IP (Transmission Control Protocol/Internet Protocol) is a set of transport and network-layer protocols for machines to communicate with each other over the network.
IP
IP (Internet Protocol) is a network-layer protocol, deals with network addressing and routing. In an IP network, each machine is assigned an unique IP address (e.g., 165.1.2.3), and the IP software is responsible for routing a message from the source IP to the destination IP. Since memorizing number is difficult for most of the people, an english-like domain name, such aswww.nowhere123.com
is used instead. The DNS (Domain Name Service) translates the domain name into the IP address (via distributed lookup tables). A special IP address 127.0.0.1 always refers to your own machine. It's domian name is "localhost
" and can be used forlocal loopback testing.
HTTP Request Methods
https://www.restapitutorial.com/lessons/httpmethods.html
- GET: A client can use the GET request to get a web resource from the server.
- POST: Used to post data up to the web server.
- PUT: Ask the server to store the data. _update _an existing resource
- DELETE: delete an existing resource
Submitting Data
The clients are usually presented with a form (produced using HTML<form>
tag). Once they fill in the requested data and hit the submit button, the browser packs the form data and submits them to the server, using either a GET request or a POST request.
A form contains fields. The types of field include:
- Text Box: produced by
<input type="text">
- Password Box: produced by
<input type="password">
The browser gather each fields' name and value, packed them into "name=value
" pairs, and concatenates all the fields together using "&
" as the field separator. This is known as a query string. It will send the query string to the server as part of the request.
name1=value1&name2=value2&name3=value3&...
The query string can be sent to the server using either HTTP GET or POST request method, which is specified in the<form>
's attribute "method
".
<form method="get|post" action="url">
If GET request method is used, the URL-encoded query string will be _appended _behind the _request-URI _after a "?
" character.
Using GET request to send the query string has the following drawbacks:
- The amount of data you could append behind _request-URI _is limited. If this amount exceed a server-specific threshold, the server would return an error "414 Request URI too Large".
- The URL-encoded query string would appear on the address box of the browser.
POST request method is used, the query string will be sent in the body of the request message, where the amount is not limited. The request headers Content-Type
andContent-Length
are used to notify the server the type and the length of the query string. The query string will not appear on the browser’s address box. POST method will be discussed later.
"Post" request method
POST request method is used to "post" additional data up to the server (e.g., submitting HTML form data or uploading a file). Issuing an HTTP URL from the browser always triggers a GET request. To trigger a POST request, you can use an HTML form with attribute method="post"
or write your own network program. For submitting HTML form data, POST request is the same as the GET request except that the URL-encoded query string is sent in the request body, rather than appended behind the request-URI
POST vs GET for Submitting Form Data
Compared with GET, POST request has the following advantages:
- The amount of data that can be posted is unlimited, as they are kept in the request body, which is often sent to the server in a separate data stream.
- The query string is not shown on the address box of the browser.
Note that although the password is not shown on the browser’s address box, it is transmitted to the server in clear text, and subjected to network sniffing. Hence, sending password using a POST request is absolutely not secure.
network
URL procedure
type in URL into browser
The browser checks the cache for a DNS record to find the corresponding IP address of url
If the requested URL is not in the cache, ISP’s DNS server initiates a DNS query to find the IP address of the server that hosts yelp.com
Browser initiates a TCP connection with the server.
The browser sends an HTTP request through TCP connection.
The server handles the request and sends back an HTTP response.
Browser receives HTTP response and may close the TCP connection, or reuse it for another request
browser checks if the response is a redirect or a conditional response (3xx result status codes), authorization request (401), error (4xx and 5xx), etc.; these are handled differently from normal responses (2xx)
if cacheable, response is stored in cache
browser decodes response (e.g. if it's gzipped)
The browser displays the HTML content
DNS(Domain Name System) is a database that maintains the name of the website (URL) and the particular IP address it links to. Every single URL on the internet has a unique IP address assigned to it. The IP address belongs to the computer which hosts the server of the website we are requesting to access.DNS is a list of URLs and their IP addresses just like how a phone book is a list of names and their corresponding phone numbers.
In order to find the DNS record, the browser checks four caches.
The browser cache: The browser maintains some DNS records for a fixed duration for websites you have previously visited. So, it is the first place to run a DNS query.
The OS cache: If it is not found in the browser cache, the browser would make a system call to your underlying computer OS to fetch the record since the OS also maintains a cache of DNS records.
The router cache. If it’s not found on your computer, the browser would communicate with the router that maintains its’ own cache of DNS records.
The ISP cache. If all steps fail, the browser would move on to the ISP. Your ISP maintains its’ own DNS server which includes a cache of DNS records which the browser would check with the last hope of finding your requested URL.
DNS query is to search multiple DNS servers on the internet until it finds the correct IP address for the website. This type of search is called a recursive search since the search will continue repeatedly from DNS server to DNS server until it either finds the IP address we need or returns an error response saying it was unable to find it.
reference:
https://stackoverflow.com/questions/2092527/what-happens-when-you-type-in-a-url-in-browser
for 3-way handshake: http://www.inetdaemon.com/tutorials/internet/tcp/3-way_handshake.shtml
.