Inside Look: How New Web Apps Get Data from Forms

In the last installment of Gearhead, we started to answer a reader's question: "How does a Web server application get data from a form?"

We got as far as discussing what goes on in an HTTP transaction, so this week we'll start to answer the question. (This is what irritates so many noncomputer-literate people - they ask you what they think is a simple question and when, two hours later, you've explained the first minute fragment, they get cranky. Rest assured, there's no way out of this social problem, and you'll just have to get used to the grief.)So, we talked about the browser sending the Web server a GET request. Let's look at that GET request again.

GET/stuff/mydoc.html HTTP/1.0

User-Agent: xxxxxx

Accept: image/gif, image/jpeg, */*

The actual request (GET/stuff/ mydoc.html) is in the HTTP header (in fact, there's nothing but a header in a GET request) and in this case, the objective is to retrieve a Web page. But when a form is involved, GET requests are more complex because they involve a back-end application. For example:

GET/cgi-bin/myapp.pl?search= Gearhead%20columns&hits=10 HTTP/1.0User-Agent: xxxxxxAccept: image/gif, image/jpeg, */*Here we're asking for the Perl application myapp.pl to be run, and we're also supplying the data to be sent to the program, that is the string "search=Gearhead%20columns".

Note the question mark - everything following it is called the query string and the query string is passed to the application (which is launched by the Web server) in an environment variable called QUERY_STRING. (Environment variables are memory-based text strings that most operating systems provide for interprocess communications.)Note that the "%20" is the way nonalphanumeric characters are encoded in URLs sent by Web browsers. The encoding is simply the ASCII hexadecimal value, and the encoding is done to allow special characters to be used to separate the parts of the URL. For example, space characters define the beginning and end of the URL that is the target of the GET request. Also, the ampersand separates arguments in the URL. Each argument has the structure: =.

If a form generated the request, each is the name of a field (defined by that field's NAME attribute).

Note that a form doesn't have to be the generator of the request - the same request could be made by an application that constructs the HTTP header, opens a TCP socket to a Web server's port (80 by default) and sends the request directly - but that's probably best left for another column.

Anyway, to generate the HTTP request above we could use the form:

Search term: Number of hits:

Now, in the QUERY_STRING environment variable is the whole query string in its raw, encoded form, i.e., "search=Gearhead%20columns&hits=10". This means that the program has to pick apart the various arguments and decode the encoded characters. Luckily, the tedious code to do this is available in libraries for most of the major languages, Perl included.

So, what's the downside of the GET method? The space put aside to hold the operating system's environment variables. Throw too many fields into your form or cram too much data in a field and your data gets truncated or something else goes wrong.

Is that all there is to it? Well, yes, at least as far as the GET method is concerned, but that's not all. There's also the POST method, which, needless to say, operates in quite a different manner and which we'll have the pleasure of disemboweling next week.

GET your arguments and opinions to gh@gibbs.com

Join the newsletter!

Or

Sign up to gain exclusive access to email subscriptions, event invitations, competitions, giveaways, and much more.

Membership is free, and your security and privacy remain protected. View our privacy policy before signing up.

Error: Please check your email address.

More about CGI

Show Comments