CGI Programming and You

CGI - The Common Gateway Interface

by Joshua M. Sled [ jsled@xcf.berkeley.edu]


Table of Contents



CGI Programming and You

1. Introduction

1.1 What is CGI?

The Common Gateway Interface is very aptly named. It defines an Interface which allows programs to act as a Gateway between the web server and the web browser, allowing any Common program [written in any programming language] to take user input [by the way of HTML forms] and output HTML which is sent back to the user's web browser. This allows for a somewhat limited form of interaction, the limits of which can be overcome by a little more work by the programmer.

CGI programs can be written in any programming or scripting language which the web server [the machine which is running the CGI programs] can run. This means that CGI programs can be written in C, C++, Java, perl, Unix shell scripts, TCL, or nearly any other programming language; all that is required is that the programming language have facilities for accessing standard input, standard output and environment variables. Perl, however, is a common choice because of it's interpretative nature [specifically: there is no waiting for or hassle associated with compiling programs in development] and excellent text-processing abilities.

1.2 What's it useful for?

CGI programs allow for some dynamic processing in an otherwise static medium [the world-wide web]. CGI scripts allow for form data to be processed, and for HTML which is determined by a user's input to be generated. CGI scripts also allow access to files which reside on the web server, which may be databases, text files or nearly any other file [with some restrictions; see 3.1: Server-Side File Access].

Since nearly everyone has a web browser which supports HTML forms, no special software is required to take advantage of CGI applications. While HTML forms are limited, the basic form elements [text input, list item selection, radio-button/check-box item selection, etc] do allow for a good deal of input to be captured. Therefore, CGI scripts are very useful for processing data which can be captured by these form elements, which is a good deal of the user-interaction necessary for useful data input.


2. Usage

2.1 How it works [an overview]

CGI programs are usually started and execute in response to a user pressing the "submit" button on a HTML form. The web browser packages up the form data and sends it back to the web server, which then executes the CGI program and passes the form data along to it.

The CGI script then processes the information and sends the data which it wants to return to the user, finishing when done. The web server passes the information back to the web browser, and the user sees the information. If the response the CGI program sends is another HTML form, then the cycle can begin again. All is dependent on what the CGI program does and outputs to the user.

2.2 Invocation

A CGI program is invoked in one of two ways:

  1. It is started in response to a form being submitted, via the <FORM ACTION="CGI_program_name.cgi" ...> HTML tag.

    In this case, the METHOD="..." attribute of the FORM tag determines how the data is delivered to the CGI program. For "METHOD=post", the data is available on standard input; for "METHOD=get", the data is available in the URL and in the QUERY_STRING environment variable [see 2.5: Environment Variables].

  2. It is started in a manner similar to the server-side include mechanism of some web servers, and invoked upon loading of the web page.

In either case, the CGI program has a standard method of communicating with the web server and web browser: standard input, standard output and the environment variables. In the case of CGI programs invoked using the server-side-include mechanism, since no form was submitted, there is no form data to process and thus no form data will be given to the program. However, environment variables are still available, and are most useful to the server-side-include-invoked CGI program.

2.3 Standard Input

Standard input is used for getting the user's data when forms are submitted with the "POST" method [ <FORM METHOD=post ...> ]. This means that the form data is in a specific format, being both URL-encoded and in an encoded representation [what I've called the Form-Data Format; see 2.7: Form-Data Format]. The data is waiting on standard input for the program to read and process. This data is not EOF-terminated, and the CONTENT_LENGTH environment variable must be used to determine the end of input [See 2.5: Environment Variables].

The nature of URL-encoding removes all spaces, so the data waiting on standard input is a single string in the format specified below [see 2.7: Form-Data Format]. This data must be read, decoded and parsed by the CGI program, but it's usually easier to use a pre-written library to do these steps than to re-write this tedious [and error-prone] processing for each CGI program. For perl, the 'cgi-lib.pl' library is very useful [see 6.3: The cgi-lib.pl Homepage].

2.4 Standard Output

Standard output is the place for data which is given to the web server to be sent back to the user's web browser. The data sent to standard output by the CGI program must be in a certain rough format, which is very straightforward as well. The output data comes in two parts:

[optional] HTTP headers
Content-Type: <MIME type>
   <blank line>
<output content>

HTTP headers can be any standard HTTP header information. HTTP headers allow the program [and the web server] to send control and other information to the web browser which isn't displayed, but used to provide meta-information about the content which is about to be sent. Allowing HTTP headers to be output by the CGI program allows for arbritary extension of the web server, allowing for an old server to output newer HTTP headers [which the web server doesn't know about and thus cannot output on it's own] to the web browser [which may be current enough know how to interpret them]. However, outputting any HTTP headers [other than "Content-Type"] is optional and normally is not useful.

The "Content-Type" header denotes what type of data is contained in the content as a MIME type. For instance, when outputting HTML from the CGI program, the program would output the line:

Content-Type: text/html
which denotes that the content below the blank line is HTML text, and should be interpreted as such. If the CGI program was outputting the binary data contained in a GIF image, the Content-Type header would be:
Content-Type: image/gif

A blank line must be output after the last HTTP header to signify that the headers have stopped and the content is about to begin.

The content is then output, and is passed as output to the web browser. If a Content-Type of text/html is used, then HTML must be output in this section. If a Content-Type of image/gif is used, then binary GIF data must be output. In general, the Content-Type must match the data being sent, or the web browser will not be able to decode it.

An example of the output of a simple CGI program generating a short HTML file is below:

----- BEGIN EXAMPLE OUTPUT -----
Content-Type: text/html

<HTML>
<HEAD>
<TITLE>Foo bar</TITLE>
</HEAD>
<BODY>
<IMG SRC="filename.gif" ALT="[A GIF image]" WIDTH=200 HEIGHT=200 ALIGN=right>
This is the output of a CGI Program.
<HR>
<BR CLEAR=all>
Isn't it pretty?
</BODY>
</HTML>
----- END EXAMPLE OUTPUT -----

Notice that the correct header [Content-Type: text/html] is the first line of the output, followed by a blank line, followed by the HTML content as described by the header. Discussion of other HTTP headers aside, this is the general form of output for nearly all CGI programs.

Useful HTTP headers which a CGI script may like to output are [but are not limited to - check the HTTP 1.1 documentation for more info]:

Expires
Tells how long the document is valid for [for caching purposes].
Last-Modified
Tells when the document was last modified [for caching purposes].
Location
For redirecting the web browser to look for the necessary data at a different URL.

2.5 Environment Variables

The environment variables contain useful information - some regarding the web browser and machine of the client accessing the CGI script - which the program can access. There are a few very important environment variables, and many that exist but are rarely used [because they are rarely useful].

Accessing environment variables is a very system- and language-specific matter, and it is the programmer's responsibility to do this correctly. However, the names and content of the environment variables are standard.

QUERY_STRING
The data following the '?' in a form submitted with the "METHOD=get" attribute.
CONTENT_LENGTH
The length of the data waiting on standard input for POSTed forms.
REQUEST_METHOD
The method by which the CGI data was submitted [ie: "GET" or "POST"]. There are other request methods, but GET and POST are the two most common; use of POST is recommended.
REMOTE_HOST
The name of the machine from which the client is accessing the form. If the client machine is using a proxy to access the web, then this will be the name of the proxy host. If there does not exist a hostname for the client's machine, this will be unset.
REMOTE_ADDR
The IP address of the client machine. This should be the IP address which the hostname given in REMOTE_HOST will decode to [ie: they should be the same machine]. If the client machine is using a proxy to access the web, this will be the IP address of the proxy machine.
REMOTE_USER
If the user had to go through authentication to get to the script, then this will be the login with which they authenticated.
HTTP_USER_AGENT
A string describing the web browser which the client is using. This string will generally follow the format "browser/version extra_info". For instance: "Mozilla/4.0 (compatible; MSIE 4.01; Windows NT)" [Mozilla means Netscape - Netscape's original internal name].

2.6 URL-Encoding

URL-encoding is a simple method of encoding which converts spaces and other characters which may confuse the program reading the form data. Since '=' and '&' are used for special purposes in the form data [see 2.7: Form-Data Format], the program wouldn't know the difference between the user's '&' [or '='] and the special '&' [or '=']. Therefore, all '&'s, '='s, spaces and some other characters are converted by URL-Encoding, however the majority of characters are left alone.

Spaces are converted to "+"; most alphabetical and numerical characters are left alone. Other characters are converted to their ASCII value preceded by a '%' character [ie: "%nn" where 'nn' is the ASCII value of the character in hexadecimal]. For instance, since <space> is converted to '+', a '+' in the user's input would be converted to '%2B', since 2B is the hexadecimal equivalent decimal 43, the ASCII value of the character '+'.

For instance, the string "2 + 2 = 4" would be URL-Encoded to read:

2+%2B+2+%3D+4
since '+' is <space>, %2B is '+' and %3D is '='.

The URL decoding process is relatively straightforward, but it is usually more useful to use a pre-written library [such as 'cgi-lib.pl'] to do the conversion and decoding.

2.7 Form-Data Format

The format of the form data is a URL-encoded string containing a representation of the data contained in the HTML form, in the following format:

field0name=field0data&field1name=field1data&...&fieldnname=fieldndata

fieldiname is the name given to the i-th field of the form.

fieldidata is the data which the user selected [or the default value for that field if the user made no selection|modification] for the i-th field of the form.

For instance, with a form like the following:


<FORM METHOD=POST ACTION="foo.cgi">
Foo: <INPUT TYPE=text NAME="foo"><BR>
Bar: <INPUT TYPE=text NAME="bar"><BR>
Select one: <SELECT NAME="age">
<OPTION selected>20
<OPTION>21
<OPTION>22
<OPTION>23
</SELECT><BR>
<INPUT TYPE=submit VALUE="Submit me!">
</FORM>

If for the text input "foo" the user entered "2 + 2 = 4", for the text input "bar" the user entered "Hello" and for the "age" selection the user left the default value of 20, then the form data would be represented as the string:

foo=2+%2B+2+%3D+4&baz=Hello&age=20
As with URL-Encoding, parsing this information is a relatively straightforward process, but it's usually more convenient to let a library do the decoding to a set of variables which the program can then easily use.


3. Programming Concerns

There are many concerns to be had regarding CGI programming, including many involving security. Many of these issues, as well, are platform-specific. If in doubt, consult your own knowledge base or your local CGI expert.

3.1 Server-Side File Access

Since CGI programs are run on the web server, they have the full privledges of any program which can be run on the web server. CGI programs have access to any and all files and devices which any other program would have. This is a double-edged sword, however, because though this means that CGI programs can draw upon any resources available on the server machine [databases, log files, attached hardware, etc], they also can corrupt and/or disrupt these same resources if used incorrectly.

If running in a single-user environment [Windows95, WindowsNT, MacOS, etc], this can be especially dangerous because the single user [and any user programs] have full access to the machine, with relatively few safeguards in place.

If running, however, on a multi-user machine [Unix], there may be appropriate mechanisms in place to help limit the amount of [potential] damage which a rogue CGI script can do. It can also limit the data which the CGI would have access to, but careful planning can avoid this problem.

Allowing the CGI program to have access to files on the server has many advantages. First, it gives the CGI program the ability to refer to files with more information. For instance, a CGI program may be able to connect to a product-information database and return product information to the user, without requiring all the data to be formatted into HTML pages ahead of time. Secondly, it gives the CGI program the much more important ability to write and store information on the web server. This allows the CGI program to have persistence of data. The user may fill out a form, the content of which the CGI program saves and associated with a userid which is given to the user. The next time the user uses the CGI program, they can give their userid, and the information given beforehand can be recalled without the user re-entering it to whatever end is deemed necessary

3.2 File Access and Permissions

NOTE:

The data in this section assumes that the CGI program is running in a multiuser environment with strict permission-based access privledges between users; specifically, this section assumes knowledge of Unix and Unix file permissions. This information probably will not pertain to single-user web servers such as Windows95 or WindowsNT.

The web server runs under a specific identity, which limits and controls which files the web server has access to. This is a good thing for many security reasons, but it can also be a nuisance at times. Many are accustomed to making their HTML files world-readable, which allows the web server process [which runs under a different user-id from the user who create the HTML files] to access the files and serve them up to the user accessing their page. The same must be done for CGI scripts. Since the scripts must be executed by the web server, they must analogously be world-executable. There may be other restrictions on CGI programs [for example: that they must end with ".cgi"]; see your web server's documentation for details.

Another important note regarding user identities as they relate to CGI programs is the fact that programs run as the user which started the process. So, a CGI program which is started by the web server will run with the identity of the web server, not with the identity of the user who created it. This is especially important for write-access to files. If a CGI program, written by the user "luser", relies on a file "cgidata" which luser has in her HTML-file directory, when it is run by the web server [with user-identity "www"], it will not have access to the "cgidata" file, unless luser made the file world-readable. If the CGI program is to write to the file, things get even worse. luser would not want to make the file world-writeable, since any other user on the machine could write to the file, which is a Bad Thing. But since the CGI program runs as www and not as luser, that is the only way to write to the file. Since most CGI application will want to write to files, there is a problem.

Fortunately, there exists a better solution. Unix has a special permission for programs called "setuid". This means that when the program is run, it runs with the identity of the user who owns the file, not the identity of the user executing the program. Thus, when the web server [user "www"] executes luser's program, it runs with luser's permissions. luser can then make the file user-write-able, and the program would have the ability to write to it, but other users would not. Use of the setuid bit is somewhat dangerous since it allows access to otherwise private files, but with care can be used to great advantage.

3.2.1 Platform-Specific Issues

As noted, this information is Unix-specific. However, the concept applies to any multiuser operating system with user-based file permissions. In a single-user operating environment, the web server and the user who created the CGI script are equivalent, and thus the above discussion is irrelevant.

3.3 Security Concerns

There are a couple of specific security issues to be concerned with while writing CGI programs. Both or neither may apply to any given CGI program, but both must be considered when writing a CGI program.

3.3.1 Watch what is done with form data.

If form data is requested from the user, chances are that it will be processed and used by the program for internal purposes. The important thing to look for is that user-supplied data is not passed verbatim to any sort of execution call.

An execution call is something that will perform an effect on the system, such as starting a program [eg: 'exec(...)' in C/C++] or interpreting a statement in the programming language being used [eg: 'eval' in perl]. A malicious user may figure out what is going on, and a well-crafted string delivered through the HTML form may be inadvertently passed on to the execution call, which may then remove files from the system, give the malicious user access to the system, or any other number of nasty things.

3.3.2 Don't allow access to binaries. This is especially an issue with perl. If the perl executable is available in the [oft-used] cgi-bin/ directory, then a user can get at the executable directly. Since perl has a command-line option to execute arbritary perl commands, this would allow a malicious user to execute any perl command, giving them a perfect opportunity to stage a [probably successful] attack on the machine in question.


4. Programming Paradigm

Using the CGI requires that a specific paradigm be used for programming. There are three basic issues which affect the design of programs which use the CGI:

  1. CGI programs are non-interactive.
  2. Because of this, CGI programs require segmented user interaction.
  3. Multiple CGI processes may be running at the same time.

4.1 Non-Interactive Processing

The CGI Program doesn't continue to run while the user is interacting with the form data. There is no way for the CGI program to respond to a user's form changes in real time. Javascript can be used to handle some of this, but Javascript is not an elegant or ideal solution.

This, combined with the limitations of HTML forms, has created a rather standardized first step in almost every CGI program: error-checking of form fields. If the data in the form fields is inconsistent with what is required or the data is not present, then the CGI program will not be able to execute properly, and the user must be made aware that the form was submitted improperly. Therefore, a good first step for any CGI program is to perform error checking on all form fields received.

4.2 Segmented User Interaction

Because of the above fact, one of the most frustrating aspects of CGI programming is the fact that interaction with the user [through HTML forms] is segmented up into web-page-sized chunks. Since the program does not continue to run between multiple form submissions, data from previous stages of processing must somehow be saved for future pages to deal with.

Because of this, complex CGI programs often take the form of a collection of programs, each which processes one form and which each have the ability to do the necessary processing. Libraries of common program behavior [accessing server-stored files, etc] are crucial to CGI programming.

This is where the topic of persistence comes into play most importantly. Say, for instance, that the CGI program determines user information and authorization on the first page; on the third page, that data is required. What happens to the data in the mean time? Since the CGI programs stop running, the data cannot be simply stored into a data structure in memory and accessed later when necessary... what to do, what to do?

There are a few solutions to this problem:

  1. Require the user to authorize on the second page. If the user's authorization isn't really required to create the second page, then this may be an option... but in general the user's authorization is required as early as possible [so the appropriate data can be shown] and near the end of the processing [to authorize changes being made to data].
  2. Store the user's authorization data on the web server. A combination of the IP address of the client, a timestamp and the identity should work, given a reasonable timeout|expiration value. But if the user goes to get a cup of coffee and comes back to finish processing the form, the timeout will have expired and the user will have to re-authorize... which isn't the best thing in the world.
  3. Client-side persistence through HIDDEN fields. The <INPUT> HTML form field has a special type of "HIDDEN", the basic meaning of which is that it will be submitted with the form, but it is not displayed by the web browser with the form. However, since it is part of the HTML of the form, it could conceivably be edited by the client, and this can introduce a security hole of a sort.
  4. Client-side persistence with cookies. This is another viable option, but with the following caveats:
  5. [from jwang@csua] Define an authorization realm in the .htaccess on the server, and use it to protect the CGI URL. The client will need to authorize when reaching the CGI URL, and authorization will be implicit through the rest of processing. This is a great solution to user authorization persistence, but any other stored, persistent data will need to be handled by one of the other methods.

None of these solutions are really elegant. This fact is what makes CGI programming a royal pain after a while.

This segmented user interaction must be designed into the program from the beginning. One of the most important things you can do after you have the concept of what the CGI program is going to do [overall], is to decide through mock-ups of the forms how it is going to go about doing it. The data required for each stage of processing must be determined before hand and carried throughout the different stages of processing. Design is very important in CGI programming for all but the simplest programs.

4.3 Multiple Simultaneous Invocations

Most programs are written with the assumption that only one copy will be running at any given time. Therefore, access to data files will only be made by one copy of the program. However, that is not always true of CGI programs, for which any number of copies may be running at any one time [one for each user submitting a form]. For reading of data files, where no changes are being made, this isn't a problem. But as soon as modification of the data files [through writing] is introduced, this becomes a major problem.

If one process is writing a file while the other is reading, the reading process may read only partially-written data, resulting in incomplete data. If two processes are writing at the same time, their data may be interleaved and completely unusable. This can result in corrupt and lost data.

The [somewhat] simple answer to this problem is file locking. Any process which is going to write to a data file must first obtain a lock on the file, before doing it's writing. All others must wait for the first process to finish before continuing. If the programming environment supports native file locking, use it. If not, a lock file created by writing processes before touching the files and deleted when they are done can be used. All other processes must check for the existence of this file before proceeding, and if the file exists must periodically check back to see if it's been deleted, and they can continue. However, using a lock file is not an ideal solution because a race condition [where two processes run near-simultaneously and both think that they have the lock] may develop. Native file locking is the best solution, since it usually is set up in such a way as to limit race conditions.


5. Conclusion

CGI programming presents various challenges for the programmer. It's segmented user interaction and limited text-input abilities present hoops which the programmer must jump through. However, because of the ubiquitous ability for people to use resources written using the CGI, it's benefits often far outweight these drawbacks.

Observe the following key points when writing CGI programs:

Keeping these points in mind will help you to write correct, error-free and useful CGI programs.


6. Resources for CGI Programmers

6.1 hoohoo.ncsa.uiuc.edu/cgi/ - The Common Gateway Interface

The people at the National Center for Supercomputer Applications [NCSA] developed HTTPd, one of the original web servers. They also developed the CGI. This site is the technical description of the CGI, including descriptions of all Environment Variables available to CGI programs.

6.2 www.cgi-resources.com/ - The CGI Resource Index

A collection of pre-written scripts and libraries, books, documentation, CGI Programmers [ie: free-lance people] and Job listings.

6.3 cgi-lib.stanford.edu/cgi-lib/ - The cgi-lib.pl Home Page

The cgi-lib.pl library in all it's glory. As the page says: "The cgi-lib.pl library has become the de facto standard library for creating Common Gateway Interface (CGI) scripts in the Perl language." You should find this useful, especially if you're doing your CGI programming in perl.

6.4 www.w3c.org/ - W3C - The World Wide Web Consortium

The standardization body for the web, they define HTML, HTTP, and many other web related [and many web-unrelated] things. The definite source for your technical WWW questions.

6.5 www.w3c.org/Markup/ - W3C's HTML Home Page

The official page for the definition of HTML. This has the technical description of HTML, and is a good reference for HTML tags, and FORM fields, if you're willing to wade through some technical fluff.

6.6 www.genome.wi.mit.edu/ftp/pub/software/WWW/cgi_docs.html - CGI.pm - a Perl5 CGI Library

[from tmonroe@csua]:
"For people who write CGI's in Perl, it would at least be fair to mention CGI.pm, a module that puts up many useful abstractions and a few other things, and lets you avoid dealing with environment variables entirely. If it is not in your local base Perl 5 distribution, it can be obtained from CPAN.

"My experiences with CGI.pm have been pretty good; one of the reasons I like it is that creating forms is very easy."


7. Examples

7.1 Simple form example.

7.2 Basic form decoding/output example.

7.3 Example of generating web content from a server-side file and an identifier embedded in a form.

7.4 CGI-based image selection; use of "Content-Type" for something other than "text/html".


Appendix 0: CGI Environment Variables

[ Taken from http://hoohoo.ncsa.uiuc.edu/cgi/env.html. ]

Not request-specific; set for all requests:

Variable Name Description Format Example
SERVER_SOFTWARE Name and version of the web server software. "name/version". "NCSA/1.5.2".
SERVER_NAME The hostname of the web server; in lieu of the hostname, an IP address will be given. - -
GATEWAY_INTERFACE The revision of the CGI Interface with which the web server complies. "CGI/revision". "CGI/1.1".

The following are request-specific; they will only be set if they are pertinent to the request being sent.

Variable Name Description Format Example
SERVER_PROTOCOL The name and revision of the protocol type with which the web server accepted the form data. "protocol/revision". "HTTP/1.0"
SERVER_PORT The port number on which the web server accepted the request. - -
REQUEST_METHOD How this form was submitted. One of "GET", "POST" or "HEAD". - -
PATH_INFO Extra path information as given by the client. Decoded by server before being passed to the CGI program. Not useful. - -
PATH_TRANSLATED Translated path information. Not useful. - -
SCRIPT_NAME A virtual path to the CGI program being executed. The program can use this to make a self-referencing URL. Not extremely useful. - -
QUERY_STRING The data after the '?' in the URL which referenced the CGI program. This is the form data from a GETed form. - -
REMOTE_HOST The hostname of the client accessing the CGI program. If a hostname does not exist for the client's machine, this will be left unset. - -
REMOTE_ADDR The IP address of the client accessing the CGI program. - -
AUTH_TYPE The type of authentication the user had to go through to get to the CGI program. If the user did not go through authentication [as is often the case], this will be left unset. - -
REMOTE_USER The identity the user authenticated with, if authentication was necessary to get to the CGI program. If the user did not authenticate, this will be left unset. - -
REMOTE_IDENT "If the HTTP server supports RFC 931 identification, then this variable will be set to the remote user name retrieved from the server. Usage of this variable should be limited to logging only." - http://hoohoo.ncsa.uiuc.edu/cgi/env.html . Not useful, usually not set. - -
CONTENT_TYPE The MIME-type of the data being transfered from a PUT form. - -
CONTENT_LENGTH The length of the form data on standard input. - -

Misc headers. Any other headers are prefixed with "HTTP_" and have all '-' converted to '_'. Examples are HTTP_ACCEPT and the User-Agent header.

Variable Name Description Format Example
HTTP_ACCEPT A string containing a comma-separated list of MIME types which the client will accept. "type/subtype, type/subtype, ..." -
HTTP_USER_AGENT The browser which the client is using. General format: "software/version library/version"
Common format: "software/version extra_info"
"Mozilla/4.0 (compatible; MSIE 4.01; Windows NT)"


Appendix 1: Invoking CGI Programs Without the Web Server

It is often useful to be able to start CGI programs without having to go through the trouble of using the web-based interface. For instance, perl programs which do not work correctly will simply generate an error when started through the web server, and will not generate any output. The programmer may want to run a debugger on the CGI program, which is impossible when the CGI program is run through the web server.

However, since the form output from the web server to the CGI program is in a well defined format, we can generate any given form data for which we would like to examine the effect on the program [for debugging purposes]. Alternatively, a simple sample script can be used to capture the form data, already converted into the correct format by the web server, to a file to later be piped into our program to be debugged. This script is available below, and is also available as formDataSave_cgi.txt.

#!/usr/bin/perl

print "Content-Type: text/html\n\n";

# Print HTML header
print <<EOHH;
<HTML>
<HEAD>
<TITLE>formDataSave.cgi</TITLE>
</HEAD>
<BODY BGCOLOR="#ffffff">
EOHH

# Get the request method
$reqMeth = $ENV{"REQUEST_METHOD"};

# Open the file to which to save the form data
open(FD, ">form.data") || print "<H1>Error opening \"form.data\"</H1>\n";

print "Form Data: ";

if ($reqMeth eq "GET") {
    # If the method is "GET", the data is in the REQUEST_METHOD
    # environment variable
    print $ENV{"QUERY_STRING"};
    print FD $ENV{"QUERY_STRING"};
} elsif ($reqMeth eq "POST") {
    # If the method is "POST", the data is one line on standard input
    $line = <STDIN>;
    print $line;
    print FD $line;
} else {
    # Otherwise, we don't know.
    print "<H1>Cannot determine form data method</H1>\n";
} # end if-elsif-else

# Close the file
close FD;

# Print the HTML footer
print <<EOHF;
</BODY>
</HTML>
EOHF

# Done

An important point to remember is to set the CGI environment variables correctly, as would be setup by the web server. In effect, you must create a similar environment to that which the web server would create for execution of the CGI program. The important environment variables to set are the REQUEST_METHOD and the CONTENT_LENGTH variables. In addition, any other environment variables which the program uses [HTTP_USER_AGENT, REMOTE_ADDR, etc] should be set to appropriate values.

The REQUEST_METHOD variable should be set to the appropriate method by which the form data would be received by the CGI program. In addition, the form data should be placed in the appropriate place. If the form is POSTed, REQUEST_METHOD should be set to "POST", and the form data should be piped in on standard input. If the form data is GETed, REQUEST_METHOD should be set to "GET" and the form data should be placed in the QUERY_STRING environment variable.

If the data is POSTed, then the CONTENT_LENGTH environment variable should be set to the length of the file which will be piped into the program's standard input. This will allow the CGI program to read the correct amount of data.

When using a command-line environment and the "POST" method, the faux-web-server form data should be piped directly into the program. An example at the UNIX command-line would be the following [typed-in command is bolded]:

user@machine [~/public_html] cat form.data | program_name.cgi
<< CGI output for that form data follows >>

When using a command-line environment and the "GET" method, the faux-web-server form data should be put in the QUERY_STRING environment variable. An example of doing this at the UNIX command-line would be the following typed-in command is bolded]:

[with sh or bash]:

user@machine [~/public_html] QUERY_STRING=`cat form.data`; export QUERY_STRING

[with csh or tcsh]:

user@machine [~/public_html] setenv QUERY_STRING `cat form.data`

After the form data is in the correct location, the program can be run and the output will come as the web server would receive it.


[ Return to XCF home page | Return to Help Sessions main page | Return to VHS home page ]
Author: Josh Sled
$Id: cgi-programming.html 1.3 Fri, 20 Mar 1998 21:52:06 -0800 jsled $