What is source code?

Defining the concept of "source code"

Introduction

The term "source code" (or source, for short) is often used in the context of computer science and technology, especially when speaking about free software, but also when discussing proprietary software. Unfortunately, this term is not always used in a precise meaning: it happens to be abused in some cases or to be used in a somewhat vague manner.

Since the concept of source code is pivotal for a number of fundamental properties of free software, a clear definition of source code is needed. Such a definition should be meaningful for computer programs, and for all other categories of software, as well.

Definition of source code

The most commonly used and accepted definition of source code (at least in the context of free software) is the one found in the GNU GPL license:

Source code: The preferred form of a work for making modifications to it.

It should be noted that the above quoted definition of source is:

short and elegant
quite robust, as it basically always leads to identify one source form for any given work
flexible enough to cover virtually all cases

Other forms of the work which are automatically produced from the source code are called object code:

Object code: Any non-source form of a work (directly or indirectly) generated from the source code by an automated process.

Any form of a work which is ready to use for the end user may be called a final form.

FAQ

Is this really the most commonly used and accepted definition of source code? I am not convinced...

No one has yet proposed a better definition of source code. Until someone comes up with a superior definition, the one found in the GNU GPL license seems to be the best definition available for the concept.

As a consequence, this is the most widely adopted definition of source code.

This definition looks too complicated! Could source code be simply defined as any form of a work that can be reasonably modified?

The proposed alternative definition is indeed simpler, but it is pretty useless: even a binary executable file can be reasonably modified with a hex editor. Please note that any modification can be done on a binary file with a hex editor: some kinds of modifications are just more difficult to do on binary files than on other forms...

In other words, any form of a work would fit the simpler definition, thus causing the concept of source code to become pointless.

What if the author edits the work directly in the form which is used by the end user? There is no separate source form in these cases...

There are indeed several cases where the preferred form for making modifications is also ready to use for the end user. There's nothing wrong with such cases: the source form of the work is also a final form. It's true that there's no distinction between the final form and the source form of these works, but this is not a problem at all.

Examples of this situation include, for instance, many scripts written in an interpreted programming language (such as POSIX shell, AWK, Perl, Python, Ruby, and so forth), web pages manually written in HTML, and many more...

The definition of source code talks about a "preferred form": preferred by whom?

The person whose preference should be taken into account is the one who last modified the version of the work under consideration. If he/she prefers to modify the work in a given form, then that form is the source code for the work.

If someone actually modifies a work (in a non-trivial manner), he/she has shown in practice what is his/her preferred form for making modifications to the work itself. This is a much stronger indication of preference than simply claiming in a vacuum what would be the preferred form for making modifications that are not (yet) actually made.

Can the source for a work change form?

Yes, it can. It may happen that the source code for a modified version of a work is in a different format with respect to the source for the original work.

For instance, consider the case where a program originally written in, say, Pascal is (manually or automatically) translated into, say, C++. After that point, one may go on modifying the program by editing the C++ code, if he/she so prefers; the source code for the modified program is then the C++ code, rather than the original Pascal code, since the C++ code is actually being preferred over other forms for making further modifications.

What if two forms of a work are equally preferred for making modifications? Does the definition of source fail in such cases?

There is no problem at all in such situations. If more than one form of a work are equally preferred for making modifications to the work itself, then it is not clear which one is better: when this is the case, any of these forms can correctly be considered as the source code of the work.

Suppose an author uses reasonably readable code (with comments, indentation, sensible naming conventions, and so forth) to modify a program, but only distributes stripped code (with comments and indentation removed, names mapped to arbitrary identifiers, and so forth) to others. Can the stripped code be considered source code?

No, it cannot. The stripped code under consideration is a clear case of deliberately obfuscated code: it's not source code, since the author prefers to use the readable code for making modifications to the program. As a consequence, the source form of the program is the readable code, not the stripped one.

What if the preferred form for modifications no longer exists?

If some form of a work no longer exists, it cannot be the preferred form for making modifications to the work itself.

One thing is when the author/maintainer uses a form of the work to make modifications (because he/she prefers that form), but does not make this form available to others. In this case, the actual source is being kept secret, and the work, if distributed in some other form, is not free software.

One completely different thing is when nobody has some form of the work any longer. That form cannot be preferred for making modifications, since it no longer exists. In this case, the actual source is the preferred form for making modifications, among the existing ones.

What happens when someone edits a work in an uncompressed or space-inefficient form, then generates a compressed or space-efficient form and deletes the original form because it takes up too much storage? Plenty of sane people will do this with audiovisual works, for instance.

In this case, by deleting the space-inefficient form, he/she clearly shows that he/she prefers to keep the space-efficient form for future modifications, rather than the original form. Hence, in this case, the actual source is the space-efficient form, being the preferred form for making further modifications.

Sometimes, the space-inefficient form may be significantly larger than the space-efficient one: the former may become really unpractical to handle. In these cases, maybe it is preferable to use the space-efficient form to make further modifications, just for practical reasons: at that point, the preferred form for making further modifications is that space-efficient form, which is consequently the actual source, even if the space-inefficient form is not deleted.

Maybe the person who edited the work is not interested in making further modifications. That could be the reason why he/she deleted the space-inefficient form. Is the source lost, in this case?

The fact is, whatever that person may think, sooner or later, the need for further modifications may indeed arise. At that point, that person will answer the question "what is the preferred form of the work for making modifications?" with "I would have preferred that other form, but I discarded it, hence I prefer this one, among the ones I kept around...".

The source is therefore not lost: it is again to be searched for among the existing forms of the work.

Consider the case where an author keeps a space-inefficient form of a work (intended for future modifications), but refuses to make it available to other people, due to practical issues (lack of a good enough network link, for instance). What if he/she distributes the work only in a space-efficient form?

When the original author of a work keeps the preferred form for making further modifications (that is to say: source code), but refuses to make it available to others, when distributing the work in some other form, that work is not free software.

In other words, you're not distributing free software, if you keep the source undisclosed.

What about digital photographs? The source consists of the photographed things and living beings. How can the source be copied and distributed in this case?

The photographed things and living beings are not the preferred form of the picture for making modifications to it. They are instead the preferred thing for re-creating the photograph from scratch.

The source code for a digital photograph is the picture itself in the form which is preferred for making modifications. Depending on which format is extracted from the digital camera and on which form one starts with when making modifications, the source may be in raw format, JPEG, or some other form...

Francesco Poli

This work is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License, version 2. It comes with absolutely no warranty. See the license text for details.