The Development of Statistical Computing at Rothamsted

An account is given of the development of statistical computing at Rothamsted. It is concerned mainly with the period from 1954 (when the first electronic computer was delivered) until 1985 (when this article was written). Initially, many specialised programs were written, but it was soon realised that, for efficiency, general‐purpose programs—each unifying many statistical techniques—were required. The development of these programs was gradual and required corresponding developments in statistical theory. Now, the bulk of statistical work, not only for Rothamsted but also for the Agricultural and Food Research Service (AFRS) as a whole, is covered by a few programs, notably Genstat that has an international market. Further developments of these programs are required to make them more accessible to scientists who are not well versed in statistics and to take advantage of technological advances.


Introduction
Applied statistics involve extensive 1 C 1 calculation that always seem to have been limited by the computational facilities available; theoretical advances in statistics often stem from observations made during the course of calculation. Thus, statisticians have long had a deep involvement with computing, and, nowadays, computers may be regarded as their laboratory instruments. Around the turn of the century, the Biometric school, associated with the names of Karl Pearson, Galton and Weldon, relied heavily on Brunsviga calculators with which were computed the astonishingly extensive Biometrika Tables for Statisticians; Brunsvigas were still in use at Rothamsted well into the 1960s. A letter from Student (W. S. Gosset, whose employer, Guinness, required a pseudonym) dated September 1919, in reply to one from R. A. Fisher on his appointment to Rothamsted, advises on suitable calculating machines, mentioning the Triumphator (an improved Brunsviga), the Millionaire (an improved Tait) and the Burroughs adder on which Student comments 'unless you've a horrid lot of tots to do I fancy that is rather expensive'. Fisher did not buy the Triumphator, but he early bought a Millionaire for the Statistics Department at a cost said to exceed £200, a considerable investment at the time but commensurate with the importance of statistical calculation. Yates bought two more in 1939 and was still using one in 1980.
At Rothamsted, the 1920s and 1930s saw a rapid development in experimental designs and associated methods for their analysis. During this period, an increasing number of experiments was analysed (115 in 1934 rising to 437 in 1951). Also in the period, the first edition of Statistical tables for biological, agricultural and medical research (Fisher & Yates, 1938) was published with the work shared between the Galton Laboratory and Rothamsted. Under Frank Yates, the 1930s saw the beginning of the use of sample surveys in agriculture whose analysis by the post-war years had become a substantial departmental commitment. The results were put on to punched cards and analysed externally at a Hollerith Bureau (Hollerith being part of the British Tabulating Machine Company, a British counterpart of International Business Machines Corporation). In the 1948 Annual Report, we find the statement: "During the year the department has had the use of a sorter-counter and arrangements have been completed for the installation of a rolling total tabulator and the replacement of the sortercounter by a sorter." This equipment was delivered in June 1949 when it was found that 'having a machine on the spot under the direct control of the research workers has resulted in a far more enterprising and flexible approach to punch-card work than was the case when all the tabulation had to be carried out at a separate bureau'. This claim was indeed borne out when, in 1950 work was reported using this equipment (by then slightly modified) for the analysis of surveys, the analysis of replicated experiments, multivariate analysis and distributional problems. The equipment had been designed for accounting work, and considerable ingenuity had clearly been exercised to cope with this range of statistical problems, many of which involved multiplication and not a builtin facility of punched-card tabulators. As was to be expected, the availability of the equipment was a stimulus to the development of statistical methodology in previously neglected fields. The equipment was already becoming overloaded; by 1951, a reproducer-summary punch had also been acquired that allowed more flexibility by providing simple facilities analogous to the storing on file of intermediate results. Nevertheless, by 1952, the limitations were being felt -"One stumbling-block is the very heavy computing involved -punched card machines are valuable in this field, but are by no means a satisfactory solution to the problem." It is not surprising therefore to find in 1953 that Healy and Rees attended a programming course for electronic machines held at the Mathematical Laboratory at Cambridge, where here happened to be a prototype computer, the Elliott-NRDC 401, which in 1954, on the advice of a Visiting Group, was moved to the Statistics Department at Rothamsted. This article is mainly concerned with subsequent developments.

The First Computer
The national picture in computing in 1954 was that important prototype computers were working in research laboratories at Cambridge University (EDSAC), the National Physical Laboratory (Pilot ACE) and at Manchester University, UK. These machines had been running for several years and in spite of Professor Hartree's optimistic statement that two EDSACs would satisfy all the nation's scientific computing requirements for the foreseeable future, the first generation of commercially produced machines was just coming on to the market.
The Elliott-NRDC 401 was itself the prototype for a moderately successful range of commercial machines. It was the first computer to be associated primarily with agricultural research and with statistics (formally 50% of its time was available to NRDC, the National Research Development Corporation, but little, if any, of this was taken up). Isolated statistical computer programs had been written elsewhere but usually in universities and for particular research projects. Although such uses of computers were of interest, the Statistics Department was also faced with the problem of how best to cope with a large and increasing amount of statistical computation of a more routine nature.
The limitations of the early computers are perhaps not familiar to the younger generation. All differed, but the 401 was in many ways typical. It could support only one user at a time, it had no compilers so that all programming was in machine code, it was supplied with no software, it had a tiny 'fast access' store (five words), backing store was switched by electromechanical relays, input was through a very slow and unreliable paper-tape reader, and output was onto an electric typewriter. Instructions and data held on a rotating disc could be read only when they passed the reading heads, and efficient programs therefore required 'optimal programming' to ensure that the successive instructions passed the reading heads just at the right moment; otherwise, the disc would have to waste a rotation or part of one. Because there was a single user, often the programmer himself, programs could be partially controlled from the hand-switches on the computer console. This allowed programs to be stopped and started, or, as an aid to finding faults, instructions could be obeyed step-by-step (a device now commonly available in a somewhat more sophisticated form in present-day operating systems), or convergence of an algorithm could be identified by visual inspection of oscilloscope monitors and the hand-switches used to change the course of a program or initiate output.
The machine had a 32-bit word length, five fast registers (two of which formed a doublelength accumulator), seven 'immediate' access tracks, each of 128 words, and 16 further 128-word tracks, any one of which could be selected by a relay. This gave a total store of 2944 words plus five fast registers, but 128 of these were reserved for the initial orders that allowed programs and integers to be read, and integer results to be printed; it also contained a division subroutine. These initial orders were originally written in Cambridge, but, after arrival at Rothamsted, they were substantially improved by D. H. Rees. They certainly needed improvement, for the original integer input required numbers to be written backwards, and the division subroutine gave the wrong answers. In 1957, Rees added three further fast registers and, in 1961, eight further relay-selected tracks. For further information, see Healy (1957) and Yates & Rees (1958).
All of this was very primitive and tiny compared with the capacity and reliability of a cheap modern computer, but it was a major advance on the existing punched-card equipment because it allowed flexible programs to be written for a wide variety of statistical computations. As well as essential subroutines for basic mathematical operations (divide, square-root, log, decimal reading, floating point operations, etc.) the first year saw the beginnings of the development of a library of programs for standard analyses. Already a program for analysing randomized block experiments was in use (Healy) and work had started on programs for factorial experiments and Latin squares. The possibility of using the 401 for survey analysis was also being considered, but the lack of a card-reader was a major obstacle. Several research investigations were in progress including what must have been one of the first uses of canonical variate analysis. This multivariate technique, thoroughly understood since before the 2nd WW, involves inverses and eigenvectors of matrices, barely possible by hand but now feasible. The problem under investigation was to try to discriminate between men and great apes by using teeth measurements, in the hope of throwing some light on the nature of australopithecine fossils. Yates sums up the first 9 months experience with the 401 and gives what turned out to be a remarkably accurate forecast of the future as follows: "Having an electronic machine on the spot has made all the difference to developing its applications to research statistical problems. In this respect our experience is exactly parallel with our experience with Hollerith equipment, where we found that it was only by having equipment on the spot, so that research workers could themselves use it, test out different methods and examine the results as they were obtained, that we could exploit its full potentialities.
The introduction of electronic methods of computation will make available for regular use statistical methods which at present are scarcely used because of the heavy numerical work involved. This in turn is likely to lead to major developments in method. It will also facilitate and speed up the routine analyses which are at present done on desk machines, but which are of a sufficiently standard type to be programmed electronically, and enable a much more thorough preliminary examination of the data to be made (to check for gross errors, inconsistencies, etc.) than is at present customary or possible." A detailed account of work done on the 401 over its 9 years of life cannot be given here. Apart from the basic support software developed and research projects that directly generated about 60 scientific papers, programs for general purpose analysis were being developed apace. These programs may be grouped into four main areas with the development much the same in all areas. An initial series of programs for very restricted purposes was produced, which were later improved and combined and by the early 1960s, these were leading to the development of a few more general programs, each of which could cope with a range of statistical computations. In the remainder of this section a brief description is given of the developments in each of the four areas.

Analysis of experiments
A randomised block program was written in 1954 and one for the 3 3 single replicate design in 1955. By 1955 it was already clear that all programs for analysing experiments should adopt the same conventions for presenting the data and for deriving new variates needed for analysis. Thus, in 1955 a program named GIED (General Input for Experimental Designs) was written and thereafter this was appended to all new programs for the analysis of experiments. A number of programs subsequently followed during the 1950s, as well as improvements on them. These included a Latin squares program, a randomised block program, a program for the analysis of split-plot experiments and a program for 2 n designs. The methods of analysis differed in many ways from those used on hand calculators, working in terms of deviations from block means, using efficiency factors to adjust for unequal information and using pseudoyields to check the pattern of confounding. The analysis depended critically on the careful distinction between treatment factors and local controls, such as blocks, rows and columns, thus anticipating later developments to be described below.
All of these programs permitted covariance analysis and handled missing plots using a simple but effective algorithm developed by Healy & Westmacott (1957). A further feature was thatas well as the basic tables of estimates of treatment effects-with their standard deviations and associated analyses of variance, tables of residuals too were given.
For the first time, these allowed certain checks to be done, for an exceptionally large residual or systematic pattern of residuals draws attention to what might be errors in the data or unusual field patterns-both of which require further investigation. A fuller account of this work is given by Yates et al. (1957).
Meanwhile GIED had been improved to extend its facilities and to make it what would now be termed 'user-friendly', and it appeared in its Mark 3 version in February 1961. Other programs were written, notably the Multiple Orthogonal Classifications program (Gower in 1958) that covered many of the more simple factorial designs; in 1962, the first attempt  Rothamsted, 1934-84. at a General Experiments Program  briefly described at the end of the next section.
The effect of these programs on routine analysis is shown in Figure 1, where the number of experiments (variates) analysed rose from 419 (834) in 1955 to 3383 (18 054) in 1964. Not only were more experiments analysed, many from the National Agricultural Advisory Service NAAS), the forerunner of the Agricultural Development and Advisory Service and the National Institute of Agricultural Botany, but there was also an increase from an average of about two variates per experiment to over five variates per experiment.

Analysis of surveys
The development for surveys is similar but differs because survey designs are less standardised than those for experiments. It was not until the end of 1956 that a primitive card reader became available. By July 1957, one of our regular surveys, the Survey of Fertilizer Practice (SFP), was being analysed on the 401 (Simpson). After that, a number of programs and modifications were added.
Clearly, a lot had been learned during these developments, and a survey of the incidence of cattle disease was quickly analysed. By 1958, we have the first attempt at a more general survey program, one for Stratified Random Sampling. Further surveys continued to be analysed. In 1959, Frank Yates was revising his book, Sampling Methods for Censuses and Surveys, for its third edition during the course of which he wrote a new chapter on the use of electronic computers for survey work, which described a general system for the specification of survey analyses. He then thought that the 401 was too limited for what was required, but this proved a somewhat gloomy view and by 1960, the General Survey Program (GSP) was written, which in a revised form, and now termed the Rothamsted General Survey Program (RGSP), continues to provide analyses for all our surveys. This work is described by Simpson (1961), and Simpson (1960, 1961). GSP (and RGSP) works in two parts: the first part reads in the sample data unit-by-unit, allowing for almost any generality of design, checks the data, stores them on file and forms basic multiway tables; the second part is essentially a table manipulation language operating on the tables produced by the first part. In late 1961, Elliott Automation gave the department an Elliott 402 (the commercial development of the 401), and this was used exclusively for the first part of GSP analyses, until March 1965, when the machine was transferred to Watford Technical College. The success of GSP suggested that a similar approach could be adopted for experiments; hence, a General Experiments Program was developed that was intended to operate on experiments in a similar way. This provided a basic programming language that was used occasionally to program some of the more unusual analyses -it did, however, require a precise knowledge of how to do the analysis.

Curve fitting, distribution fitting and assay
Curve fitting, distribution fitting and assays provide yet another example of the development from specialised to general programs. These were developed during the 1950s and early 1960s and included programs for fitting the negative binomial distribution, exponential regression, the logistic curve, Chebychev polynomials, probit analysis and logit analysis. Soon, it was realised that most of this work was fundamentally a matter of minimising a sum-of-squares or maximising a likelihood so that general methods for functionoptimization, developed by numerical analysts, should cope. This approach was used in the general program for Estimation of Parameters in Maximum Likelihood Equations, a precursor of the Maximum Likelihood Program (MLP), now the standard program for non-linear modelling (Ross).

Multivariate analysis
The story for multivariate analysis has similarities with the developments described earlier but also has some marked differences. A program for multiple regression was, of course, developed early (by 1956, if not before). Most other multivariate analyses were done using basic matrix subroutines. Thus, even in 1955, a program named Latent Roots and Vectors (Slow) had been written for use with the apes' teeth project. This was followed by a fast version, a Choleski triangulation and pivotal condensation, both basic operations for matrix inversion and hence multiple regression. Several other matrix subroutines were prepared and at the end of 1956, the collection of some of these into a small matrix package, AUTOMAT with a simple control language, represented an especially important development (Healy). AUTOMAT was a subroutine package that allowed many of the classical multivariate analyses to be concisely programmed.
Work on classification, in the sense of forming classes, was new and stimulated the development of several inter-connecting programs. Basically, one program evaluated a similarity matrix for up to 128 taxa according to a general coefficient of similarity (Gower, 1971) and punched it out in coded form on to paper tape. This had to be done at least twice to check for and hence eliminate punching errors. Further programs read this tape and operated on the similarity matrix to give hierarchical classifications, summaries, etc. Thus, in multivariate analysis, there were few special-purpose programs, but methods were provided for their easy construction.
The 401 was switched off in July 1965, and the machine was taken to the Science Museum, South Kensington.

After the 401
Much was learned from the work done on the 401. Perhaps the main lesson was -'to survive and unify'. The first steps towards unification had already been taken: GIED, The General Survey Program, The General Experiments Program, AUTOMAT, Maximum Likelihood Estimation by optimization, an integrated system of Classification Programs, and an integrated system of Multivariate Programs. Towards the end of the life of the 401, serious thought was given to how best to use the next computer, a Ferranti (later International Computers and Tabulators, later International Computers Limited) Orion computer that was eventually delivered in October 1964, but did not become operational until March 1965. Because all existing programs had perforce been written in the 401 machine code, there was no question of transferring them to the new machine. Yates (1966) contains a discussion that illustrates the thinking at that time.
The possibility of unifying our programs into a few general programs was actively pursued. There were several types of unification to be considered. Firstly, there was unification of special programs into more general programs. Our original programs had a variety of different input conventions that only added to the initial chaos, later rectified, of the 401 having different input, and output codes -clearly data-presentation conventions needed to be standardised. Finally, the increasing importance of storing data and computed results on magnetic tape files that could be retrieved for further analyses also required standard storage formats.
Unification of methodology directly implies unified general programs; indeed, the urge to unify programs often stimulates the research needed to unify statistical methodology. Examples are the unifications needed to provide a general framework for analysis of variance algorithms (Nelder, 1965a, 1965band Wilkinson, 1970, the concept of generalised linear models (Nelder & Wedderburn, 1972) that contain many well-known models as special cases and the unifying concept of distance in multivariate analysis (Gower, 1966).
Linking the concepts of general programs, standard input and output conventions, standard ways of describing the structure of data and standard filing structures, points to the desirability of having a unified control language.
A number of unified programs were developed, including ones for non-linear model fitting, classification, survey work using a rewritten GSP, multivariate analysis and experiments. All of these programs incorporated a rewritten and extended version of GlED.
The Orion Computer was provided with a compiler for Extended Mercury Autocode (EMA), a language comparable, and in some ways superior, with early versions of Fortran. Mercury Autocode was developed at Manchester University, in about 1954 but this was not available on the Rothamsted Orion until 1966. Thus, the earlier mentioned programs were written in machine code. Because everyone was familiar with machine-code programming this was no hardship; indeed, it was regarded as an advantage because computers remained slow and the importance of having efficient programs for frequent general use could not be ignored. There was eventually a move towards EMA, in the interest of portability of programs, and most other programs used the language but to no avail as EMA died out.
Two important programs written by Yates for the 401 need to be mentioned. FITCON was written in 1957 to fit constants for main effects and interactions to multiway tables. This program was generalised in 1958 to fit quantal data and called FITQUAN. Both were rewritten in EMA. FITQUAN was a forerunner of Generalized Linear Interactive Modelling (GLIM, see in the succeeding paragraphs) and would now be described as analysing data in tabular form, using a generalised linear model with the binomial distribution and a probit link-function.
The Algol60 report was published when the 401 was reaching the end of its life and plans were being made for transfer to the Orion. Experience with GEP, GSP, and the matrix and multivariate programs suggested that our routine work might be effectively done by forming a library of statistical subroutines that could be linked together by an appropriate language to be embodied in what was called The Survey and Experiments Program (SEP). Computer languages like Algol and Fortran have powerful facilities for defining general operators but are less good at defining general operands, whereas Statisticians are familiar with a multitude of different structures (matrices of various forms, multi-way tables with or without margins, hierarchical structures that may or may not have different kinds of information at different hierarchic levels, etc.) and often think in terms of operating on them as entities. SEP was intended to plug this gap but it had features of the Mythical Man Month (Brooks, 1978) and was discontinued after 2 years work. As Yates wrote in the Annual Report for 1966 "It gives us useful experience on what is required in a general statistical language for future machines." In the period 1964-1969, some 90 programs were written for the Orion program library. Many more, of course, were written for special research projects that had only transitory value and were not worth recording. Figure 1 shows a steady rise in the numbers of experiments and variates analysed, reaching 6124 experiments and 50 373 variates in 1967.
During this period John Nelder, then Head of Statistics at National Vegetable Research Station (NVRS), Wellesbourne, was a frequent visitor and user of the Orion at Rothamsted. Because of the variety of ways that data were collected at NVRS, he developed what was termed a Three-Tier System for the Analysis of Experiments. The first tier consisted of a set of programs to read in data and convert them into a standard form. The second tier, a modification of GIED, operated on the stored data. The third tier consisted of a set of programs for analysis. Other programs in the series were concerned with the writing to, reading from and editing of, magnetic tapes connected with the long-term storage of experimental data and intermediate results. Thus, there was a strong concern with standardised conventions and filing formats.

Genstat
Yates retired in 1968, and Nelder was appointed Head of the Statistics Department. Interest in computing was growing, and Rees was appointed Head of a new Computer Department at Rothamsted. The remit of the Computer Department was to run the computer service and to deal with non-statistical computing; responsibility for statistical computing remained with the Statistics Department. Thought was already being given to the replacement of the Orion and hence what was to be done about our statistical programs, as it was clear that EMA, would cease to be available.
In 1966, Nelder spent a period in Adelaide, Australia working with G. N. Wilkinson at the Waite Institute of the University of Adelaide and at the CSIRO Division of Mathematical Statistics. While there, he developed ideas for the concise specification of experimental designs in terms of their separate block and treatment structures. These ideas stemmed from two papers (Nelder, 1965a(Nelder, , 1965b) that unified understanding of design and had repercussions on analysis.
Wilkinson developed a very general algorithm that operated on the design specification to give an analysis of variance that in a recursive form has great versatility. In practice, the algorithm was programmed in a non-recursive form, which copes only with first order balance but nevertheless, handles a wide class of commonly occurring block designs with confounding, fractional replication and error terms associated with multiple strata. To this program was added a derived variate section, equivalent to GIED, but now expressed in a Fortran-like language. This was named Genstat (General Statistical Program) and first appeared in May 1966. The introduction to the 1970 version of the user manual indicates the scope of this Genstat: "Genstat 4 is a computer program system for statistical analysis of observational (sic) data, developed initially at the Waite Institute for Agricultural Research and CSIRO Division of Mathematical Statistics. The system provides general facilities for analysis of variance, multiple regression and covariance analysis, and for generating, operating on, storing and retrieving, listing and tabulating data files." The relation of this program to the program Genstat developed at Rothamsted after Nelder's arrival has caused much misunderstanding. It was certainly one of the contributory strands, but there were others, not least the work that had been done at Rothamsted over the preceding 10 years. Probably, it is now impossible to disentangle the many threads, but the following is my understanding of the various influences on the design of Genstat.
Analysis of experiments came via Adelaide using Wilkinson's analysis of variance algorithm, several times recoded and revised (Wilkinson was at Rothamsted from 1970-1975, and Nelder's structure-formulae for specifying treatment and block aspects of the experimental design.
Multiple regression and standard multivariate directives for components analysis and canonical variates analysis were in the style of MAP, and the classification directives are a direct transfer from CLASP. The standard matrix-algebra and table-handling facilities of the Genstat language, supplemented by the multivariate directives, give a powerful language for writing multivariate macros, of which some 30 are now in the Genstat macro library. This is essentially the subroutine package approach envisaged for SEP, which allows new statistical programs to be written without extending the language itself.
Later major additions to Genstat were, in 1977, Generalized Linear Models, through GLIM, a project formally of the Royal Statistical Society with which Rothamsted was much involved (see below) and in 1979, Time Series Analysis. In 1981 the optimization sections of MLP were incorporated into Genstat.
Initial development of Genstat at Rothamsted was much hampered by problems with the computers. The programming language was to be Fortran, so the Orion Computer could not be used, and it was not until November 1970 that the new ICL 4-70 was commissioned in the Computer Department, and this gave trouble for several months in 1971. Thus, development had to be done remotely, initially, through a bureau in London and then using a card-reader/line-printer link with an IBM machine in the Edinburgh Regional Computing Centre. Early versions of Genstat were available in 1971, but it was not until March 1972 that the system became generally available, after which, it was quickly to become the standard statistical computing language of the A(F)RS Agricultural and (Food) Research Service Institutes. The outstanding success of the Genstat project owes much to John Nelder's leadership and to his many contributions. The whole project was (and continues to be) overseen by a committee; Nelder was chairman until his retirement in 1984.
With the development of Genstat, most of the Statistics Department's computational needs for routine analysis were accommodated, and the Genstat language provides a powerful tool for programming research problems and for assembling a macrolibrary for statistical computations of a less routine nature. Currently, there are 43 programs in the macro library. In 1983, work began on a major revision to define Genstat 5. This was planned for release in 1986.
Genstat soon acquired users outside the AFRS and now has 388 installations in 35 countries. This success has brought associated problems. To run on a wide range of computers (some 30 models), great attention must be paid to providing portable program code. This means that programming must be in standard versions of Fortran with any sections that might be machine-dependent being carefully flagged. Then there are the problems of actually providing the different versions from the master-code; this has been handled through liaising with special implementors, often at university sites, and by providing a special conversion program to select appropriate variants of the code. Thus, at external sites, the releases mentioned earlier may occur months, or even years, after they have occurred at Rothamsted, so that, inevitably, several different versions of Genstat currently coexist and must be maintained. A scale of charges and suitable legal contracts have had to be worked out. A back-up service has to be provided to handle queries and reports of real or imagined errors. Sometimes users request special facilities, which have to be considered in the light of their more general utility. A great deal of time has been taken up with providing good documentation both for users and for implementers, but much remains to be done (Alvey et al., 1982;Alvey et al., 1983). There is a continuing demand for courses on the use of Genstat both from within the AFRS and from external users. All this makes considerable demands on the Department, which has been unable to use the revenues from Genstat for its support and further development.
Some alleviation of this problem was made in 1979 when the Numerical Algorithms Group (NAG), a non-profit making organisation based in Oxford, agreed to take over distribution both of Genstat and its documentation and also to act as a first line of defence for queries; since 1985, NAG has also taken on the administration and coordination of most of the implementations of the different releases for different machine ranges.

Other Programs
Genstat caters for most of our needs, but the language is not particularly well suited for surveys because it has only primitive facilities for unit-by-unit operation. Thus RGSP remains our main survey program; it has been much improved and is now fully portable (see Yates, 1949Yates, , fourth edition, 1981. Although many of the facilities of some of the freestanding programs are now integrated with Genstat, they are rather more efficient and offer a few extra facilities and so remain in use. A further program GENKEY, whose genesis lies around 1964, was developed in a fully operational form by R. W. Payne. GENKEY calculates identification keys for groups of organisms and prints them out in a directly usable form. Thus this is a specialised program, but it has many potential uses; the principal use so far has been to construct keys for 469 species of yeast and to produce, via the laser-printing service at Oxford University, UK, a finished book published by Cambridge University Press (Barnett, Payne & Yarrow, 1983). A few programs produced elsewhere are occasionally used, and some research projects are most conveniently programmed in Fortran or another language at a similar level.
The Generalized Linear Model project, which produced the first version of the program GLIM in 1973, occupied several members of the Department. It was officially a project of the Royal Statistical Society's Working Party on Statistical Computing, with John Nelder as Chairman. The program FITQUAN mentioned as being developed on the 401 is a special case of a generalised linear model (GLM). The special feature of a GLM is that the observations are assumed to belong to a statistical distribution whose expected value can be expressed as a function of a linear function of the parameters concerned. It has long been known that the maximum likelihood estimation of the parameters of such models can be computed iteratively as if the problem were one of weighted least squares but with the weights changing on each cycle of iteration. In an important paper, Nelder & Wedderburn (1972) unified the theory, especially for the exponential family of distributions that covers most of the commonly occurring cases. The Working Party set about programming this approach with Nelder, Wedderburn and Rogers, and later, Baker, Payne and White all making major contributions. The associated theory is described by McCullagh & Nelder (1983). Since 1974 NAG has handled the marketing of GLIM. Other free standing programs at Rothamsted, especially MLP (Maximum Likelihood Program) and GENKEY, have developed over the same period as Genstat and have generated similar administrative problems. In 1984, NAG took on the distribution and conversion of MLP. In 1985, because of lack of resources, work at Rothamsted associated with the GLIM project was reduced to a residual level. Most generalised linear model analyses are done using Genstat but it is anticipated that GLIM will continue to be used.
From 1968 onwards the method for collecting annual statistics on routine jobs done on the computer was changed from numbers of experiments and variates analysed, to numbers of jobs successfully run and units of data submitted. The first method was based mainly on experimental analyses, while the latter method includes many other types of statistical jobs of a routine nature. Thus the two sets of figures are not comparable. Nevertheless, an attempt has been made to include both in Figure 1. Whether or not the peak shown for jobs runs in 1970 and 1971 is real, is hard to say. The new Computer Department took on some data preparation and other A(F)RS institutes began to prepare their own data and submit their jobs directly, which could account for a significant drop in the early 1970s. The number of units analysed has fluctuated fairly wildly, but there is an overall upward trend reaching well over two million units in the past 3 years; we have no figures for the AFRS as a whole. The number of jobs also fluctuates but about a steady state of about 1800 jobs. Hitherto, all data has been entered on keyboardcontrolled equipment-firstly, on a variety of paper-tape machines, then punched cards and now on floppy discs. In all these methods, careful procedures are required for checking that data has been entered correctly. Nowadays, data may arrive on floppy discs or cassettes and there is a move (first noted in 1959) towards direct entry from data-loggers or laboratory instruments. At first sight, this would appear to remove the need for checking, but it is becoming clear that automated data too can be erroneous and that methods must be devised to control their quality. This is particularly important when such data is 'unseen by human eyes' so that even major discrepancies can pass unnoticed. The checking of data quality is an unglamorous but important aspect of statistical computing because the most penetrating statistical analysis using the latest computing equipment is futile, or worse, if the data are of poor quality.
It is not only in numerical aspects of computing that the computer has been useful to statisticians. Since 1947, the Department has been concerned with producing an annual 300 page publication of the Yields of the Field Experiments. Originally this was typed in its entirety but we gradually mechanised the task; since 1981 the text has been edited on a word-processor (using the previous year's files) inserting the numerical part into the text by transferring files produced by Genstat analyses. This has not only saved several weeks' work but also gives a more accurate and better printed production.

Conclusion
The work begun in 1954 has progressed logically to provide a few general portable programs, the most important of which is Genstat, that handle both the research and routine statistical calculations required by the AFRS. Because statistical principles are of general applicability, the use of these programs is not only confined to agricultural research but also is of value to applied statisticians working in medicine, environmental science, in the social sciences, psychology, industry, local government, etc. The guiding principle has been to unify (i) the analysis-specification and data-description level and (ii) at the statistical level. The first level of unification has implied a unified control language with good operand (data description) definitions and methods for operating on these operands. The operations may be controlled by many parameters, some of an optional nature, that should be specified in a concise and simple manner. Unification at the statistical level has depended on statistical research within the Statistics Department into generalised linear models, multivariate data analysis, analysis and design of experiments, estimation and optimization, and into the theory of diagnostic keys.
It has been amply demonstrated that the design of effective statistical computing systems requires extensive statistical knowledge and research. Certainly Genstat could never have been written without the considerable statistical experience and expertise available in the Statistics Department. One has only to examine the appalling statistical packages produced commercially for microcomputers, which can only have been written by non-statisticians or perhaps by very inexperienced statisticians. Fortunately microcomputers are increasing in capacity and Genstat is already available on machines in the personal computer range.
In case readers get the wrong impression, I must emphasize that the staff resources available for work on statistical software are meagre. The number of posts concerned fluctuates enormously from year to year, depending on whether or not special effort is being put into developments or whether it is maintenance that is mainly required. At peaks, up to 12 persons have been involved but all have other substantial commitments in statistical consulting and research; an average number of posts is nearer four and a half.
Other groups of statisticians might have done things differently and, indeed, they have. In the USA, the Biomedical Programs BMD(P) package was developed at the University of California, Los Angeles, the Statistical Programs for the Social Sciences (SPSS) at the University of Chicago and the Statistical Analysis System (SAS) at the Biostatistics Department of the University of North Carolina in Raleigh. All of these are now profitable commercial enterprises. The system S developed at the Bell Telephone Laboratories, New Jersey (later called AT&T Bell Laboratories) has profited from its relation to the UNIX system and is oriented more exclusively to the exploratory data analyst and research statistician than also to those who do more routine analyses. I believe that Genstat compares well with the best of these, by providing better statistics, more flexibility and by using computing resources more efficiently. We look forward to the next 30 years of statistical computing at Rothamsted. Epilogue R. W. Payne, VSNi International VSN International Ltd., 2 Amberside House, Wood Lane, Hemel Hempstead, HP2 4TP UK An initial release of Genstat 5 was made to AFRC sites in 1986, backed up by a series of 'conversion courses' to introduce existing users to the new syntax. The reaction was overwhelmingly positive, apart from some disquiet at the AFRC Computer Centre about the increased load on their VAX computers. However, these had been bought to support interactive computing, and upgrades soon followed to support the new interactive style of analysis that put the statisticians back into close touch with their data. The full release of Genstat 5, in 1987, included a much improved reference manual published by Oxford University Press, thus addressing some of the concerns about documentation, noted earlier. Another important enhancement was the provision of high-resolution graphics and, on the statistical side, the inclusion of the REML algorithm for analysis of unbalanced linear mixed models, by permission of Rob Kempton of the Scottish Agricultural Statistics Service (now Biostatistics Scotland). This also marked the start of a very important collaboration with Robin Thompson-who later (in 1995) became Head of the Statistics Department at Rothamsted.
There was still the need to provide conversions to a wide range of types of computer, but the completion of a PC version for Genstat 5 Release 1.2, in 1988, gave a foretaste of a less diverse future. This was reinforced, in 1996, by the completion of the first Edition of Genstat for Windows. The original Genstat was preserved as the Genstat Server, and a new Client program was included to provide a menu interface and display the output. This was originally in plain text. However, the eighth Edition of Genstat for Windows, in 2005, provided the alternative of a much more attractive rtf output, which could be cut and pasted into other systems, such as MS Word. The Client also provided a powerful and attractive spreadsheet system, to allow data to be entered, displayed and edited. Finally, there was a separate Graphics Viewer, to display the plots and allow them to be zoomed, rotated and edited.
Statistical enhancements were important too, but, with the decline of the Rothamsted Statistics Department, external collaborations became more important. These included non-metric multidimensional scaling, in 1990, in collaboration with Les Underhill of University of Cape Town, South Africa; facilities for selecting and generating experimental designs, in 1993, in collaboration with Mike Franklin of the Scottish Agricultural Statistics Service (now Biostatistics Scotland); REML facilities for modelling correlation structures, in 1997, in collaboration with Robin Thompson at Rothamsted, and Brian Cullis and Arthur Gilmour at New South Wales Department of Primary Industry, UK; hierarchical generalised models, in 2002, in collaboration with John Nelder, then at Imperial College, London, UK, and Youngjo Lee of Seoul National University, Seoul, South Korea; and QTL analysis, in 2009, in collaboration with Fred van Eeuwijk and his colleagues in the Biometrics Group at University of Wageningen, Wageningen, Netherlands. The procedure library was another focus for collaboration, and now contains over 600 procedures from nearly 100 authors spread over four different continents.
The relationship with the Numerical Algorithms Group also flourished, with additional collaboration from 1996-1998 on a NAG-led EU project called STABLE. This aimed to put Genstat's algorithms into the visual programming environment provided by NAG's Iris Explorer system. The resulting software was never commercialised, but the experience proved very useful in the more recent development of the Breeding View system under the Generation Challenge Program. It also paved the way for the formation of VSN International (VSNi), in 2000. VSNi is a spin-off company from Rothamsted and NAG, which brought together the Genstat development group from Rothamsted and the statistical commercialisation group from NAG to provide a stronger collaboration of research and development with sales and marketing. Continuity was provided by the transfer of all the key staff, including Roger Payne as the Chief Science and Technology Officer of the company, to continue his leadership of Genstat development. Rothamsted's research in applied statistics and statistical computing thus continues its influence into the 21st century.