%
LaTeX source for Galton on Co-relation or Correlation\documentclass{article} \usepackage{amsmath} \usepackage{amssymb} \usepackage{epsfig} \usepackage{longtable} \usepackage{times} \newcommand{\z}{\phantom{0}} \begin{document} \setcounter{page}{1} I. ``Co-relations and their Measurement, chiefly from Anthropometric Data.'' By \textsc{Francis Galton, F.R.S.} Received December 5, 1888. ``Co-relation or correlation of structure'' is a phrase much used in biology, and not least in that branch of it which refers to heredity, and the idea is even more frequently present than the phrase; but 1 am not aware of any previous attempt to define it clearly, to trace its mode of action in detail, or to show how to measure its degree. Two variable organs are said to be co-related when the variation of the one is accompanied on the average by more or less variation of the other, and in the same direction. Thus the length of the arm is said to be correlated with that of the leg, because a person with a long arm has usually a long log, and conversely. If the correlation be close, then a person with a very long arm would usually have a very long leg; if it be moderately close, then the length of the leg would usually be only long, not very long; and if there were no correlation at all then the length of the leg would on the average be mediocre. It is easy to see that correlation must be the consequence of the variations of the two organs being partly due to common causes. If they were wholly due to common causes, the correlation would be perfect, as is approximately the case with the symmetrically disposed parts of the body. If they were in no respect due to common causes, the co-relation would be \textit{nil}. Between these two extremes are an number of intermediate cases, and it will be shown how the closeness of correlation in any particular case admits of being expressed by a simple number. To avoid the possibility of misconception, it is well to point out that the subject in hand has nothing whatever to do with the average proportions between the various limbs, in different races, which have been often discussed from early times up to the present day, both by artists and by anthropologists. The fact that the average ratio between the stature and the cubit is as 100 to 37, or thereabouts, does not give the slightest information about the nearness with which they vary together. It would be an altogether erroneous inference to suppose their average proportion to be maintained so that when the cubit might be expected to be one-twentieth longer than the average cubit, the stature might be expected to be one-twentieth greater than the average stature, and conversely. Such a supposition is easily shown to be contradicted both by fact and theory. The relation between the cubit and the stature will be shown to be such that for every inch, centimetre, or other unit of absolute length that the cubit deviates from the mean length of cubits, cubits, the stature will on the average deviate from the mean length of statures to the amount of 2.5 units, and in the same direction. Conversely, for each unit of deviation of stature, the average deviation of the cubit will he 0.26 unit. These relations are not numerically reciprocal, but the exactness of the co-relation becomes established when we have transmuted the inches or other measurement of the cubit and of the stature into units dependent on their respective scales of variability. We thus cause a long cubit and an equally long stature, as compared to the general run of cubits and statures, to be designated by an identical scale-value. The particular unit that I shall employ is the value of the probable error of any single measure in its own group. In that of the cubit, the probable error is 0.56 inch = 1.42 cm.; in the stature it is 1.75 inch = 4.44 cm. Therefore the measured lengths of the cubit in inches will be transmuted into terms of a new scale in which each unit = 0.56 inch, and the measured lengths of the stature will be transmuted into terms of another new scale in which each unit is 1.75 inch. After this has been done, we shall find the deviation of the cubit as compared to the mean of the corresponding deviations of the stature, to be as 1 to 0.8. Conversely, the deviation of the stature as compared to the mean of the corresponding deviations of the cubit will also be as l to 0.8. Thus the existence of the co-relation is established, and its measure is found to be 0.8. Now as to the evidence of all this. The data were obtained at my anthropometric laboratory at South Kensington. They are of males of 21 years and upwards, but as a large proportion of them were students, and barely 21 years of age, they were not wholly full-grown; but neither that fact nor the small number of observations is prejudicial to the conclusions that will be reached. They were measured in various ways, partly for the purpose of this inquiry. It will be sufficient to give some of them as examples. The exact number of 350 is not preserved throughout, as injury to some limb or other reduced the available number by 1, 2, or 3 in different cases. After marshalling the measures of each limb in the order of their magnitudes, I noted the measures in each series that occupied the positions of the first, second and third quarterly divisions. Calling these measures in any one series Q$_1$, M and Q$_3$, I take M, which is the median or middlemost value, as that whence the deviations are to be measured, and $\frac{1}{2}\{\text{Q}_3-\text{Q}_3\}=\text{Q}$ as the probable error of any single measure in the series. This is practically the same as saying that one-half of the deviations fall within the distance of $\pm\text{Q}$ from the mean value, because the series run with fair symmetry. In this way I obtained the following values of M and Q, in which the second decimal must be taken as only roughly approximate. The M and Q of any particular series may be identified by a suffix, thus M$_c$, Q$_c$ might stand for those of the cubit, and M$_i$, Q$_i$ for those of the stature. \begin{center} Table I. \\ \smallskip \begin{tabular}{|l|rr||rr|} \hline & \multicolumn{2}{|c||}{M} & \multicolumn{2}{|c||}{Q} \\ \hline & Inch. & Cubit. & Inch. & Cubit. \\ \hline Head length & 7.62 & 19.35 & 0.19 & 0.48 \\ Head breadth & 6.00 & 15.24 & 0.18 & 0.46 \\ Stature & 67.20 & 170.69 & 1.75 & 4.44 \\ Left middle finger & 4.54 & 11.53 & 0.15 & 0.38 \\ Left cubit & 18.05 & 45.70 & 0.56 & 1.42 \\ Height of right knee & 20.50 & 52.00 & 0.80 & 2.03 \\ \hline \end{tabular} \end{center} {\footnotesize \textsc{Note.}---The head length is its maximum length measured from the notch between and just below the eyebrows. The cubit is measured from the hand prone and without taking off the coat; it is the distance between the elbow of the bent left arm and the tip of the middle finger. The height of the knee is taken sitting when the knee is bent at right angles, less the measured thickness of the heel of the boot.} Tables were then constructed, each referring to a different pair the above elements, like Tables II and III, which will suffice as examples of the whole of them. It will be understood that the Q value is a universal unit applicable to the most varied measurements, such as breathing capacity, strength, memory, keenness of eyesight, and enables them to be compared together on equal terms notwithstanding their intrinsic diversity. It does not only refer to measures of length, though partly for the sake of compactness, it is only those of length that will be here given as examples. It is unnecessary to extend the limits of Table II, as it includes every line and column in my MS table that contains not less than twenty entries. None of the entries lying within the flanking lines and columns of Table II were used. {\footnotesize \begin{center} Table II. \\ \smallskip \begin{tabular}{|l|c|c|c|c|c|c|c|c|c|} \hline & \multicolumn{8}{|c|}{Length of left cubit in inches, 348 adult males.} & \\ \cline{2-9} \multicolumn{1}{|c|}{Stature in}& & 16.5 & 17.0 & 17.5 & 18.0 & 18.5 & 19.0 & & Total \\ \multicolumn{1}{|c|}{inches.} & Under & and & and & and & and & and & and & 19.5 & cases. \\ &16.5 &under&under&under& under &under&under & and & \\ & &17.0 &17.5 &18.0 &18.5 &19.0 &19.5 &above& \\ \hline 71 and above & .. & .. & .. & \z1 & \z\z3 & \z4 & 15 & 7 & \z30 \\ 70 & .. & .. & .. & \z1 & \z\z5 & 13 & 11 & .. & \z30 \\ 69 & .. & \z1 & \z1 & \z2 & \z25 & 15 & \z6 & .. & \z50 \\ 68 & .. & \z1 & \z3 & \z7 & \z14 & \z7 & \z4 & \z2 & \z48 \\ 67 & .. & \z1 & \z7 & 15 & \z28 & \z8 & \z2 & .. & \z61 \\ 66 & .. & \z1 & \z7 & 18 & \z15 & \z6 & .. & .. & \z48 \\ 65 & .. & \z4 & 10 & 12 & \z\z8 & \z2 & .. & .. & \z36 \\ 64 & .. & \z5 & 11 & \z2 & \z\z3 & .. & .. & .. & \z21 \\ Below 64 & 9 & 12 & 10 & \z3 & \z\z1 & .. & .. & .. & \z34 \\ \hline Totals & 9 & 25 & 49 & 61 & 102 & 55 & 38 & 9 & 348 \\ \hline \end{tabular} \end{center} } \begin{figure} \begin{center} \epsfig{file=galton_corr_fig.eps,width=7cm,height=7cm,clip=} \end{center} \end{figure} The measures were made and recorded to the nearest tenth of an inch. The heading of 70 inches of stature includes all records between 69.5 and 70.4 inches; that of 69 includes all between 68.5 and 69.4, and so on. The values derived from Table II, and from other similar tables, are entered in Table III, where they occupy all the columns up to the three last, the first of which is headed ``smoothed'' These smoothed values were obtained by plotting the observed values, after transmuting them as above described into their respective Q units, upon a diagram such as is shown in the figure. The deviations of the ``subject'' are measured parallel to the axis of $y$ in the figure, and those of the mean of the corresponding values of the ``relative'' are measured parallel to the axis of $x$. When the stature is taken as the subject, the median positions of the corresponding cubits, which are given in the successive lines of Table III, are marked with small circles. When the cubit is the subject, the mean positions of the corresponding statures are marked with crosses. The firm line in the figure is drawn to represent the general run of the small circles and crosses. It is here seen to be a straight line, and it was similarly found to be straight in every other figure drawn from the different pairs of co-related variables that I have as yet tried. But the inclination of the line to the vertical differs considerably in different cases. In the present one the inclination is such that a deviation of 1 on the part of the subject, whether it be stature or cubit, is accompanied by a mean deviation on the part of the relative, whether it be cubit or stature, of 0.8. This decimal fraction is consequently the measure of the closeness of the correlation. We easily retransmute it into inches. If the stature be taken as the subject, then Q$_s$ is associated with Q$_c\times0.8$; that is, a deviation of 1.75 inches in the one with $0.56 \times 0.8$ of the other. This is the same as 1 inch of stature being associated with a mean length of cubit equal to 0.26 inch. Conversely, if the cubit he taken as the subject, then Q$_c$ is associated with Q$_s\times0.8$; that is, a deviation of 0.56 inch in the one with $1.75\times0.8$ of the other. This is the same as 1 inch of cubit being associated with a mean length of 2.5 inches of stature. If centimetre be read for inch the same holds true. Six other tables are now given in a summary form, to show how well calculation on the above principle agrees with observation. {\footnotesize \begin{center} Table IV. \\ \smallskip \begin{longtable}{|c|c|c|c||c|c|c|c|} \hline & & \multicolumn{2}{|c||}{Mean of corresponding} & & & \multicolumn{2}{|c|}{Mean of corresponding} \\ No. & Length & \multicolumn{2}{|c||}{statures.} & No. & & \multicolumn{2}{|c|}{lengths of head.} \\ \cline{3-4} \cline{7-8} of & of & & & of & Height & & \\ cases. & head. & & & cases. & & & \\ & & Observed. & Calculated. & & & Observed. & Calculated. \\ \hline 32 & 7.90 & 68.5 & 68.1 & 26 & 70.5 & 7.72 & 7.75 \\ 41 & 7.80 & 67.2 & 67.8 & 30 & 69.5 & 7.70 & 7.72 \\ 46 & 7.70 & 67.6 & 67.5 & 50 & 68.5 & 7.65 & 7.68 \\ 52 & 7.60 & 66.7 & 67.2 & 49 & 67.5 & 7.65 & 7.64 \\ 58 & 7.50 & 66.8 & 66.8 & 56 & 66.5 & 7.57 & 7.60 \\ 34 & 7.40 & 66.0 & 66.5 & 43 & 65.5 & 7.57 & 7.69 \\ 26 & 7.30 & 66.7 & 66.2 & 31 & 64.5 & 7.54 & 7.65 \\ \hline & & \multicolumn{2}{|c||}{Mean of corresponding} & & Length & \multicolumn{2}{|c|}{Mean of corresponding} \\ No. & & \multicolumn{2}{|c||}{lengths of left} & No. & of left & \multicolumn{2}{|c|}{statures.} \\ of & Height. & \multicolumn{2}{|c||}{middle finger.} & of & middle & \multicolumn{2}{|c|}{\ } \\ \cline{3-4} \cline{7-8} cases. & & & & cases. & finger. & & \\ & & Observed. & Calculated. & & & Observed. & Calculated. \\ \hline 30 & 70.5 & 4.71 & 4.74 & 23 & 4.80 & 70.2 & 69.4 \\ 50 & 69.5 & 4.55 & 4.68 & 49 & 4.70 & 68.1 & 68.5 \\ 37 & 68.5 & 4.57 & 4.62 & 62 & 4.60 & 68.0 & 67.7 \\ 62 & 67.5 & 4.58 & 4.56 & 63 & 4.50 & 67.3 & 66.9 \\ 48 & 66.5 & 4.59 & 4.50 & 57 & 4.40 & 66.0 & 66.1 \\ 37 & 65.5 & 4.47 & 4.44 & 35 & 4.30 & 65.7 & 65.3 \\ 20 & 64.5 & 4.33 & 4.38 & & & & \\ \hline & & \multicolumn{2}{|c||}{Mean of corresponding} & & & \multicolumn{2}{|c|}{Mean of corresponding} \\ No. & Left & \multicolumn{2}{|c||}{lengths of left cubit.} & No. & Length & \multicolumn{2}{|c|}{lengths of left middle} \\ of & middle & & & of & of left & \multicolumn{2}{|c|}{finger.}\\ \cline{3-4} \cline{7-8} cases. & finger. & & & cases. & cubit. & & \\ & & Observed. & Calculated. & & & Observed. & Calculated. \\ \hline 23 & 4.80 & 18.97 & 18.80 & 29 & 19.00 & 4.76 & 4.75 \\ 50 & 4.70 & 18.55 & 18.49 & 32 & 18.70 & 4.64 & 4.69 \\ 62 & 4.60 & 18.24 & 18.18 & 48 & 18.40 & 4.60 & 4.62 \\ 62 & 4.50 & 18.00 & 17.87 & 70 & 18.10 & 4.56 & 4.55 \\ 57 & 4.40 & 17.72 & 17.55 & 37 & 17.80 & 4.49 & 4.48 \\ 34 & 4.30 & 17.27 & 17.24 & 31 & 17.50 & 4.40 & 4.41 \\ & & & & 28 & 17.20 & 4.37 & 4.34 \\ & & & & 24 & 16.90 & 4.32 & 4.28 \\ \hline & & \multicolumn{2}{|c||}{Mean of corresponding} & & & \multicolumn{2}{|c|}{Mean of corresponding} \\ No. & Length & \multicolumn{2}{|c||}{breadths of head.} & No. & Breadth & \multicolumn{2}{|c|}{lengths of head.} \\ \cline{3-4} \cline{7-8} of & of & & & of & of & & \\ cases. & head. & & & cases. & head. & & \\ & & Observed. & Calculated. & & & Observed. & Calculated. \\ \hline 32 & 7.90 & 6.14 & 6.12 & 27 & 6.30 & 7.72 & 7.84 \\ 41 & 7.80 & 6.05 & 6.08 & 36 & 6.20 & 7.72 & 7.75 \\ 46 & 7.70 & 6.14 & 6.04 & 53 & 6.10 & 7.65 & 7.65 \\ 52 & 7.60 & 5.98 & 6.00 & 58 & 6.00 & 7.68 & 7.60 \\ 34 & 7.40 & 5.96 & 5.91 & 37 & 5.80 & 7.55 & 7.50 \\ 26 & 7.30 & 5.85 & 5.87 & 30 & 5.70 & 7.45 & 7.46 \\ \hline & & \multicolumn{2}{|c||}{Mean of corresponding} & & & \multicolumn{2}{|c|}{Mean of corresponding} \\ No. & & \multicolumn{2}{|c||}{heights of knee.} & No. & Height & \multicolumn{2}{|c|}{statures.} \\ \cline{3-4} \cline{7-8} of & Stature. & & & of & of & & \\ cases. & & & & cases. & knee. & & \\ & & Observed. & Calculated. & & & Observed. & Calculated. \\ \hline 30 & 70.0 & 21.7 & 21.7 & 23 & 22.2 & 70.5 & 70.6 \\ 50 & 69.0 & 21.1 & 21.3 & 32 & 21.7 & 69.8 & 69.6 \\ 38 & 68.0 & 20.7 & 20.9 & 50 & 21.2 & 68.7 & 68.6 \\ 61 & 67.0 & 20.5 & 20.5 & 68 & 20.7 & 67.3 & 67.7 \\ 49 & 66.0 & 20.2 & 20.1 & 74 & 20.2 & 66.2 & 66.7 \\ 36 & 65.0 & 19.7 & 19.7 & 41 & 19.7 & 65.5 & 65.7 \\ & & & & 26 & 19.2 & 64.3 & 64.7 \\ \hline & & \multicolumn{2}{|c||}{Mean of corresponding} & & & \multicolumn{2}{|c|}{Mean of corresponding} \\ No. & & \multicolumn{2}{|c||}{heights of knee.} & No. & Height & \multicolumn{2}{|c|}{left cubit.} \\ \cline{3-4} \cline{7-8} of & Left & & & of & of & & \\ cases. & cubit. & & & cases. & knee. & & \\ & & Observed. & Calculated. & & & Observed. & Calculated. \\ \hline 29 & 19.0 & 21.5 & 21.6 & 23 & 22.25 & 18.98 & 18.97 \\ 32 & 18.7 & 21.4 & 21.2 & 30 & 21.75 & 18.68 & 18.70 \\ 48 & 18.4 & 20.8 & 20.9 & 52 & 21.25 & 18.38 & 18.44 \\ 70 & 17.1 & 20.7 & 20.6 & 69 & 20.75 & 18.15 & 18.17 \\ 37 & 17.8 & 20.4 & 20.2 & 70 & 20.25 & 17.75 & 17.90 \\ 31 & 17.5 & 20.0 & 19.9 & 41 & 19.75 & 17.55 & 17.63 \\ 28 & 17.2 & 19.8 & 19.6 & 27 & 19.25 & 17.02 & 17.36 \\ 23 & 16.9 & 19.3 & 19.2 & & & & \\ \hline \end{longtable} \end{center} } From Table IV the deductions given in Table V can be made; but they may be made directly from tables of the form of Table III, whence Table IV was itself derived. \begin{center} \vbox{ Table V. \\ \smallskip \renewcommand{\arraystretch}{1.25} \begin{tabular}{|l|l|c|c|c|c|} \hline & & \multicolumn{2}{|c|}{In units of Q.} & \multicolumn{2}{|c|}{In units of ordinary} \\ & & \multicolumn{2}{|c|}{\ } & \multicolumn{2}{|c|}{measure.} \\ \hline \multicolumn{1}{|c|}{Subject.} & \multicolumn{1}{|c|}{Relative.} & $r.$ & $\sqrt{(1-r^2)}$ & As 1 to & \\ & & & $=f.$ & to & $f.$ \\ \hline Stature & Cubit & 0.8\z & 0.6\z & 0.26 & 0.45 \\ Cubit & Stature & & & 2.5\z & 1.4\z \\ & & & & & \\ Stature & Head length & 0.35 & 0.93 & 0.38 & 1.63 \\ Head length & Stature & & & 3.2\z & 0.17 \\ & & & & & \\ Stature & Middle finger & 0.7\z & 0.72 & 0.06 & 0.10 \\ Middle finger & Stature & & & 8.2\z & 1.26 \\ & & & & & \\ Middle finger & Cubit & 0.85 & 0.61 & 3.13 & 0.34 \\ Cubit & Middle finger & & & 0.21 & 0.09 \\ & & & & & \\ Head length & Head breadth & 0.45 & 0.89 & 0.43 & 0.16 \\ Head breadth & Head length & & & 0.48 & 0.17 \\ & & & & & \\ Stature & Height of knee & 0.9\z & 0.44 & 0.41 & 0.35 \\ Height of knee & Stature & & & 1.20 & 0.77 \\ & & & & & \\ Cubit & Height of knee & 0.8\z & 0.60 & 1.14 & 0.64 \\ Height of knee & Cubit & & & 0.56 & 0.45 \\ \hline \end{tabular} \renewcommand{\arraystretch}{1.00} } \end{center} When the deviations of the subject and those of the mean of the relatives are severally measured in units of their own Q, there is always a regression in the value of the latter. This is precisely analogous to what was observed in kinship, as I showed in my paper read before this Society on ``Hereditary Stature'' (`Roy.\,Soc.\,Proc.,' vol. 40, 1886, p.\ 42). The statures of kinsmen are co-related variables; thus, the stature of the father is correlated to that of the adult son, and the stature of the adult son to that of the father; the stature of the uncle to that of the adult nephew, and the stature of the adult nephew to that of the uncle, and so on; but the index of correlation which is what I there called ``regression,'' is different in the different cases. In dealing with kinships there is usually no need to reduce the measures to units of Q, because the Q values are alike in all the kinsmen, being of the same value as that of the population at large. It however happened that the very first case that I analysed was different in this respect. It was the reciprocal relation between the statures of what I called the ``mid-parent'' and the son. The mid-parent is an ideal progenitor, whose stature is the average of that of the father on the one hand and of that of the mother on the other, after her stature had been transmuted into its male equivalent by the multiplication of the factor of 1.08. The Q of the mid-parental stature was found to be 1.2, that of the population dealt with was 1.7. Again, the mean deviation measured in inches of the statures of the sons was found to be two-thirds of the deviation of the mid-parents, while the mean deviation in inches of the mid-parent was one-third of the deviation of the sons. Here the regression, when calculated in Q units, is in the first case from $\frac{1}{1.2}$ to $\frac{2}{3}\times1.7=1$ to 0.47, and in the second case from $\frac{1}{1.7}$ to $\frac{1}{3}\times\frac{1}{1.2}=1$ to 0.44 which is practically the same. The \textit{rationale} of all this will be found discussed in the paper on ``Hereditary Stature,'' to which reference has already been made, and in the appendix to it by Mr.~J.~D.~Hamilton Dickson. The entries in any table, such as Table II, may be looked upon as the values of the vertical ordinates to a surface of frequency, whose mathematical properties were discussed in the above-mentioned appendix, therefore I need not repeat them here. But there is always room for legitimate doubt whether conclusions based on the strict properties of the ideal law of error would be sufficiently correct to be serviceable in actual cases of correlation between variables that conform only approximately to that law. It is therefore exceedingly desirable to put the theoretical conclusions to frequent test, as has been done with these anthropometric data. The result is that anthropologists may now have much less hesitation than before, in availing themselves of the properties of the law of frequency of error. I have given in Table V a column headed $\sqrt{(1-r^2)}=f$. The meaning of $f$ is explained in the paper on ``Hereditary Stature.'' It is the Q value of the distribution of any system of $x$ values, as $x_1$, $x_2$, $x_3$, \&c., round the mean of all of them, which we may call X. The knowledge of $f$ enables dotted lines to be drawn, as in the figure above, parallel to the line of M values, between which one half of the $x$ observations, for each value of $y$, will be included. This value of $f$ has much anthropological interest of its own, especially in connexion with M.~Bertillon's system of anthropometric identification, to which I will not call attention now. It is not necessary to extend the list of examples to show how to measure the degree in which one variable may be correlated with the combined effect of $n$ other variables, whether these be themselves correlated or not. To do so, we begin by reducing each measure into others, each having the Q of its own system for a unit. We thus obtain a set of values that can be treated exactly in the same way as the measures of a single variable were treated in Tables II and onwards. Neither is it necessary to give examples of a method by which the degree may be measured, in which the variables in a series each member of which is the summed effect of $n$ variables, may be modified by their partial correlation. After transmuting the separate measures as above, and then summing them, we should find the probable error of any one of them to be $\sqrt{n}$ if the variables were perfectly independent, and $n$ if they were rigidly and perfectly co-related. The observed value would be almost always somewhere intermediate between these extremes, and would give that information that is wanted. To conclude, the prominent characteristics of any two correlated variables, so far at least as I have as yet tested them, are four in number. It is supposed that their respective measures have been first transmuted into others of which the unit is in each case equal to the probable error of a since single measure in its own series. Let $y=\text{the}$ deviation of the subject, whichever of the two variables may be taken in that capacity; and let $x_1$, $x_2$, $x_3$, \&c., be the corresponding deviations of the relative, and let the mean of these be X. Then we find: (1) that $y=rX$ for all values of $y$; (2) that $r$ is the same, whichever of the two variables is taken for the subject; (3) that $r$ is always less than 1; (4) that $r$ measures the closeness of correlation. \bigskip \noindent [\textit{Proceedings of the Royal Society of London} \textbf{45} (1888), 135--145.] \end{document} %