Lesson 18: Correlation and Agreement

Many biostatistical analyses are conducted to study the relationship between two continuous or ordinal scale variables within a group of patients.

Purposes of these analyses include:

assessing correlation between the two variables, i.e., identifying whether values of one variable tend to be higher (or possibly lower) for higher values of the other variable;
assessing the amount of agreement between the values of the two variables, i.e., comparing alternative ways of measuring or assessing the same response;
assessing the ability of one variable to predict values of the other variable, i.e., formulating predictive models via regression analyses.

This lesson will focus only on correlation and agreement, (issues numbered 1 and 2 listed above).

Objectives

Upon completion of this lesson, you should be able to:

Recognize appropriate use of Pearson correlation, Spearman correlation, Kendall’s tau-b and Cohen’s Kappa statistics.
Use a SAS program to produce confidence intervals for correlation coefficients and interpret the results.
Adapt a SAS program to produce the correlation coefficients, their confidence intervals and Kendall’s tau-b.
Recognize situations that call for the use of a statistic measuring concordance.
Distinguish between a concordance correlation coefficient and a Kappa statistic based on the type of data used for each.
Interpret a concordance correlation coefficient and a Kappa statistic.

18.1 - Pearson Correlation Coefficient

Correlation is a general method of analysis useful when studying possible association between two continuous or ordinal scale variables. Several measures of correlation exist. The appropriate type for a particular situation depends on the distribution and measurement scale of the data. Three measures of correlation are commonly applied in biostatistics and these will be discussed below.

Suppose that we have two variables of interest, denoted as X and Y, and suppose that we have a bivariate sample of size n:

\(\left(X_ , Y_ \right), \left(X_ , Y_ \right), \dots , \left(X_ , Y_ \right)\)

and we define the following statistics:

These statistics above represent the sample mean for X, the sample variance for X, the sample mean for Y, the sample variance for Y, and the sample covariance between X and Y, respectively. These should be very familiar to you.

The sample Pearson correlation coefficient (also called the sample product-moment correlation coefficient) for measuring the association between variables X and Y is given by the following formula:

The sample Pearson correlation coefficient, \(r_

\) , is the point estimate of the population Pearson correlation coefficient

The Pearson correlation coefficient measures the degree of linear relationship between X and Y and \(-1 ≤ r_

≤ +1\), so that \(r_

\) is a "unitless" quantity, i.e., when you construct the correlation coefficient the units of measurement that are used cancel out. A value of +1 reflects perfect positive correlation and a value of -1 reflects perfect negative correlation.

For the Pearson correlation coefficient, we assume that both X and Y are measured on a continuous scale and that each is approximately normally distributed.

The Pearson correlation coefficient is invariant to location and scale transformations. This means that if every \(X_\) is transformed to

\(X_ * = aX_ + b\)

and every \(Y_\) is transformed to

\(Y_ * = cY_ + d\)

where \(a > 0, b, c > 0\), and d are constants, then the correlation between X and Y is the same as the correlation between \(X*\) and \(Y*\).

With SAS, PROC CORR is used to calculate \(r_

\). The output from PROC CORR includes summary statistics for both variables and the computed value of \(r_

\). The output also contains a p-value corresponding to the test of:

\(H_ : \rho_

= 0\) versus \(H_ : \rho_

≠ 0\)

It should be noted that this statistical test generally is not very useful, and the associated p-value, therefore, should not be emphasized. What is more important is to construct a confidence interval.

The sampling distribution for Pearson's \(r_

\) is not normal. In order to attain confidence limits for \(r_

\) based on a standard normal distribution, we transform \(r_

\) using Fisher's Z transformation to get a quantity, \(z_

\), that has an approximate normal distribution. Then we can work with this value. Here is what is involved in the transformation.

Fisher's Z transformation is defined as

\(z_p=\dfraclog_e\left( \dfrac \right) \sim N\left( \zeta_p , sd=\dfrac> \right)\)

We will use this to get the usual confidence interval, so, an approximate \(100(1 - \alpha)\%\) confidence interval for \(\zeta_

\) is given by \([z_> , z_>]\), where

But really what we want is an approximate \(100(1 - \alpha)\%\) confidence interval for \(\rho_

\) is given by \([r_> , r_>]\), where

Again, you do not have to do this by hand. PROC CORR in SAS will do this for you but it is important to have an idea of what is going on.

18.2 - Spearman Correlation Coefficient

The Spearman rank correlation coefficient, \(r_s\), is a nonparametric measure of correlation based on data ranks. It is obtained by ranking the values of the two variables (X and Y) and calculating the Pearson \(r_p\) on the resulting ranks, not the data itself. Again, PROC CORR will do all of these actual calculations for you.

The Spearman rank correlation coefficient has properties similar to those of the Pearson correlation coefficient, although the Spearman rank correlation coefficient quantifies the degree of linear association between the ranks of X and the ranks of Y. Also, \(r_s\) does not estimate a natural population parameter (unlike Pearson's \(r_p\) which estimates \(\rho_p\) ).

An advantage of the Spearman rank correlation coefficient is that the X and Y values can be continuous or ordinal, and approximate normal distributions for X and Y are not required. Similar to the Pearson \(r_p\), Fisher's Z transformation can be applied to the Spearman \(r_s\) to get a statistic, \(z_s\), that has an asymptotic normal distribution for calculating an asymptotic confidence interval. Again, PROC CORR will do this as well.

18.3 - Kendall Tau-b Correlation Coefficient

The Kendall tau-b correlation coefficient, \(\tau_b\), is a nonparametric measure of association based on the number of concordances and discordances in paired observations.

Suppose two observations \(\left(X_i , Y_i \right)\) and \(\left(X_j , Y_j \right)\) are concordant if they are in the same order with respect to each variable. That is, if

They are discordant if they are in the reverse ordering for X and Y, or the values are arranged in opposite directions. That is, if

The two observations are tied if \(X_i = X_j\) and/or \(Y_i = Y_j\) .

The total number of pairs that can be constructed for a sample size of n is

N can be decomposed into these five quantities:

\(N = P + Q + X_0 + Y_0 + (XY)_0\)

where P is the number of concordant pairs, Q is the number of discordant pairs, \(X_0\) is the number of pairs tied only on the X variable, \(Y_0\) is the number of pairs tied only on the Y variable, and \(\left(XY\right)_0\) is the number of pairs tied on both X and Y.

The Kendall tau-b for measuring order association between variables X and Y is given by the following formula:

This value becomes scaled and ranges between -1 and +1. Unlike Spearman it does estimate a population variance as:

The Kendall tau-b has properties similar to the properties of the Spearman \(r_s\). Because the sample estimate, \(t_b\) , does estimate a population parameter, \(t_b\) , many statisticians prefer the Kendall tau-b to the Spearman rank correlation coefficient.

18.4 - Example - Correlation Coefficients

SAS® Example

Provides an IML module for calculating point and interval estimates of the Pearson correlation coefficient and the concordance correlation coefficient

(19.1_correlation.sas): Age and percentage body fat were measured in 18 adults. SAS PROC CORR provides estimates of the Pearson, Spearman, and Kendall correlation coefficients. It also calculates Fisher's Z transformation for the Pearson and Spearman correlation coefficients in order to get 95% confidence intervals.

******************************************************************************* * This program indicates how to construct a bivariate scatterplot with an * * overlay of the least squares regression line. * * * * This program also provides an example for calculating point and * * interval estimates of the Pearson, Spearman, and Kendall correlation * * coefficients. * *******************************************************************************; data bodyfat; input subject age bodyfat_perc; cards; 01 23 9.5 02 23 27.9 03 27 7.8 04 27 17.8 05 39 31.4 06 41 25.9 07 45 27.4 08 49 25.2 09 50 31.1 10 53 34.7 11 53 42.0 12 54 29.1 13 56 32.5 14 57 30.3 15 58 33.0 16 58 33.8 17 60 41.1 18 61 34.5 ; run; proc gplot data=bodyfat; plot bodyfat_perc*age/vaxis=axis1 haxis=axis2 nolegend frame; axis1 label=(a=90 '% Body Fat') minor=none; axis2 label=('Age') minor=none; symbol1 value=star color=black interpol=r; title "Scatterplot"; run; proc corr data=bodyfat Pearson Spearman Kendall Fisher(biasadj=no); var age; with bodyfat_perc; title "Correlation Coefficients"; run;

MODAL_TITLE

The resulting estimates for this example are 0.7921, 0.7539, and 0.5762, respectively for the Pearson, Spearman, and Kendall correlation coefficients. The Kendall tau-b correlation typically is smaller in magnitude than the Pearson and Spearman correlation coefficients.

The 95% confidence intervals are (0.5161, 0.9191) and (0.4429, 0.9029), respectively for the Pearson and Spearman correlation coefficients. Because the Kendall correlation typically is applied to binary or ordinal data, its 95% confidence interval can be calculated via SAS PROC FREQ (this is not shown in the SAS program above).

18.5 - Use and Misuse of Correlation Coefficients

Correlation is a widely-used analysis tool that sometimes is applied inappropriately. Some caveats regarding the use of correlation methods follow.

The correlation methods discussed in this chapter should be used only with independent data; they should not be applied to repeated measures data where the data are not independent. For example, it would not be appropriate to use these measures of correlation to describe the relationship between Week 4 and Week 8 blood pressures in the same patients.
Caution should be used in interpreting results of correlation analysis when large numbers of variables have been examined, resulting in a large number of correlation coefficients.
The correlation of two variables that both have been recorded repeatedly over time can be misleading and spurious. Time trends should be removed from such data before attempting to measure correlation.
To extend correlation results to a given population, the subjects under study must form a representative (i.e., random) sample from that population. The Pearson correlation coefficient can be very sensitive to outlying observations and all correlation coefficients are susceptible to sample selection biases.
Care should be taken when attempting to correlate two variables where one is a part and one represents the total. For example, we would expect to find a positive correlation between height at age ten and adult height because the second quantity "contains" the first quantity.
Correlation should not be used to study the relation between an initial measurement, X, and the change in that measurement over time, Y - X. X will be correlated with Y - X due to the regression to the mean phenomenon.
Small correlation values do not necessarily indicate that two variables are unassociated. For example, Pearson's \(r_p\) will underestimate the association between two variables that show a quadratic relationship. Scatterplots should always be examined.

Correlation does not imply causation. If a strong correlation is observed between two variables A and B, there are several possible explanations:

A influences B
B influences A
A and B are influenced by one or more additional variables
the relationship observed between A and B was a chance error.

18.6 - Concordance Correlation Coefficient for Measuring Agreement

How well do two diagnostic measurements agree? Many times continuous units of measurement are used in the diagnostic test. We may not be interested in correlation or linear relationship between the two measures, but in a measure of agreement.

The concordance correlation coefficient, \(r_c\) , for measuring agreement between continuous variables X and Y (both approximately normally distributed), is calculated as follows:

Similar to the other correlation coefficient, the concordance correlation satisfies \(-1 ≤ r_c ≤ +1\). A value of \(r_c = +1\) corresponds to perfect agreement. A value of \(r_c = - 1\) corresponds to perfect negative agreement, and a value of \(r_c = 0\) corresponds to no agreement. The sample estimate, \(r_c\) , is an estimate of the population concordance correlation coefficient:

Let's look at an example that will help to make this concept clearer.

SAS® Example

SAS PROC FREQ option for constructing Cohen's kappa and weighted kappa statistics

******************************************************************************* * This program indicates how to construct a bivariate scatterplot with an * * overlay of the line of identity. * * * * This program also provides an IML module for calculating point and * * interval estimates of the Pearson correlation coefficient and the * * concordance correlation coefficient. * *******************************************************************************; data dice_baseline; input subject cort_auc1 cort_auc2; cards; 61001 5.28170 5.37851 61002 5.58796 5.33628 61003 6.47607 6.59770 61005 6.36019 6.39746 61007 5.81121 5.82528 61008 6.03036 6.21147 61009 5.84549 6.10434 61014 6.80349 6.90689 61015 6.28977 6.27369 61023 5.88446 5.94352 61024 5.79701 6.04876 61026 5.01302 4.72154 61027 6.48824 6.37891 61028 5.30862 5.53405 61033 5.70905 5.77803 61034 5.98545 5.77613 61035 6.19924 6.20880 61036 6.00639 5.98313 61037 6.27793 6.44342 61038 6.57390 6.62936 61039 5.69639 5.43509 61042 5.96588 6.01282 61044 6.04803 6.11529 61046 6.39423 6.03876 62002 7.44584 7.58421 62003 5.90813 5.87230 62004 6.05483 5.94695 62005 5.65735 5.64983 62006 6.44815 6.44280 62007 6.28611 6.45374 62009 6.40863 6.14994 62012 5.62564 5.58142 62013 6.68375 6.58815 62014 5.76951 5.84802 62015 5.94383 5.95489 62016 5.66024 5.73711 62017 4.77492 4.57465 62018 5.60468 5.43495 62019 6.82819 6.86652 62020 5.18986 5.05725 62021 6.48810 6.59655 62022 6.08867 5.76965 62023 5.91400 5.89672 62024 5.58217 5.57651 62026 6.32857 6.48921 62027 7.67703 7.76541 62028 5.92411 5.66689 62029 6.15313 6.16558 62030 5.20392 5.42481 62032 6.43962 6.46171 62033 6.20661 6.28542 63001 6.04767 5.82528 63002 6.46923 6.51728 63003 5.68370 5.79701 63004 5.11719 5.47363 63005 6.10993 6.13541 63006 4.91744 5.06968 63008 5.35972 5.56605 63010 6.73016 6.76285 63011 5.93700 5.94092 63012 6.07716 5.92548 63014 6.58185 6.52781 63015 5.84317 5.85030 63016 5.98144 6.22389 63017 5.77452 5.81662 63018 5.46142 5.84898 63021 5.44920 5.43688 63022 5.96519 6.01302 63026 6.29258 6.42339 63027 6.86608 6.92806 63028 5.47875 5.72634 63030 6.16190 6.16608 63032 5.66707 5.97114 63033 5.80634 5.63640 63034 6.37256 6.24416 63035 5.65755 5.98070 64001 6.07596 6.06010 64002 6.57898 6.54552 64003 6.60733 6.89724 64004 5.69611 5.82963 64005 6.60331 6.62972 64007 5.89963 5.83195 64008 6.19731 6.07044 64009 5.88875 6.03427 64010 5.64912 5.46135 64011 6.99962 7.10324 64012 6.61282 6.68520 64015 6.60477 6.76468 64016 5.35161 5.55307 64017 6.63249 6.77868 64020 5.37717 5.19760 64023 5.58781 5.87044 64025 5.74499 5.77090 64027 5.92655 6.09265 64028 5.01060 5.17112 64030 7.12400 7.17368 64031 5.79909 5.37603 64032 5.75609 5.85174 64033 6.79275 6.78255 64035 6.02198 6.03915 64036 5.43960 5.82229 64037 5.20163 5.12928 65001 6.18838 6.66593 65002 6.13860 6.26224 65003 6.98807 6.97658 65005 5.54628 5.50547 65006 4.47249 4.59256 65007 5.04034 5.07775 65008 5.42025 5.46227 65009 5.26772 5.41463 65011 6.43019 6.38438 65013 6.56323 6.46607 65014 5.06134 4.70619 65016 6.32666 6.31564 65017 5.90235 6.05890 65018 6.05800 5.99467 65020 5.96388 6.01204 65021 5.57324 5.61324 65022 6.21017 6.15262 65025 6.01934 6.00318 65026 5.52082 5.54575 65027 5.89237 5.67469 65028 6.24592 6.37106 65031 6.31524 6.44334 65032 6.19602 6.29576 65033 6.07305 6.07966 65037 5.60960 5.38492 65038 5.39806 5.18466 65040 5.95464 6.20802 66003 5.96880 5.86128 66004 5.89707 5.69116 66005 6.27067 6.35294 66006 5.39744 5.23236 66010 6.64408 6.64990 66011 6.29430 6.32724 66012 5.22507 5.28754 66013 5.15840 5.05580 66014 5.85385 5.54974 66018 6.49611 5.87795 66019 5.43285 5.50871 66021 6.32416 6.31612 66023 5.77253 5.74469 66024 5.89920 5.95774 ; run; proc means data=dice_baseline noprint; var cort_auc1 cort_auc2; output out=dice_baselinemin min=var1 var2; run; proc means data=dice_baseline noprint; var cort_auc1 cort_auc2; output out=dice_baselinemax max=var1 var2; run; data dice_baselineall; set dice_baselinemin dice_baselinemax; drop _type_ _freq_; if _N_=1 then var1=floor(min(var1,var2)); if _N_=1 then var2=floor(min(var1,var2)); if _N_=2 then var1=ceil(max(var1,var2)); if _N_=2 then var2=ceil(max(var1,var2)); run; data dice_baselineall; set dice_baseline dice_baselineall; run; proc gplot uniform data=dice_baselineall; plot cort_auc1*cort_auc2 var1*var2/overlay vaxis=axis1 haxis=axis2 nolegend frame; axis1 label=(a=90 'Cortisol Every Hour') minor=none; axis2 label=('Cortisol Every Two Hours') minor=none; symbol1 value=star color=black interpol=none; symbol2 value=none color=black interpol=join; title "Concordance Correlation Coefficient"; run; proc iml; ******************************************************************************* * Enter the appropriate SAS data set name in the use statement and enter the * * appropriate variable names in the read statements. * *******************************************************************************; use dice_baseline; read all var into var1; read all var into var2; ******************************************************************************* * The IML module, labeled concorr, starts next. * *******************************************************************************; start concorr; nonmiss=loc(var1#var2^=.); var1=var1[nonmiss]; var2=var2[nonmiss]; free nonmiss; n=nrow(var1); mu1=sum(var1)/n; mu1=round(mu1,0.0001); mu2=sum(var2)/n; mu2=round(mu2,0.0001); sigma11=ssq(var1-mu1)/(n-1); sigma11=round(sigma11,0.0001); sigma22=ssq(var2-mu2)/(n-1); sigma22=round(sigma22,0.0001); sigma12=sum((var1-mu1)#(var2-mu2))/(n-1); sigma12=round(sigma12,0.0001); lshift=(mu1-mu2)/((sigma11#sigma22)##0.25); rho=sigma12/sqrt(sigma11#sigma22); rho=round(rho,0.0001); z=log((1+rho)/(1-rho))/2; se_z=1/sqrt(n-3); t=tinv(0.975,n-3); z_low=z-(se_z#t); z_upp=z+(se_z#t); rho_low=(exp(2#z_low)-1)/(exp(2#z_low)+1); rho_low=round(rho_low,0.0001); rho_upp=(exp(2#z_upp)-1)/(exp(2#z_upp)+1); rho_upp=round(rho_upp,0.0001); crho=(2#sigma12)/((sigma11+sigma22)+((mu1-mu2)##2)); crho=round(crho,0.0001); z=log((1+crho)/(1-crho))/2; if sigma12^=0 then do; t1=((1-(rho##2))#(crho##2))/((1-(crho##2))#(rho##2)); t2=(2#(crho##3)#(1-crho)#(lshift##2))/(rho#((1-(crho##2))##2)); t3=((crho##4)#(lshift##4))/(2#(rho##2)#((1-(crho##2))##2)); se_z=sqrt((t1+t2-t3)/(n-2)); end; else se_z=sqrt(2#sigma11#sigma22)/((sigma11+sigma22+((mu1-mu2)##2))#(n-2)); t=tinv(0.975,n-2); z_low=z-(se_z#t); z_upp=z+(se_z#t); crho_low=(exp(2#z_low)-1)/(exp(2#z_low)+1); crho_low=round(crho_low,0.0001); crho_upp=(exp(2#z_upp)-1)/(exp(2#z_upp)+1); crho_upp=round(crho_upp,0.0001); Results=n//mu1//mu2//sigma11//sigma22//sigma12//rho_low//rho//rho_upp// crho_low//crho//crho_upp; r_name=; print 'The Estimated Correlation and Concordance Correlation (and 95% Confidence Limits)'; print Results [rowname=r_name]; finish concorr; ******************************************************************************* * The IML module, labeled concorr, is finished. * *******************************************************************************; run concorr;

The ACRN DICE trial was discussed earlier in this course. In that trial, participants underwent hourly blood draws between 08:00 PM and 08:00 AM once a week in order to determine the cortisol area-under-the-curve (AUC). The participants hated this! They complained about the sleep disruption every hour when the nurses came by to draw blood, so the ACRN wanted to determine for future studies if the cortisol AUC calculated on measurements every two hours was in good agreement with the cortisol AUC calculated on hourly measurements. The baseline data were used to investigate how well these two measurements agreed. If there is good agreement, the protocol could be changed to take blood every two hours.

Note for this SAS program - Run the program to view the output. This is higher level SAS than you are expected to program yourself in this course, but some of you may find the programming of interest.

The SAS program yielded \(r_c = 0.95\) and a 95% confidence interval = (0.93, 0.96). The ACRN judged this to be excellent agreement, so it will use two-hourly measurements in future studies.

What about binary or ordinal data? Cohen's Kappa Statistic will handle this.

18.7 - Cohen's Kappa Statistic for Measuring Agreement

Cohen's kappa statistic, \(\kappa\) , is a measure of agreement between categorical variables X and Y. For example, kappa can be used to compare the ability of different raters to classify subjects into one of several groups. Kappa also can be used to assess the agreement between alternative methods of categorical assessment when new techniques are under study.

Kappa is calculated from the observed and expected frequencies on the diagonal of a square contingency table. Suppose that there are n subjects on whom X and Y are measured, and suppose that there are g distinct categorical outcomes for both X and Y. Let \(f_\) denote the frequency of the number of subjects with the \(i^\) categorical response for variable X and the \(j^\) categorical response for variable Y.

Then the frequencies can be arranged in the following g × g table:

The observed proportional agreement between X and Y is defined as:

and the expected agreement by chance is:

where \(f_\) is the total for the \(i^\) row and \(f_\) is the total for the \(i^\) column. The kappa statistic is:

Cohen's kappa statistic is an estimate of the population coefficient:

Generally, \(0 ≤ \kappa ≤ 1\), although negative values do occur on occasion. Cohen's kappa is ideally suited for nominal (non-ordinal) categories. Weighted kappa can be calculated for tables with ordinal categories.

SAS Example

(19.3_agreement_Cohen.sas) : Two radiologists rated 85 patients with respect to liver lesions. The ratings were designated on an ordinal scale as:

0 ='Normal' 1 ='Benign' 2 ='Suspected' 3 ='Cancer'

SAS PROC FREQ provides an option for constructing Cohen's kappa and weighted kappa statistics.

******************************************************************************* * This program indicates how to calculate Cohen's kappa statistic for * * evaluating the level of agreement between two variables. * *******************************************************************************; proc format; value raterfmt 0='Normal' 1='Benign' 2='Suspected' 3='Cancer'; run; data radiology; input rater1 rater2 count; format rater1 rater2 raterfmt.; cards; 0 0 21 0 1 12 0 2 0 0 3 0 1 0 4 1 1 17 1 2 1 1 3 0 2 0 3 2 1 9 2 2 15 2 3 2 3 0 0 3 1 0 3 2 0 3 3 1 ; run; proc freq data=radiology; tables rater1*rater2/agree; weight count; test kappa; exact kappa; title "Cohen's Kappa Coefficients"; run;

The weighted kappa coefficient is 0.57 and the asymptotic 95% confidence interval is (0.44, 0.70). This indicates that the amount of agreement between the two radiologists is modest (and not as strong as the researchers had hoped it would be).

Note! Updated programs for examples 19.2 and 19.3 are in the folder for this lesson. Take a look.

18.8 - Summary

In this lesson, among other things, we learned how to:

recognize appropriate use of Pearson correlation, Spearman correlation, Kendall’s tau-b and Cohen’s Kappa statistics.
use a SAS program to produce confidence intervals for correlation coefficients and interpret the results.
adapt a SAS program to produce the correlation coefficients, their confidence intervals and Kendall’s tau-b.
recognize situations that call for the use of a statistic measuring concordance.
distinguish between a concordance correlation coefficient and a Kappa statistic based on the type of data used for each.
interpret a concordance correlation coefficient and a Kappa statistic.

Legend

[1]	Link
↥	Has Tooltip/Popover
Toggleable Visibility