Adaption of a neuropsychological assessment battery from in-person to telephone or videoconference administration

Contributed by Yonah Joffe1, MS, Tamara G. Fong, MD, PhD2,3, and Eva M. Schmitt, PhD3 

1Department of Clinical and Health Psychology, University of Florida, Gainesville, FL, USA
2Department of Neurology, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA, USA
3Aging Brain Center, Marcus Institute for Aging Research Hebrew SeniorLife, Boston, MA, USA

Introduction

In-person neuropsychological testing to measure cognitive performance has been the standard approach in older adults with age-related cognitive disorders such as delirium and dementia. However, during the COVID-19 pandemic, protecting the safety of study participants meant suspending face-to-face visits. Use of neuropsychological testing delivered via telehealth, or remote methods, is not widely done, because of concerns that normative data, derived from standardized test procedures, may not be applicable, and some studies have demonstrated subtle differences in task-performance across in-person and remote testing modes.1

To allow for ongoing follow-up of our study cohort, we adapted the existing in-person neuropsychological assessment battery to remote, contact-free telephone and video conference neuropsychological assessments.2 We aimed to examine the performance of the remote assessments by (1) calibrating the newly adapted remote assessments with data from our existing and ongoing cohorts; and (2) conducting a concurrent cross-validation of remote neuropsychological assessments in a prospective sub-sample.

Methods

Both telephone and video conference neuropsychological assessment procedures were developed and implemented within an ongoing observational cohort study, the Successful Aging following Elective Surgery (SAGES) II (N = 420)3. First, remote neuropsychological assessments were developed by careful selection and adaptation of the existing in-person neuropsychological testing used in SAGES. The goal was to make the telephone and videoconference assessments either identical or as close as possible to the SAGES in-person assessment.

For the videoconference assessments, equipment was delivered to the home of the participant, including a tablet computer mounted on a 360-degree rotating tablet stand such that the participant’s hands could be viewed by videoconference during writing tasks. For the telephone assessments, written tasks were omitted. Latent variable psychometric methods, a strategy that can overcome limitations that may result from differences in task difficulty and scoring across modes, were used to compare the measurement modes.

Second, a nested validation sub-study was conducted in 100 study participants who had completed at least an in-person baseline and 1-month neuropsychological follow-up and were scheduled for established follow-up. To test for measurement equivalence, participants underwent both in-person and remote assessments (in alternating order) within a two week period. Item response theory was used to calibrate data collected by different assessment modes. Measurement equivalence was assessed with Bland–Altman plots and regression analysis.

We looked at differences in two aspects of how test items measured underlying cognition. The first way was in how strongly the test items were correlated with underlying cognition, captured in factor loadings (differences expressed on Cohen’s q metric). Secondly, we looked at differences in test difficulty (how hard it was to get a correct answer) by mode (differences expressed on a Cohen’s d metric). Factor loadings analysis and mean differences analysis give us two different but complementary pieces of information about how well a test measures what it is supposed to measure.

Factor loadings analysis looks at how strongly each test is related to the overall skill or ability being measured (in this case, cognitive ability). A high factor loading means that an item is doing a good job of capturing a person’s true ability level. This helps us understand whether the test structure stays the same across different testing formats. Mean differences analysis looks at whether one testing format (like video or telephone) makes it easier or harder to get a correct answer compared to in-person testing. If a test is easier in one format, people might score higher even if their actual ability hasn’t changed.

These two analyses complement each other because factor loadings tell us whether the test is measuring the same thing across different formats, while mean differences tell us whether the difficulty level changes between formats.

Results

The process of delivery (drop off and pick-up) cleaning, and maintenance of videoconference equipment, as well as training and guidance of study participants, required additional time and effort beyond what was required to perform in-person testing; however, the process was feasible and acceptable to the study participants.

In the factor loadings analysis described above, only small differences were found between in-person and video conference assessments. The largest difference among factor loadings was found with language (Boston Naming Test) and attention tasks (Visual Search and Attention Test), but these effects were small (Cohen’s q = 0.06) and not statistically significant (95% confidence interval on q, -0.06, +0.18).

When the in-person and videoconference assessments were compared for differences in difficulty, the working memory task (Digit Span Backwards test) was less difficult by video with a small-to-moderate effect size (Cohen’s d of -0.28, 95% CI, -0.54, -0.01). The contrast between in-person and telephone assessment was much larger, with telephone assessment being less difficult than in-person (largest shift in item difficulty for digit span backwards, d = -1.12 95% CI -1.35, -0.90), which is considered a large effect size.

Calibrated scores between telephone and videoconference demonstrated good agreement (r = 0.72, 95% CI 0.61, 0.80), and the differences were able to be corrected with latent variable measurement models. For more details, see Reference 2 below.

Conclusions and Future Directions

Neuropsychological assessment done by video-conferencing can be as precise as in-person testing. Calibration of ability estimates using latent variable measurement models can address small measurement differences and generate scores without evidence of systematic bias. However, telephone assessments measured underlying cognitive performance with less precision than either in-person or video assessments.

The adaptation of in-person neuropsychological assessments to a remote mode may hold broader applicability across many settings where in-person assessments may not be possible including circumstances of contact isolation, those who live in inaccessible remote rural settings, specialized units with restricted access or simply feasibility constraints (e.g., patients who move out of state during their follow-up). In these conditions, videoconference assessments could be used in place of in-person assessments and could support individual-level inferences for most participants. Further details of the procedures are described here: Adaptation of In-Person Neuropsychological Assessments for Remote Administration

This procedure might be helpful to future studies when remote adaptation may be needed. If sites have the luxury of time, then all protocols should be validated against in-person assessment prior to implementation. In our study, due to COVID-19 restrictions, this was done retrospectively, but we were able to demonstrate high agreement and validity of our approaches.

References

  1. Marra DE, Hamlet KM, Bauer RM, Bowers D. Validity of teleneuropsychology for older adults in response to COVID-19: a systematic and critical review. Clin Neuropsychol. 2020;34(7–8): 1411-1452. doi: 10.1080/13854046.2020.1769192
  2. Joffe Y, Liu J, Arias F, et al. Adaptation, calibration, and validation of a cognitive assessment battery for telephone and video administration. J Am Geriatr Soc. 2024;1-10. doi:10.1111/jgs.19275
  3. Hshieh TT, Schmitt EM, Fong TG, et al. SAGES II Study Team. Successful aging after elective surgery II: Study design and methods. J Am Geriatr Soc. 2023;71(1):46-61. doi: 10.1111/jgs.18065

Suggested Citation

Joffe, Yonah; Fong, Tamara G.; and Schmitt, Eva M. Adaption of a neuropsychological assessment battery from in-person to telephone or videoconference administration; January 2025, Available at: https://deliriumnetwork.org/blog-adaption-of-a-neuropsychological-assessment-battery/ (accessed today’s date)

Posted in Delirium Research.

Leave a Reply