meerva

Analysis of Data with Measurement Error Using a Validation Subsample

v0.2-2 · Oct 27, 2021 · GPL-3

Description

Sometimes data for analysis are obtained using more convenient or less expensive means yielding "surrogate" variables for what could be obtained more accurately, albeit with less convenience; or less conveniently or at more expense yielding "reference" variables, thought of as being measured without error. Analysis of the surrogate variables measured with error generally yields biased estimates when the objective is to make inference about the reference variables. Often it is thought that ignoring the measurement error in surrogate variables only biases effects toward the null hypothesis, but this need not be the case. Measurement errors may bias parameter estimates either toward or away from the null hypothesis. If one has a data set with surrogate variable data from the full sample, and also reference variable data from a randomly selected subsample, then one can assess the bias introduced by measurement error in parameter estimation, and use this information to derive improved estimates based upon all available data. Formulaically these estimates based upon the reference variables from the validation subsample combined with the surrogate variables from the whole sample can be interpreted as starting with the estimate from reference variables in the validation subsample, and "augmenting" this with additional information from the surrogate variables. This suggests the term "augmented" estimate. The meerva package calculates these augmented estimates in the regression setting when there is a randomly selected subsample with both surrogate and reference variables. Measurement errors may be differential or non-differential, in any or all predictors (simultaneously) as well as outcome. The augmented estimates derive, in part, from the multivariate correlation between regression model parameter estimates from the reference variables and the surrogate variables, both from the validation subset. Because the validation subsample is chosen at random any biases imposed by measurement error, whether non-differential or differential, are reflected in this correlation and these correlations can be used to derive estimates for the reference variables using data from the whole sample. The main functions in the package are meerva.fit which calculates estimates for a dataset, and meerva.sim.block which simulates multiple datasets as described by the user, and analyzes these datasets, storing the regression coefficient estimates for inspection. The augmented estimates, as well as how measurement error may arise in practice, is described in more detail by Kremers WK (2021) <arXiv:2106.14063> and is an extension of the works by Chen Y-H, Chen H. (2000) <doi:10.1111/1467-9868.00243>, Chen Y-H. (2002) <doi:10.1111/1467-9868.00324>, Wang X, Wang Q (2015) <doi:10.1016/j.jmva.2015.05.017> and Tong J, Huang J, Chubak J, et al. (2020) <doi:10.1093/jamia/ocz180>.

Downloads

830

Last 30 days

4296th

2.5K

Last 90 days

9.1K

Last year

Trend: -2.6% (30d vs prior 30d)

CRAN Check Status

4 NOTE

10 OK

Show all 14 flavors

Flavor	Status	Time
r-devel-linux-x86_64-debian-clang	NOTE	49.4s
r-devel-linux-x86_64-debian-gcc	NOTE	35.3s
r-devel-linux-x86_64-fedora-clang	NOTE	79.1s
r-devel-linux-x86_64-fedora-gcc	NOTE	87.9s
r-devel-macos-arm64	OK	18s
r-devel-windows-x86_64	OK	73s
r-oldrel-macos-arm64	OK
r-oldrel-macos-x86_64	OK	45s
r-oldrel-windows-x86_64	OK	84s
r-patched-linux-x86_64	OK	45.6s
r-release-linux-x86_64	OK	42.7s
r-release-macos-arm64	OK
r-release-macos-x86_64	OK	56s
r-release-windows-x86_64	OK	69s

Check details (4 non-OK)

NOTE r-devel-linux-x86_64-debian-clang

CRAN incoming feasibility

Maintainer: ‘Walter K Kremers <kremers.walter@mayo.edu>’

The Description field contains
  <arXiv:2106.14063> and is an extension of the works by Chen Y-H, Chen
Please refer to arXiv e-prints via their arXiv DOI <doi:10.48550/arXiv.YYMM.NNNNN>.

NOTE r-devel-linux-x86_64-debian-gcc

CRAN incoming feasibility

Maintainer: ‘Walter K Kremers <kremers.walter@mayo.edu>’

The Description field contains
  <arXiv:2106.14063> and is an extension of the works by Chen Y-H, Chen
Please refer to arXiv e-prints via their arXiv DOI <doi:10.48550/arXiv.YYMM.NNNNN>.

NOTE r-devel-linux-x86_64-fedora-clang

dependencies in R code

Namespace in Imports field not imported from: ‘dplyr’
  All declared Imports should be used.

NOTE r-devel-linux-x86_64-fedora-gcc

dependencies in R code

Namespace in Imports field not imported from: ‘dplyr’
  All declared Imports should be used.

Check History

NOTE 10 OK · 4 NOTE · 0 WARNING · 0 ERROR · 0 FAILURE Mar 10, 2026

NOTE r-devel-linux-x86_64-debian-clang

CRAN incoming feasibility

Maintainer: ‘Walter K Kremers <kremers.walter@mayo.edu>’

The Description field contains
  <arXiv:2106.14063> and is an extension of the works by Chen Y-H, Chen
Please refer to arXiv e-prints via their arXiv DOI <doi:10.48550/arXiv.YYMM.NNNNN>.

NOTE r-devel-linux-x86_64-debian-gcc

CRAN incoming feasibility

Maintainer: ‘Walter K Kremers <kremers.walter@mayo.edu>’

The Description field contains
  <arXiv:2106.14063> and is an extension of the works by Chen Y-H, Chen
Please refer to arXiv e-prints via their arXiv DOI <doi:10.48550/arXiv.YYMM.NNNNN>.

NOTE r-devel-linux-x86_64-fedora-clang

dependencies in R code

Namespace in Imports field not imported from: ‘dplyr’
  All declared Imports should be used.

NOTE r-devel-linux-x86_64-fedora-gcc

dependencies in R code

Namespace in Imports field not imported from: ‘dplyr’
  All declared Imports should be used.

Dependency Network

Version History

new 0.2-2 Mar 10, 2026

updated 0.2-2 ← 0.2-1 diff Oct 26, 2021

updated 0.2-1 ← 0.1-2 diff May 12, 2021

updated 0.1-2 ← 0.1-1 diff Apr 26, 2021

new 0.1-1 Apr 18, 2021

Maintainer

Walter K Kremers

Dependencies

Depends

R (>= 3.4.0)

Imports

survival dplyr tidyr ggplot2 mvtnorm matrixcalc

Suggests

R.rsp

Compilation

No compilation needed

First Published

Apr 18, 2021

RSS Feed

CRAN Checks

View on CRAN →