Prepdat: an R Package for Preparing Data for Experimental Analysis
Overview
prepdat is an R package that integrates raw data files collected from individual participants (usually from a psychological experiment), enabling the user to go from raw data files, in which each line corresponds to one trial conducted during the experiment, to one finalized table ready for statistical analysis, in which each line corresponds to the averaged performance of each participant according to specified dependent and independent variables. prepdat also includes several other possibilities for the aggregated values such as medians of the dependent variable and trimming procedures for reaction-times according to Van Selst & Jolicoeur (1994).
Installation
A stable release of prepdat is now available on CRAN https://cran.r-project.org/package=prepdat. To install prepdat use:
install.packages("prepdat")
To install the latest version of prepdat (i.e., the development version of next release), install devtools, and then install directly from GitHub by using:
# install devtools install.packages("devtools")
# install prepdat from GitHub devools::install_github("ayalaallon/prepdat")`
Using prepdat
The two major functions you need to know in order to use prepdat are file_merge() and prep().
file_merge()
The file_merge() function concatenates raw data files of individual participants (in which each line corresponds to a single trial in the experiment) to one raw data file that includes all participants. In order for the function to work, all raw data files you wish to merge should be put in one folder containing nothing but the raw data files. In addition, the working directory should be set to that folder. All raw data files should be in the same format (either txt or csv).
prep()
After you merged the raw data files using file_merge(), or any other function (for example using Eprime mergedat), you are ready to continue implementing prepdat by using the prep() function, which is the main function of prepdat.
prep() takes the raw data table created in file_merge() (or by other functions) and creates one finalized table ready for statistical analysis. The finalized table contains for each participant the averaged or aggregated values (e.g., medians) of several possible dependent variables (e.g., reaction-time and accuracy) according to specified independent variables, which can be any combination of within-subject (a.k.a repeated measures) and between-subject independent variables. The possibilities for dependent measures include:
- mdvc: Mean of the dependent variable.
- sdvc: Standard deviation of the dependent variable.
- meddvc: Median of the dependent variable.
- tdvc: Mean/s of the dependent variable after rejecting observations above standard deviation criterion/s you specify.
- ntr: Number of observations of the dependent variable that were rejected for each standard deviation criterion/s.
- ndvc: Number of observations of the dependent variable before rejection.
- ptr: Proportion of observations of the dependent variable that were rejected for each standard deviation criterion/s.
- rminv: Harmonic mean of the dependent variable.
- prt: Percentiles of the dependent variable according to any percentile (default is 0.05, 0.25, 0.75, 0.95).
- mdvd: Mean of a second dependent variable (e.g., accuracy).
- merr: error rate (i.e., suitable when the second dependnet variable is accuracy).
- nrmc: Mean according to non-recursive procedure with moving criterion (Van Selst & Jolicoeur, 1994).
- nnrmc: Number of observations of the dependent variable that were rejected for the non-recursive procedure.
- pnrmc: Proportion of observations of the dependent variable that were rejected for the non-recursive procedure.
- tnrmc: Total number of observations upon which the non-recursive procedure was applied.
- mrmc: Mean according to modified-recursive procedure with moving criterion (Van Selst & Jolicoeur, 1994).
- nmrmc: Number of observations of the dependent variable that were rejected for the modified-recursive procedure.
- pmrmc: Proportion of observations of the dependent variable that were rejected for the modified-recursive procedure.
- tmrmc: Total number of observations upon which the modified-recursive procedure was applied.
- hrmc: Mean according to hybrid-recursive procedure with moving criterion (Van Selst & Jolicoeur, 1994).
- nhrmc: Number of observations of the dependent variable that were rejected for the hybrid-recursive procedure.
- thrmc: Total number of observations upon which the hybrid-recursive procedure was applied.
Example
In the example below, we use prep() to go from one table containing data (after already merging the individuals raw data files) from 15 participants (5400 trials in total) to a finalized table showing all the possibilities for the dependent variable (e.g., means and medians) for each participant according to specified within-subject and between-subject independent variables, including the modified recursive procedure of Van Selst & Jolicoeur (1994).
# Load prepdat
library(prepdat)
# Load the example data that comes with prepdat
data(stroopdata)
# To get an overview of the example data ?stroopdata
# Look at the first few lines of the example data head(stroopdata) subject block age gender order font_size trial_num target_type rt ac 1 5020 1 24 2 1 12 1 1 677 1 2 5020 1 24 2 1 12 2 1 538 1 3 5020 1 24 2 1 12 3 1 507 1 4 5020 1 24 2 1 12 4 1 2818 1 5 5020 1 24 2 1 12 5 1 582 1 6 5020 1 24 2 1 12 6 1 498 1
# Perform prep finalized_data <- prep( dataset = stroopdata # Name of the merged raw data table in case you already loaded it into R. , file_name = NULL # Name of the file that contains the raw data after merging the individual # raw data files. , id = "subject" # Name of the column that contains the variable specifying the case identifier. , within_vars = c("block", "target_type") # Name of column or columns that contain independent # within-subject variables. , between_vars = c("order") # Name of column or columns that contain independent between-subject # variables. , dvc = "rt" # Name of the column that contains the continuous dependent variable (e.g., # reaction-time). , dvd = "ac" # Name of the column that contains the discrete dependent variable (e.g., 0 # and 1 for accuracy measures). , keep_trials = NULL , drop_vars = c() , keep_trials_dvc = "raw_data$rt > 100 & raw_data$rt < 3000 & raw_data$ac == 1" # Keep for # dvc only # trials that # meet these # conditions. , keep_trials_dvd = "raw_data$rt > 100 & raw_data$rt < 3000" # Keep for dvd only trials that # meet these conditions. , id_properties = c() , sd_criterion = c(1, 1.5, 2) # Criterions to reject all observations above standard deviations # specified here and then calculate means. , percentiles = c(0.05, 0.25, 0.75, 0.95) # Percentiles of dvc (any percentile is possible). , outlier_removal = 2 # Perform modified recursive procedure with moving criterion. , keep_trials_outlier = "raw_data$ac == 1" # Keep for outlier removal procedure only trials # that meet this condition. , decimal_places = 4 , notification = TRUE , dm = c() # See ?prep for more details on this argument. , save_results = TRUE # Create a txt file containing the finalized table. , results_name = "results.txt" # Name of the file that contains the finalized table. , save_summary = TRUE # Save a summary txt file with the important parameters of prep(). )
# Look at finalized_data: # The hierarchical order for within_vars was first "block" (which has two levels- "1" and "2", and then # "target_type" (which also has two levels- "1" and "2"). This means that for each of the dependent # measures we will get four columns. For example mdvc1 is the mean for "block" 1 and "target_type" 2, # mdvc2 is the mean for "block" 2 and "target_type" 1 etc. head(finalized_data) subject order mdvc1 mdvc2 mdvc3 mdvc4 sdvc1 sdvc2 sdvc3 5013 5013 2 863.1736 1038.4444 1081.0000 1103.1189 328.2833 214.1703 417.1448 5020 5020 1 706.8741 781.1429 636.8056 712.9437 410.1729 361.9275 304.8082 5021 5021 2 655.0280 742.0294 558.8611 652.5714 161.7873 170.3273 120.8668 5022 5022 1 604.4266 725.2941 580.1944 650.1250 107.9061 153.0384 127.7895 5023 5023 2 747.0979 827.4706 908.6571 962.7183 265.1188 200.0777 347.3918 5024 5024 1 615.9722 793.1714 667.2778 764.1259 124.6003 156.6617 182.2824 sdvc4 meddvc1 meddvc2 meddvc3 meddvc4 t1dvc1 t1dvc2 t1dvc3 t1dvc4 5013 321.4880 758.5 1036.5 1014.0 1037.0 776.8220 1046.7037 1033.0333 1065.1316 5020 328.2770 586.0 701.0 540.0 629.5 595.3409 699.3636 566.5000 628.3538 5021 144.2790 633.0 780.0 540.5 629.5 631.6408 760.3636 535.5172 625.0849 5022 135.0557 594.0 681.5 565.0 635.0 589.2881 691.9565 573.2903 638.7900 5023 243.0594 726.0 834.0 821.0 900.5 724.3952 824.6087 857.9655 923.2973 5024 180.0681 600.0 781.0 629.0 719.0 591.3860 775.7308 618.9677 734.5574 t1.5dvc1 t1.5dvc2 t1.5dvc3 t1.5dvc4 t2dvc1 t2dvc2 t2dvc3 t2dvc4 5013 790.0763 1012.8387 1037.0000 1053.9538 809.4818 1005.5000 1001.1176 1067.4148 5020 595.3409 699.3636 566.5000 626.4351 595.3409 699.3636 566.5000 631.6818 5021 629.5040 748.2069 558.3030 619.6953 635.9926 731.6667 564.0882 630.3759 5022 599.3893 697.6296 569.4706 626.8425 602.2774 725.2941 562.9143 637.9854 5023 718.3630 851.2143 842.6970 914.1520 709.2174 827.2188 864.4118 933.3881 5024 584.9612 755.8750 618.9677 744.6397 590.5597 755.8750 634.5882 750.9568 n1tr1 n1tr2 n1tr3 n1tr4 n1.5tr1 n1.5tr2 n1.5tr3 n1.5tr4 n2tr1 n2tr2 n2tr3 n2tr4 5013 26 9 6 29 13 5 4 13 7 2 2 8 5020 11 2 2 12 11 2 2 11 11 2 2 10 5021 40 12 7 34 18 5 3 12 8 1 2 7 5022 25 11 5 44 12 7 2 17 6 0 1 7 5023 19 11 6 31 8 6 2 17 5 2 1 8 5024 30 9 5 21 15 3 5 7 10 3 2 4 ndvc1 ndvc2 ndvc3 ndvc4 p1tr1 p1tr2 p1tr3 p1tr4 p1.5tr1 p1.5tr2 p1.5tr3 5013 144 36 36 143 0.1806 0.2500 0.1667 0.2028 0.0903 0.1389 0.1111 5020 143 35 36 142 0.0769 0.0571 0.0556 0.0845 0.0769 0.0571 0.0556 5021 143 34 36 140 0.2797 0.3529 0.1944 0.2429 0.1259 0.1471 0.0833 5022 143 34 36 144 0.1748 0.3235 0.1389 0.3056 0.0839 0.2059 0.0556 5023 143 34 35 142 0.1329 0.3235 0.1714 0.2183 0.0559 0.1765 0.0571 5024 144 35 36 143 0.2083 0.2571 0.1389 0.1469 0.1042 0.0857 0.1389 p1.5tr4 p2tr1 p2tr2 p2tr3 p2tr4 rminv1 rminv2 rminv3 rminv4 5013 0.0909 0.0486 0.0556 0.0556 0.0559 777.4543 997.0999 951.4738 1019.3421 5020 0.0775 0.0769 0.0571 0.0556 0.0704 612.0752 709.9542 575.2651 647.6535 5021 0.0857 0.0559 0.0294 0.0556 0.0500 617.4345 700.6980 501.4269 626.3859 5022 0.1181 0.0420 0.0000 0.0278 0.0486 585.7888 693.8455 559.1845 622.7780 5023 0.1197 0.0350 0.0588 0.0286 0.0563 684.5878 772.5444 822.9681 908.1756 5024 0.0490 0.0694 0.0857 0.0556 0.0280 595.9175 767.3401 629.8362 732.2745 p0.05dvc1 p0.05dvc2 p0.05dvc3 p0.05dvc4 p0.25dvc1 p0.25dvc2 p0.25dvc3 p0.25dvc4 5013 538.65 744.25 575.00 704.20 666.0 889.75 858.00 910.00 5020 474.00 532.10 453.50 506.35 515.0 639.00 508.25 575.00 5021 447.00 485.00 456.75 483.90 552.5 594.75 502.00 549.50 5022 497.50 506.55 436.75 461.45 548.5 607.75 528.00 563.75 5023 433.10 482.00 549.40 668.40 641.0 722.25 705.50 793.75 5024 484.15 594.90 495.75 585.20 536.0 703.50 556.00 658.00 p0.75dvc1 p0.75dvc2 p0.75dvc3 p0.75dvc4 p0.95dvc1 p0.95dvc2 p0.95dvc3 p0.95dvc4 5013 958.00 1150.50 1181.75 1245.00 1462.55 1439.50 1779.75 1648.90 5020 684.50 764.00 624.75 701.75 1857.10 1198.10 1035.00 1568.25 5021 735.00 866.50 606.75 699.25 958.70 990.05 743.75 941.20 5022 650.50 833.75 610.00 734.25 744.80 971.05 706.75 888.25 5023 820.00 953.00 1027.00 1095.75 1034.80 1139.70 1405.30 1439.15 5024 659.75 832.50 695.50 837.50 887.20 1120.20 1062.75 1026.80 mdvd1 mdvd2 mdvd3 mdvd4 merr1 merr2 merr3 merr4 mrmc1 mrmc2 5013 1.0000 1.0000 1 0.9931 0.0000 0.0000 0 0.0069 809.4818 1038.4444 5020 1.0000 0.9722 1 0.9861 0.0000 0.0278 0 0.0139 589.3846 699.3636 5021 1.0000 0.9444 1 0.9722 0.0000 0.0556 0 0.0278 655.0280 742.0294 5022 0.9931 0.9444 1 1.0000 0.0069 0.0556 0 0.0000 603.9929 725.2941 5023 1.0000 0.9444 1 0.9861 0.0000 0.0556 0 0.0139 709.2174 827.4706 5024 1.0000 0.9722 1 1.0000 0.0000 0.0278 0 0.0000 608.5211 777.3529 mrmc3 mrmc4 pmrmc1 pmrmc2 pmrmc3 pmrmc4 nmrmc1 nmrmc2 nmrmc3 nmrmc4 5013 1001.1176 1057.5985 4.8611 0.0000 5.5556 4.1958 7 0 2 6 5020 566.5000 626.4351 9.7222 5.7143 5.5556 7.7465 14 2 2 11 5021 571.6571 641.5036 0.0000 0.0000 2.7778 2.1429 0 0 1 3 5022 562.9143 650.1250 2.0979 0.0000 2.7778 0.0000 3 0 1 0 5023 842.6970 955.3121 4.1667 0.0000 8.3333 0.7042 6 0 3 1 5024 611.3438 751.0071 1.3889 2.8571 11.1111 2.0833 2 1 4 3 tmrmc1 tmrmc2 tmrmc3 tmrmc4 5013 144 36 36 143 5020 144 35 36 142 5021 143 34 36 140 5022 143 34 36 144 5023 144 34 36 142 5024 144 35 36 144
References
Grange, J.A. (2015). trimr: An implementation of common response time trimming methods. R Package Version 1.0.1.https://cran.r-project.org/package=trimr
Selst, M. V., & Jolicoeur, P. (1994). A solution to the effect of sample size on outlier elimination. The quarterly journal of experimental psychology, 47 (3), 631-650.