Bringing Sexy (Webcam Eye-tracking) Back into the Lab: Stage 1 Registered Report

Authors

Affiliations

Jason Geller

Boston College

João Veríssimo

University of Lisbon

Julia Droulin

Univetsity of North Carolina - Chapel Hill

Abstract

Webcam-based eye-tracking offers a scalable and accessible alternative to traditional lab-based systems. While recent studies demonstrate that webcam eye-tracking can replicate canonical effects across domains such as language, memory, and decision-making, questions remain about its precision and reliability. In particular, spatial accuracy, temporal resolution, and attrition rates are often poorer than those observed with research-grade systems, raising the possibility that environmental and hardware factors introduce substantial noise. The present registered report directly tests this hypothesis by bringing webcam eye-tracking back into a controlled laboratory setting. In Experiment 1, we examine the effect of webcam quality (high vs. standard) in a single word Visual World Paradigm (VWP) task, testing whether higher-quality webcams yield stronger competition effects, earlier effect onsets, and reduced attrition. In Experiment 2, we assess the impact of head stabilization (chinrest vs. no chinrest) under identical environmental conditions. Together, these studies isolate the causal influence of hardware and movement on webcam eye-tracking data quality. Results will inform a more methodological understanding of webcam-based eye-tracking, clarifying whether its current limitations are intrinsic to the technology or can be mitigated through improved hardware and experimental control.

Keywords

webcam eye-tracking, webcameras, VWP, Lab-based experimentation, competition, spoken word recognition

Online experimentation in the behavioral sciences has advanced considerably since its introduction at the 1996 Society for Computers in Psychology (SCiP) conference in Chicago, IL (Reips, 2021). One methodological domain that has shown particular promise in moving online is eye tracking. Traditionally, eye-tracking studies required controlled laboratory settings equipped with specialized and costly hardware—a process that is both resource- and time-intensive. More recently, however, a growing body of research has shown that eye tracking can be successfully adapted to online environments (e.g., Bogdan et al., 2024; Bramlett & Wiener, 2024; James et al., 2025; Özsoy et al., 2023; Prystauka et al., 2024; Slim et al., 2024; Slim & Hartsuiker, 2023; Van der Cruyssen et al., 2023; Vos et al., 2022; Yang & Krajbich, 2021). By leveraging standard webcameras, researchers can now record eye movements remotely, making it possible to collect data from virtually any location at any time. This shift not only enhances scalability, but also broadens access to more diverse and representative participant samples.

Webcam-based eye tracking has become an increasingly viable and accessible method for behavioral research. Implementation typically requires only a standard computing device (e.g., laptop, desktop, tablet, or smartphone) equipped with a built-in or external webcam. Data are collected through a web browser running dedicated software capable of recording and estimating gaze position in real time. This accessibility has been further enhanced by the integration of webcam-based eye tracking into several established experimental platforms, including Gorilla (Anwyl-Irvine et al., 2019), PsychoPy/PsychoJS (Peirce et al., 2019), jsPsych (Leeuw, 2014), PCIbex (Zehr & Schwarz, 2022), and Labvanced (Kaduk et al., 2023).

To reliably estimate where users are looking, webcam-based eye tracking typically relies on appearance-based methods, which infer gaze direction directly from visual features of the eye region (e.g., pupil and iris appearance) (Cheng et al., 2024; Saxena et al., 2024). Recent work has extended these methods using deep learning to learn gaze–appearance mappings directly from data (e.g., Kaduk et al., 2023; Saxena et al., 2024). This contrasts with research-grade eye trackers, which use model-based algorithms combining infrared illumination with geometric modeling of the pupil and corneal reflections (Cheng et al., 2024).

The most widely used library for webcam eye tracking is WebGazer.js (Papoutsaki et al., 2016; Patterson et al., 2025). WebGazer.js is an open-source JavaScript library that performs real-time gaze estimation using standard webcams. It is an appearance-based method that leverages computer vision techniques to detect the face and eyes, extract image features, and map these features onto known screen coordinates during a brief calibration procedure. Once trained, gaze locations on the screen are estimated via ridge regression (Papoutsaki et al., 2016).

Although webcam eye-tracking is still relatively new, validation efforts are steadily accumulating and the results are encouraging. Researchers have successfully applied webcam-based methods to domains such as language (e.g., Bramlett & Wiener, 2025; Geller et al., 2025; Prystauka et al., 2024), judgment and decision-making (e.g., Yang & Krajbich, 2021), and memory (e.g., James et al., 2025) Overall, these studies replicate canonical effects and show strong convergence with findings from traditional lab-based eye-tracking systems.

However, there are several limitations associated with web-based eye tracking. First, effect sizes are typically smaller than those observed in lab-based studies (Bogdan et al., 2024; Degen et al., 2021; Kandel & Snedeker, 2024; Slim et al., 2024; Slim & Hartsuiker, 2023; Van der Cruyssen et al., 2023), which generally necessitates larger sample sizes to achieve comparable statistical power. Second, relative to research-grade eye trackers, both spatial accuracy/precision and temporal resolution tend to be lower. Spatial accuracy refers to the extent to which measured gaze positions deviate from the true gaze point, whereas precision reflects the consistency of those measurements over time (Carter & Luke, 2020). In webcam-based eye tracking, spatial accuracy and precision often exceed 1° of visual angle (Semmelmann & Weigelt, 2018). Regarding temporal resolution, sampling rates are typically more variable, with most webcameras rarely exceeding 30 Hz . Consequently, detectable effects tend to span a relatively broad temporal range of approximately 50–1000 ms (Geller et al., 2025; Semmelmann & Weigelt, 2018; Slim et al., 2024; Slim & Hartsuiker, 2023). These constraints make webcam eye tracking less suitable for studies that require fine-grained spatial or temporal fidelity—for example, paradigms involving many or small areas of interest (AOIs) (James et al., 2025) or tasks requiring millisecond-level temporal precision (Slim et al., 2024). Lastly, webcam-based studies tend to exhibit higher attrition rates. For instance, Patterson et al. (2025) reported an average attrition rate of approximately 13% across studies, with substantial variability across individual experiments (see also Geller et al., 2025; Prystauka et al., 2024).

An open question is whether the limitations of web-based eye-tracking primarily stem from the WebGazer.js algorithm itself or from environmental and hardware constraints—and, crucially, whether future improvements can mitigate these issues. On the algorithmic side, recent work (e.g., James et al., 2025; also see Yang & Krajbich, 2021) demonstrated that modifying WebGazer.js so that the sampling rate is polled consistently and timestamps are aligned to data acquisition (rather than completion) markedly improves temporal resolution. Implementing these changes within online experiment platforms such as Gorilla and jsPsych has brought webcam-based eye-tracking closer to the timing fidelity achieved in laboratory settings. For example, using the Gorilla platform, Prystauka et al. (2024) reported a 50 ms timing difference, while Geller et al. (2025) observed a 100 ms difference between lab-based and online effects.

To our knowledge, no study has directly tested how environmental and hardware constraints impact webcam-based eye-tracking data. Slim & Hartsuiker (2023) provided some evidence suggesting that hardware quality may underlie some of these limitations, reporting a positive correlation between webcam sample rate and calibration accuracy. Similarly, Geller et al. (2025) found that participants who failed calibration more often reported using standard-quality built-in webcams and working in suboptimal environments (e.g., natural lighting). Together, these findings suggest that both hardware and environmental factors may contribute to the increased noise commonly observed in online eye-tracking data.

Proposed Research

To address environmental and technical sources of noise in webcam eye-tracking, we plan to bring participants into the lab to complete a Gorilla-hosted webcam task under standardized conditions. We plan to manipulate two factors across two experiments. Experiment 1 will vary webcam quality (high- vs. standard-quality external cameras). Experiment 2 varies head stabilization (with vs. without a chinrest). All sessions will be conducted under standardized conditions: identical ambient lighting, fixed viewing distance, the same display/computer model, and controlled network settings. This design allows us to isolate the causal effects of hardware and movement on data quality. Our key questions are whether higher-quality webcams and reduced head movement decrease noise, thereby (a) increasing effect sizes (higher proportion of looks), (b) yielding earlier onsets of established effects, and (c) reducing calibration failures/attrition rate. As noted above

To examine these factors, we replicate a paradigm widely used in psycholinguistics—the Visual World Paradigm (VWP) (Cooper, 1974; Tanenhaus et al., 1995). The VWP has been successfully adapted for webcam-based eye-tracking (Bramlett & Wiener, 2024, 2025; Geller et al., 2025; Prystauka et al., 2024). While there are variations in implementation (see Huettig et al., 2011), in the version most relevant to the present study, each trial presents four images positioned in the four screen quadrants while a spoken word is played. Participants then select the picture that matches the utterance.

In paradigms of this kind, item sets are typically designed so that the display contains a target (e.g., carrot), a cohort competitor (e.g., carriage), a rhyme competitor (e.g., parrot), and an unrelated distractor (e.g., tadpole). This configuration allows researchers to examine the dynamics of lexical competition—for example, how phonologically similar words like carriage (cohort effect) or parrot (the rhyme effect) affect online speech processing. Notably, such competition effects have also been replicated in webcam-based VWP studies (e.g., Geller et al., 2025; Slim et al., 2024), highlighting this paradigm as a particularly strong test case for the present investigation.

Experiment 1: High Quality Webcam vs. Standard-quality Webcam

Both Slim & Hartsuiker (2023) and Geller et al. (2025) observed a clear relationship between webcam quality and calibration accuracy in webcam-based eye-tracking. Building on these findings, Experiment 1 tests how webcam quality influences competition effects in a single-word VWP. Specifically, we ask whether a higher-quality webcam yields (a) a greater proportion of looks to relevant interest areas (i.e., stronger detectability of competition), (b) an earlier emergence of these effects over time, and (c) lower data attrition rates relative to a lower-quality webcam.

To address this, participants will complete the same VWP task using one of two webcam types: a high-quality external webcam (e.g., Logitech Brio) and a standard external webcam designed to emulate a typical built-in laptop camera (e.g., Logitech C270). The high-quality webcam offers higher resolution, a higher sampling rate, and greater frame-rate stability, and more consistent illumination handling—factors expected to enhance gaze precision and tracking reliability. In contrast, more standard webcams, while representative of most participants’ home setups, typically provide lower frame rates and exhibit greater variability under different lighting conditions. Comparing these two setups enables a direct assessment of how hardware quality constrains the strength, timing, and reliability of linguistic competition effects in webcam-based eye-tracking.

Hypotheses

We hypothesize several effects related to competition, onset, and attrition.

Competition Effects

(H1a) Participants will show a competition effect, with more looks directed toward cohort competitors than unrelated distractors. (H1b) Webcam quality (high vs. standard) will influence the overall proportion of looks, with higher-quality webcams detecting a greater number of looks. (H1c) There will be an interaction between webcam quality and competition, such that the magnitude of the competition effect will be larger in the high-quality webcam condition than in the standard-quality condition.

Onset Effects

(H2a) Looks to cohort competitors will emerge earlier than looks to unrelated distractors. (H2b) The onset of looks will occur earlier in the high-quality webcam condition than in the standard-quality condition. (H2c) Consequently, the competition effect will emerge sooner in the high-quality webcam condition compared to the standard-quality condition.

Attrition

(H3) Attrition rates will be lower in the high-quality webcam condition than in the standard-quality webcam condition.

Method

All stimuli (audio and images), code, and data (raw and summary) will be placed on OSF at this link: https://osf.io/cf6xr/overview. The entire experiment will be stored on Gorilla’s open materials with a link to preview the tasks. In addition, the code and manuscript will be fully reproducible using Quarto and the package manager nix (Dolstra & contributors, 2023) in combination with the R package {rix} (Rodrigues & Baumann, 2025) . Together, nix and {rix} enable reproducible computational environments at both the system and package levels. This manuscript and all of the necessary files to reproduce it can be found on GitHub: https://github.com/jgeller112/Webcam2Lab-VWP.

Sampling Goal

We conducted an a priori power analysis via Monte Carlo simulation in R. Data from 21 participants, collected online using the Gorilla experimental platform during the development of the webgazeR package and employing the same stimuli and VVWP design, were used to seed the simulations. In these data, we observed a cohort effect of approximately 3%. Using this value as our seed, we collapsed the data across time bins to compute binomial counts per trial and fit a binomial generalized linear mixed model (GLMM) to obtain fixed-effect estimates. We then augmented the dataset by adding a between-subjects factor for webcam quality, with participants evenly assigned to high- and standard-quality groups. In the high-quality webcam group, we modeled both a higher overall fixation rate and a larger cohort effect, whereas in the standard-quality group the cohort effect was halved relative to the high-quality group. Simulated datasets were generated under this model, and the planned GLMM—including a condition × webcam interaction—was refit to each simulated dataset (N= 5000). Power was estimated as the proportion of simulations in which the interaction term exceeded |z| = 1.96. The analysis script to run this power analysis is located here: https://osf.io/4trmn/files/a46g8. Results indicated that a total of 35 participants per group (N = 70) would provide approximately 90% power to detect the hypothesized reduction in the cohort effect and overall fixation rate under standard-quality webcam conditions. We will therefore recruit participants until we have 35 in each group (N = 70 total). We will run our study until we have 70 usable participants (35 in each group). For the calibration analysis (see below), all participants who enter the study will be included.

Materials

VWP

Picture Stimuli

Stimuli were adapted from Colby & McMurray (2023) . Each set comprised four images: a target, an onset (cohort) competitor, a rhyme competitor, and an unrelated item (e.g., rocket, rocker, pocket, bubble). For the webcam study, we used 30 sets (15 monosyllabic, 15 bisyllabic).

Within each set, only the target and its onset competitor served as auditory targets once each, yielding two trial types: TCRU (target–cohort–rhyme–unrelated) and TCUU (target–cohort–unrelated–unrelated). This resulted in 60 trials total (30 sets × 2 targets per set). A MATLAB script generated a unique randomized list per participant, pseudo-randomizing display positions so that each image type was approximately equally likely to appear in any quadrant across subjects.

All 120 images were from a commercial clipart database that were selected by a small focus group of students and edited to have a cohesive style using a standard lab protocol (McMurray et al., 2010). Images were all scaled to 300 × 300 pixels.

Auditory Stimuli

Auditory stimuli were recorded by a female monolingual speaker of English in a sound-attenuated room sampled at 44.1 kHz. Auditory tokens were edited to reduce noise and remove clicks. They were then amplitude normalized to 70 dB SPL. . All .wav files were converted to .mp3 for online data collection.

Webcams

To manipulate recording quality, two webcams will be used. In the high-quality condition we will use a Logitech Brio webcam that records in 4K resolution (up to 4096 × 2160 px) with a 90° field of view, providing high-fidelity video. In the standard-quality condition we will use a Logitech C270 HD webcam will record in 720 p resolution, producing video comparable to that of a typical laptop webcam, thereby simulating lower-quality online recordings.

Both webcams will be mounted in a fixed position above the monitor to maintain consistent framing across participants. Lighting will be standardized to ensure uniform image quality across all sessions.

Experimental Setup and Procedure

All tasks will be completed in a single session lasting approximately 30 minutes. The experiment will be programmed and administered in Gorilla (Anwyl-Irvine et al., 2019). Participants will be brought into a room in the Human Neuroscience Lab at Boston College and seated in front of a 23-inch Dell U2312HM monitor (1920 × 1080 px) approximately 65 cm from the screen. Audiotry information will be presented over Sony ZX110 headphones to ensure consistent audio presentation and minimize background noise. The experimental tasks will be fixed and presented in this order: informed consent, single word VWP, and a demographic questionnaire. The entire experiment can be viewed on Gorilla at this link:.

Before the main task, an instructional video will demonstrate the calibration procedure. Calibration will occur twice—once at the start and again after 30 trials—with up to three attempts allowed each time. In each calibration phase, participants will view nine calibration targets and five validation points, looking directly at each target as instructed. Participants will then complete four practice trials to familiarize themselves with the task. Each trial begins with a 500 ms central fixation cross, followed by a preview display of four images located in the screen’s corners. After 1500 ms, a start button appears at the center; participants click it to confirm fixation before hearing the spoken word. The images remain visible throughout the trial, and participants indicate their response by clicking the image corresponding to the spoken target. A response deadline of 5 seconds will be used. Eye movements are recorded continuously during each . Following the main VWP task, participants will complete a brief demographic questionnaire, after which they will be thanked for their participation.

Data Preprocessing and Exclusions

We will follow guidelines outlined in Geller et al. (2025). At the participant level, individuals with overall task accuracy below 80% will be excluded. At the trial level, only correct-response trials (accuracy = 1) will be retained. Reaction times (RTs) outside ±2.5 SD of the participant-level distribution (computed within condition) will be discarded.

For eye-tracking preprocessing we will use the {webgazeR} package in R that contains helper functions to preprocess webcam eye-tracking data. All webcam eye-tracking files and behavioral data will be merged. Data quality will be screened via sampling-rate checks with very low-frequency recordings (e.g., < 5 Hz) by-participant and by-trial excluded (Bramlett & Wiener, 2025; Vos et al., 2022). We will quantify out-of-bounds (OOB) samples—gaze points outside the normalized screen (1,1)—and remove participants and trials with excessive OOB data ( > 30%). OOB samples will be discarded prior to analysis. In addition, Gorilla provides calibration/quality metrics (“convergence” and “confidence,” both 0–1); trials with convergence < 0.5 or confidence > 0.5 will be excluded.

Areas of Interest (AOIs) will be defined in normalized coordinates as the four screen quadrants, and gaze samples will be assigned to AOIs. To create a uniform time base, data will be resampled into 100-ms bins. Trial time will be aligned to the actual stimulus onset by taking the audio onset metric provided by Gorilla. We then subtract 200 ms to approximate saccade programming and execution latency (Viviani, 1990), and an additional 100 ms due to silence prefixed to the audio recording.

For analysis, within each participant × trial × time bin we will compute, for each AOI, the number of valid gaze samples in that AOI (“successes”) and the total number of valid samples in the bin (“trials”). These binomial counts (or their proportions) will serve as inputs to the statistical models and summaries; subject- and condition-level aggregates will be obtained by averaging across trials for descriptive plots.

Analysis Plan

Competition and Onset Effects

To analyze overall competition effects and onset latency, we will use generalized additive mixed models (GAMMs; Wood, 2017). GAMMs extends the generalized linear modeling framework by modeling effects that are expected to vary nonlinearly over time–a common feature in the VWP (Brown-Schmidt et al., 2025; Ito & Knoeferle, 2022; Mitterer, 2025; Veríssimo & Lago, 2025). These models capture nonlinear effects by fitting smoothing splines—or “wiggles”—to the data using data-driven, machine-learning-based methods. This approach reduces the risk of over-fitting and eliminates the need to use polynomial terms, as required in traditional growth curve models (Mirman, n.d.). Importantly, GAMMs also allow researchers to account for autocorrelation in time-series data, which is especially critical in gaze analyses where successive samples are not independent. By modeling the autocorrelation structure, GAMMs provide more accurate estimates of temporal effects and prevent inflation of Type I error rates. In addition to this, fitting GAMMs allow us to estimate the onset of the competition effect in each condition (see Veríssimo & Lago, 2025).

Gaze samples will be analyzed with a binomial (logistic) GAMM using the bam() function from the {mgcv} package (Wood, 2017). For visualization, we will employ functions from the {tidygam} package (Coretta, 2024), and use the {onsets} package (Veríssimo & Lago, 2025) to examine onset latencies. The dependent variable consisted of gaze counts to cohort and unrelated pictures, for each participant and in each 100 ms time bin. All analyses were conducted on a window ranging from -100 ms from target onset to 1200 ms.

We will fit a model including parametric terms for webcam type (effects-coded: high = 0.5, standard = –0.5), item type (effects-coded: cohort = 0.5, unrelated = –0.5), and their interaction, capturing the overall (time-independent) effects. To examine how webcam type moderates the cohort effect over time, these two factors will also be combined into a single four-level factor. ¹Nonlinear, time-dependent effects will be modeled using factor smooths for time and for time-by-condition interactions, with condition treated as a categorical variable. To account for individual differences, we include random smooths by participant and random smooths for time by participant for each level of condition. This model specification allows the model to capture (a) overall differences in fixation proportions between conditions (via the parametric terms), (b) dynamic, time-varying trajectories unique to each condition (via the smooth terms), and (c) participant-specific deviations from these group-level patterns (via the random smooths). While it is common to specify maximal models with random effects models (Barr et al., 2013), these can be costly when fitting GAMMs (Veríssimo & Lago, 2025). The current model specifications follows Veríssimo & Lago (2025). All effects will be judged as statistically significant if the p value is < .05.

Debating Bayesian models

To account for autocorrelation in the residuals, we will first fit the model without an autoregressive term in order to estimate the autocorrelation parameter (ρ). We will then re-fit the model including a first-order autoregressive process (AR(1)) to properly model temporal dependencies. Although using larger time bins can reduce autocorrelation, it does not eliminate it entirely, so explicitly modeling residual autocorrelation ensures valid statistical inference. Template code to fit the models is included below.

Show R code

# quick rho estimate (fit once without AR to get residual ACF ~ lag1)

# combine levels of both factors into one factor 
dat$cond4 <- interaction(dat$condition, dat$webcam, drop = TRUE)

m0   <- bam(cbind(fix, fail) ~ 1 + cond_c*cam_c +
  s(time, k = 10) +
  s(time, by = cond4 k = 10) +      # condition-specific curves interaction
  s(participant, bs = "re") +        # random intercepts
  s(time, participant, by=cond4,  bs = "re"),  # random fucntional smooths for time/subject by cond
  family = binomial(), method = "fREML",
  discrete = TRUE, data = dat, na.action = na.omit, select = TRUE)

rho  <- acf(residuals(m0, type = "pearson"), plot = FALSE)$acf[2]

# final model with AR(1) to handle within-series autocorrelation

m1 <- bam(cbind(fix, fail) ~ 1 + cond_c*cam_c +
              s(time, k = 10) +
              s(time, by = cond4, k = 10) +      # condition-specific curves interaction
              s(participant, bs = "re") +        # random intercepts
              s(time, participant, by=cond4,  bs = "re"),  # subject smooths
                                 family = binomial(), method = "fREML",
            discrete = TRUE, data = dat, na.action = na.omit, select = TRUE)

summary(m1)

Show R code

# Obain onsets in each condition (and their differences)

onsets_comp <- get_onsets(model = m1,               # Fitted GAMM
                       time_var = "time",           # Name of time variabl
                 by_var = "conditionfactor",        # Name of condition/group variable
                          difference = T,              # Obtain differences between onsets
                          n_samples = 10000,           # Large number of samples (less variable results)
                          seed = 1,                    # Random seed for reproducibility
                          silent = T))                 # Cleaner output in documentation

Calibration

To examine whether webcam affects calibration rejection, we will fit a logistic regression model using the glm function and the code blow.

Show R code

glm(calibration ~ webcam, family = binomial(link = "logit"))

Experiment 2: Head stabilization (chin rest) vs. no head stabilization

In Experiment 2, we will use the same standard-quality webcam as in Experiment 1, but will manipulate head stability by comparing a chin-rest condition to a no–chin-rest condition. Some online platforms [e.g., Labvanced; Finger et al. (2017)] mitigate head motion by warning participants when they move outside a predefined region; however, it remains unclear how such motion control interacts with WebGazer.js estimates of event detection and onset latency. We therefore test the following hypotheses regarding competition, onset, and attrition.

Hypotheses

We hypothesize several effects concerning competition, onset, and attrition. Participants are expected to show a competition effect, with more looks directed toward cohort competitors than to unrelated distractors. The use of a chin rest is predicted to influence the overall proportion of looks. The competition effect is predicted to be larger when participants use a chin rest than when they do not. We also expect looks to cohort competitors to emerge earlier than looks to unrelated distractors, with overall gaze onsets occurring sooner in the chin-rest condition. Finally, we anticipate that attrition rates will be lower in the chin-rest condition compared to the no–chin-rest condition

Sampling goal, materials , procedure

The sampling goal, materials, and procedure are the same as Experiment 1. The difference is whether participants use a chin rest or not.

References

Anwyl-Irvine, A. L., Massonnié, J., Flitton, A., Kirkham, N., & Evershed, J. K. (2019). Gorilla in our midst: An online behavioral experiment builder. Behavior Research Methods, 52(1), 388–407. https://doi.org/10.3758/s13428-019-01237-x

Barr, D. J., Levy, R., Scheepers, C., & Tily, H. J. (2013). Random effects structure for confirmatory hypothesis testing: Keep it maximal. Journal of Memory and Language, 68(3), 255–278. https://doi.org/10.1016/j.jml.2012.11.001

Bogdan, P. C., Dolcos, S., Buetti, S., Lleras, A., & Dolcos, F. (2024). Investigating the suitability of online eye tracking for psychological research: Evidence from comparisons with in-person data using emotionattention interaction tasks. Behavior Research Methods, 56(3), 2213–2226. https://doi.org/10.3758/s13428-023-02143-z

Bramlett, A. A., & Wiener, S. (2024). The art of wrangling. Linguistic Approaches to Bilingualism. https://doi.org/https://doi.org/10.1075/lab.23071.bra

Bramlett, A. A., & Wiener, S. (2025). The art of wrangling: Working with web-based visual world paradigm eye-tracking data in language research. Linguistic Approaches to Bilingualism, 15(4), 538–570. https://doi.org/10.1075/lab.23071.bra

Brown-Schmidt, S., Cho, S.-J., Fenn, K. M., & Trude, A. M. (2025). Modeling spatio-temporal patterns in intensive binary time series eye-tracking data using Generalized Additive Mixed Models. Brain Research, 1854, 149511. https://doi.org/10.1016/j.brainres.2025.149511

Carter, B. T., & Luke, S. G. (2020). Best practices in eye tracking research. International Journal of Psychophysiology, 155, 49–62. https://doi.org/10.1016/j.ijpsycho.2020.05.010

Cheng, Y., Wang, H., Bao, Y., & Lu, F. (2024). Appearance-based Gaze Estimation With Deep Learning: A Review and Benchmark (arXiv:2104.12668). https://doi.org/10.48550/arXiv.2104.12668

Colby, S. E., & McMurray, B. (2023). Efficiency of spoken word recognition slows across the adult lifespan. Cognition, 240, 105588. https://doi.org/10.1016/j.cognition.2023.105588

Cooper, R. M. (1974). The control of eye fixation by the meaning of spoken language: A new methodology for the real-time investigation of speech perception, memory, and language processing. Cognitive Psychology, 6(1), 84–107. https://doi.org/10.1016/0010-0285(74)90005-X

Coretta, S. (2024). Tidygam: Tidy prediction and plotting of generalised additive models. https://github.com/stefanocoretta/tidygam

Coretta, S., & Casillas, J. V. (2024). A tutorial on generalised additive mixed effects models for bilingualism research. Linguistic Approaches to Bilingualism. https://doi.org/10.1075/lab.23076.cor

Degen, J., Kursat, L., & Leigh, D. D. (2021). Seeing is believing: Testing an explicit linking assumption for visual world eye-tracking in psycholinguistics. Proceedings of the Annual Meeting of the Cognitive Science Society, 43.

Dolstra, E., & contributors, T. N. (2023). Nix (Version 2.15.3) [Computer software]. https://nixos.org/

Finger, H., Goeke, C., Diekamp, D., Standvoß, K., & König, P. (2017). LabVanced: A unified JavaScript framework for online studies. International Conference on Computational Social Science (IC2S2).

Geller, J., Prystauka, Y., Colby, S. E., & Drouin, J. R. (2025). Language without borders: A step-by-step guide to analyzing webcam eye-tracking data for L2 research. Research Methods in Applied Linguistics, 4(3), 100226. https://doi.org/10.1016/j.rmal.2025.100226

Huettig, F., Rommers, J., & Meyer, A. S. (2011). Using the visual world paradigm to study language processing: a review and critical evaluation. Acta Psychologica, 137(2), 151–171. https://doi.org/10.1016/j.actpsy.2010.11.003

Ito, A., & Knoeferle, P. (2022). Analysing data from the psycholinguistic visual-world paradigm: Comparison of different analysis methods. Behavior Research Methods, 55(7), 3461–3493. https://doi.org/10.3758/s13428-022-01969-3

James, A. N., Ryskin, R., Hartshorne, J. K., Backs, H., Bala, N., Barcenas-Meade, L., Bhattarai, S., Charles, T., Copoulos, G., Coss, C., Eisert, A., Furuhashi, E., Ginell, K., Guttman-McCabe, A., Harrison, E. (Chaz)., Hoban, L., Hwang, W. A., Iannetta, C., Koenig, K. M., … de Leeuw, J. R. (2025). What Paradigms Can Webcam Eye-Tracking Be Used For? Attempted Replications of Five Cognitive Science Experiments. Collabra: Psychology, 11(1), 140755. https://doi.org/10.1525/collabra.140755

Kaduk, T., Goeke, C., Finger, H., & König, P. (2023). Webcam eye tracking close to laboratory standards: Comparing a new webcam-based system and the EyeLink 1000. Behavior Research Methods, 56(5), 5002–5022. https://doi.org/10.3758/s13428-023-02237-8

Kandel, M., & Snedeker, J. (2024). Assessing two methods of webcam-based eye-tracking for child language research. Journal of Child Language, 52(3), 675–708. https://doi.org/10.1017/s0305000924000175

Leeuw, J. R. de. (2014). jsPsych: A JavaScript library for creating behavioral experiments in a Web browser. Behavior Research Methods, 47(1), 1–12. https://doi.org/10.3758/s13428-014-0458-y

McMurray, B., Samelson, V. M., Lee, S. H., & Tomblin, J. B. (2010). Individual differences in online spoken word recognition: Implications for SLI. Cognitive Psychology, 60(1), 1–39. https://doi.org/10.1016/j.cogpsych.2009.06.003

Mirman, D. (n.d.). Growth curve analysis and visualization using r.

Mitterer, H. (2025). A web-based mouse-tracking task for early perceptual language processing. Behavior Research Methods, 57(11). https://doi.org/10.3758/s13428-025-02827-8

Özsoy, O., Çiçek, B., Özal, Z., Gagarina, N., & Sekerina, I. A. (2023). Turkish-german heritage speakers’ predictive use of case: Webcam-based vs. In-lab eye-tracking. Frontiers in Psychology, 14, 1155585. https://doi.org/10.3389/fpsyg.2023.1155585

Papoutsaki, A., Sangkloy, P., Laskey, J., Daskalova, N., Huang, J., & Hays, J. (2016, July). WebGazer: Scalable Webcam Eye Tracking Using User Interactions. International Joint Conference on Artificial Intelligence.

Patterson, A. S., Nicklin, C., & Vitta, J. P. (2025). Methodological recommendations for webcam-based eye tracking: A scoping review. Research Methods in Applied Linguistics, 4(3), 100244. https://doi.org/10.1016/j.rmal.2025.100244

Peirce, J., Gray, J. R., Simpson, S., MacAskill, M., Höchenberger, R., Sogo, H., Kastman, E., & Lindeløv, J. K. (2019). PsychoPy2: Experiments in behavior made easy. Behavior Research Methods, 51(1), 195–203. https://doi.org/10.3758/s13428-018-01193-y

Prystauka, Y., Altmann, G. T. M., & Rothman, J. (2024). Online eye tracking and real-time sentence processing: On opportunities and efficacy for capturing psycholinguistic effects of different magnitudes and diversity. Behavior Research Methods, 56(4), 3504–3522. https://doi.org/10.3758/s13428-023-02176-4

Reips, U.-D. (2021). Web-based research in psychology: A review. Zeitschrift Für Psychologie, 229(4), 198–213. https://doi.org/10.1027/2151-2604/a000475

Rodrigues, B., & Baumann, P. (2025). Rix: Reproducible data science environments with ’nix’. https://docs.ropensci.org/rix/

Saxena, S., Fink, L. K., & Lange, E. B. (2024). Deep learning models for webcam eye tracking in online experiments. Behavior Research Methods, 56(4), 3487–3503. https://doi.org/10.3758/s13428-023-02190-6

Semmelmann, K., & Weigelt, S. (2018). Online webcam-based eye tracking in cognitive science: A first look. Behavior Research Methods, 50(2), 451–465. https://doi.org/10.3758/s13428-017-0913-7

Slim, M. S., & Hartsuiker, R. J. (2023). Moving visual world experiments online? A web-based replication of Dijkgraaf, Hartsuiker, and Duyck (2017) using PCIbex and WebGazer.js. Behavior Research Methods, 55(7), 3786–3804. https://doi.org/10.3758/s13428-022-01989-z

Slim, M. S., Kandel, M., Yacovone, A., & Snedeker, J. (2024). Webcams as windows to the mind? A direct comparison between in-lab and web-based eye-tracking methods. Open Mind, 8, 1369–1424. https://doi.org/10.1162/opmi_a_00171

Tanenhaus, M. K., Spivey-Knowlton, M. J., Eberhard, K. M., & Sedivy, J. C. (1995). Integration of visual and linguistic information in spoken language comprehension. Science (New York, N.Y.), 268(5217), 1632–1634. http://www.ncbi.nlm.nih.gov/pubmed/7777863

Van der Cruyssen, I., Ben-Shakhar, G., Pertzov, Y., Guy, N., Cabooter, Q., Gunschera, L. J., & Verschuere, B. (2023). The validation of online webcam-based eye-tracking: The replication of the cascade effect, the novelty preference, and the visual world paradigm. Behavior Research Methods. https://doi.org/10.3758/s13428-023-02221-2

Veríssimo, J., & Lago, S. (2025). A novel method for detecting the onset of experimental effects in visual world eye-tracking. PsyArXiv. https://doi.org/10.31234/osf.io/yk4xb_v3

Viviani, P. (1990). Eye movements in visual search: cognitive, perceptual and motor control aspects. Reviews of Oculomotor Research, 4, 353–393.

Vos, M., Minor, S., & Ramchand, G. C. (2022). Comparing infrared and webcam eye tracking in the Visual World Paradigm. Glossa Psycholinguistics, 1(1). https://doi.org/10.5070/G6011131

Wood, S. N. (2017). Generalized Additive Models: An introduction with R (2nd ed.). Chapman; Hall/CRC.

Yang, X., & Krajbich, I. (2021). Webcam-based online eye-tracking for behavioral research. Judgment and Decision Making, 16(6), 1485–1505. https://doi.org/10.1017/S1930297500008512

Zehr, J., & Schwarz, F. (2022). PennController for internet based experiments (IBEX). Open Science Framework. https://doi.org/10.17605/OSF.IO/MD832

Footnotes

GAMs are inherently additive, meaning that interactions between nonlinear (smooth) terms cannot be estimated directly. To evaluate time-varying interactions or simple effects, it is therefore standard practice to combine relevant factors into a single composite factor and fit condition-specific smooths (Coretta & Casillas, 2024).↩︎