Improving Webcam Eye-Tracking: Impact of Hardware and Head Stabilization on Data Quality

Authors
Affiliations

Jason Geller

Boston College

João Veríssimo

University of Lisbon

Julia Droulin

University of North Carolina - Chapel Hill

Abstract
Webcam-based eye-tracking offers a scalable and accessible alternative to traditional lab-based systems. While recent studies demonstrate that webcam eye-tracking can replicate canonical effects across domains such as language, memory, and decision-making, questions remain about its precision and reliability. In particular, spatial accuracy, temporal resolution, and attrition rates are often poorer than those observed with research-grade systems, raising the possibility that environmental and hardware factors introduce substantial noise. The present registered report directly tests two factors that may introduce noise into webcam data: camera quality and head stabilization. In Experiment 1, we examine the effect of external webcam quality (high vs. standard) in a single word Visual World Paradigm (VWP) task, testing whether using a better webcam can yield stronger competition effects, earlier effect onsets, and reduced attrition. In Experiment 2, we assess the impact of head stabilization (chin rest vs. no chin rest) under identical environmental conditions. Together, these studies identify the impact of hardware and movement on webcam eye-tracking data quality. Results will inform a more methodological understanding of webcam-based eye-tracking, clarifying whether its current limitations are intrinsic to the technology or can be mitigated through improved hardware and experimental control. These set of studies have implications for both online and in-lab utilization of webcam eye-tracking.
Keywords

webcam eye-tracking, webcameras, VWP, Lab-based experimentation, competition, spoken word recognition


Online experimentation in the behavioral sciences has advanced considerably since its introduction at the 1996 Society for Computers in Psychology (SCiP) conference in Chicago, IL (Reips, 2021). One methodological domain that has shown particular promise in moving online is eye tracking. Traditionally, eye-tracking studies required controlled laboratory settings equipped with specialized and costly hardware—a process that is both resource- and time-intensive. More recently, however, a growing body of research has shown that eye tracking can be successfully adapted to online environments (e.g., Bogdan et al., 2024; Bramlett & Wiener, 2024; James et al., 2025; Özsoy et al., 2023; Prystauka et al., 2024; Slim et al., 2024; Slim & Hartsuiker, 2023; Van der Cruyssen et al., 2023; Vos et al., 2022; Yang & Krajbich, 2021). By leveraging computers with webcameras, researchers can now record eye movements remotely, making it possible to collect data from virtually any location at any time. This shift not only enhances scalability, but also broadens access to more diverse and representative participant samples.

Webcam-based eye tracking has become an increasingly viable and accessible method for behavioral research. Implementation typically requires only a standard computing device (e.g., laptop, desktop, tablet, or smartphone) equipped with a built-in or external webcam. Data are collected through a web browser running dedicated software capable of recording and estimating gaze position in real time. This accessibility has been further enhanced by the integration of webcam-based eye tracking into several established experimental platforms, including Gorilla (Anwyl-Irvine et al., 2019), PsychoPy/PsychoJS (Peirce et al., 2019), jsPsych (Leeuw, 2014), PCIbex (Zehr & Schwarz, 2022), and Labvanced (Kaduk et al., 2023).

To reliably estimate where users are looking, webcam-based eye tracking typically relies on appearance-based methods, which infer gaze direction directly from visual features of the eye region (e.g., pupil and iris appearance) (Cheng et al., 2024; Saxena et al., 2024). Recent work has extended these methods using deep learning to learn gaze–appearance mappings directly from data (e.g., Kaduk et al., 2023; Saxena et al., 2024). This contrasts with research-grade eye trackers, which use model-based algorithms combining infrared illumination with geometric modeling of the pupil and corneal reflections (Cheng et al., 2024).

The most widely used library for webcam eye tracking is WebGazer.js (Papoutsaki et al., 2016; Patterson et al., 2025). WebGazer.js is an open-source JavaScript library that performs real-time gaze estimation using standard webcams. It is an appearance-based method that leverages computer vision techniques to detect the face and eyes, extract image features, and map these features onto known screen coordinates during a brief calibration procedure. Once trained, gaze locations on the screen are estimated via ridge regression (Papoutsaki et al., 2016).

Although webcam eye-tracking is still relatively new, validation efforts are steadily accumulating and the results are encouraging. Researchers have successfully applied webcam-based methods to domains such as language (e.g., Bramlett & Wiener, 2025; Geller et al., 2025; Prystauka et al., 2024), judgment and decision-making (e.g., Yang & Krajbich, 2021), memory (e.g., James et al., 2025), and public health (Chen-Sankey et al., 2023). Collectively, this work demonstrates that webcam eye-tracking can yield interpretable and meaningful results that are comparable to those obtained with traditional lab-based systems.

However, there are several limitations associated with web-based eye tracking. First, experimental effects are typically smaller than those observed in lab-based studies (Bogdan et al., 2024; Degen et al., 2021; Kandel & Snedeker, 2024; Slim et al., 2024; Slim & Hartsuiker, 2023; Van der Cruyssen et al., 2023). Second, relative to research-grade eye trackers, both spatial accuracy/precision and temporal resolution tend to be lower with webcam eye-tracking. Spatial accuracy refers to the extent to which measured gaze positions deviate from the true gaze point, whereas precision reflects the consistency of those measurements over time (Carter & Luke, 2020). In webcam-based eye tracking, spatial accuracy and precision often exceed 1° of visual angle (Semmelmann & Weigelt, 2018). Regarding temporal resolution, sampling rates are typically more variable, with most webcams rarely exceeding 30 Hz. Consequently, detectable effects tend to span a relatively broad temporal range, with some users reporting a 1000 ms onset difference (Geller et al., 2025; Semmelmann & Weigelt, 2018; Slim et al., 2024; Slim & Hartsuiker, 2023). This variability makes webcam eye tracking less suitable for studies that require fine-grained spatial or temporal fidelity—for example, paradigms involving many or small areas of interest (AOIs) (James et al., 2025) or tasks requiring millisecond-level temporal precision (Slim et al., 2024). Lastly, webcam-based studies tend to exhibit higher attrition rates. For instance, Patterson et al. (2025) reported an average attrition rate of approximately 13% across studies, with substantial variability across individual experiments (see also Geller et al., 2025; Prystauka et al., 2024). Together, increased noise and variation often necessitates the need for larger sample sizes for comparable power and to combat high attrition.

An open question is whether the limitations of web-based eye-tracking primarily stem from the WebGazer.js algorithm itself or from environmental and hardware constraints—and, crucially, whether future improvements in the set up of webcam eye-tracking can mitigate these issues. On the algorithmic side, recent work (e.g., James et al., 2025; also see Yang & Krajbich, 2021) demonstrated that modifying WebGazer.js so that the sampling rate is polled consistently and timestamps are aligned to data acquisition (rather than completion) markedly improves temporal resolution. Implementing these changes within online experiment platforms such as Gorilla and jsPsych has brought webcam-based eye-tracking closer to the timing fidelity achieved in laboratory settings. For example, using the Gorilla platform, Prystauka et al. (2024) reported a 50 ms timing difference, while Geller et al. (2025) observed a 100 ms difference between lab-based and online effects.

To our knowledge, no study has directly tested how environmental and hardware constraints impact webcam-based eye-tracking data. Slim & Hartsuiker (2023) provided some evidence suggesting that hardware quality may underlie some of these limitations, reporting a positive correlation between sampling rate and calibration accuracy. Similarly, Geller et al. (2025) found that participants who failed calibration more often reported using standard-quality built-in webcams and working in suboptimal environments (e.g., natural lighting). Together, these findings suggest that both hardware and environmental factors may contribute to the increased noise commonly observed in online eye-tracking data. In the proposed research we plan to manipulate two factors, hardware selection (webcams) and participant stability, across two experiments.

Proposed Research

To address environmental and technical sources of noise in webcam eye-tracking, we plan to bring participants into the lab to complete a Gorilla‐hosted webcam task under standardized conditions. We manipulate two key factors (between subjects) across two experiments. Experiment 1 varies external webcam quality (high‐ vs. standard‐quality external cameras). Experiment 2 varies head stabilization (i.e., chin rest vs. no chin rest). All sessions will use identical ambient lighting, fixed viewing distance, the same display/computer model, and controlled network settings. These manipulations specifically target sources of measurement noise induced by the quality of the webcam and the amount of movement by the participant.

To examine these factors, we employed a paradigm widely used in psycholinguistics—the Visual World Paradigm (VWP) (Cooper, 1974; Tanenhaus et al., 1995). The VWP has been successfully adapted for webcam-based eye tracking (Bramlett & Wiener, 2024, 2025; Geller et al., 2025; Prystauka et al., 2024). Although implementations vary across studies (see Huettig et al., 2011), the version used here investigates phonemic competition, wherein item sets are typically constructed so that the display contains a target (e.g., carrot), a cohort competitor (e.g., carriage), a rhyme competitor (e.g., parrot), and an unrelated distractor (e.g., tadpole). This configuration allows researchers to examine the dynamics of lexical competition—for instance, how phonologically similar words such as carriage (cohort effect) or parrot (rhyme effect) influence online spoken-word processing. Typically, fixations to cohort or rhyme competitors persist longer or emerge earlier than fixations to unrelated distractors, reflecting transient lexical activation.

In the present study, we focus specifically on cohort competition effects in single-word spoken-word recognition using the VWP. Several studies (e.g., Geller et al., 2025; Slim et al., 2024) have observed clear cohort competition effects using webcam eye tracking. However, effect sizes are sometimes smaller than those reported in traditional lab-based studies (e.g., Slim et al., 2024), and effects reliably emerge later in time when measured with standard webcams. This pattern suggests that increased measurement noise in webcam setups primarily introduces temporal delays—and in some cases smaller detectable effects—rather than exaggerating or distorting the underlying competition dynamics.

The current research aims to inform best practices for webcam-based eye tracking, with particular attention to hardware quality and physical setup considerations (e.g., head movement). Reducing noise by manipulating hardware and head movement is predicted to make the measured gaze signal more stable and less variable across time and trials. In turn, this can make existing effects easier to detect, potentially manifesting as (a) larger and clearer competition effects, (b) earlier and more reliable detection of effect onsets in time-course analyses, and (c) lower calibration failure and attrition rates compared to standard webcams.

While these guidelines will benefit researchers conducting webcam studies in uncontrolled, online settings, they are also valuable for laboratory-based research in which webcams may serve as lower-cost alternatives to infrared eye-tracking systems. By systematically testing the role of hardware and head stabilization, this work clarifies the conditions under which webcam eye tracking can approximate lab-quality data and where its limitations remain.

Experiment 1: High-Quality Webcam vs. Standard-Quality Webcam

Both Slim & Hartsuiker (2023) and Geller et al. (2025) observed a clear relationship between webcam quality and calibration accuracy in webcam-based eye-tracking. Building on these findings, Experiment 1 tests how webcam quality influences competition effects in a single-word VWP. Specifically, we ask whether a higher-quality webcam yields (a) a greater proportion of looks to relevant interest areas (i.e., greater looks to cohorts vs. unrelated items) (b) an earlier emergence of these effects over time, and (c) lower data attrition rates relative to a lower-quality webcam.

To address this, participants will complete the same VWP task using one of two webcam types: a high-quality external webcam (e.g., Logitech Brio) and a standard external webcam designed to emulate a typical built-in laptop camera (e.g., Logitech C270). The high-quality webcam offers higher resolution, a higher sampling rate (60 Hz), and greater frame-rate stability, and more consistent illumination handling—factors expected to enhance gaze precision and tracking reliability. In contrast, more standard webcams, while representative of most participants’ home setups, typically provide lower frame rates and exhibit greater variability under different lighting conditions. Comparing these two setups enables a direct assessment of how hardware quality constrains the strength, timing, and reliability of linguistic competition effects in webcam-based eye-tracking.

Hypotheses

We hypothesize several effects related to competition, onset, and attrition.

Competition Effects

(H1a) Participants will show an overall greater proportion of looks to competitors. (H1b) Webcam quality (high vs. standard) will influence the overall proportion of looks, with higher-quality webcams detecting a greater proportion of looks.

To quantify the effect of webcam quality on the proportion of looks within the time window of interest, we will use Cohen’s h—a standardized measure of effect size appropriate for comparisons between two proportions.

Timing/Onset Effects

if (H2a) Each webcam condition will show a change in the proportion of looks across time. More specifically we hypothesize the proportion change will be non-linear across time and that there will be a difference between the two conditions across time. (H2b) For standard-quality webcams vs. high-quality webcam, onsets will be detected later due to increased noise.

Attrition

(H3) Attrition rates will be lower in the high-quality webcam condition than in the standard-quality webcam condition.

Method

All stimuli (audio and images), code, and data (raw and summary) will be placed on OSF at this link: https://osf.io/cf6xr/overview. The entire experiment will be stored on Gorilla’s open materials with a link to preview the tasks. In addition, the code and manuscript will be fully reproducible using Quarto and the package manager nix (Dolstra & contributors, 2023) in combination with the R package {rix} (Rodrigues & Baumann, 2025) . Together, nix and {rix} enable reproducible computational environments at both the system and package levels. This manuscript and all of the necessary files to reproduce it will be stored on GitHub.

Sampling Goal

We conducted an a priori power analysis via Monte Carlo simulation in R. Data from 21 participants, collected online using the Gorilla experimental platform during the development of the {webgazeR} (Geller et al. (2025)) package and employing the same stimuli and VVWP design, were used to seed the simulations. In these data, we observed a cohort effect of approximately 3%. Using this value as our seed, we collapsed the data across time bins to compute binomial counts per trial and fit a binomial generalized linear mixed model (GLMM) to obtain fixed-effect estimates. We then augmented the dataset by adding a between-subjects factor for webcam quality, with participants evenly assigned to high- and standard-quality groups. In the high-quality webcam group, we modeled both a higher overall fixation rate and a larger cohort effect, whereas in the standard-quality group the cohort effect was halved relative to the high-quality group. Simulated datasets were generated under this model, and the planned GLMM—including a condition × webcam interaction—was refit to each simulated dataset (N= 5000). Power was estimated as the proportion of simulations in which the interaction term exceeded |z| = 1.96. The analysis script to run this power analysis is located here: https://osf.io/4trmn/files/a46g8. Results indicated that a total of 35 participants per group (N = 70) would provide approximately 90% power to detect the hypothesized reduction in the cohort effect and overall fixation rate under standard-quality webcam conditions. Because this analysis only examines overall proportions and we plan to also examine effects across time, we will plan to recruit 50 participants in each group (N = 100 total). We will run our study until we have 100 usable participants (50 in each group). For the calibration analysis (see below), all participants who enter the study will be included.

Participants will give informed consent before participating. At the present time these studies are approved by the ethics committee.

Materials

VWP

Picture Stimuli

Stimuli were adapted from Colby & McMurray (2023). Each stimulus set comprised four images. For the webcam study, we used 30 sets (15 monosyllabic, 15 bisyllabic). Within each set, only the target and its onset competitor served as auditory targets, each presented once, yielding two trial types: TCRU trials, in which a target, cohort, rhyme, and unrelated picture were displayed (e.g., rocket, rocker, pocket, bubble), and TCUU trials, in which a target, cohort, and two unrelated images were displayed (e.g., mouth, mouse, house, chain). This design resulted in 60 trials total (30 sets × 2 target types per set). A custom MATLAB script (https://osf.io/x3frv) generated a unique randomized list for each participant, pseudo-randomizing display positions so that each image type was approximately equally likely to appear in any quadrant across subjects.

All 120 images were from a commercial clipart database that were selected by a small focus group of students and edited to have a cohesive style using a standard lab protocol (McMurray et al., 2010). Images were all scaled to 300 × 300 pixels.

Auditory Stimuli

Auditory stimuli were recorded by a female monolingual speaker of English in a sound-attenuated room sampled at 44.1 kHz. Auditory tokens were edited to reduce noise and remove clicks. They were then amplitude normalized to 70 dB SPL. . All .wav files were converted to .mp3 for online data collection. 

Webcams

To manipulate recording quality, two webcams will be used. In the high-quality condition, we will use a Logitech Brio webcam, which records in 4K resolution (up to 4096 × 2160 px) with a 90° field of view and samples at 60 Hz. This setup provides high-fidelity video with greater spatial and temporal precision. In the standard-quality condition, we will use a Logitech C270 HD webcam, which records in 720p resolution and samples at 30 Hz, producing video quality comparable to that of a typical built-in laptop webcam and therefore simulating lower-quality online recordings (see (Jarvis et al., 2025)

Both webcams will be mounted in a fixed position above the monitor to maintain consistent framing across participants. Lighting will be standardized to ensure uniform image quality across all sessions.

Experimental Setup and Procedure

All tasks will be completed in a single session lasting approximately 30 minutes. The experiment will be programmed and administered in Gorilla (Anwyl-Irvine et al., 2019). Participants will be brought into a room in the Human Neuroscience Lab at Boston College and seated in front of a 23-inch Dell U2312HM monitor (1920 × 1080 px) approximately 65 cm from the screen. Audiotry information will be presented over Sony ZX110 headphones to ensure consistent audio presentation and minimize background noise. The experimental tasks will be fixed and presented in this order: informed consent, single word VWP, and a demographic questionnaire. The entire experiment can be viewed on Gorilla.

Before the main task, an instructional video will demonstrate the calibration procedure. Calibration will occur twice—once at the start and again after 30 trials—with up to three attempts allowed each time. In each calibration phase, participants will view nine calibration targets and five validation points, looking directly at each target as instructed. Participants will then complete four practice trials to familiarize themselves with the task. Each trial begins with a 500 ms central fixation cross, followed by a preview display of four images located in the screen’s corners. After 1500 ms, a start button appears at the center; participants click it to confirm fixation before hearing the spoken word. The images remain visible throughout the trial, and participants indicate their response by clicking the image corresponding to the spoken target. A response deadline of 5 seconds will be used. Eye movements are recorded continuously during each . Following the main VWP task, participants will complete a brief demographic questionnaire, after which they will be thanked for their participation.

Data Preprocessing and Exclusions

We will follow the guidelines outlined in Geller et al. (2025) and exclude participants with overall task accuracy below 80%, those who report English as not their first language, and those with non-normal or uncorrected vision. At the trial level, only correct-response trials (accuracy = 1) will be retained. Reaction times (RTs) outside ±2.5 SD of the participant-level distribution (computed within condition) will be removed. To increase signal-to-noise, participants with fewer than 40 usable trials after these exclusions will also be removed.

For eye-tracking preprocessing we will use the {webgazeR} package in R (Geller et al., 2025) that contains helper functions to preprocess webcam eye-tracking data. All webcam eye-tracking files and behavioral data will be merged. Data quality will be screened via sampling-rate checks with very low-frequency recordings (e.g., < 15 Hz) by-participant and by-trial excluded (Bramlett & Wiener, 2025; Vos et al., 2022). We will quantify out-of-bounds (OOB) samples—gaze points outside the normalized screen (1,1)—and remove participants and trials with excessive OOB data (> 30%). OOB samples will be discarded prior to analysis. In addition, Gorilla provides calibration/quality metrics (“convergence” and “confidence,” both 0–1); trials with convergence < 0.5 or confidence > 0.5 will be excluded.

Areas of Interest (AOIs) will be defined in normalized coordinates as the four screen quadrants, and gaze samples will be assigned to AOIs. To create a uniform time base, data will be downsampled into 50-ms bins. Trial time will be aligned to the actual stimulus onset by taking the audio onset metric provided by Gorilla. We then subtract 100 ms due to silence prefixed to the audio recording. 

For the analysis portion we will count the number of looks to cohort (vs. unrelated) items. We will first count the number of looks to cohorts and unrelated items by participant x trial x time bin. Then we will take the count over trial to give us a dataset with participant x time bin for each webcamera. These look counts will serve as inputs to the statistical models and summaries.

The display contained two types of critical trials: TCRU (target–cohort–rhyme–unrelated) and TCUU (target–cohort–unrelated–unrelated). Because some items served as a rhyme competitor (R) in TCRU trials and as an unrelated item (U) in TCUU trials, we adopted a principled scheme to define the “unrelated” baseline in our cohort analyses. For the primary analysis of cohort competition, we used a single, unrelated competitor as the baseline. Concretely, For each trial, we identified one critical unrelated item that was never used as a rhyme competitor (R) anywhere in the experiment. In TCRU trials, this was the standard unrelated item. In TCUU trials, when one of the two unrelated objects also served as a rhyme competitor in other trials, we excluded that item from the C–U contrast and treated only the remaining object as the unrelated baseline.

Analysis Plan

GAMMs

To analyze overall competition effects and onset latency, we will use generalized additive mixed models [GAMMs; (Wood, 2017). GAMMs extends the generalized linear modeling framework by modeling effects that are expected to vary non-linearly over time–a common feature in the VWP (Brown-Schmidt et al., 2025; Ito & Knoeferle, 2022; Mitterer, 2025; Veríssimo & Lago, 2025). These models capture non-linear effects by fitting smoothing splines to the data using data-driven, machine-learning-based methods, with the amount of non-linearity captured by how “wiggly” the time course is (one can think of this as the number of bowpoints of a curve). A benefit of this approach is that it reduces the risk of over-fitting and eliminates the need to use polynomial terms, as required in traditional growth curve models (Mirman, 2014) . Importantly, GAMMs also allow researchers to account for autocorrelation in time-series data, which is especially critical in gaze analyses where successive samples are not independent. By modeling the autocorrelation structure, GAMMs provide more accurate estimates of temporal effects and prevent inflation of Type I error rates (Rij et al., 2019). In addition to this, fitting GAMMs allow us to estimate the onset of the competition effect in each condition (see Veríssimo & Lago, 2025).

Competition Effects (H1a and H1b)

Gaze samples will be analyzed with a binomial (logistic) GAMM using the bam() function from the {mgcv} package (Wood, 2017). For visualization, we will employ functions from the {tidygam} package (Coretta, 2024), the {onsets} package (Veríssimo & Lago, 2025) to examine onset latencies, and the {itsadug} (van Rij et al., 2022) package for AR functions and to test differences in smooth splines across the time (get_differences()). The dependent variable will consist of gaze counts to cohorts compared to unrelated items, for each participant and in each 50ms time bin. All analyses were conducted on a window ranging from stimulus onset (100 ms) to 1200 ms.

The right model in the context of GAMMs is a difficult one. Results have been shown to vary depending on whether one uses an ordered factor scheme or unordered factor scheme with time (Oltrogge et al., 2025). Because of this we plan to fit the model multiple ways to test for robustness. In one model (Listing 2) we will fit a model that includes a parametric term for webcam type that is an unordered factor (treatment-coded such that high-quality = 1 and standard-quality = 0). To examine whether webcam type moderates the cohort effect over time, we will include smooth terms for time-by-condition interactions with camera. To account for individual differences, we will include participant-specific random smooths for time, and participant-specific random smooths for time within each level of the camera factor.

This specification allows the model to capture three components:

  1. The overall (parametric) effect of webcam type on the proportion of looks to cohorts-over-unrelated items. Here, the intercept reflects the expected proportion of looks in the standard-quality condition, and the webcam coefficient reflects the difference in looks between high- and standard-quality webcams (H1a and H1b).
  2. Condition-specific, time-varying trajectories in cohort competition (via the smooth terms), which support inferences about the timing (onsets) and dynamics of cohort effects across conditions. With an unordered factor, the model estimates time smooths for each condition separately with the p-value telling us if each smooth differs from zero (H2a).
  3. Deviations from these group-level trajectories at the individual participant level (via random smooths)

In a second instantiation of the model (see Listing 3), we will include a parametric term for webcam type treated as an ordered factor. To assess whether webcam type moderates the cohort-over-unrelated effect over time, we will incorporate smooth terms for time as well as time-by-condition interactions, as required when fitting ordered factors.

This specification allows the model to capture the same effects above with one key difference: By including an ordered factor we must include separate smooths for time (the reference) and time x webcam type. This model estimates a baseline smooth for the standard-quality webcam and a difference smooth for the high-quality webcam. Because the webcam factor is ordered, the difference smooth directly represents how the trajectory for high-quality deviates from the standard-quality trajectory. This structure enables inferences about the timing and dynamics of cohort effects across webcam conditions. We will also look at how changing the order affects inferences.

For both models, we use default arguments for the smooth functions. We use k = 9 (number of knots) to capture the “wigglyness” of the smooths. Smooths were fit with a thin plate regression spline. This default seems to work generally well (Oltrogge et al., 2025); however deviations from this default will reported and robustness will be tested.

Although “maximal” random-effects structures are often recommended in linear mixed-effects models (Barr et al., 2013), such specifications can be computationally prohibitive in GAMMs. The present specification follows the recommendations of Veríssimo & Lago (2025). Statistical significance will be assessed at \(\alpha\) = .05. When fitting difference curves with ordered factors with simultaneous CIs, we will correct for multiple comparisons using a Bonferroni correction (Krause et al., 2024).

To account for autocorrelation in the residuals, we will first fit the model without an autoregressive term in order to estimate the autocorrelation parameter (ρ). We will then re-fit the model including a first-order autoregressive process (AR(1)) to properly model temporal dependencies. Although using larger time bins can reduce autocorrelation, it does not eliminate it entirely, so explicitly modeling residual autocorrelation ensures valid statistical inference.

Onset Effects (H2b)

Once the models are fit, we will extract predicted gaze curves for each condition using the {onsets} package (Veríssimo & Lago, 2025). The {onsets} procedure first simulates gaze curves across time from the fitted GAMMs (N = 10,000). For each simulated curve, the onset of the condition effect is identified by comparing the predicted log-odds at each timepoint to a predefined criterion. Within the package, onset is defined as the earliest time at which the predicted log-odds is significantly greater than the model-predicted log-odds at the first timepoint of the analysis window (here word onset). Repeating this procedure across 10,000 simulations yields a distribution of onset estimates, from which a 95% highest density interval (HDI) can be obtained. To derive between-condition comparisons, onset times from paired simulations are subtracted, producing a corresponding distribution with median onset difference and its associated 95% HDI (see Listing 4 for the analysis code).

Listing 1: R packages to load
Show R code
# load packages
library(mgcv) # perform gam
library(tidygam) # visualization of gams
library(itsadug) # get_differences
library(onsets)  # onset differences
Listing 2: GAMM model set–up unordered camera factor
Show R code
# set contrasts
options(contrasts = rep("contr.treatment", 2)) # treament code variables

# combine levels of both factors into one factor 
dat$camera <- as.factor(dat$camera)

# get start event for AR
start_event = start_event(., column = "Time", # for ar
                                     event = c("subj", "Trial"), 
                                     label = "start.event",
                                     label.event = NULL, 
                                     order = FALSE)$start.event)
# quick rho estimate (fit once without AR to get residual ACF ~ lag1)

m0 <- 
bam(cbind(fix_cohort, fix_unrelated) ~ 1 + 
    camera + s(time, by = camera, k = 10) + 
    s(participant, by=camera, bs = "re") +
    s(time, participant, by=camera, bs = "re"), 
    family = binomial(), method = "fREML",
  discrete = TRUE, data = dat, na.action = na.omit)
    
rho  <- acf(residuals(m0, type = "pearson"), plot = FALSE)$acf[2]

# final model with AR(1) to handle within-series autocorrelation

m1 <- 
bam(cbind(fix_cohort, fix_unrelated) ~ 1 + 
    camera + s(time, by = camera, k = 10) + 
    s(participant, by=camera, bs = "re") +
    s(time, participant, by=camera, bs = "re") + 
    rho=rho, AR.start = start_event)
Listing 3: GAMM model set–up ordered camera factor
Show R code
dat$camera_ord <- factor(dat$camera, ordered = T)

# get start event for AR
start_event = start_event(., column = "Time", # for ar
                                     event = c("subject", "Trial"), 
                                     label = "start.event",
                                     label.event = NULL, 
                                     order = FALSE)$start.event)
# quick rho estimate (fit once without AR to get residual ACF ~ lag1)

m0 <- 
bam(cbind(fix_cohort, fix_unrelated) ~ 1 + 
    1 + Condition_ord + s(Time) + 
    s(Time, by=Condition_ord) + 
    (Participant, bs="re") + 
    s(Participant, by=Condition_ord, bs="re") + 
    s(Participant, Time, bs="re") + 
    s(Participant, Time, by=Condition_ord, bs="re"), 
    family = binomial(), method = "fREML",
  discrete = TRUE, data = dat, na.action = na.omit)
    
rho  <- acf(residuals(m0, type = "pearson"), plot = FALSE)$acf[2]

# final model with AR(1) to handle within-series autocorrelation

m1 <- 
bam(cbind(fix_cohort, fix_unrelated) ~ 1 + 
    1 + Condition_ord + s(Time) + 
    s(Time, by=Condition_ord) + 
    (Participant, bs="re") + 
    s(Participant, by=Condition_ord, bs="re") + 
    s(Participant, Time, bs="re") + 
    s(Participant, Time, by=Condition_ord, bs="re"), 
    family = binomial(), method = "fREML",
  discrete = TRUE, data = dat, na.action = na.omit, 
    rho=rho, AR.start = start_event)
Listing 4: Getting onset differences using the {onsets} package for unordered model
Show R code
# Obain onsets in each condition (and their differences)

onsets_comp <- get_onsets(model = m1,               # Fitted GAMM
                       time_var = "time",           # Name of time variabl
                 by_var = "camera",        # Name of condition/group variable
                          compare = T,              # Obtain differences between onsets
                          n_samples = 10000,           # Large number of samples (less variable results)
                          seed = 1)            # Random seed for reproducibility
Listing 5: Getting onset differences using the {onsets} package for ordered model
Show R code
# Obain onsets in each condition (and their differences)

onsets_comp <- get_onsets(model = m1,               # Fitted GAMM
                       time_var = "time",           # Name of time variabl
                 by_var = "camera_ord",    # Name of condition/group variable
                          compare = T,              # Obtain differences between onsets
                          n_samples = 10000,           # Large number of samples (less variable results)
                          seed = 1)                  # Random seed for reproducibility

Calibration (H3c)

To examine whether webcam type affects calibration rejection, we will fit a logistic regression model using the glm() function (see Listing 6). Calibration outcome will be coded as a binary variable, where 0 indicates that a participant failed to calibrate at either of the calibration phases, and 1 indicates that the participant successfully passed both calibration phases. This model will allow us to estimate whether the webcam condition reliably predicts the probability of successful calibration.

Listing 6: Logistic model to examine the effect of webcams on calibration
Show R code
# fit glm model
glm(calibration ~ camera, family = binomial(link = "logit"))

Experiment 2: Head stabilization (chin rest) vs. no head stabilization (no chin rest)

In Experiment 2, we will use the same standard-quality external webcam as in Experiment 1, but will manipulate head stability by comparing a chin-rest condition to a no–chin-rest condition. Hessels et al. (2014) offered evidence that head movement can impact the accuracy and reliability of eye-tracking estimates, showing that even moderate deviations in head position can produce systematic calibration drift, increased data loss, and slower recovery when gaze is reacquired. Importantly, their study demonstrates that these effects are not limited to extreme movements, and can arise during typical participant behavior when head position is not constrained. Some online platforms like labvanced (Finger et al., 2017) attempt to mitigate head motion by warning participants when they move outside a predefined region; however, it remains unclear how such motion control interacts with WebGazer.js estimates of event detection and onset latency. In laboratory-based eye-tracking, chin rests are routinely used to stabilize the head in a fixed position so that only the eyes move during the experiment. By introducing a chin-rest manipulation, we can directly assess the extent to which head stability contributes to competition effects, the timing of event detection, and participant attrition in webcam-based eye-tracking. We therefore test the following hypotheses regarding competition, onset, and attrition.

Hypotheses

We hypothesize several effects related to competition, onset, and attrition. These are generally the same as Experiment 1.

Competition Effects

(H1a) Participants will show an overall greater proportion of looks to competitors. (H1b) Head stabilization (chin rest vs. no chin rest) will influence the overall proportion of looks, with chin rest condition detecting a greater proportion of looks.

Timing/Onset Effects

(H2a) Each condition will show a change in the proportion of looks across time. More specifically we hypothesize the proportion change will be non-linear across time and that there will be a difference between the two conditions across time. (H2b) For the no chin-rest condition compared to the chin-rest condition, onsets will be detected later due to increased noise.

Attrition

(H3) Attrition rates will be lower in the high-quality webcam condition than in the standard-quality webcam condition.

Sampling Goal, Materials , Procedure, and Analysis Plan

The sampling goal, materials, procedure, and analysis plan are the same as Experiment 1. The main difference is whether participants use a chin rest or not.

Declarations

Funding

This study was funded by the main author.

Conflicts of Interest

The authors declare no conflicts of interest.

Ethics Approval

This study was approved by the relevant ethics committee.

Availability of Data and Materials

All data and materials will be stored on OSF, GitHub, and archived on Zenodo.

Code Availability

All code will be made available on OSF, GitHub, and Zenodo.

Authors’ Contributions

  • JG - Writing—original draft
  • JV - Writing—review & editing
  • JD - Writing—review & editing

References

Anwyl-Irvine, A. L., Massonnié, J., Flitton, A., Kirkham, N., & Evershed, J. K. (2019). Gorilla in our midst: An online behavioral experiment builder. Behavior Research Methods, 52(1), 388–407. https://doi.org/10.3758/s13428-019-01237-x
Barr, D. J., Levy, R., Scheepers, C., & Tily, H. J. (2013). Random effects structure for confirmatory hypothesis testing: Keep it maximal. Journal of Memory and Language, 68(3), 255–278. https://doi.org/10.1016/j.jml.2012.11.001
Bogdan, P. C., Dolcos, S., Buetti, S., Lleras, A., & Dolcos, F. (2024). Investigating the suitability of online eye tracking for psychological research: Evidence from comparisons with in-person data using emotionattention interaction tasks. Behavior Research Methods, 56(3), 2213–2226. https://doi.org/10.3758/s13428-023-02143-z
Bramlett, A. A., & Wiener, S. (2024). The art of wrangling. Linguistic Approaches to Bilingualism. https://doi.org/https://doi.org/10.1075/lab.23071.bra
Bramlett, A. A., & Wiener, S. (2025). The art of wrangling: Working with web-based visual world paradigm eye-tracking data in language research. Linguistic Approaches to Bilingualism, 15(4), 538–570. https://doi.org/10.1075/lab.23071.bra
Brown-Schmidt, S., Cho, S.-J., Fenn, K. M., & Trude, A. M. (2025). Modeling spatio-temporal patterns in intensive binary time series eye-tracking data using Generalized Additive Mixed Models. Brain Research, 1854, 149511. https://doi.org/10.1016/j.brainres.2025.149511
Carter, B. T., & Luke, S. G. (2020). Best practices in eye tracking research. International Journal of Psychophysiology, 155, 49–62. https://doi.org/10.1016/j.ijpsycho.2020.05.010
Cheng, Y., Wang, H., Bao, Y., & Lu, F. (2024). Appearance-based Gaze Estimation With Deep Learning: A Review and Benchmark (arXiv:2104.12668). https://doi.org/10.48550/arXiv.2104.12668
Chen-Sankey, J., Elhabashy, M., Gratale, S., Geller, J., Mercincavage, M., Strasser, A. A., Delnevo, C. D., Jeong, M., & Wackowski, O. A. (2023). Examining Visual Attention to Tobacco Marketing Materials Among Young Adult Smokers: Protocol for a Remote Webcam-Based Eye-Tracking Experiment. JMIR Research Protocols, 12, e43512. https://doi.org/10.2196/43512
Colby, S. E., & McMurray, B. (2023). Efficiency of spoken word recognition slows across the adult lifespan. Cognition, 240, 105588. https://doi.org/10.1016/j.cognition.2023.105588
Cooper, R. M. (1974). The control of eye fixation by the meaning of spoken language: A new methodology for the real-time investigation of speech perception, memory, and language processing. Cognitive Psychology, 6(1), 84–107. https://doi.org/10.1016/0010-0285(74)90005-X
Coretta, S. (2024). Tidygam: Tidy prediction and plotting of generalised additive models. https://github.com/stefanocoretta/tidygam
Degen, J., Kursat, L., & Leigh, D. D. (2021). Seeing is believing: Testing an explicit linking assumption for visual world eye-tracking in psycholinguistics. Proceedings of the Annual Meeting of the Cognitive Science Society, 43.
Dolstra, E., & contributors, T. N. (2023). Nix (Version 2.15.3) [Computer software]. https://nixos.org/
Finger, H., Goeke, C., Diekamp, D., Standvoß, K., & König, P. (2017). LabVanced: A unified JavaScript framework for online studies. International Conference on Computational Social Science (IC2S2).
Geller, J., Prystauka, Y., Colby, S. E., & Drouin, J. R. (2025). Language without borders: A step-by-step guide to analyzing webcam eye-tracking data for L2 research. Research Methods in Applied Linguistics, 4(3), 100226. https://doi.org/10.1016/j.rmal.2025.100226
Hessels, R. S., Cornelissen, T. H. W., Kemner, C., & Hooge, I. T. C. (2014). Qualitative tests of remote eyetracker recovery and performance during head rotation. Behavior Research Methods, 47(3), 848–859. https://doi.org/10.3758/s13428-014-0507-6
Huettig, F., Rommers, J., & Meyer, A. S. (2011). Using the visual world paradigm to study language processing: a review and critical evaluation. Acta Psychologica, 137(2), 151–171. https://doi.org/10.1016/j.actpsy.2010.11.003
Ito, A., & Knoeferle, P. (2022). Analysing data from the psycholinguistic visual-world paradigm: Comparison of different analysis methods. Behavior Research Methods, 55(7), 3461–3493. https://doi.org/10.3758/s13428-022-01969-3
James, A. N., Ryskin, R., Hartshorne, J. K., Backs, H., Bala, N., Barcenas-Meade, L., Bhattarai, S., Charles, T., Copoulos, G., Coss, C., Eisert, A., Furuhashi, E., Ginell, K., Guttman-McCabe, A., Harrison, E. (Chaz)., Hoban, L., Hwang, W. A., Iannetta, C., Koenig, K. M., … de Leeuw, J. R. (2025). What Paradigms Can Webcam Eye-Tracking Be Used For? Attempted Replications of Five Cognitive Science Experiments. Collabra: Psychology, 11(1), 140755. https://doi.org/10.1525/collabra.140755
Jarvis, M., Vasarhelyi, A., Anderson, J., Mulley, C., Lipp, O. V., & Ney, L. J. (2025). js-mEye: An extension and plugin for the measurement of pupil size in the online platform jsPsych. Behavior Research Methods, 58(1). https://doi.org/10.3758/s13428-025-02901-1
Kaduk, T., Goeke, C., Finger, H., & König, P. (2023). Webcam eye tracking close to laboratory standards: Comparing a new webcam-based system and the EyeLink 1000. Behavior Research Methods, 56(5), 5002–5022. https://doi.org/10.3758/s13428-023-02237-8
Kandel, M., & Snedeker, J. (2024). Assessing two methods of webcam-based eye-tracking for child language research. Journal of Child Language, 52(3), 675–708. https://doi.org/10.1017/s0305000924000175
Krause, J., Rij, J. van, & Borst, J. P. (2024). Word Type and Frequency Effects on Lexical Decisions Are Process-dependent and Start Early. Journal of Cognitive Neuroscience, 36(10), 2227–2250. https://doi.org/10.1162/jocn_a_02214
Leeuw, J. R. de. (2014). jsPsych: A JavaScript library for creating behavioral experiments in a Web browser. Behavior Research Methods, 47(1), 1–12. https://doi.org/10.3758/s13428-014-0458-y
McMurray, B., Samelson, V. M., Lee, S. H., & Tomblin, J. B. (2010). Individual differences in online spoken word recognition: Implications for SLI. Cognitive Psychology, 60(1), 1–39. https://doi.org/10.1016/j.cogpsych.2009.06.003
Mirman, D. (2014). Growth curve analysis and visualization using r. Chapman; Hall/CRC.
Mitterer, H. (2025). A web-based mouse-tracking task for early perceptual language processing. Behavior Research Methods, 57(11). https://doi.org/10.3758/s13428-025-02827-8
Oltrogge, E., Veríssimo, J., Patil, U., & Lago, S. (2025). Memory retrieval and prediction interact in sentence comprehension: An experimental evaluation of a cue-based retrieval model. Journal of Memory and Language, 144, 104651. https://doi.org/10.1016/j.jml.2025.104651
Özsoy, O., Çiçek, B., Özal, Z., Gagarina, N., & Sekerina, I. A. (2023). Turkish-german heritage speakers’ predictive use of case: Webcam-based vs. In-lab eye-tracking. Frontiers in Psychology, 14, 1155585. https://doi.org/10.3389/fpsyg.2023.1155585
Papoutsaki, A., Sangkloy, P., Laskey, J., Daskalova, N., Huang, J., & Hays, J. (2016, July). WebGazer: Scalable Webcam Eye Tracking Using User Interactions. International Joint Conference on Artificial Intelligence.
Patterson, A. S., Nicklin, C., & Vitta, J. P. (2025). Methodological recommendations for webcam-based eye tracking: A scoping review. Research Methods in Applied Linguistics, 4(3), 100244. https://doi.org/10.1016/j.rmal.2025.100244
Peirce, J., Gray, J. R., Simpson, S., MacAskill, M., Höchenberger, R., Sogo, H., Kastman, E., & Lindeløv, J. K. (2019). PsychoPy2: Experiments in behavior made easy. Behavior Research Methods, 51(1), 195–203. https://doi.org/10.3758/s13428-018-01193-y
Prystauka, Y., Altmann, G. T. M., & Rothman, J. (2024). Online eye tracking and real-time sentence processing: On opportunities and efficacy for capturing psycholinguistic effects of different magnitudes and diversity. Behavior Research Methods, 56(4), 3504–3522. https://doi.org/10.3758/s13428-023-02176-4
Reips, U.-D. (2021). Web-based research in psychology: A review. Zeitschrift Für Psychologie, 229(4), 198–213. https://doi.org/10.1027/2151-2604/a000475
Rij, J. van, Hendriks, P., Rijn, H. van, Baayen, R. H., & Wood, S. N. (2019). Analyzing the Time Course of Pupillometric Data. Trends in Hearing, 23. https://doi.org/10.1177/2331216519832483
Rodrigues, B., & Baumann, P. (2025). Rix: Reproducible data science environments with ’nix’. https://docs.ropensci.org/rix/
Saxena, S., Fink, L. K., & Lange, E. B. (2024). Deep learning models for webcam eye tracking in online experiments. Behavior Research Methods, 56(4), 3487–3503. https://doi.org/10.3758/s13428-023-02190-6
Semmelmann, K., & Weigelt, S. (2018). Online webcam-based eye tracking in cognitive science: A first look. Behavior Research Methods, 50(2), 451–465. https://doi.org/10.3758/s13428-017-0913-7
Slim, M. S., & Hartsuiker, R. J. (2023). Moving visual world experiments online? A web-based replication of Dijkgraaf, Hartsuiker, and Duyck (2017) using PCIbex and WebGazer.js. Behavior Research Methods, 55(7), 3786–3804. https://doi.org/10.3758/s13428-022-01989-z
Slim, M. S., Kandel, M., Yacovone, A., & Snedeker, J. (2024). Webcams as windows to the mind? A direct comparison between in-lab and web-based eye-tracking methods. Open Mind, 8, 1369–1424. https://doi.org/10.1162/opmi_a_00171
Tanenhaus, M. K., Spivey-Knowlton, M. J., Eberhard, K. M., & Sedivy, J. C. (1995). Integration of visual and linguistic information in spoken language comprehension. Science (New York, N.Y.), 268(5217), 1632–1634. http://www.ncbi.nlm.nih.gov/pubmed/7777863
Van der Cruyssen, I., Ben-Shakhar, G., Pertzov, Y., Guy, N., Cabooter, Q., Gunschera, L. J., & Verschuere, B. (2023). The validation of online webcam-based eye-tracking: The replication of the cascade effect, the novelty preference, and the visual world paradigm. Behavior Research Methods. https://doi.org/10.3758/s13428-023-02221-2
van Rij, J., Wieling, M., Baayen, R. H., & van Rijn, H. (2022). itsadug: Interpreting time series and autocorrelated data using GAMMs.
Veríssimo, J., & Lago, S. (2025). A novel method for detecting the onset of experimental effects in visual world eye-tracking. PsyArXiv. https://doi.org/10.31234/osf.io/yk4xb_v3
Vos, M., Minor, S., & Ramchand, G. C. (2022). Comparing infrared and webcam eye tracking in the Visual World Paradigm. Glossa Psycholinguistics, 1(1). https://doi.org/10.5070/G6011131
Wood, S. N. (2017). Generalized Additive Models: An introduction with R (2nd ed.). Chapman; Hall/CRC.
Yang, X., & Krajbich, I. (2021). Webcam-based online eye-tracking for behavioral research. Judgment and Decision Making, 16(6), 1485–1505. https://doi.org/10.1017/S1930297500008512
Zehr, J., & Schwarz, F. (2022). PennController for internet based experiments (IBEX). Open Science Framework. https://doi.org/10.17605/OSF.IO/MD832