Procedure

Before you start the experiment

Although lexical decision tasks are typically low-risk and low-cost experiments, one must know if they have the available resources. You need to consider several practical aspects before starting data collection, which are mostly the same as for other low-risk psychological experiments:

Lab-based study: quiet lab room with a computer.
Online study: web server or hosting service funding.
Research software may require funding. However, multiple open-source options are available at no cost.
Participant reimbursement: Specify the exact amount or range you plan or are expected to pay participants. Consider local guidelines, customary rates, and any specific requirements from your institution or funding source.
Ethics permission: Ethics approval is not legally required in Germany (this might be different elsewhere), but may be needed by your university, funder, or journal. It is recommended to seek ethical clearance. In case your university does not have an ethics committee, one can consult the ethics committee of the German Society of Psychology
Data safety and storage: In the European Union, data storage needs to comply with the GDPR (as of 2025). It is the researcher's responsibility to ensure that their experiment complies with current local standards.

Programming the experiment

The first step is to program the experiment: We want the experimental software to show each trial and to record the participant’s response to the trial and the reaction time. A lexical decision task can be programmed with medium-low programming skills. Templates are available for beginners on many of the existing experimental platforms (see examples below).

Possible platforms to run the experiment

Here is a nonexhaustive list of programs used by the TRUST group in recent years. Typically, all solutions have a forum for customer support and examples available. Find more information at the webpages linked below.

Program	Cost	Programming skill	Online hosting
Psytoolkit	Free	Intermediate	Platform
PsychoPy	Free	No	Self
Pavlovia	Yes	No	Platform
JsPsych	Free	Yes	Self
Open Sesame	Free	No	Self
Open Sesame Online (OSWeb)	Free	No	Platform
EyeLink Experiment Builder	Yes	No	No
Gorilla	Yes	No	Yes
EPrime	Yes	No	No
Presentation	Yes	Yes	No

The performance of the systems/platforms should be checked regarding the timing of the stimulus presentation and the latency of the key press. Information on this can be found here: Bridges et al., 2020; Anwyl-Irvine et al., 2021.

Programming considerations

Several decisions have to be made when programming experiments. These relate to the task structure, the use of the response device, and data quality assurance. Typically, an experimental session consists of blocks of items, with each block consisting of trials, which refers to the sequence of events that the participant engages with. In the classical lexical decision task, a single word or non-word is presented in each trial.

Text encoding

In principle, the text presentation is simple, but there can be nerve-wracking pitfalls (for example, when switching from a PC to a Mac computer). A special case in German is how text is encoded, meaning how letters and characters are stored in the computer's memory using specific digital codes. Some encoding systems do not allow the correct display of German-specific letters, such as the umlauts ä, ö, ü, Ä, Ö, Ü, or the sharp s ß. We highly recommend checking whether the presentation setup displays words with these letters correctly.

We recommend using UTF-8, a universal encoding for representing text digitally. UTF-8 can be set at the file level or within the presentation software. This encoding standard usually ensures that all characters, including German-specific letters, are shown correctly.

Trial structure

A trial generally has the following structure:

Fixation cross
- Fixation-cross presentation is implemented to prepare the participant for the upcoming trial. This presentation prevents eye movements or attentional blinks that could increase response times.
- Typically used signs are “+”, “*” or “x”. These fixation crosses serve as a forward mask to the reading material, potentially influencing behavior.
- Our suggested alternative is to use vertical and horizontal lines indicating the stimulus's location, without overlap between the bars and the word. Optionally, when including multiple word lengths, include bars at the left and right to indicate word length.
- Suggested duration of fixation cross presentation: 500 ms
Letter string presentation
- Reaction time, a main measure of the task, is measured from when the item appears on the screen until the response occurs.
- Letter strings are recommended to stay on screen until a response. Alternatively, one can have a specific hypothesis that would make a shorter presentation time reasonable. Also, one could show the stimulus for a fixed amount of time to incentivize participants to focus on the task, respond quickly, and avoid taking long breaks in the middle of a block. Here, presentation duration could range from 200 ms to 1000 ms. Adequate presentation times also vary by the target participant group. Children, language learners, or individuals with reading problems of any kind might often need longer presentation times than the average university student participant. When the participant responds before the maximum stimulus presentation time is over, the trial ends, the stimulus disappears, and the next trial is initiated.
- The letter strings should be presented in a different random order for each participant.
Participant response
- Typically, participants are instructed to respond by pressing on a keyboard, e.g., Left key: “d”, right key: “k”
- Alternatively, one can implement a go/no-go task with only button presses for words or non-words.
- Suppose the aim is to directly compare words and non-words (i.e., the so-called lexicality effect). In that case, it is recommended to counterbalance the response key between participants, such that half of them will press the left key for words and the right key for pseudowords, and the other half the other way around.
- If the number of words and non-words is counterbalanced, it is typical to instruct participants to press the Right key for words and the Left key for non-words.
- Alternatives to keyboard responses exist, such as keypads, touchscreens, or mouse clicks. For more information see Pronk et al., 2020, Bridges et al., 2020 or Rodd, 2024.
- Inter-stimulus interval: A break before the next stimulus will reduce interference effects. We recommend a blank screen inter-trial interval of 500 ms between the response and the next fixation cross.

After 100 trials, we recommend a short, self-paced break (“Press the space bar when you’re ready to continue”).

Participants

The number of participants should be determined by a power calculation based on the effect size of interest (e.g., see Brysbaert & Stevens, 2018). Note there are differences in power considerations when investigating individual differences (e.g., see Hedge et al. (2018)). Also, the reliability of the measure of interest could be relevant for considerations regarding the number of participants.

There are no fixed rules about inclusion criteria. If the aim is to minimise inter-individual differences and examine differences on the word (or non-word) level, criteria can maximise similarity between participants (e.g., age, level of education). Possible recruitment strategies could be a convenience sample (e.g., undergraduate students), online platforms, or snowball sampling.

Unless it is relevant to the research question, we strongly recommend minimizing the number of additional questions. Every new bit of information reduces the anonymity, as identification based on the data should not be possible by law (i.e., GDPR rules). For example, if location, gender, and highest education are collected, identification in a rural region could be possible already. In addition, fewer questions reduce the testing time for participation. Such a questionnaire should be embedded at the beginning or the end of the experiment.

We recommend asking for:

Language history (i.e., for reference see Li et al., 2006)
Age
Highest education
Although gender is often reported and even requested by reviewers, we consider it less relevant, as there is no evidence of gender differences in the cognitive processes underlying reading in German adult readers. This is, although a research gap in need of evidence.

Design options

Controlled experiment

With a small number of available participants and strong theoretical motivation, a controlled experiment is feasible. Here, participants respond to several items selected to vary on a specific characteristic, while not co-varying on other characteristics that may also affect reading processes (see relevant the section on How to determine the effect of a specific variable based on lexical decision data in “Materials”). The design of a lexical decision task relies on the trade-off between available time and data quality. On the one hand, as the length of the experiment increases, the likelihood of participant drop-out increases. On the other hand, a short study with only a couple of items is less likely to provide high data quality. Considering time, assuming that a typical reading adult can finish a 30-60 minute task is reasonable. Such a task length may be suitable for 500-1,000 decisions (e.g., 250 words and 250 non-words), including pre-stimulus and post-response delays and pauses between blocks.

Lexicon Projects - Semi-controlled regression designs

Lexicon projects are, in principle, infrastructure projects that allow the exploration of new phenomena in extensive datasets with a broad stimulus and participant base. This format is particularly suitable when one wants to provide resources relevant to investigating reading and psycholinguistic processing, for example, in the context of a language or across many languages. Find a list here:

Language	Reference
Mandarin	Sze et al., 2014
Mandarin	Tsang et al., 2018
Mandarin	Wang et al., 2025
Cantonese	Tse et al., 2017
Cantonese	Tse et al., 2022
Dutch	Keuleers et al., 2010
British English	Keuleers et al., 2012
American English	Balota et al., 2007
French	Ferrand et al., 2010
German	Schröter & Schroeder, 2017^*
Hebrew	Stein et al., 2024
Malay	Yap et al., 2010
Persian	Nemati et al., 2022
Portuguese	Soares et al., 2019
Spanish	Aguasvivas et al., 2018
Italian	Amenta et al., 2024

^* Note, the size of LP is relatively small and focusing on children. Find updates on a larger German Lexicon project HERE

The aim is to provide reaction time and accuracy estimates for single words. For such studies, it is recommended to maximise the number of experimental items, with several tens of thousands of words, and at least 30 datapoints per item. Due to time constraints, generally each participant responds to only a subset of items. Note that one can use the datasets from lexicon projects to create virtual controlled experiments (see Kuperman 2015)

Crowd-sourcing lexical decision projects

If the researcher aims to collect large amounts of data, they may consider crowd-sourcing a project. Here, one can use a gamified lexical decision task, in which each participant provides limited data over a very short experiment duration (e.g., 3 minutes). As this approach is more prone to noise, a much larger amount of data needs to be collected, with the recommendation to cover at least 100,000 words and 40 observations per item (Amenta et al., 2025). * The study can estimate participants’ vocabulary knowledge at the end to provide an incentive for participation. For the items, one may choose a selection of higher-frequency words known to participants and of words not generally known to all participants. This will provide more informative vocabulary scores for participants. In this case, the recommendation is to minimise the number of non-words, so that the ratio is 3 non-words to 7 words.