Procedure
Before you start the experiment 
Although lexical decision tasks are typically low-risk and low-cost experiments, one must know if they have the available resources. You need to consider several practical aspects before starting data collection, which are mostly the same as for other low-risk psychological experiments:
- Lab-based study: quiet lab room with a computer.
- Online study: web server or hosting service funding.
- Research software may require funding. However, multiple open-source options are available at no cost.
- Participant reimbursement: Specify the exact amount or range you plan or are expected to pay participants. Consider local guidelines, customary rates, and any specific requirements from your institution or funding source.
- Ethics permission: Ethics approval is not legally required in Germany (this might be different elsewhere), but may be needed by your university, funder, or journal. It is recommended to seek ethical clearance. In case your university does not have an ethics committee, one can consult the ethics committee of the German Society of Psychology
- Data safety and storage: In the European Union, data storage needs to comply with the GDPR (as of 2025). It is the researcher's responsibility to ensure that their experiment complies with current local standards.
Programming the experiment 
The first step is to program the experiment: We want the experimental software to show each trial and to record the participant’s response to the trial and the reaction time. A lexical decision task can be programmed with medium-low programming skills. Templates are available for beginners on many of the existing experimental platforms (see examples below).
Possible platforms to run the experiment 
Here is a nonexhaustive list of programs used by the TRUST group in recent years. Typically, all solutions have a forum for customer support and examples available. Find more information at the webpages linked below.
| Program | Cost | Programming skill | Online hosting |
|---|---|---|---|
| Psytoolkit | Free | Intermediate | Platform |
| PsychoPy | Free | No | Self |
| Pavlovia | Yes | No | Platform |
| JsPsych | Free | Yes | Self |
| Open Sesame | Free | No | Self |
| Open Sesame Online (OSWeb) | Free | No | Platform |
| EyeLink Experiment Builder | Yes | No | No |
| Gorilla | Yes | No | Yes |
| EPrime | Yes | No | No |
| Presentation | Yes | Yes | No |
The performance of the systems/platforms should be checked regarding the timing of the stimulus presentation and the latency of the key press. Information on this can be found here: Bridges et al., 2020; Anwyl-Irvine et al., 2021.
Programming considerations 
Several decisions have to be made when programming experiments. These relate to the task structure, the use of the response device, and data quality assurance. Typically, an experimental session consists of blocks of items, with each block consisting of trials, which refers to the sequence of events that the participant engages with. In the classical lexical decision task, a single word or non-word is presented in each trial.
Text encoding 
In principle, the text presentation is simple, but there can be nerve-wracking pitfalls (for example, when switching from a PC to a Mac computer). A special case in German is how text is encoded, meaning how letters and characters are stored in the computer's memory using specific digital codes. Some encoding systems do not allow the correct display of German-specific letters, such as the umlauts ä, ö, ü, Ä, Ö, Ü, or the sharp s ß. We highly recommend checking whether the presentation setup displays words with these letters correctly.
We recommend using UTF-8, a universal encoding for representing text digitally. UTF-8 can be set at the file level or within the presentation software. This encoding standard usually ensures that all characters, including German-specific letters, are shown correctly.
Trial structure 
A trial generally has the following structure:
-
Fixation cross
- Fixation-cross presentation is implemented to prepare the participant for the upcoming trial. This presentation prevents eye movements or attentional blinks that could increase response times.
- Typically used signs are “+”, “*” or “x”. These fixation crosses serve as a forward mask to the reading material, potentially influencing behavior.
- Our suggested alternative is to use vertical and horizontal lines indicating the stimulus's location, without overlap between the bars and the word. Optionally, when including multiple word lengths, include bars at the left and right to indicate word length.
- Suggested duration of fixation cross presentation: 500 ms
-
Letter string presentation
- Reaction time, a main measure of the task, is measured from when the item appears on the screen until the response occurs.
- Letter strings are recommended to stay on screen until a response. Alternatively, one can have a specific hypothesis that would make a shorter presentation time reasonable. Also, one could show the stimulus for a fixed amount of time to incentivize participants to focus on the task, respond quickly, and avoid taking long breaks in the middle of a block. Here, presentation duration could range from 200 ms to 1000 ms. Adequate presentation times also vary by the target participant group. Children, language learners, or individuals with reading problems of any kind might often need longer presentation times than the average university student participant. When the participant responds before the maximum stimulus presentation time is over, the trial ends, the stimulus disappears, and the next trial is initiated.
- The letter strings should be presented in a different random order for each participant.
-
Participant response
- Typically, participants are instructed to respond by pressing on a keyboard, e.g., Left key: “d”, right key: “k”
- Alternatively, one can implement a go/no-go task with only button presses for words or non-words.
- Suppose the aim is to directly compare words and non-words (i.e., the so-called lexicality effect). In that case, it is recommended to counterbalance the response key between participants, such that half of them will press the left key for words and the right key for pseudowords, and the other half the other way around.
- If the number of words and non-words is counterbalanced, it is typical to instruct participants to press the Right key for words and the Left key for non-words.
- Alternatives to keyboard responses exist, such as keypads, touchscreens, or mouse clicks. For more information see Pronk et al., 2020, Bridges et al., 2020 or Rodd, 2024.
- Inter-stimulus interval: A break before the next stimulus will reduce interference effects. We recommend a blank screen inter-trial interval of 500 ms between the response and the next fixation cross.
After 100 trials, we recommend a short, self-paced break (“Press the space bar when you’re ready to continue”).
Participants 
The number of participants should be determined by a power calculation based on the effect size of interest (e.g., see Brysbaert & Stevens, 2018). Note there are differences in power considerations when investigating individual differences (e.g., see Hedge et al. (2018)). Also, the reliability of the measure of interest could be relevant for considerations regarding the number of participants.
There are no fixed rules about inclusion criteria. If the aim is to minimise inter-individual differences and examine differences on the word (or non-word) level, criteria can maximise similarity between participants (e.g., age, level of education). Possible recruitment strategies could be a convenience sample (e.g., undergraduate students), online platforms, or snowball sampling.
Unless it is relevant to the research question, we strongly recommend minimizing the number of additional questions. Every new bit of information reduces the anonymity, as identification based on the data should not be possible by law (i.e., GDPR rules). For example, if location, gender, and highest education are collected, identification in a rural region could be possible already. In addition, fewer questions reduce the testing time for participation. Such a questionnaire should be embedded at the beginning or the end of the experiment.
We recommend asking for:
- Language history (i.e., for reference see Li et al., 2006)
- Age
- Highest education
- Although gender is often reported and even requested by reviewers, we consider it less relevant, as there is no evidence of gender differences in the cognitive processes underlying reading in German adult readers. This is, although a research gap in need of evidence.
Design options
Controlled experiment 
With a small number of available participants and strong theoretical motivation, a controlled experiment is feasible. Here, participants respond to several items selected to vary on a specific characteristic, while not co-varying on other characteristics that may also affect reading processes (see relevant the section on How to determine the effect of a specific variable based on lexical decision data in “Materials”). The design of a lexical decision task relies on the trade-off between available time and data quality. On the one hand, as the length of the experiment increases, the likelihood of participant drop-out increases. On the other hand, a short study with only a couple of items is less likely to provide high data quality. Considering time, assuming that a typical reading adult can finish a 30-60 minute task is reasonable. Such a task length may be suitable for 500-1,000 decisions (e.g., 250 words and 250 non-words), including pre-stimulus and post-response delays and pauses between blocks.
Lexicon Projects - Semi-controlled regression designs 
Lexicon projects are, in principle, infrastructure projects that allow the exploration of new phenomena in extensive datasets with a broad stimulus and participant base. This format is particularly suitable when one wants to provide resources relevant to investigating reading and psycholinguistic processing, for example, in the context of a language or across many languages. Find a list here:
| Language | Reference |
|---|---|
| Mandarin | Sze et al., 2014 |
| Mandarin | Tsang et al., 2018 |
| Mandarin | Wang et al., 2025 |
| Cantonese | Tse et al., 2017 |
| Cantonese | Tse et al., 2022 |
| Dutch | Keuleers et al., 2010 |
| British English | Keuleers et al., 2012 |
| American English | Balota et al., 2007 |
| French | Ferrand et al., 2010 |
| German | Schröter & Schroeder, 2017* |
| Hebrew | Stein et al., 2024 |
| Malay | Yap et al., 2010 |
| Persian | Nemati et al., 2022 |
| Portuguese | Soares et al., 2019 |
| Spanish | Aguasvivas et al., 2018 |
| Italian | Amenta et al., 2024 |
* Note, the size of LP is relatively small and focusing on children. Find updates on a larger German Lexicon project HERE
The aim is to provide reaction time and accuracy estimates for single words. For such studies, it is recommended to maximise the number of experimental items, with several tens of thousands of words, and at least 30 datapoints per item. Due to time constraints, generally each participant responds to only a subset of items. Note that one can use the datasets from lexicon projects to create virtual controlled experiments (see Kuperman 2015)
Crowd-sourcing lexical decision projects 
If the researcher aims to collect large amounts of data, they may consider crowd-sourcing a project. Here, one can use a gamified lexical decision task, in which each participant provides limited data over a very short experiment duration (e.g., 3 minutes). As this approach is more prone to noise, a much larger amount of data needs to be collected, with the recommendation to cover at least 100,000 words and 40 observations per item (Amenta et al., 2025). * The study can estimate participants’ vocabulary knowledge at the end to provide an incentive for participation. For the items, one may choose a selection of higher-frequency words known to participants and of words not generally known to all participants. This will provide more informative vocabulary scores for participants. In this case, the recommendation is to minimise the number of non-words, so that the ratio is 3 non-words to 7 words.