Systematize your systematic searches

A practical guide to building transparent and reproducible database searches
Author

Lucija Batinović

Published

November 21, 2025

Tips shared in this post are mostly aimed at making database searches more streamlined and reproducible. This post builds on the information in the Beginner’s guide to open and reproducible systematic reviews (Carlsson et al., 2024). If you’re not familiar with the systematic search methodology, start with the resources in at the end of this post.

A short checklist for a reproducible search strategy:


Reporting the searches and search documentation

Reporting the conducted searches is extremely important for reproducibility and should be done as detailed as possible. Reproducible search strategies help during the review updates, helps with changes during peer-review, and ensures transparency of the process.

To properly report and present the searches, your supplements should contain the following:

  1. Search terms
  2. Combined blocks
  3. Date and Language limitations
  4. Settings used for every database
  5. Time when searches were conducted
  6. Number of hits received
  7. Raw datafile of exported searches (.ris, .bib)

To document the exported reference files in a reproducible way:

  1. use a reference manager (e.g., Zotero, Mendeley, EndNote)
  2. Import files and name them in a legible way: database-name_date-of-search_number-of-articles
  3. Each database search file should be its own directory/folder
  4. Separate folder for search history documents

You can find many great examples of reported search strategies on a preprint platform that publishes systematic search strategies: searchRxiv. These searches can be reused, adapted, or used as inspiration for your search strategies.

Example search strategy

Let’s say you’re interested in the effects of cognitive training on inhibition in children on the autism spectrum. You want to find all relevant studies and meta-analyze the effects. To build a reproducible, easy-to-follow search strategy, you can use excel (or other spreadsheet software of choice) to first discern the relevant categories and search terms. To do this, let’s start with a hypothetical research question (RQ) following the PICOS framework:

What is the effect of floortime (I) on inhibition control (O) in children with autism spectrum disorder (P)?

This research question contains the population, intervention, and outcome of interest. You can also define the comparison as “children with ASD which do not receive the intervention”, and study design as “randomized-controlled trials”.

A complete PICOS RQ would then be: What is the effect of floortime (I) on inhibition control (O) in children with autism spectrum disorder (P), compared to no intervention (C), as evaluated in randomized controlled trials (S)?

These can be part of the RQ, but can also be left out. Leaving out parts of the framework depends on your aims regarding precision/sensitivity. For example, adding terms relating to the study design could increase precision, but if the design is not mentioned in the relevant article field, you will miss potentially relevant articles.

RQ: “What is the effect of Floortime therapy on inhibition control in children with autism spectrum disorder?”
population child, minor, under 18
intervention floortime
comparison placebo, treatment-as-usual, waitlist control
outcome inhibition control
study design RCT, randomized control trial
Database Population Intervention Comparison Outcome Study design
SCOPUS (“child*” OR “under 18” OR “minor*” OR “adolescen*” OR “student*”) AND (“autism spectrum disorder” OR “ASD” OR “autism” OR “pervasive developmental disorder*”) (“DIR” OR “floortime” OR “play therapy”) (“placebo” OR “treatment as usual” OR “TAU” OR “waitlist control” OR “control group”) (“inhibition control” OR “response inhibition” OR “cognitive inhibition” OR “executive function*” OR “interference control”) (“randomized controlled trial” OR “randomised controlled trial” OR “RCT” OR “clinical trial” OR “controlled clinical trial”)
PUBMED (“autism”[Title/Abstract] OR “Autistic Disorder”[MeSH Terms] OR “Autism Spectrum Disorder”[MeSH Terms]) AND (“child”[Title/Abstract] OR “adolescen*“[Title/Abstract]) AND (”Executive Function”[MeSH Terms] OR “Executive Function”[Title/Abstract]) NA NA NA NA
Datatabase Search strategy Number of retrieved hits Date of search Additional Information
SCOPUS (TITLE-ABS-KEY(“child*” OR “under 18” OR “minor*” OR “adolescen*” OR “student*”) AND TITLE-ABS-KEY(“autism spectrum disorder” OR “ASD” OR “autism” OR “pervasive developmental disorder”)) AND (TITLE-ABS-KEY(”DIR” OR ”floortime” OR ”play therapy”)) AND (TITLE-ABS-KEY(”inhibition control” OR ”response inhibition” OR ”cognitive inhibition” OR ”executive function” OR “interference control”)) 3 2025-08-29 No limitations
PUBMED (“autism”[Title/Abstract] OR “Autistic Disorder”[MeSH Terms] OR “Autism Spectrum Disorder”[MeSH Terms]) AND (“child”[Title/Abstract] OR “adolescen*“[Title/Abstract]) AND (”Executive Function”[MeSH Terms] OR “Executive Function”[Title/Abstract]) 302 2025-08-30 No limitations
Search.number Query Sort.By Filters Search.Details Results Time Date
14 (#1 OR #2 OR #3) AND (#4 OR #5) AND (#6 OR #7) NA NA (“autism”[Title/Abstract] OR “Autistic Disorder”[MeSH Terms] OR “Autism Spectrum Disorder”[MeSH Terms]) AND (“child”[Title/Abstract] OR “adolescen*“[Title/Abstract]) AND (”Executive Function”[MeSH Terms] OR “Executive Function”[Title/Abstract]) 302 07:53:39 2025/08/30
13 (#1 OR #2 OR #3) AND (#4 OR #5) AND (#6 OR #7) AND (#8 OR #9) NA NA (“autism”[Title/Abstract] OR “Autistic Disorder”[MeSH Terms] OR “Autism Spectrum Disorder”[MeSH Terms]) AND (“child”[Title/Abstract] OR “adolescen*“[Title/Abstract]) AND (”Executive Function”[MeSH Terms] OR “Executive Function”[Title/Abstract]) AND (“Floortime”[Title/Abstract] OR “DIR”[Title/Abstract]) 0 07:52:42 2025/08/30
12 (#1 OR #2 OR #3) AND (#4 OR #5) AND (#6 OR #7) AND (#8 OR #9 OR #10 OR #11) NA NA (“autism”[Title/Abstract] OR “Autistic Disorder”[MeSH Terms] OR “Autism Spectrum Disorder”[MeSH Terms]) AND (“child”[Title/Abstract] OR “adolescen*“[Title/Abstract]) AND (”Executive Function”[MeSH Terms] OR “Executive Function”[Title/Abstract]) AND (“Floortime”[Title/Abstract] OR “DIR”[Title/Abstract]) 0 07:52:29 2025/08/30
11 “Developmental, Individual-differences, Relationship-based”[Title/Abstract] NA NA “Developmental, Individual-differences, Relationship-based”[Title/Abstract] 0 07:50:14 2025/08/30
10 “Developmental, Individual-differences, Relationship-based”[Title/Abstract] - Schema: all NA NA “Developmental, Individual-differences, Relationship-based”[Title/Abstract] 0 07:50:14 2025/08/30
9 “DIR”[Title/Abstract] NA NA “DIR”[Title/Abstract] 2,246 07:49:55 2025/08/30
8 “Floortime”[Title/Abstract] NA NA “Floortime”[Title/Abstract] 17 07:49:31 2025/08/30
7 “executive function”[Title/Abstract] NA NA “executive function”[Title/Abstract] 25,394 07:48:36 2025/08/30
6 “Executive Function”[Mesh] NA NA “Executive Function”[MeSH Terms] 22,663 07:48:16 2025/08/30
5 adolescen*[Title/Abstract] NA NA “adolescen*“[Title/Abstract] 428,618 07:46:20 2025/08/30
4 child[Title/Abstract] NA NA “child”[Title/Abstract] 538,847 07:45:07 2025/08/30
3 “Autism Spectrum Disorder”[Mesh] NA NA “Autism Spectrum Disorder”[MeSH Terms] 49,450 07:30:43 2025/08/30
2 “Autistic Disorder”[Mesh] NA NA “Autistic Disorder”[MeSH Terms] 28,670 07:30:27 2025/08/30
1 autism[Title/Abstract] NA NA “autism”[Title/Abstract] 74,859 07:29:22 2025/08/30

Although you should read relevant articles and systematic reviews to identify correct terms for your search, you can also use AI to help you find relevant terms. Once you’re happy with the terms, you should select relevant databases to start building and piloting the searches. AI tools can help you translate the search strategies for different databases, but I don’t recommend completely relying on AI for this step. You should always learn about the specific features of each database to avoid errors caused by incorrect combinations of operators and truncations. AI can do the manual work for you, but you must understand the structure of each search and be familiar with database-specific rules.

Librarians/information retrieval specialists can help deciding what a relevant database would be. For this example, I searched SCOPUS, as it’s a generic, commonly used databasethat contains a large number of sources from all scientific disciplines. SCOPUS is additionally hosted on its own, not available through an interface like ProQuest. I also searched PUBMED, which is a medical database. PUBMED is more specialized than SCOPUS, but it’s relevant in this example with a clinical population, and a health-oriented intervention.

Sharing your search strategy

How you access the database usually depends on what type of access you have. If you try to access a database (e.g., PsycINFO) through your university library login, you might be directed to a search interface (e.g., ProQuest or EBSCOHost). Different interfaces provide different ways to save your search history, and you should familiarize yourself with each of their features before conducting the searches.

To share the searches, you can create a document (usually any text editor works) with a summary of all database searches on the first page (look at the final search strategies tab). On the following pages, paste individual, complete searches from each database (e.g., PubMed demo search tab). This document is useful for others to quickly understand and rerun your searches. It will also be useful to you when you update your searches, or need to troubleshoot them. To make sure you have reported all relevant information, follow the PRISMA-S reporting guidelines for systematic searches.

For example, conducting searches through ProQuest and, since recently, EBSCOHost allows you to download the Search History in multiple file formats. This is very convenient for sharing reproducible search strategies. However, for databases that do not provide downloadable search strategies, a workaround can be saving the website with all conducted searches as a text file. This is the quickest and the simplest way to have a complete search history in one file, although it would be less interoperable. To save the file this way, go to the search history section of the database, select “Save page as…” in your browser, and save as text file format. You can also copy-paste each search string manually into a file of your choice to capture all relevant information, which is more time consuming, but allows you flexibility in how you save the searches.

The goal is to save your strategies locally on your machine, and there are multiple ways to achieve this, and these are some of the quickest/simplest ways to ensure you have the entire search history saved in one place.

You can see two examples of the saved search history from EBSCOHost.

S. Search.Name Search.Description Query..user.entered. Query..expanded.display.term. Search.run.Date.and.Time Results..count. Search.Mode Expander.s. Interface Database.s.
S4 NA NA (TI,AB(“developmental individual-difference” OR “DIR model”)) AND (TI,AB(“inhibitory control” OR “executive function”)) AND (TI,AB(”autism” OR ”autism spectrum disorder” OR ”ASD”)) AND (TI,AB(child OR toddler* OR preschool* OR”young child*“)) (TI,AB(“developmental individual-difference” OR “DIR model”)) AND (TI,AB(“inhibitory control” OR “executive function”)) AND (TI,AB(”autism” OR ”autism spectrum disorder” OR ”ASD”)) AND (TI,AB(child OR toddler* OR preschool* OR”young child*“)) 2025-11-20T14:27:08.515Z 213 SmartText Searching Primijeni srodne riječi, Primjena ekvivalentnih predmeta NA ERIC
S3 NA NA (TI,AB(“floortime” OR “developmental individual-difference” OR “DIR model”)) AND (TI,AB(“inhibitory control” OR “executive function”)) AND (TI,AB(”autism” OR ”autism spectrum disorder” OR ”ASD”)) AND (TI,AB(child OR toddler* OR preschool* OR”young child*“)) (TI,AB(“floortime” OR “developmental individual-difference” OR “DIR model”)) AND (TI,AB(“inhibitory control” OR “executive function”)) AND (TI,AB(”autism” OR ”autism spectrum disorder” OR ”ASD”)) AND (TI,AB(child OR toddler* OR preschool* OR”young child*“)) 2025-11-20T14:26:54.101Z 213 SmartText Searching Primijeni srodne riječi, Primjena ekvivalentnih predmeta NA ERIC
S2 NA NA (TI,AB(“floortime” OR “developmental individual-difference” OR “DIR model”)) AND (TI,AB(“inhibition” OR “inhibitory control” OR “executive function”)) AND (TI,AB(”autism” OR ”autism spectrum disorder” OR ”ASD”)) AND (TI,AB(child OR toddler* OR preschool* OR”young child*“)) (TI,AB(“floortime” OR “developmental individual-difference” OR “DIR model”)) AND (TI,AB(“inhibition” OR “inhibitory control” OR “executive function”)) AND (TI,AB(”autism” OR ”autism spectrum disorder” OR ”ASD”)) AND (TI,AB(child OR toddler* OR preschool* OR”young child*“)) 2025-11-20T14:26:47.712Z 186 SmartText Searching Primijeni srodne riječi, Primjena ekvivalentnih predmeta NA ERIC
S1 NA NA (TI,AB(“floortime” OR “DIR/Floortime” OR “developmental individual-difference” OR “DIR model”)) AND (TI,AB(“inhibition” OR “inhibitory control” OR “executive function”)) AND (TI,AB(”autism” OR ”autism spectrum disorder” OR ”ASD”)) AND (TI,AB(child OR toddler* OR preschool* OR”young child*“)) (TI,AB(“floortime” OR “DIR/Floortime” OR “developmental individual-difference” OR “DIR model”)) AND (TI,AB(“inhibition” OR “inhibitory control” OR “executive function”)) AND (TI,AB(”autism” OR ”autism spectrum disorder” OR ”ASD”)) AND (TI,AB(child OR toddler* OR preschool* OR”young child*“)) 2025-11-20T14:06:13.691Z 12 SmartText Searching Primijeni srodne riječi, Primjena ekvivalentnih predmeta NA ERIC

EBSCO Web

This file can then be shared as a supplement on OSF or other repositories. If you instead decide to publish your search strategies on searchRxiv, you should follow their instructions.

Sharing your reference files

Although sharing the search strategy is crucial, oftentimes, searches are not 100% reproducible. While databases index articles regularly, there is often a delay in cross-referencing the studies. This means that relevant studies can appear after you have already conducted your searches, and then they appear only when you rerun the search at a later time. Likewise, studies you found initially could be removed from the databases for various reasons. This is why it’s important to save and share the original reference list you retrieved in your initial search.

The easiest way to do this is to download the references when you conduct the searches and save them. You can import them and keep them in Zotero (which I mentioned as an open source alternative, but other reference managers work fine as well), or just keep them as a .bib or .ris file locally. This initial reference file is the most important to keep safe, as you can always reuse it if the next steps (i.e., screening or deduplication) go awry. In addition to sharing this file, you can also share the reference file of excluded and included references.

  1. If you screen in spreadsheets (e.g., Excel), you can tag the articles as “excluded” or “included” in Zotero, filter the tagged articles, and export the reference file for each tag (I recommend as a .bib file, and a .csv file so you can more easily go through those references later).
  2. if you use screening softwares (like Rayyan or Covidence), they provide ways to export only excluded or included articles in multiple formats.

I recommend exporting either as .bib or .ris file. These are the most versatile file formats, easily imported to most reference managers, and compatible with R packages for systematic reviews and LaTeX.


What is a “systematic” literature search?

  1. First (and the most important) step of the literature search involves conducting searches in citation databases (e.g., ERIC), which are systematic, searchable collections of journals and their articles, along with other types of publication formats (e.g., book chapters or conference proceedings). These databases can be hosted on their own, or more often, they are available through database interfaces (e.g., ProQuest). Therefore, one would conduct a search in ERIC through the ProQuest interface, and it is not completely informative to state that the search was conducted in ProQuest.

  2. Second comes manual searching of selected journals to ensure main articles of interest are caught. This is needed because you may miss studies that do not have any of the keywords you used in your database searches. This should not really happen if your search strategies are thorough, but as the fields in social sciences have heterogeneous terminology, it is often difficult to capture all possible alternatives.

  3. Third comes the grey literature - this covers theses and dissertations, non-scientific papers (e.g., news articles), pre-prints (non-peer-reviewed) on preprint servers (e.g., PsyArXiv), file drawer reports (i.e., never published studies), unpublished datasets, or any other type of relevant text which is not peer-reviewed and published in a scientific journal —> this is an extremely important and often overlooked step!! grey literature is found through websites that register trials (like ClinicalTrials.gov), preprint servers (found on the OSF preprint platform), preregistration websites (e.g., Prospero), gray literature databases (e.g., BASE), and most commonly Google Scholar (which is not a reproducible search engine so be careful when reporting the searches)

  4. Finally, you go through forward and/or backward reference searches - a forward search means you will search for all studies that have cited your included studies. Backward reference search means retrieving all references your included studies have cited. You can do this manually by screening the references within each article, but many databases allow for forward and backward searches of selected references. There are also websites that extract references of selected articles for you and collate them in one reference file for you (e.g., CitationChaser)

Forming a search strategy

  1. Come up with the terminology and eligibility criteria (with a content expert)
  2. (With information retrieval specialist/librarian assistance) form a search strategy specific to each database
  3. Pilot the searches
  4. Reiterate the first three steps until you reach a consensus on a satisfying search string

Free text vs. controlled vocabulary

Searches can contain terms that can appear in the title, abstract, full text, keywords, and other parts of the articles, and this can include any word or combination of words. Other terms can be indexed subjects (thesaurus/MeSH terms) which are standardized and assigned to reports by specialized indexers. Some databases do not contain controlled vocabulary, but if a database does contain it, it is recommended that relevant terms are included in the search strategies. For example, if you look for studies on intellectual disability, a thesaurus term might be something like “intellectual developmental disorder”, which should catch all relevant articles.

However, this stage requires testing out all potentially relevant terms and evaluating the number of hits and their precision before deciding which terms will be part of the final search strategy. Often, free-standing text (i.e., non-thesaurus terms) seem to retrieve all relevant hits and cover the thesaurus terms, and are essential if the studies get lagged indexing with the thesaurus terms. This step requires expertise and trying out various approaches, and is usually subjective to a certain degree.

Sensitivity vs. precision

Retrieved Reports Not Retrieved Reports
Relevant Reports Relevant reports retrieved (a) Relevant reports not retrieved (b)
Irrelevant Reports Irrelevant reports retrieved (c) Irrelevant reports not retrieved (d)

Sensitivity: how many relevant reports were located out of all existing relevant reports (a/(a+b)) —> e.g., there are 35 articles that exist and fit your criteria and your search strategy retrieved 33 of these studies.

Precision: how many reports are relevant from all the reports your search strategy retrieved (a/(a+c)) —> e.g., your search strategy retrieved 500 reports, and 33 of those are relevant.

Defining search strings is a complex task, and this step takes time and piloting. Often is best to get help from a librarian/information retrieval specialist during this stage to help find the best search strings for each database. It is also important to have someone who’s familiar with the research field and knows common terminology during search strategy creation. Sometimes, the offered thesaurus terms don’t fit your scope, or offer more hits than you find necessary which makes the searches less precise. Librarians know how to devise searches in different databases, and often have access to better search tools. However, these tips can help you create a search strategy on your own. An important part in creating a search string is balancing between having a very sensitive search strategy (locates all possible hits that fit your criteria but also covers irrelevant hits) and a precise search (only returns relevant hits but with the possibility of not finding all of the relevant ones).

Deciding between sensitivity and precision is also a subjective decision. After trialing the searches, it is important to consider how many articles you retrieved, how relevant they are (by piloting/skimming through the most relevant hits) and evaluating this information against the capacity of the team doing the systematic review. For example, if there are enough resources and time to screen all records, allowing stronger sensitivity may be a better option. This decision also depends on the type of review you’re conducting – if you are doing a rapid review, precision will be your main concern, but for a scoping review, you’ll aim for sensitivity.

PICOS Search strings Operators Additional limits
Demographic (adult* OR “over 18”) AND Language: English
Diagnosis “Intellectual disabilit*” OR “intellectual developmental disorder*” AND Publication date: 2005-2022
Intervention “Working memory tasks” OR “short-term memory intervention*” AND
Outcome “Working memory”

The simplest way to build a search strategy is to divide the keywords according to the frameworks for the research question (e.g., PICOS for systematic reviews or PCC for scoping reviews). Like in the example table above, keywords can be categorized under the population concepts (i.e., demographic or diagnosis), intervention concept, comparison/control, outcome, or study design concepts.

Once the keywords are categorized, you can start looking for synonyms, alternative spellings, relevant terminology that describes the concepts, and depending on the database, thesaurus terms.

Other limitations

You can limit your searches to improve precision by restricting type of document formats, limiting the publication dates and language of the publications. If not justified by theory or other sensible reasons, you should put as few limitations to searches as possible (e.g., if a certain intervention was invented in 2005, it is sensible to limit publication date to that period).

Search operators

Operators are symbols or words used to connect keywords to build a search that a database can properly understand and execute. Operators allow us to manipulate precision and sensitivity of our search strategies.

Example of operators in the EBSCOHost search engine:

AND OR NOT
Each result contains all search terms. Each result contains at least one search term. Results do not contain the specified terms.
The search “child AND autism” finds items that contain both “child” and “autism”. The search “child OR minor” finds items that contain either “child” or items that contain “minor”. The search “autism NOT Asperger” finds items that contain “autism” but do not contain “Asperger”.

Databases also use parentheses “()” to form search chunks that should go together, for example “(Child* OR minor) AND (autism OR Asperger*)”. Quotation marks are used to form phrases which will be searched verbatim, and not separately in text, e.g., “Down syndrome”. Asterisk (*) is used to truncate words, i.e., find all versions of a particular word. For example, “teach*” will retrieve “teach”, “teacher”, “teaching”, “teaches”.

Footnotes

  1. You might see history and strategy used interchangeably. They usually refer to the same thing in slightly different contexts, and it’s not really detrimental if you mix the terms. A search strategy basically contains the final search string and accompanying filters/delimiters for each database you plan to search. A search history is essentially that implemented search strategy, containing all of the search strings and selected limitations, and is taken from the database search history section.↩︎