Research Guides: Systematic Reviews and Related Evidence Syntheses: AI in Evidence Synthesis

Responsible Use & Potential Tools

Artificial Intelligence (AI) in Evidence Synthesis

This guide is based on ongoing testing of multiple AI and automation tools by experienced methodologists. AI can help increase efficiency and automate certain tasks in evidence synthesis — but it cannot replace human judgment or oversight. These findings align with the recent Artificial Intelligence (AI) Methods in Evidence Synthesis webinar series by Cochrane. We share this resource to support responsible exploration of AI tools and workflows. Use with care, transparency, and critical thinking.

Additional recommendations for responsible use:

Use AI tools to support - not replace - critical judgment and domain expertise.
Ensure outputs are transparent, reproducible, and aligned with evidence synthesis standards (available below in Explore More).
Validate AI-generated content before including it in your review.
Follow publisher and journal guidelines regarding AI use and disclosure.
Respect institutional data privacy and security requirements.
Clearly document how and where AI was used in your workflows.

Workflow Stages & Potential Tools
Plan	Identify	Extract & Evaluate	Combine, Summarize & Share
Frame research questions Microsoft CoPilot Google Gemini	Generate search terms Gemini, PubReminer, Yale MeSH Analyzer	Auto-extract from PDFs ChatPDF	Meta-analysis Meta-mar
Summarize the literature Google Notebook LM	Citation searching CitationChaser Research Rabbit	RCT evaluation Robot Reviewer	Writing assistants Grammarly
Project/Meeting notes Microsoft Copilot	Screening/Deduplication Covidence *uses active learning for study ranking and auto-deduplicates	Bias-assessment visualization robvis	Multi-function (search, summarize, report) Ellicit, Consensus, Perplexity

Workflow Stages & Potential Tools

Plan

Identify

Extract & Evaluate

Combine, Summarize &
Share

Frame research questions

Microsoft CoPilot

Google Gemini

Generate search terms

Gemini, PubReminer,

Yale MeSH Analyzer

Auto-extract from PDFs

ChatPDF

Meta-analysis

Meta-mar

Summarize the literature

Citation searching

RCT evaluation

Writing assistants

Project/Meeting notes

Microsoft Copilot

Screening/Deduplication

Covidence

*uses active learning for study

ranking and auto-deduplicates

Bias-assessment visualization

robvis

Multi-function

(search, summarize, report)

Ellicit, Consensus,

Perplexity

Potential Issues with AI tools

Large Language Models (LLMs) for Designing Searches

They can help find synonyms, related organizations, conference names, and terms in other languages. However:

Results vary between prompts and platforms, reducing reproducibility.
Risk of hallucinations (plausible but incorrect info) and factual errors.
Subscription-based database content is often inaccessible, so key studies may be missed.
Always verify outputs, and if in doubt, consult your librarian.

Multi-Function Products

Some tools (e.g., Elicit) claim to handle the full review process. In practice, results can vary widely—even with identical prompts—and important studies may be missed (Bernard et al., 2025).

Tools like Consensus, Elicit, and Perplexity pull mainly from open-access sources (e.g., Semantic Scholar, the web).
Proprietary databases content (e.g., Ovid, Web of Science, etc.) will often be absent.

Example Prompts and Use Cases

Using LLMs to Frame a Research Question

LLMs can help researchers translate a broad interest into a structured research question using common frameworks like PICO (Population, Intervention, Comparator, Outcome), PEO (Population, Exposure, Outcome), or PCC (Population, Concept, Context).

Use Case:

A public health researcher wants to study the impact of urban green spaces on mental health. They prompt an LLM with:

“Please help me frame a systematic review question on how green space exposure influences mental health outcomes, using PICO.”

The model suggests:

Population: Adults living in urban environments.
Intervention/Exposure: Access to or time spent in green spaces.
Comparator: Adults with limited or no access to green spaces.
Outcome: Mental health outcomes such as depression, anxiety, or well-being.

This gives the researcher a structured starting point, but the researcher must adapt it by refining the population, clarifying exposures, prioritizing outcomes, and aligning with project scope.

Tip:

The prompting process is iterative. If the LLM’s response is vague or off-track, add more context (e.g., specify age groups or study designs) or rephrase the request.

Key point:

AI can accelerate idea generation, but the final framing requires human expertise for accuracy and reproducibility.

Use AI to Create a Table of Related Reviews

Objective: Use Google NotebookLM (or a GPT-based tool) to generate a comparative table summarizing key aspects of review articles.

Step-by-Step Instructions:

1. Log in to Google NotebookLM using your institutional NetID and password.

2. Add content by uploading article PDFs.

3. Enter this prompt:

Create a table with the following columns, and one row per article:
- First author’s last name and publication year.
- Type of Review.
- Eligibility criteria.
- Databases searched.
- Years covered by the search.

4. Review the AI - generated table for accuracy and missing data. Revise the prompt if needed to improve clarity or add context.

5. Click "Save to Note" of you want to keep the output in your Notebook.

Use Gemini or Copilot to Generate Search Terms for a Research Question

Objective: Use a conversational AI (Gemini or Copilot Chat) to brainstorm search terms for a database search.

Step-by-Step Instructions:

1. Open your preferred AI tool (Gemini, Copilot, ChatGPT, etc.).

2. Enter your prompt. Here is an example:

I'm conducting a systematic review on the effectiveness of mindfulness interventions for reducing stress in healthcare workers. Please generate a list of relevant keywords I could use in a library database search. Organize them by concept (e.g., population, intervention, outcome).

3. Review the output. Look for:

Suggested synonyms and variant phrases.
Any incorrect, vague, or overly broad suggestions.
Always verify any suggested MeSH terms.

4. Refine or follow up with additional prompts like:

Include British and American spelling variations.
Turn this into a sample Boolean search string for PubMed.

Use Yale MeSH Analyzer to Identify Controlled Vocabulary

Objective: Analyze MeSH terms across multiple relevant articles using the Yale MeSH Analyzer (an automation tool).

Step-by-Step Instructions:

1. Go to Yale MeSH Analyzer.

2. Enter your PubMed IDs into the input box. Here are sample IDs to experiment with:

35012345, 34789765, 34011229, 33567890

3. Click “Go.”

4. Examine the resulting table, which includes: Article titles, MeSH terms, and more.
Note: To display abstracts, select this option on the search page.

5. Reflect:

Which MeSH terms are consistently used across your articles?
Are there any that surprise you or reveal new angles (e.g., population focus, intervention type)?

A similar activity can be done using PubReminer which generates frequency tables for keywords and MeSH terms.

Explore More

AI-Based Literature Review Tools
A research guide from Texas A&M University Libraries
Texas A&M AI Services
For a list of protected versions of AI accessible using your NetID and password
Texas A&M Artificial Intelligence Page
For campus use guidelines & ethics, upcoming AI workshops, and more

Affengruber L, van der Maten MM, Spiero I, et al. An exploration of available methods and tools to improve the efficiency of systematic review production: a scoping review. BMC Med Res Methodol. 2024;24(1):210. Published 2024 Sep 18. doi:10.1186/s12874-024-02320-4
Lieberum JL, Toews M, Metzendorf MI, et al. Large language models for conducting systematic reviews: on the rise, but not yet ready for use-a scoping review. J Clin Epidemiol. 2025;181:111746. doi:10.1016/j.jclinepi.2025.111746
Scherbakov D, Hubig N, Jansari V, Bakumenko A, Lenert LA. The emergence of large language models as tools in literature reviews: a large language model-assisted systematic review. J Am Med Inform Assoc. 2025;32(6):1071-1086. doi:10.1093/jamia/ocaf063
Siemens W, von Elm E, Binder H, et al. Opportunities, challenges and risks of using artificial intelligence for evidence synthesis. BMJ Evid Based Med. Published online January 9, 2025. doi:10.1136/bmjebm-2024-113320

International Collaboration for the Automation of Systematic Reviews (ICASR)
International Committee of Medical Journal Editors (ICMJE)
Their page, Defining the Role of Authors and Contributors, now contains a section on AI-assisted technology

PRISMA AI reporting guidelines for systematic reviews and meta-analyses on AI in healthcare
Reporting guidelines for clinical trials of artificial intelligence interventions: the SPIRIT-AI and CONSORT-AI guidelines
Responsible AI use in Evidence Synthesis (RAISE) recommendations & guidance
Led by representatives from the International Collaboration for Automation in Systematic Reviews, Cochrane and Campbell, JBI and others.

Systematic Reviews and Related Evidence Syntheses

Responsible Use & Potential Tools

Artificial Intelligence (AI) in Evidence Synthesis

Additional recommendations for responsible use:

Workflow Stages & Potential Tools

Plan

Identify

Extract & Evaluate

Combine, Summarize & Share

Potential Issues with AI tools

Large Language Models (LLMs) for Designing Searches

Multi-Function Products

Example Prompts and Use Cases

Using LLMs to Frame a Research Question

Use Case:

Tip:

Key point:

Use AI to Create a Table of Related Reviews

Step-by-Step Instructions:

Use Gemini or Copilot to Generate Search Terms for a Research Question

Step-by-Step Instructions:

Use Yale MeSH Analyzer to Identify Controlled Vocabulary

Step-by-Step Instructions:

Explore More

Combine, Summarize &
Share