AI & Workflows

Using LLMs to Refactor Legacy PHP Codebases Safely

·9 min read·By Abimael Espinoza

LLMs are the best refactoring tool ever invented for PHP — and the easiest way to silently introduce bugs at scale. Here's the workflow I use to get the productivity gain without the regressions.

The setup: guardrails first

Before any LLM-driven refactor, you need three things in place. Without them, you're gambling.

  1. PHPStan or Psalm at level 5+ on the module being refactored.
  2. PHPUnit tests covering the public behavior of that module — at minimum the happy path and 2–3 edge cases.
  3. Version control + small commits — every refactor step should be revertible in one click.

Workflow: the 4-step refactor loop

1. Pin the behavior with characterization tests

Ask the LLM to write tests describing what the legacy code currently does — bugs and all. These tests don't validate correctness; they validate that your refactor doesn't change behavior.

2. Refactor in narrow scopes

One file, one class, one method at a time. Long prompts produce long mistakes. Good first targets: extract a service from a fat controller, replace inline SQL with an Eloquent query, convert a switch to a strategy pattern.

3. Run the full guardrail suite after every change

  • Static analysis (PHPStan / Psalm).
  • Tests (PHPUnit).
  • Linter (PHP-CS-Fixer).
  • Composer autoloader regeneration if you moved files.

4. Human review for intent, not syntax

The LLM handles syntax. You handle: 'does this still match what the business needs?' and 'will the next engineer understand this?'.

Prompt patterns that work

  • Provide the existing code + the test that must keep passing + your target pattern. 'Refactor X to pattern Y while keeping test Z green.'
  • Constrain the output: 'no new dependencies, PHP 8.2 syntax, PSR-12 style.'
  • Ask for the refactor plan first; review; then execute.
  • After each change, ask: 'what did you change and why?'

What to never let the LLM do unsupervised

  • Database schema migrations.
  • Authentication / authorization logic.
  • Money-handling code.
  • Cryptography.
  • Anything in a try/catch where the catch is silently swallowing errors.

A real example

Last quarter I migrated a 14k-line legacy PHP 7.2 codebase to PHP 8.3 with this workflow. AI did about 70% of the syntax-level work via Rector + LLM-driven file-by-file passes. Humans reviewed every PR and made architectural decisions. Total time: ~3 weeks vs. an estimated 8 weeks without AI assistance, with fewer regression bugs than a comparable 100% manual project.

When AI refactoring fails

  • When tests don't exist — the LLM has nothing to anchor against.
  • When the codebase has unique conventions the model wasn't trained on.
  • When 'refactor' really means 'redesign' — that still requires a human architect.

Need a hand?

Hiring or modernizing PHP? Let's talk.

16+ years building, scaling, and rescuing PHP applications. Direct contact, no marketplace, US time zones from LATAM.

Related reading