Know Your Accessibility Testers Before You Need To

Pad on a desk with the words Strength and Weakness separated by a vertical line

Most accessibility managers have a vague sense of who their strongest team members are and, similarly, who the weakest are. Vagueness stops being good enough the moment a layoff list lands on your desk or a high-stakes audit is staffed by two people who share the same “growth opportunity,” which is a nice way of saying weakness.

There are three distinct reasons different groups of people need to objectively understand where every tester on your team falls.

  1. The first is defensive, from the Accessibility Manager’s perspective: How do you quickly determine who your strongest team members are if you are asked to reduce headcount?
  2. The second is operational, from the Accessibility Team Lead’s perspective: How do you assign team members to larger projects and avoid pairing people with the same strengths or weaknesses?
  3. The third is developmental, from the perspective of whoever is in charge of employee growth: How do you coach your testers to improve in areas where they are weak?

All four of these perspectives depend on the same thing: objective, dated, written performance data, captured consistently for every tester, every quarter.

None of this is free. Time spent gathering performance data is time not spent on the audits the data is meant to describe. No accessibility manager has ever said they wanted to squeeze in more time-consuming personnel reviews. The rubric earns its keep when the cost of running it stays below the cost of the problems it prevents. The five-category structure described below is deliberately small for that reason: enough granularity to be defensible in a layoff file, useful in a staffing meeting, and informative in a development conversation, without becoming its own quarterly project requiring dozens of hours of time invested.

The Layoff Problem

Layoffs come from above, and they come fast. When leadership hands you a target headcount reduction, you sometimes have as little as a few hours to decide who stays and who goes. You also may be asked to defend that decision later.  The first layoff I managed, I was so stressed I threw up.  In hindsight, that was because I didn’t have enough objective data on the people I was managing and got caught in the “who is better?” doom spiral.

Relying on your gut is not a strategy. “They seemed less engaged” is not a defense. The EEOC, the plaintiffs’ bar, and any laid-off employee aged 40 or older will ask the same question: Where is the documentation? Not everyone has the resources of Microsoft or McDonald’s, who have both offered voluntary retirement.

The Age Discrimination in Employment Act prohibits adverse employment actions based on age, and the Older Workers Benefit Protection Act imposes specific disclosure obligations when a reduction-in-force affects anyone 40 or older. None of those obligations is easier to meet because you liked one tester better than another. They get easier to meet when you have a scoring rubric, applied consistently across the team, with dated evaluator comments for every score of zero, every bonus, and every deduction.

Subjective feelings are also exactly the kind of data that dissolves under pressure. Managers laying off people they have worked with for years will, consciously or not, sand off the rough edges of their stronger personal relationships and sharpen the edges of the weaker ones when making their decisions. While a rubric does not eliminate that bias, it makes it more obvious and gives the next person reviewing the file something to push back on.

The Project Staffing Problem

The quieter version of the same problem appears in routine project staffing. If your two strongest testers are both weak in communication and you pair them for an audit, the audit’s gaps will mirror their knowledge gaps.

The fix is not complicated. Pair complementary weaknesses, not matching ones. If you must staff two testers with overlapping gaps on the same project, plan for an extra layer of oversight up front: a third reviewer with the relevant expertise, an automated tooling pass on the area where both are weak, or a scoped second look from outside the project team. Any of those is cheaper than the alternative.

You cannot avoid pairing two people with the same weaknesses if you do not know which person has what weaknesses. Which brings us back to the rubric.

The Career Development Problem

The third reason is the one your testers actually care about. They want to know how they are doing, what to work on, and whether they are on track for the next step. “You’re doing great” tells them nothing. Vague encouragement is worse than vague criticism because it leaves them unable to disagree with anything specific and unable to demonstrate growth six months later.

A rubric makes the development conversation honest. A tester who scores as average in Accuracy is not a bad tester; they are a tester whose next professional goal should be to gain the discipline to improve their accuracy. The bonus categories work the same way in the other direction. Mentoring a newer tester, working on a certification, and contributing to internal playbooks: these are not extracurricular fluff. They are the activities that turn a competent tester into a senior one, and tracking them gives the tester credit for work that does not show up in audit deliverables. When the next promotion slot opens, the rubric record over four or eight quarters is the difference between “we think they are ready” and “here is what they have done to prove they are ready.”

Patterns over time matter more than any single quarter. A tester improving steadily across three quarters is a different signal than one stalled for a year.  Both of those situations are different from that of a previously strong tester whose scores have just dropped. None of those patterns is visible without the data.  All of them tell you something different about what the tester needs next, and what conversation you owe her.

The Rubric

There are five core competencies, each scored on a scale between 0 and 2.  For this part, 10 is a perfect score.

1. Quality

Completeness and clarity of deliverables

Did not meet expectations.

More than one report was missing critical elements, requiring rejection and resubmission.

Partially met expectations.

Most required elements present; severity ratings occasionally inconsistent; minor follow-up needed before reports are usable.

Fully met expectations.

Reports are complete, clear, and immediately actionable. Severity ratings are justified. Supporting evidence is consistently included.

2. Speed

Output pace relative to estimates and deadlines

Did not meet expectations.

More than one deadline missed without advance notice; output well below estimates; caused downstream delays for the team or client.

Partially met expectations.

Generally on time; occasional slippage with advance notice; output met minimum expectations; minimal downstream impact.

Fully met expectations.

Consistently on or ahead of schedule; output met or exceeded estimates; proactively flags blockers before they affect the project.

3. Accuracy

Correct findings, WCAG citations, and impact assessments

Did not meet expectations.

Frequent false positives or duplicates, missed issues, or incorrect WCAG citations (wrong criterion, wrong level). Accuracy rate below 75%.

Partially met expectations.

Occasional false positives, duplicates, or citation errors; issues are generally valid but may need correction before client delivery. Accuracy 75%-90%.

Fully met expectations.

Minimal false positives and duplicates. WCAG citations correct, including level and techniques; findings appropriate for client review. Accuracy above 90%.

4. Communications

Responsiveness, clarity, and proactive escalation

Did not meet expectations.

Unresponsive beyond one business day; communications unclear; blockers raised after delays already occurred.

Partially met expectations.

Responds within one business day; occasionally vague; blockers identified, but sometimes late.

Fully met expectations.

Responds promptly; communications are clear and complete on the first exchange; blockers are raised before they cause delays.

5. Availability

Reliability during committed hours and meetings

Did not meet expectations.

Frequently unavailable without prior notice; more than one meeting missed without notice; project work stalled or reassigned due to unplanned absences.

Partially met expectations.

Generally available; occasional unplanned absences; advance notice provided most of the time.

Fully met expectations.

Consistently available during all committed hours; planned absences communicated in advance with enough lead time to plan around them.

Bonus Points (Up to 2 Additional Points)

Everyone has a bad day.  Like high school, this extra credit can help make up for that.  Bonus points are awarded for contributions made outside the testing project during the scoring period. Each bonus can be +.5 or +1, depending on the positive impact.  Total bonus score cannot be greater than 2, regardless of the number of activities completed. Bonus activities can include, but are not limited to:

  • Recorded instructional or orientation videos for other testers or team members
  • Completed coursework, practice exams, or earned credit toward an accessibility certification
  • Participated in authoring or reviewing a proposal or statement of work
  • Mentored a newer tester, including reviewing their work and providing feedback
  • Contributed to internal knowledge resources such as testing checklists, playbooks, or style guides
  • Represented the organization at an accessibility conference, webinar, or community event
  • Contributed to a published article, blog post, public resource, or social media on behalf of the organization

The important thing is that the testers know the list of bonus activities.

Deductions (Up to 2 subtracted Points)

If bonuses are the carrots, deductions are the stick.  There are things that testers can do that make their test managers’ lives really difficult, no matter how good the tester’s other work is.  Apply deductions for documented violations during the scoring period. Deductions can be -.5 or -1, depending on the severity or number of instances.  The final score cannot be reduced below 0.  Document all deductions in the evaluator’s comments. Again, this list is not meant to be exhaustive, but can be added to or deleted from depending on the environment the tester is working in.

  • Late or missing timesheet; failure to follow the time-off request process
  • Failure to complete required compliance or tool training by the posted deadline
  • Failure to use required templates, filing conventions, or designated finding-tracking tools
  • Testing outside of the assigned scope or communicating directly with a client without prior authorization
  • Repeating mistakes after previous correction
  • Unprofessional conduct toward a colleague, client, or project stakeholder
  • Failure to escalate a critical finding or blocking issue within one business day of identification

What the Rubric Actually Does

Using this rubric, in a split second:

  1. A manager will know who their low performers are.
  2. A team lead will know who to assign to work together.
  3. A trainer will know which classes to assign to a tester.

Where the categories really shine is coaching: A tester who scores 2 in Accuracy and 1 in Speed requires different coaching than one who scores 1 in Accuracy and 2 in Speed. The Accuracy category in the rubric explicitly tracks correct WCAG citations, including criterion and level, so the scores tell you something concrete about each tester’s technical depth, not just their level of effort. Pair two complementary profiles for a deadline-driven audit, and you cover both ends. Pair two of the same weaknesses, and somebody on the project should be reviewing every report twice. The rubric does not say that directly, but the data it generates does, the moment you sit down to staff a project and compare scores side by side.

When to Start

Monthly is the right cadence. Quarterly or annual is too long because audits can be short, and quality drift that matters in accessibility work happens in weeks, not years. Score on the same day each month, file the rubrics where HR can find them, and add dated evaluator comments throughout the quarter as incidents and accomplishments occur, rather than reconstructing them in a panic when the next evaluation cycle starts.

The worst time to start tracking tester performance is the day you need the data. By then, whatever you produce will look exactly like what it is: documentation reverse-engineered from a decision already made. That document does not protect anyone, including you.

Final thoughts

At AccessAbility Officer, we use this rubric monthly for every tester on our team. We do it because our customers deserve auditors whose strengths and weaknesses are known, whose pairings are deliberate, and whose work product reflects active oversight rather than hope. We also do it because our testers, most of whom have disabilities, deserve clear feedback, fair evaluation, and a documented path to advancement that does not depend on who happens to be in the room when a promotion decision is made.

The rubric is not the point. The discipline of capturing objective, dated, written performance data on a regular cadence is the point. The rubric is just the structure that makes that discipline sustainable.

If you manage an accessibility team and you cannot, right now, point to written performance data for every tester for the current quarter, start this month. Score the team on the same day. File the results where they can be found. Add evaluator comments as the next quarter unfolds. By the time you actually need the data, whether for a promotion decision, building a team for a new project, a coaching conversation, or a layoff, you will have it. The next time your testers ask how they are doing, you will be able to tell them, specifically, with evidence.

That is what your employees deserve. That is what your customers are paying for. And that is the difference between an accessibility team that runs on documentation and one that runs on luck.