HiredScore’s Focus on  Responsible AI and Bias Mitigation

Designing and building responsible AI that our customers can trust and adopt in line with their legal compliance obligations has been part of HiredScore’s mission statement since the beginning. We take responsible AI very seriously and regularly work with outside experts in AI, data privacy, employment law, and responsible AI to ensure adherence to best practices and the highest standards.

 

Overview of HiredScore AI Functionality

  • Spotlight is our proprietary candidate-focused AI that matches information submitted by candidates to job requisitions.
  • Fetch identifies potential matches between open jobs and people who have applied for other jobs previously and would like to continue to be considered for jobs at the company.  
  • HiredScore parses information contained in job requisitions and resumes and matches information on candidate qualifications and job requirements, reflecting who meets the employer-defined qualifications for the role against the candidate-stated job experience and relevance, taking the candidate data submitted by the candidate at face value.
  • HiredScore does not write job requisitions, nor does it edit job requisitions in any way. If a job requisition itself contains language specifying role qualifications that provides advantages to certain segments of the workforce, HiredScore has no editing abilities that would supersede the employer-defined criteria for the role. HiredScore never adds requirements or ignores requirements identified from job requisitions.
  • HiredScore only takes candidate data submitted by the candidate at face value. HiredScore does not ingest or use social media or any other data, nor does HiredScore evaluate or assess a candidate beyond the data they submit when applying for the employer’s job.

HiredScore’s Spotlight Feature

HiredScore’s explainable AI outputs report to what extent a candidate’s application matches the job requirements using a simple A, B, C, D system, where A denotes the highest match between a candidate’s application and an employer-defined job requisition. Put another way, the technology acts like a “smart mirror” reflecting back the extent to which candidates fit the job requirements set by the employer. The A, B, C, D scores are never “curved.” This means that if a job requisition has commonly held requirements, there may be many more As and Bs than Cs and Ds. Conversely, if a job requisition has very uncommonly held requirements, there may be many more Cs and Ds than As and Bs, and each candidate will receive a score indicating how they align with the requirements; this score is independent of the pool or the other candidates’ alignment or misalignment.

Just as it is true that meeting job requisition requirements is not the end of the inquiry for employers selecting candidates for employment, so, too, is the case with HiredScore matching scores.

The model’s output, the A, B, C, D designation of smart-mirror reflections, does not supplant human judgment, nor is it recommending anyone. It merely helps recruiters and hiring managers expediently identify which applicants have stated that they meet which requirements of the jobs to which they apply. Recruiters and hiring managers always have complete discretion over which applicants they select and what information they consider. HiredScore does not automatically make or execute any decisions. It just indicates which candidates match the stated job requirements and to what extent.

NYC Local Law 144 Context and Definitions

New York City’s Local Law 144, which came into effect in July 2023, regulates how automated employment decision tools, or “AEDTs” (as defined in the law), either make job decisions or influence humans to make job decisions. Is HiredScore’s Spotlight an AEDT?

Simply put, no, it is not. The Spotlight functionality was not designed as nor intended to be an AEDT as defined under the NYC Law. Under the NYC Law, a tool constitutes an AEDT if an employer uses the tool in its hiring process in such a way as to “substantially assist or replace discretionary decision making.” This phrase is defined very specifically according to the NYC Law, and Workday encourages our customers to review this definition with their legal counsel. The output of the Spotlight candidate tool is not intended to be the sole criterion, weighted more heavily than any other criterion, or overrule conclusions from human decision-making at our customer’s organization. Nonetheless, Workday (and HiredScore, prior to Workday’s acquisition of HiredScore) routinely conducts bias testing of its AI products that have the potential to significantly impact individuals’ economic opportunities. While Workday engaged an external auditor to conduct the bias testing of the Spotlight AI tool according to the approach described in the NYC Law, the purpose of this testing is in furtherance of Workday/HiredScore’s commitments to identify and mitigate potential bias. Customers may need to conduct their own testing and bias audits with an external auditor depending on their legal obligations..

New York City Local Law 144

The NYC Law (Local Law 144) defines the automated employment decision tools (AEDTs) that are in scope of the legislation and the term “bias audit” with the following (excerpted and summarized from 2021 N.Y.C. Local Law No. 144, N.Y.C. Admin. Code):

  • Automated employment decision tool—any computational process, derived from machine learning, statistical modeling, data analytics, or artificial intelligence, that issues simplified output, including a score, classification, or recommendation, that is used to substantially assist or replace discretionary decision-making for making employment decisions that impact natural persons.
  • Machine learning, statistical modeling, data analytics, or artificial intelligence—a group of computer-based mathematical techniques that generate a prediction of a candidate’s fit or likelihood of success or classification based on skills/aptitude. The inputs, predictor importance, and parameters of the model are identified by a computer to improve model accuracy or performance and are refined through cross-validation or by using a train/test split.
  • Simplified output—a prediction or classification that may take the form of a score, tag or categorization, recommendation, or ranking.
  • Employment decision—to screen candidates for employment or employees for promotion within the city.
  • Bias audit—an impartial evaluation by an independent auditor. Such bias audit shall include but not be limited to the testing of an automated employment decision tool to assess the tool’s disparate impact on persons of any component 1 category (race/ethnicity and sex/gender at minimum).
  • Metrics—Under LL 144, bias is assessed using impact ratios to determine whether one group is favored by the system over another. When the system results in a binary outcome (classification), the following metric is used to calculate the impact ratio:

Selection rate of one group
Selection rate of the group with the highest rate
Where selection rate is defined as:
SRx
=
# successful candidates in group x
# candidates in group x
When the system results in a continuous outcome, such as a score or ranking, the following metric is used:
Average score of individuals in a group
Average score of individuals in the highest scoring group

  • Adverse impact (4/5th rule): Under the Equal Employment Opportunity Commission’s Uniform Guidelines, adverse impact (bias) is said to be occurring when the selection rate of one subgroup is less than four-fifths (80%) of the group with the highest selection rate. As such, this is the threshold used in the test results below to determine if a system is biased based on the impact ratios metrics provided above.
  • Intersectional impact ratios. In addition to examining bias at the standalone level, where the rates of single subgroups (such as black and white) are compared, impact ratios are also calculated for intersectional groups. Here, impact ratios are calculated for groups based on their gender and ethnicity intersectional categorization.
  • Data Quality Assurance. Given that the 4/5th rule is most appropriate for larger sample sizes, the data is cleaned to remove groups with small sample sizes before the data is analyzed. This includes removing any missing responses or responses from those who have indicated they would “prefer not to say.” Data cleaning can also involve combining smaller groups into a single “other” category.
  • Small Sample Sizes. A small sample is defined here as the group representing less than 5% of the individuals or there being fewer than 3 individuals in this group. If a group with a small sample size has the highest rate or score, it is not used as the denominator for the impact ratio metric and instead uses the group with the highest score/rate that has a sufficient sample size. Therefore, the impact ratio for the underrepresented group would be greater than 1.

Results of Bias Testing Spotlight According to NYC’s Approach

Workday commissioned an external auditor to conduct testing of the Spotlight AI using the approach and formulas as provided in the NYC law for its “bias audits.”

Date of Testing Completion: 12 February 2024‍

Distribution Date: 1 February 2023–12 February 2024‍

Data Collected Description: HiredScore simplified output showing how much of a match candidates are against the job requirements, as defined by the employer, that they applied to with an A, B, C, D system. The data for this audit includes a random sample of 10 Global Large Customers data over the last 12 months of reqs and candidates, covering requisitions that were opened and/or filled between February 2023 and February 2024.‍

Source of Data Collected: The ATS system of the employers, which includes HiredScore grades for applicants.‍

Audit Outcome: There is no evidence of disparate impact based on the calculated impact ratios as well as the data analyzed for the standalone and intersectional analysis.‍

Audit Detailed Results:

''

*The results shown above do not include results for Native American, Native Hawaiian, or PI, because the number of individuals in these categories was less than 0.2% of the total population