Inspired Selections UK Ltd are a recruitment firm based in Birmingham specialising in the optometry market. Trading since 2008, the company has established an enviable reputation for the quality of their services, both in terms of clients requiring optometry staff and applicants looking for positions in the field.
Inspired Selections UK Ltd use a unique question-based system with over 65 possible questions to be able to ascertain the true requirements of prospective clients. This approach enables the most effective method of matching applicants to positions.
The main problem Inspired Selections UK Ltd experienced was trying to deal with is the volume of applicants they currently have on their books, and matching them to potential jobs. One aspect of this challenge has been to distinguish between the different types of applicants according to levels of qualifications.
A second aspect is that jobs in the optometry sector involve a mix of highly qualified opticians and retail staff. An optometrist needs to have a degree-level qualification in optometry in order to be able to perform eye examinations and dispense medicines.
However, an optical assistant does not require a degree in optometry as the role primarily involves glasses and frames sales and booking eye examinations. Therefore, candidates with retail experience could be qualified for these positions. Separating those candidates qualified for optometrist jobs and those suitable for assistant jobs was time consuming for Inspired Selections UK Ltd staff.
A further issue faced by Inspired Selections UK Ltd is the matching of an advert to the best candidates. Typically, this is done manually but given the number of candidates on file, this process can be considerably time consuming and not always accurate.
Inspired Selections UK Ltd main goal was to automatically organise their CV database so that applicant optometrists can be automatically separated from optical assistants. The system also needed to be able to separate prospective candidates according to their level of education.
A second required system needed the capability to rank applicants’ suitability for individual job advertisements by matching CVs to the advert. These desired systems should be able to scale up to thousands of CVs and be of a low computational complexity. Inspired Selections UK Ltd required software to do this and it was supplied from Think Beyond Data in the form of C++ source code. The company provided 1,450 CVs in the form of *.doc, *.docx and *.pdf documents.
Think Beyond Data staff used text clustering to automatically organise the CVs for the first required system. The system was tailored to be able to read the text from differing file formats including PDF files, *.doc files, XML based *.docx files and simple TXT files.
Each CV is then translated into a document (TF-IDF) vector made up of each unique word, its term frequency and the inverse document frequency (how important the word is and it’s frequency across all CVs). This enables words that do not appear frequently and that are more important, to have more weight in the vector. These vectors are then clustered based on their similarities to each other.
Hierarchical clustering was chosen as it creates a tree structure whereby each individual CV is represented by a leaf node of the tree. At each branch, two groups of highly related CVs are merged into a single group until only a single group remains. This tree structure allows Inspired Selections UK Ltd to drill down through the CVs to their desired level, where they are then separated out into related groupings.
Using this approach, optometrist and optical assistant applicant CVs can be separated out, enabling Inspired Selections UK Ltd to still be able to select documents manually as opposed to a fully automated process. The hierarchical clustering provided unique challenges for the requirement of handling up to thousands of documents which is computationally expensive.
A second aspect of the system was to provide a description for Inspired Selections UK Ltd of each cluster in terms of the CVs held and what they represent with words such as ‘optometrist’ and ‘retail’. This was achieved by finding the top keywords in the subset of documents and presenting them to the company as they drill down through the dataset. The team were able to take this output and generate a dataset suitable for visualisation on a web page (see Figure 1).
Figure 1: An example of hierarchical clustering of 10 CVs represented as a dendrogram showing CV similarity in each cluster with keyword descriptors for particular clusters of documents.
The second software system required by Inspired Selections UK Ltd was to enable them to rank a set of CVs in relation to a given job advert, enabling staff to quickly locate the best-suited CVs for the job vacancy. To achieve this, a similar methodology to the clustering approach was used.
A TF-IDF vector is generated for the job advert, and vector distances between each CV and the advert are then calculated in a similar manner to the clustering, resulting in a distance between zero and one. Values close to one demonstrate high similarity and closer to zero low similarity.
A ranked list can then be created with the most similar CVs at the top. An example of the CV ranking to an optometrist CV is shown in Figure 2. A comparison of CVs to three differing job vacancies is shown in Figure 3.
Figure 2: An example of an optometrist advert on the left with a set of CVs on the right ranked in terms of their similarity to the job vacancy. Please note all applicant names have been changed for the purpose of this example.
Figure 3: A visualisation of how a set of CVs match to a range of job vacancies enabling the better matching ones to be quickly isolated from the less suitable CVs. Document distance values close to one demonstrate high similarity whilst distance values closer to zero demonstrate low similarity
The overall result was an improved process with two bespoke software systems that enabled better organisation of Inspired Selections UK Ltd’s optician related CVs. The supplied software acts as a back-end system for which they can create a visualisation of the clusters of CVs pertaining to differing skill sets, as well as the ranking of CVs to job vacancies.
The system provides clustering speed and the ability to handle hundreds of thousands of documents with cluster descriptions and ranked results to a given job vacancy. This unique piece of software, which is not available commercially, enables Inspired Selections UK Ltd to provide a much better service to their optician-related recruitment clients.