The Power of Large Language Models

Behind Empath’s Employee Skills Inference

Recently, there has been an exploding level of interest in OpenAI’s ChatGPT. Its ability to provide meaningful answers to a wide range of questions has dazzled a broad range of users beyond the narrow universe of AI practitioners. ChatGPT is powered by the large language model GPT-3, the latest in a line of large language model innovation (LLMs). GPT-3 is trained on a huge corpus of books and Wikipedia entries, primarily using unsupervised learning (training where no labeled data of correct or incorrect results is provided).

For specific tasks where it is feasible to provide labeled data (marking answers to questions as correct or identifying generated text as relevant), fine-tuning an LLM is a better approach and can lead to much greater accuracy. For fine tuning for specific tasks (performing additional training on the base LLM with labeled data), Google’s open source BERT model (or its successors) has become the dominant technique. ChatGPT itself was fine-tuned with supervised learning and reinforcement learning to drive better text generation, especially to avoid harmful or biased responses.

Examples of tasks that require much higher accuracy include discussing symptoms and receiving medical advice and prescriptions, presenting a legal argument and hoping to get back the most relevant legal cases to buttress it, and describing a car’s mechanical problem and retrieving repair instructions. Simply using unsupervised LLMs will not yield the level of accuracy needed for these or similar tasks.

Let’s describe a specific task that we are very familiar with: inferring skills for employees in a company’s workforce. Empath was started in 2020 with the goal of bringing truly accurate skills inference for a company’s employees. With a complete and accurate skills inventory in hand, employees can plan their next job and career progression and companies can more efficiently build teams and manage their workforce.

We infer skills by taking the full “digital footprint” of all employees: (any source of language to, from, or about employees) and determine not just that the employee has a skill but also their proficiency in that skill. Proficiency levels are critical since many workforce skills are almost universally possessed by each employee in a company’s workforce (so inferring skills as binary tags is not actually useful). Examples include specific capabilities such as project management and Microsoft Excel, and soft skills such as teamwork and collaboration. Of course, to infer properly we need meaningful and distinct descriptions of the behavior exhibited for each proficiency level of the skill. Fortunately, companies and consulting firms such as IBM and EY have been creating richly described skills taxonomies for decades.

What we’ve done, is to determine which skills proficiency levels were semantically similar to the language available in an employee’s digital footprint. The meaning of the skill description can be expressed as an “embeddings vector” (a sequence of several hundred numbers that represents the precise meaning of a fragment of language). The text in the employee’s digital footprint (or more specifically a part of the footprint as we will discuss shortly) can also be expressed as an embeddings vector. The cosine distance reflects how similar the two fragments of text are, whether or not they match on any specific words whatsoever. Leveraging the power of BERT and successor LLMs we were able to perform this task with greater than 95 percent accuracy at our first customer AT&T.

But aren’t there other ways of doing skills inference, you might ask? For at least a decade, several other companies have claimed to infer skills, albeit usually focused on recruiting. The general approach has been to take resumes for job candidates that are filled with keywords and phrases and mark the candidate as having the relevant skills. These products maintain simple skill libraries, which are really just lists of tags (no levels or descriptions). If the resume has sufficient presence of one of the skills tags then the candidate is marked as having the skill. This is a reasonable approach to characterizing an unknown person and presumably made selecting candidates for interviews a bit more efficient. Unfortunately, this method cannot determine levels of proficiency in skills. It is also reliant on matching keywords or synonyms of skills, versus using the underlying meaning to match employees and skills via semantic similarity, as we do with LLMs.

Over the last three years we have continuously tracked improvements in LLMs, and more importantly optimized our usage of them. Among the many non-obvious improvements we have made is feeding the embeddings vectors, along with various metrics for distance, to large machine machine learning models where each part of the employee’s digital footprint (their performance review, their interaction in a work system such as their CRM or issue data, and their project descriptions) is separately weighted by the machine learning model (indeed we have a patent pending on this combined LLM/ML approach). With large amounts of labeled data from the companies we work with (whose employees mark the inferred skills and levels as correct or not) we have also extensively fine tuned successor models to BERT for this specific task. This also allows us to determine which parts of the employee footprint are worth the effort to continue to extract information from.

Due to their ability to provide meaningful responses to any question, large language models trained against a corpus with unsupervised learning can yield seemingly magical results when applied to the problem of unrestricted chat. Yet, LLMs that are fine-tuned for a specific task with labeled data of correct and incorrect results are actually much more powerful in their capacity to change how work is done. As our customers know, one of the most valuable of these specific tasks is inferring skills for company employees.

The Power of Large Language Models

Behind Empath’s Employee Skills Inference

LATEST NEWS

NEWSWIRE: Empath Launches AI-Enabled Opportunity
Marketplace: Unlocks Internal Mobility

The Power of Large Language Models

WorkTrends ft. Carlos Gutierrez

Main Navigation

Company

Lysa is Senior Vice President of Sales for Empath. She brings more than 25 years of SaaS sales, consulting and management experience.

Prior to joining EmPath, Lysa has held several sales roles with BetterUp, IBM Watson Talent, Korn Ferry, HireRight and Lee Hecht Harrison. Lysa is passionate about motivating teams to help customers solve business challenges through AI and machine learning.

Lysa earned a B.A. in Journalism and Public Relations from Temple University but continues her learning every day.

Carlos Gutierrez is a co-founder and CEO of Empath, Inc.

Secretary Gutierrez joined ASG from Citi, where he was Vice Chairman of the Institutional Clients Group and a member of the Senior Strategic Advisory Group.

He currently serves on the boards of the U.S. Chamber of Commerce’s U.S.-India Business Council, the Boao Forum for Asia, Occidental Petroleum Corporation, Exelon Corporation, and MetLife.

Secretary Gutierrez was born in Havana, Cuba. He is married to Edilia and has three grown children.

He is based in Washington, DC.

Carlos leads Empath’s marketing efforts. He brings wide-ranging experience developing and executing effective growth strategies to the role of CMO.

Carlos is a member of Golden Seeds, one of the most active early stage investment firms focused on women led businesses. An avid investor, Carlos has advised and invested in companies across various industries including financial services, education, hospitality, and cybersecurity.

Carlos is a contributor to the Asia Times and The Times of Israel and his writing has been featured in The Jerusalem Post, HuffPost, CNBC, Univision, and El Pais.

He holds a B.A. from the University of Michigan, a Master’s degree from Georgetown’s McDonough School of Business, and a J.D. from Georgetown University Law Center.

When she’s not on the job, Janelle coaches group exercise classes and enjoys spending time outdoors with her friends and family. Janelle holds a bachelor’s degree in Communication from the University of California, Davis and resides in Reno, NV.

The Power of Large Language Models

Behind Empath’s Employee Skills Inference

LATEST NEWS

NEWSWIRE: Empath Launches AI-Enabled OpportunityMarketplace: Unlocks Internal Mobility

The Power of Large Language Models

WorkTrends ft. Carlos Gutierrez

Main Navigation

Company

NEWSWIRE: Empath Launches AI-Enabled Opportunity
Marketplace: Unlocks Internal Mobility