Classifying unknown gestures with modified loss function

Kornel Dylski - September 28, 2020

Image classifiers are a common concept in deep learning projects. While the theory is well researched and some state-of-the-art achievements in this area are truly impressive, there are still challenges in real-life applications.

Challenges with gestures application

We’ve recently had the opportunity to create a real-time classifier for webcam images. We created a neural network that could be trained to recognize gestures shown in front of the camera.

Some of the initial challenges in the project included an insufficient number of images for training and the lack of an explicit indication of the unknown category.

While the former is a typical issue in deep learning projects, i.e. the training dataset isn’t large enough to cover every possibility. The lack of diverse data can be handled by data augmentation (described e.g. here), which involves generating photos shot from different perspectives, with different exposure and lighting.

The latter issue was more tricky but no less critical – the classifier records continuously but because the participant may not show any gestures, the forecasts should return nothing most of the time. We thus needed to create a classifier that would not only recognize a gesture, but also be able to acknowledge that none is shown.

Following some research, we found that the most common solution to the problem was only a compromise and involved adding an additional category and training the identifier with random pictures from outside of the basic scope. This would require adding an endless number of pictures. We wanted to avoid that as it does not sufficiently cover the unknown category concept.

We did some experiments and modifications that can cover the unknown category, which will be discussed in this article. You can also check out notebook which presents experiments.

Why handling the unknown is important

For a basic network which does not consider unknown as a category, all unidentified pictures will have to fall into one of the existing known categories (almost randomly). This is really bad, because the network could make confident predictions and still be wrong.

Here are two confusion matrices which show validation results of the same network. The first for the pictures taken from known categories. The second is for pictures taken from unknown categories as well. This shows how deceptive it can get when you only validate with categories from your training dataset.

Confusion matrix for the pictures taken from known categories

Confusion matrix for the pictures taken from known categories

Classifier loss function

To address this issue you should first know how the classifier understands the concept of category and how loss function makes it to learn. If you are familiar with this you can skip to paragraph six.

To train the classifier we first gather data arranged into categories. We pass this data through a neural network. Then, loss function is applied on the output and the corresponding category.

Categories are represented by numbers, so the number 5 is the tiger.

The loss function is crucial. It’s usually a pure function that takes prediction (network output) and target (corresponding category) vectors, and returns the loss value. Loss value tells the network how far the network output indication was from the correct category.

Standard approach with categorical cross-entropy

Understanding the formula is not required to understand the concept (!)

For classification, the basic loss function is cross-entropy (CE). You can work it out later, but for now you only have to know that it compares the vectors tᵢ and sᵢ and then returns a value that represents the difference. The function can be perfectly applied in this case.

Cross entropy returns value that indicates difference between vectors

But first, both prediction and target have to be converted into vectors to fit into this function. To do so, softmax function is applied on prediction and one-hot encoding is applied on target.

One hot encoding simply takes a category represented by an integer number and returns a vector whose all values are zeroed except the one at the index corresponding to this number (see the example below).

One hot converts the fourth category into a vector made of zeros, with one at the fourth index

Softmax is a function that converts a vector of any number’s values into a vector of numbers with values between zero and one, so that the sum of all values is one. Thanks to this, the resulting converted vector can be treated as a probability that the result is a specific category.

Same here, knowing the formula is not required to understand the concept

The vector is shrinked between 0 and 1 in the way that the sum of the values equals 1

Please note the side effect – the largest value changes disproportionately to the others, taking the largest portion. This causes the function to be able to indicate only one category.

Change categorical to binary cross-entropy

Our loss function categorical cross-entropy indicates which category is the most probable. But the network should be equally able to classify multiple categories or none of them.

To allow this, you have to change softmax to sigmoid function and apply cross-entropy individually to each value, making the probabilities independent. Instead of determining which category is most probable, we determine, for each category, how probable it is that it occured. This is now called binary cross-entropy (BCE).

Each vector value shows its own loss and average is returned

Add unknown category

We divide categories from a dataset into three types.

Note the difference. The dataset category is a set of images signed with the same label. The network category is a number that can be recognized by the network as a label.

First, known categories. These are common categories which the network recognizes, and pictures from these categories can be predicted by the network as one of the categories.

Second, unknown categories. We’ve chosen a few additional categories which do not need to be classified separately. All pictures from them are labeled unknown and should be recognised by the network as unknown.

Third, unseen categories. They are similar to unknown categories and should also be recognized as such by the network. The images of these categories, however, are not used in training, but for validation. The network will have to recognize them as unknown even though it has not seen any analogous pictures before.

Categories types

To start handling pictures from unknown categories, we only add a single category at the beginning, and call it “na”

The “na” category will be used to handle unknown pictures

The naive approach (which, in fact, may prove efficient if you don’t expect exceptional pictures) would suggest to convert this to one-hot vector and, as per usual, pass it to the loss function. Pictures from unknown categories will be then learned and similar pictures will be successfully marked as “na”.

The results are much better with this category as most of the unknown and unseen categories fall into “na”. But as shown below (first row), unseen pictures spread across other categories as well.

Confusion matrix for binary cross-entropy

Modify loss function to handle unknown category

We’ve modified the loss function to consider unknown (and unseen) pictures a bit differently. Intuition suggests that the function should learn that all unknown categories are not probable at all. Network prediction for these pictures should thus be a vector of zeros.

During training, targets for unknown categories should also be changed to vectors of zeros (instead of vectors with zeros and the value 1 at beginning).

# apply sigmoid and one-hot
input = input.sigmoid()
target = F.one_hot(target, input.shape[1]).float()

# change all target category “na” (placed at 0 index) to zero
target[:, 0] = 0

# finally count bce loss
loss = F.binary_cross_entropy(input, target)

Confusion matrix for modified binary cross-entropy

The results have improved. Now unseen pictures more often tend to be predicted as unknown.

With this modification the accuracy for known categories is a bit lower, but the accuracy for unseen categories has vastly improved, along with the overall accuracy (see the results in the notebook).

While the results still depend heavily on the number of images seen by the network, it is a step in the right direction.

Do not forget about metrics

You should also modify metrics and other functions which are interpreting network output. Metrics have to know that a vector with values close to zero indicates an unknown category.

Here is an example of modified accuracy function. All vectors whose highest value is below the threshold are considered to fall into the “na” category.

def accuracy(input, target, thresh=0.4, na_idx=0):
  valm, argm = input.max(-1)

  # results below threshold are considered as "na" category
  argm[valm < thresh] = na_idx

  return (argm==target).float().mean()

Look at experiments

Notebook contains implementation, training and validation results. All was implemented in pytorch with fastai2 framework.

We performed three experiments to compare results.

  1. simple CE classification
  2. BCE classification with additional “na” category
  3. BCENa (modified) classification with additional “na” category

For experiments, we used the Imagenette dataset which is a subset of ImageNet but contains only 10 categories.

The code and notebook are available on my github. You are welcome to try the code in your projects.

Any feedback, comments and questions are welcome.

About the author

Kornel Dylski

Software Engineer

Kornel is a frontend engineer with several years of experience building robust web applications. Apart from web solutions, he participates in machine learning projects. He has always been interested in physics, which led him to explore artificial intelligence and programming languages such as Python.
His focus is on solving technical problems and providing data-driven solutions to clients' needs. He has a creative spirit and loves to make people laugh or smile while working together on complex issues.

Are you curious if AI
is something that can
change your company?

Let’s find out.

JOIN WORKSHOPS

More articles

Find us on

Need help with implementing AI in your business?

Let's talk blue circle

This site uses cookies for analytical purposes.

Accept Privacy Policy

In the interests of your safety and to implement the principle of lawful, reliable and transparent processing of your personal data when using our services, we developed this document called the Privacy Policy. This document regulates the processing and protection of Users’ personal data in connection with their use of the Website and has been prepared by Nexocode.

To ensure the protection of Users' personal data, Nexocode applies appropriate organizational and technical solutions to prevent privacy breaches. Nexocode implements measures to ensure security at the level which ensures compliance with applicable Polish and European laws such as:

  1. Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation) (published in the Official Journal of the European Union L 119, p 1); Act of 10 May 2018 on personal data protection (published in the Journal of Laws of 2018, item 1000);
  2. Act of 18 July 2002 on providing services by electronic means;
  3. Telecommunications Law of 16 July 2004.

The Website is secured by the SSL protocol, which provides secure data transmission on the Internet.

1. Definitions

  1. User – a person that uses the Website, i.e. a natural person with full legal capacity, a legal person, or an organizational unit which is not a legal person to which specific provisions grant legal capacity.
  2. Nexocode – NEXOCODE sp. z o.o. with its registered office in Kraków, ul. Generała Henryka Kamieńskiego 51, 30-644 Kraków, entered into the Register of Entrepreneurs of the National Court Register kept by the District Court for Kraków-Śródmieście in Kraków, 11th Commercial Department of the National Court Register, under the KRS number: 0000686992, NIP: 6762533324.
  3. Website – website run by Nexocode, at the URL: nexocode.com whose content is available to authorized persons.
  4. Cookies – small files saved by the server on the User's computer, which the server can read when when the website is accessed from the computer.
  5. SSL protocol – a special standard for transmitting data on the Internet which unlike ordinary methods of data transmission encrypts data transmission.
  6. System log – the information that the User's computer transmits to the server which may contain various data (e.g. the user’s IP number), allowing to determine the approximate location where the connection came from.
  7. IP address – individual number which is usually assigned to every computer connected to the Internet. The IP number can be permanently associated with the computer (static) or assigned to a given connection (dynamic).
  8. GDPR – Regulation 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of individuals regarding the processing of personal data and onthe free transmission of such data, repealing Directive 95/46 / EC (General Data Protection Regulation).
  9. Personal data – information about an identified or identifiable natural person ("data subject"). An identifiable natural person is a person who can be directly or indirectly identified, in particular on the basis of identifiers such as name, identification number, location data, online identifiers or one or more specific factors determining the physical, physiological, genetic, mental, economic, cultural or social identity of a natural person.
  10. Processing – any operations performed on personal data, such as collecting, recording, storing, developing, modifying, sharing, and deleting, especially when performed in IT systems.

2. Cookies

The Website is secured by the SSL protocol, which provides secure data transmission on the Internet. The Website, in accordance with art. 173 of the Telecommunications Act of 16 July 2004 of the Republic of Poland, uses Cookies, i.e. data, in particular text files, stored on the User's end device.
Cookies are used to:

  1. improve user experience and facilitate navigation on the site;
  2. help to identify returning Users who access the website using the device on which Cookies were saved;
  3. creating statistics which help to understand how the Users use websites, which allows to improve their structure and content;
  4. adjusting the content of the Website pages to specific User’s preferences and optimizing the websites website experience to the each User's individual needs.

Cookies usually contain the name of the website from which they originate, their storage time on the end device and a unique number. On our Website, we use the following types of Cookies:

  • "Session" – cookie files stored on the User's end device until the Uses logs out, leaves the website or turns off the web browser;
  • "Persistent" – cookie files stored on the User's end device for the time specified in the Cookie file parameters or until they are deleted by the User;
  • "Performance" – cookies used specifically for gathering data on how visitors use a website to measure the performance of a website;
  • "Strictly necessary" – essential for browsing the website and using its features, such as accessing secure areas of the site;
  • "Functional" – cookies enabling remembering the settings selected by the User and personalizing the User interface;
  • "First-party" – cookies stored by the Website;
  • "Third-party" – cookies derived from a website other than the Website;
  • "Facebook cookies" – You should read Facebook cookies policy: https://www.facebook.com/policy/cookies
  • "Other Google cookies" – Refer to Google cookie policy: www.google.com/policies/technologies/types/

3. How System Logs work on the Website

User's activity on the Website, including the User’s Personal Data, is recorded in System Logs. The information collected in the Logs is processed primarily for purposes related to the provision of services, i.e. for the purposes of:

  • analytics – to improve the quality of services provided by us as part of the Website and adapt its functionalities to the needs of the Users. The legal basis for processing in this case is the legitimate interest of Nexocode consisting in analyzing Users' activities and their preferences;
  • fraud detection, identification and countering threats to stability and correct operation of the Website.

4. Cookie mechanism on the Website

Our site uses basic cookies that facilitate the use of its resources. Cookies contain useful information and are stored on the User's computer – our server can read them when connecting to this computer again. Most web browsers allow cookies to be stored on the User's end device by default. Each User can change their Cookie settings in the web browser settings menu: Google ChromeOpen the menu (click the three-dot icon in the upper right corner), Settings > Advanced. In the "Privacy and security" section, click the Content Settings button. In the "Cookies and site date" section you can change the following Cookie settings:

  • Deleting cookies,
  • Blocking cookies by default,
  • Default permission for cookies,
  • Saving Cookies and website data by default and clearing them when the browser is closed,
  • Specifying exceptions for Cookies for specific websites or domains

Internet Explorer 6.0 and 7.0
From the browser menu (upper right corner): Tools > Internet Options > Privacy, click the Sites button. Use the slider to set the desired level, confirm the change with the OK button.

Mozilla Firefox
browser menu: Tools > Options > Privacy and security. Activate the “Custom” field. From there, you can check a relevant field to decide whether or not to accept cookies.

Opera
Open the browser’s settings menu: Go to the Advanced section > Site Settings > Cookies and site data. From there, adjust the setting: Allow sites to save and read cookie data

Safari
In the Safari drop-down menu, select Preferences and click the Security icon.From there, select the desired security level in the "Accept cookies" area.

Disabling Cookies in your browser does not deprive you of access to the resources of the Website. Web browsers, by default, allow storing Cookies on the User's end device. Website Users can freely adjust cookie settings. The web browser allows you to delete cookies. It is also possible to automatically block cookies. Detailed information on this subject is provided in the help or documentation of the specific web browser used by the User. The User can decide not to receive Cookies by changing browser settings. However, disabling Cookies necessary for authentication, security or remembering User preferences may impact user experience, or even make the Website unusable.

5. Additional information

External links may be placed on the Website enabling Users to directly reach other website. Also, while using the Website, cookies may also be placed on the User’s device from other entities, in particular from third parties such as Google, in order to enable the use the functionalities of the Website integrated with these third parties. Each of such providers sets out the rules for the use of cookies in their privacy policy, so for security reasons we recommend that you read the privacy policy document before using these pages. We reserve the right to change this privacy policy at any time by publishing an updated version on our Website. After making the change, the privacy policy will be published on the page with a new date. For more information on the conditions of providing services, in particular the rules of using the Website, contracting, as well as the conditions of accessing content and using the Website, please refer to the the Website’s Terms and Conditions.

Nexocode Team