Superlearning for data scientists and AI programmers

Occasionally I write posts with specific tips for programmers of different kinds. Nowadays data scientists and AI programmers are in high demand. New areas like deep neural networks, chatbots, and mixed reality pose a new level of challenges. How will you treat these challenges as superlearners is up to you. This post focuses on working with convolutional neural networks, as an example for practices that apply in other areas. The professional terminology is mentioned as-is and unadapted to people who do not have the relevant experience.

For those who do not want to read further, here is a short summary. Superlearning is great for data scientists, since it teaches some of the most important skills:

We can read articles faster than competitors, and reading is required.
It is important to memorize many details, and we can use the methods you know (mindmaps and mental palaces of various sorts)
Quick visual perception and attention to details provide additional competitive edge. When we train speedreading, we train these skills too.

New and trendy

The convolutional neural networks become popular after some revolutionary results in 2012. Since then the neural networks ace many tasks that were considered impossible to handle using other methods. The most popular application for this technology is autonomous cars. With rising demand, there are many self-taught specialists in neural networks. Most specialists teach existing networks to work with slightly modified inputs, with very few people actually understanding the processes involved to the point of being able to create a new network.

Huge data

What is it like working with a neural network of an average dataset?

Nobody works in a vacuum. We all start by reading many articles and finding which article will help us. It is a good practice to start with an article that provides both code and pretrained weights.
You spend considerable time setting up the environment: finding and downloading all relevant packages may be a long process. If you work with MXNet or Tensorflow this is an issue of hours, but if you use some custom build of Caffe you may end up spending days configuring the system.
Then you download a dataset and learn how to use it. An average dataset has 50GB of data with mixed labeling methodology. Each database requires learning its configuration and using the right tools.
Once all pieces in place the training starts. The full training can take up to 3 weeks, so it is custom to break training into shorter sessions of several hours focused on specific traits of the network.
During the training, it is expected that the specialist prepares for the next training session and reads the relevant literature.

Reading

Typically the neural network is a key element of some greater system. The system is often modified and updated by team members. Each such update has a partially unpredictable effect on the networks. It takes a serious effort to know the various part of the system and the relevant articles in the area. Great new articles appear every year, and their ideas should be implemented so the system does not get outdated.

Great new articles appear every year, and their ideas should be implemented so the system does not get outdated. This means several blocks of the network need to be replaced and tested like lego blocks. Unfortunately, neural networks are not lego blocks and modifications can get complex.

Usually, things do not work. It is very hard to understand why. Hundreds of things can go wrong. This great article summarizes SOME of the things that need to be checked. In fact, the list is much longer and specific per system. Simply remembering different things that can go wrong and which of these things were checked requires extraordinary memory.

Knowledge distillation

It is possible to use some networks to train others. This process is known as knowledge distillation. The idea is training several specialist networks, occasionally using different technologies. Then these specialists can train a generalist network. The process is more stable than training the generalist network to begin with. Each specialist needs to deal with a limited number of situations and it is easy to check when these requirements are met.

On the side of complexity, instead of dealing with one network, data scientist needs to deal with several networks. Often some of the specialist networks use different technologies and more complex architectures than the generalist network since training and inference speed of the “teacher” network is less important than that of the “student” network. This practice requires multitasking skill.

Analyzing the results

Only a small part of the result analysis is statistical. Qualitative analysis by visual inspection is very important to understand the root cause of failure and form a good improvement strategy. This review needs to be done fast. Quick visual perception and attention to details are required to understand what went wrong and where.

Tips

Get 4 Free Sample Chapters of the Key To Study Book

Get access to advanced training, and a selection of free apps to train your reading speed and visual memory