Chapter 1: Key Machine Learning Terms

Machine learning is the platform from which we develop neural networks. Currently, there are many applications of the power and capacity of neural networks in everyday life. Humans need these artificial intelligence models to assist in technical and time-demanding tasks. These are tasks that would otherwise be prone to errors due to our biological limitations.

The machines need us as well to continue their process of learning and to gain more knowledge as technological intelligence improves. When we observe how improvement and development do go hand in hand, we can only imagine what is in store for the future of humankind. All these leaps and bounds we are currently undergoing had to start somewhere. That somewhere and some time was in the past, not far past but within the last century.

Key Machine Learning Terms and Their Definitions

Total 18 key terms of Machine Learning. In this section, we discuss their definitions.

  1. Model
  2. Algorithms
  3. Training
  4. Regression
  5. Classification
  6. Target
  7. Feature
  8. Label
  9. Overfitting and Generalization
  10. Regularization
  11. Parameter and hyperparameter
  12. GPU
  13. Vectors and Matrices
  14. Cloud computing with GPU
  15. CNN
  16. ReLU
  17. U-Net
  18. Backpropagation


This term describes the real-world interpretation in a mathematical format that the machine can understand. Training data is essential for the device to learn from; hence, your model should contain algorithms based on appropriate training data.


This term is the set of assumptions that the machine applies to the input data and produces outputs based on these assumptions. Algorithms are almost synonymous with program instructions. During training, devices have to follow specific algorithms to result in particular outcomes.


This activity involves repeated subjection of the machine to the same set of training data and corresponding algorithms. The aim of this repetitive nature in training is to achieve learning. Hence, the overall process is called machine learning. The immediate purpose of machine training is to achieve synergy between the training data and their corresponding algorithms. This synergy is an indicator of progress in the right direction for your machine learning process.


This training technique is applied when the machine labeled output does not correspond to the format of the expected known outcome. Regression implies a problem in the learning process of the machine-learning model. For instance, the machine produces a fixed value output based on continually variable outcomes such as time range.


This activity involves grouping data sets into predetermined classes. Classification is carried out based on various factors relevant to the particular data classes.


Target is the output produced by your machine after processing the input. The outcome depends on the input variable, and the algorithms applied. In training, the output produced is an already known value.


The feature is a term used to refer to the input variables to your machine. In machine learning, feature data is synonymous with input data. Dimensions are the number of features used as inputs. Feature engineering is the act of turning previous features into new features. Predictions are typically made using features. Features may also identify as attributes.


Labels are the opposite of features. The final output or outcome from machine learning is called label data.

Overfitting and Generalization

Overfitting uses the extents of generalizations to offer a description of itself. Both overfitting and generalization are end-process concepts of machine learning. Generalization is essentially the extent to which your output precision in the real world matches your target precision during training. Remember, during practice, you used training data, and different types of data will invariably behave differently.

Generalization describes the ease with which you can switch your training data for separate data while still maintaining your target precision. Generalization can be excellent or weak based on your signaling to noisiness rate. A high value in this rate in the material used as training data indicates a useful generalization, while the opposite situation holds. Overfitting is a condition in which both the model and training data exactly match each other, resulting in a poor generalization of new data.


Regulation is a mutually agreeable coming to terms on the permissible extent of complexity in your machine learning models. The ultimate objectives of this process are to increase generalizations and to eliminate overfitting or under-fitting situations.

A figure of authority and integrity should carry out this exercise to avoid bias or favoritism. He or she must be a third party and non-partisan. To achieve these objectives, the regulator will need to instill and charge a substantial fine on the conflicting or inconsistent features of your model. You should understand that regularization would result in limitations to the independent status of your model.

Parameter and hyperparameter

Parameters are features of the machine-learning model that form a part of its design layout and enhances its mechanical, physical looks. Parameters are modifiable and variable. Functional parameters are settings. Parameters can be technical as to affect the machine mechanism of prediction, or visual parameters to enhance the observational appearance of your machine-learning model. Parameters can be targeted for regularization to achieve model uniformity among the warring parties. Functional parameters depend on alterations in the algorithms or training data.

Hyperparameters are those knowledge aspects that you obtain from your extensive familiarity with a particular field of study or research. Hyperparameters cannot bear significant influence by the output from your machine-learning model. You need to fine-tune or initialize your model with these hyperparameters before training. This action will build bias towards your point of view. Your final model output will favor your already preferred outcome. You receive training to ignore certain aspects of the model output entirely or partially, which do not conform to your predetermined professional bias.


GPUs are the processing units that are used by neural networks and machine language systems to carry out their necessary functions. It is comparable to a personal computer’s CPU except that it has hundreds or thousands more processors per chip than the CPU.

A significant characteristic of GPU is that it performs arithmetic operations on vectors. This fact means that a GPU is pre-programmed to particular use situations. Each process will be carried out in parallel with another different task at the same time. On the contrary, a regular CPU can carry out all kinds of mathematical operations in a sequential order making it slower than the GPU.

Vectors and Matrices

These terms represent lists of vectors or just numbers placed in particular grids. This attention to detail in the specificity of the vectors, grids, matrix, and data sets is crucial in the successful functioning of your neural network. You can also apply these traits to machine learning systems.

These features are intentionally small to fit as many of them as possible into a chip. Therefore, since GPU uses more processors per chip, it will need a very high number of grids for every available chip. Vectors and matrices are vital elements for the learning process in machine language and artificial intelligence.

Cloud computing with GPU

You can apply existing cloud computing aspects and features to machine learning as well. All you need is rental space on the cloud, and you can lease out GPUs from the large companies offering these services. This move is more cost-effective than entirely purchasing a whole bunch of GPUs since you get to pay for its use at affordable rates.

In addition, this is an efficient way to utilize the massive potential of GPUs. Only units that are needed at a particular time will be in use. Others will be on standby for the next available task. This concept maximizes the usage of available GPUs. Individual enthusiasts could also rent out affordable GPUs for their personal intellectual development and promote innovation.


This term is Convolutional Neural Networks. This system is used in machine learning to improve the clarity, accuracy, and precision of a particular feature, in this case, an image. The resulting labeled output should achieve very high scores in all the relevant features during prediction.

The mechanism of action in CNN involves passing an image through a range of filters to get a classified, labeled output. The availability of both filtering and convolution layers enables your CNN to improve its specificity. Your network can then handle complicated or abstract images better than the typical systems.


ReLU is the short form of the Rectified Linear Unit. ReLU is a layer added as an activation feature to CNN to improve its efficiency further. Remember that there were filtering and convolution layers on CNN. After the insertion of each convolution layer, the application of the ReLU layer is subsequent. In this case, the ReLU typically serves as the activation function.

Currently, it is a commonly used application whose purpose is to mimic the actual neurons on and off switch. The ReLU primarily eliminates negative values by setting them to zero. In the subject of machine learning, CNN has so many redundancies that success and the highest efficiency are almost guaranteed outright.


In convolutional neural networks, there is an architectural component termed as U-Net. This structural feature focuses on biomedical image segmentation, particularly in the field of radiology. U-Net established similar numbers of up and downsampling layers resulting in CNN’s positive contribution to the medical field, albeit radiology.


This procedure is the core algorithm behind how neural networks learn. What we mean by learning is that we want to find out which weights and balances minimize a specific cost function. The aim is to try to reduce this cost function. You are looking for a negative gradient of this cost function. A negative slope will indicate how to alter the weights and balances to decrease the cost most efficiently.

People arrive at the total cost of the network by calculation as follows: you get all the outcomes given by the system. Compare the findings to the known expected results that you already have. Take the differences in all these run instances – Square all the values representing the various individual differences. Finally, find the mean. The mean is the halfway point of all the total squared numerical values. To achieve learning and convincingly evaluate performance, machine learning must be subject to backpropagation.

This activity introduces many chances for the machine to learn or relearn while the chances of failing at prediction (cost function) minimize significantly. Without backpropagation, the advancements currently achieved in machine learning would not have been possible.

Leave a Reply

Your email address will not be published. Required fields are marked *