Machine learning (ML) is one of the hottest areas of tech right now. But interesting enough, it is far from new. Keep in mind that the origins of ML actually go back to 1959. It was during this time that IBM programmer, Arthur Samuel, developed a system to allow someone to play checkers against a computer.
But this did not have a complex set of pre-defined rules and conditions. Instead, Samuel did something unique: he based the actions of the computer on data. In other words, it would “learn” over time to play checkers.
Samuels called this ML, which he famously defined as a “field of study that gives computers the ability to learn without being explicitly programmed.”
This approach would gain much popularity and become a core part of the development of AI (Artificial Intelligence). For example, ML helped with the early versions of computer vision and voice recognition.
How ML Works
A good way to understand ML is to look at the workflow. It’s not about someone just spinning up some code.
So, let’s take a look at the key steps:
1 – Data Preparation: This is often labor intensive and tedious. The process is about cleaning up a dataset, such as by finding the gaps, outliers, duplications and biases. The data may also need to be labeled.
In fact, data preparation can easily take a majority of the time of creating an ML model. It is also prone to error. If the data is not at a high quality, the results of the model will be off.
Consider that inadequate data preparation is often the main reason a project fails.
2 – Selecting the Algorithm: There are many algorithms available. And with Python commands on an ML platform, you can easily apply these to your dataset. But selecting the right algorithm takes trial-and-error. Although, an experienced data scientist will have the skills to make a good pick.
What’s interesting is that some of the algorithms are fairly basic, such as a regression analysis (this shows the correlation among variables). Others are based on the proximity of data to each other like the K-Nearest Neighbor. And others can be quite complex, as is the case with neural networks or deep learning systems.
3 – Training and Testing: Once you have the algorithm, you will then run the data through about 70% of the dataset. The process may also include feature extraction. This means you will come up with the types of variables to test for. No doubt, this can lead to big problems as a data scientist may not necessarily understand the business domain – and as a result, select the wrong variables.
But once you have trained the model, you can then test it with the remaining 30% of the dataset. But it is critical that it is representative of the whole. If not, the results could be skewed.
4 – Evaluate the Results: You want to measure the accuracy of the ML model. In terms of what benchmarks to use, they vary. If you are testing for fraud, then the accuracy needs to be near perfect. But on the other hand, if your model is about predicting customer churn, then there is much more leeway.
Note that this four-step process is a rough guide. There may be some other steps, depending on the type of project. For example, the deployment and management of the ML model can be extensive.
There have also emerged specialized ML approaches. One is AIOps, which involves the use of ML to assess IT (Information Technology) environments. Some of the use cases include detecting problems and even predicting when they may occur. The result is much lower costs and higher efficiency. In fact, AIOps has the potential to revolutionize IT.
Conclusion
There is often confusion about ML, AI and deep learning. But there are clear differences. AI is the category for all analytical and predictive systems. And one of the subsets is ML. And what about deep learning? Well, it’s a subset of ML.
However, in the business world, much of the development is about ML. The reason is that the systems are fairly mature and straightforward. For example, banks have been using ML for decades, such as for credit screening.
The good news is that more businesses have access to large amounts of data – much of it is their own – and the ML development tools are affordable, if not free. There are also web-based platforms like AWS and Azure that allow for rapid development. In other words, running ML models is much easier now, allowing businesses of any size to leverage the power of sophisticated analytics to make better business decisions or automate operations.