Unsupervised Learning and Data Clustering

The word unsupervised learning is the opposite of supervised learning. If before we thought of “with help,” then it logically follows that here we mean “without help.” In terms of learning, you are given learning materials without a clearly defined goal in mind. The word “clearly” is used because there may be a goal in mind. Imagine needing to write a poem or a short story. Instead of copying others right off the bat, you read through literature after literature looking at how they structured theirs, how their literary devices were used in relation to the theme, how their theme is relevant at the time they were written, et al. Your job is to look for patterns on your own.

Machine can also do the same thing. Earlier we had a training and test dataset to gauge the correctness of our A.I. models. In unsupervised learning, we only have one dataset that we try to uncover insights and patterns from. These insights and patterns may not even be visible to the human eye, which makes these types of learning even more exciting.

If in supervised learning we were focused with approximation, here in unsupervised learning we are more focused with description. Our A.I. will learn to find conclusions by way of clustering, figuring out a distribution for the dataset, or even reducing dimensions to see the dataset just plotted in a 2D graph to see if there is structure. Given the scope of unsupervised learning, there are not a lot of models or tests that we can do. However, this does not necessarily make it useless, as entering something without assumptions and letting the facts speak for themselves can oftentimes lead us to better conclusions. Unsupervised learning is helpful since it creates outcomes for us regardless of our preconceived notions.

Being able to tell A from B is one hallmark of intelligence. The hallmark present in supervised learning is the ability to improve one’s understanding with repetition. Supervised learning can be boiled down to cost function minimization, since the minimum cost is most probably the optimum solution to our problem. For instance, we call people “gifted” or “genius level” since they are able to pick up things faster than usual. But the key idea there is that they are able to minimize future mistakes by spotting said mistakes earlier on. On the other hand, unsupervised learning highlights another hallmark of intelligence: the ability to tell apart things without prior context.

If in supervised learning we were concerning ourselves with approximations using regression and classification, we concern ourselves instead with descriptions in unsupervised learning using clustering and dimensionality reduction. As the name implies, unsupervised learning is “without help,” utilizing unlabeled datasets, contrary to what supervised learning does.

Since there is no clear target in the dataset, unsupervised learning models are left to themselves to describe the natural structures that exist in data. Data scientists often create their own clusters beforehand based on their preconceptions and personal thoughts. However, there are times when there may exist a natural structure in the data that we may not have even been aware of. This structure is conceived upon about by our models’ capabilities to comprehend endless amounts of dimensions and data, something that we humans are incapable of doing.

Clustering

As the name implies, clustering is all about lumping in together data based on similarities they share. Perhaps you lump student data based on their grades? Or maybe based on their extra-curricular activity?

In the photo above we a clustering transformation from an unlabeled set of people to being clustered to four different groups. This is the power of unsupervised learning. It enables us to see which belongs which.

Imagine being a store owner and you wanted to utilize data science in order to better serve your customers. Since you know your business so well, you are well-equipped to make types of customers on the top of your head. Some might be students, some might be evening goers, Monday goers, et al. However, perhaps you notice that you are still not optimizing the amount of sales that you can make? That there are customers still that are left untapped that you cannot get your hands on? With enough customer data to use with unsupervised learning, you may be able to study who those customers are.

Two popular types of clustering methods are: 1) k-means clustering; and 2) Kohonen self-organizing maps (SOMs). Both do the same objective of clustering, but are different by nature.

The k in k-means is the number of clusters you want to partition the data into. For example, if you chose four clusters, then the final output will be four clusters. The problem with this is that you should know beforehand how many clusters you think exist. Perhaps you are thinking that defeats the purpose of clustering but it is not entirely the case. Assigning five clusters instead of four might lead to a different organization entirely. It is up to your creativity and inquisitiveness. However, different ks may yield wishful thinking upon the data scientist. Proceed with caution. There are methods for finding out what the optimum number of clusters there are, but that is outside the scope of our discussion.

On the other hand, SOMs do not require assigning cluster numbers beforehand. It is less flexible than k-means in the sense that it will figure out the clusters on its own. What makes SOMs superior, in a sense, is its ability to maintain the structure of the data, unlike k-means. What this means is that while SOMs traverse multidimensional space, their method of clustering data does not in any way change how close one data point is to another when moving to translate it from one dimension to another. This preservation of structure is important since the natural structure of data is preserved in the clustering process, making your final clusters as natural as they can be.

Dimensionality Reduction

Clustering is fine and all but at the end of the day, we humans are visual creatures. It would be great if we could visually see what the clusters looked like. Recall that the amount of dimensions that something has depends on the amount of features that you give it. For instance, using two features means being in two dimensions. But how do we visualize four, five, or even one hundred or more dimensions? As the name implies, dimensionality reduction involves reducing those dimensions to our desired number. Dimensionality reduction methods transform multidimensional data from one coordinate system to another.

The importance of dimensionality reduction cannot be stressed enough. Sometimes it is best to see with our own eyes what the clusters look like. By visually inspecting, we can spend more time getting our own hands dirty with possible clusters or combinations of clusters.

Like clustering, there are two main methods used in dimensionality reduction, namely principal component analysis (PCA) and t-distributed stochastic neighbour embedding (t-SNE). Both are useful as they enable us to visualize N-dimensional data into two- or three-dimensional space.

However, the devil is in the details, as they say. PCA mainly captures the linear structures between the features (columns in our spreadsheet) present in our data. This drawback causes inaccurate transfer from higher to lower dimensional spaces. We say inaccurate because the structure from higher dimensions is not preserved in the dimensional reduction. This gets fixed when using t-SNE since, in this method, local distances are preserved within the higher to lower dimension mapping.

Possibilities

As long as we need to find clusters or understand the structure of data, unsupervised learning will get the job done. A practical application of this in the business world is customer profiling. Because of clustering, businesses are able to optimize their services based on which customer is which. For example, instead of blasting the same advertisements and promotions to every customer, a business would save much more money if they sent those to customers who fit the bill instead. If one cluster enjoys more of product or service A than B, then perhaps sending them discount codes or promotions on product or service A would be good for business and customer retention. This also means a lesser budget for the promotional bast, since the company will not be sending the discount to everyone.

Particle, star, or galaxy detection in physics requires an incredible amount of data. A faster way of looking for new types of particles, stars, or galaxies is by using unsupervised learning. The A.I. will tell you if a new cluster is forming and you can visualize it right away. You can spend less time this way investigating too much time on each new entry since you are only interested in new discoveries.

Unsupervised Learning

Related

Era Innovator

Leave a ReplyCancel reply

Unsupervised Learning

Share this:

Related

Era Innovator

Leave a ReplyCancel reply