K-Means Clustering in the network security domain.

Shristi Agarwal
3 min readAug 12, 2021

As most of us know K-Means clustering is an unsupervised clustering algorithm approach for performing the classification of the data points or the objects, it is different from KNN algorithm (which is a supervised technique) in which clusters are fixed, but here K-Means works on the data, it tries to form as many clusters it can form , from the inputted data points , not just mapping data points in the fixed cluster (as like in KNN algorithm).

It classifies in the below given fashion…

This is how K-Means label the given data into clusters through centroids.

In K-Means Clustering K denotes the no. of clusters formed.

Initially we randomly take the no. of clusters, but as soon as we go ahead no. of clusters changes, this is done by the Elbow Method for selection of optimal K clusters .

Final No. of clusters are not fixed initially in the K-means, for this, we have the Elbow method, as we know if k increases, average distortion will decrease, each cluster will have fewer constituent instances, and the instances will be closer to their respective centroids. However, the improvements in average distortion will decline as k increases. The value of k at which improvement in distortion declines the most is called the elbow, at which we should stop dividing the data into further clusters.

A very useful approach for clustering in the Security domain, but how?

Have you heard about the Botnets?

Let me simplify, it is a short of robot network (which are computers) which are infected by the malware and are under the control of a single attacking party i.e. bot-herder/bot master, the command and control server(C2) is the bot master’s server (which’s binding ip with the host name in DNS keep on changing so frequently so that nobody can trace the host machine). Through C2, bot master distribute malware through infected sites, social media or spam emails, and infect the other machines and their network which eventually called Botnet.

Through some network monitoring tool like Wireshark, we collect the data of the network.

On that data we apply K-means clustering to form the K-centroids which helps us to study the infected number of systems through the graph, rate of the systems getting affected with the malware attack, and various other important networking conclusions we can derive from those graphs form using the K-means clustering.

Hence, K-Means clustering is an efficient algorithm to solve problems in the network and security domain, other useful domains for this algorithm can be for Human resource management team, where employees, and customers can be clustered or grouped based upon some desired parameters.

But for the detecting the network security domain, this technique is useful to detect faults and frauds in attacks like spamming, phishing and DDoS attacks.

Thank you….

Happy Learning🙂…

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

No responses yet

Write a response