Similarities and differences of k-means, PAM and CLARA
Clustering is a way in unsupervised learning that puts like data
points in one group. It is used a lot in things like market groups,
seeing what’s in a photo, and health studies. There are many ways to
group, but K-Means, Partitioning Around Medoids (PAM), and Clustering
Large Applications (CLARA) are three well-known types, each with its own
good and bad points. This paper looks at these three ways, showing their
pros, cons, and where they work best.
K-Means Clustering
K-Means is a top pick for clustering data because it’s easy and fast.
It sorts data into k groups by using a center-based plan that keeps the
spread inside each group low. Here are the steps:
1-Pick how many groups (k) you want.
2-Choose k centers at random.
3-Put each data bit close to a nearby center.
4-Find the new centers based on the data bits set.
Do steps 3 and 4 over and over until the groups don’t change.
Pros of K-Means:
+Quick and works with big data sets.
+Good for round groups that are far apart.
+Simple to get and use.
Cons of K-Means:
-You must pick the count of groups (k) at first.
-Does not like odd points, as they can mess up group picks.
-Not good for odd or not round groups.
Partitioning Around Medoids (PAM)
PAM is a way to group data like K-Means, but it deals with odd data
better. Instead of middle points, it picks real data points as medoids,
which helps it not to get thrown off by weird numbers.
Here’s how it works:
Pick k real data points as medoids.
Link every data point to the nearest medoid.
Keep swapping medoids with other points to cut down on
differences.
Stop when you can’t get a better set of medoids.
Why PAM is good:
+Better with odd data than K-Means.
+Good for small sets of data.
+Uses real data points, so it’s easy to see what it’s doing.
Why PAM might not be the best:
-Takes a lot of computer power for big sets of data.
-Slower than K-Means because it has to try a lot of swaps.
Clustering Large Applications (CLARA)
CLARA is a form of PAM made for big sets of data. It does not look at
all the data but uses parts to guess results fast.
Here is what the method does:
1-It picks many small bits from all the data.
2-Uses PAM on each bit to find the top centers.
3-Puts other data points with the closest center.
4-Picks the best group way based on cost.
Good things about CLARA:
+Works well with lots of data. +Keeps strong against odd data and is
quick. +Mixes well the ways of K-Means and PAM.
Not so good things about CLARA:
-Picking parts can make it less right. -Takes more computer work than
K-Means.
Conclusion
Each clustering method has its own advantages and is best suited for
specific scenarios:
K-Means is preferred for large datasets with well-separated clusters
and when speed is a priority.
PAM is better for small datasets where robustness to outliers is
important.
CLARA offers a balance between PAM and K-Means, working well for
large datasets where full PAM computation is impractical.
The choice of the best clustering method depends on the nature of the
dataset, whether outlier resistance is needed, and available
computational resources.
LS0tDQp0aXRsZTogIlnDvGNlbCBUYW4gRWJpcmkgQWJzZW5jZSBQYXBlciINCm91dHB1dDogaHRtbF9ub3RlYm9vaw0KLS0tDQoNCg0KDQojIFNpbWlsYXJpdGllcyBhbmQgZGlmZmVyZW5jZXMgb2Ygay1tZWFucywgUEFNIGFuZCBDTEFSQQ0KDQpDbHVzdGVyaW5nIGlzIGEgd2F5IGluIHVuc3VwZXJ2aXNlZCBsZWFybmluZyB0aGF0IHB1dHMgbGlrZSBkYXRhIHBvaW50cyBpbiBvbmUgZ3JvdXAuIEl0IGlzIHVzZWQgYSBsb3QgaW4gdGhpbmdzIGxpa2UgbWFya2V0IGdyb3Vwcywgc2VlaW5nIHdoYXQncyBpbiBhIHBob3RvLCBhbmQgaGVhbHRoIHN0dWRpZXMuIFRoZXJlIGFyZSBtYW55IHdheXMgdG8gZ3JvdXAsIGJ1dCBLLU1lYW5zLCBQYXJ0aXRpb25pbmcgQXJvdW5kIE1lZG9pZHMgKFBBTSksIGFuZCBDbHVzdGVyaW5nIExhcmdlIEFwcGxpY2F0aW9ucyAoQ0xBUkEpIGFyZSB0aHJlZSB3ZWxsLWtub3duIHR5cGVzLCBlYWNoIHdpdGggaXRzIG93biBnb29kIGFuZCBiYWQgcG9pbnRzLiBUaGlzIHBhcGVyIGxvb2tzIGF0IHRoZXNlIHRocmVlIHdheXMsIHNob3dpbmcgdGhlaXIgcHJvcywgY29ucywgYW5kIHdoZXJlIHRoZXkgd29yayBiZXN0Lg0KDQojIyBLLU1lYW5zIENsdXN0ZXJpbmcNCg0KSy1NZWFucyBpcyBhIHRvcCBwaWNrIGZvciBjbHVzdGVyaW5nIGRhdGEgYmVjYXVzZSBpdCdzIGVhc3kgYW5kIGZhc3QuIEl0IHNvcnRzIGRhdGEgaW50byBrIGdyb3VwcyBieSB1c2luZyBhIGNlbnRlci1iYXNlZCBwbGFuIHRoYXQga2VlcHMgdGhlIHNwcmVhZCBpbnNpZGUgZWFjaCBncm91cCBsb3cuIEhlcmUgYXJlIHRoZSBzdGVwczoNCg0KMS1QaWNrIGhvdyBtYW55IGdyb3VwcyAoaykgeW91IHdhbnQuDQoNCjItQ2hvb3NlIGsgY2VudGVycyBhdCByYW5kb20uDQoNCjMtUHV0IGVhY2ggZGF0YSBiaXQgY2xvc2UgdG8gYSBuZWFyYnkgY2VudGVyLg0KDQo0LUZpbmQgdGhlIG5ldyBjZW50ZXJzIGJhc2VkIG9uIHRoZSBkYXRhIGJpdHMgc2V0Lg0KDQpEbyBzdGVwcyAzIGFuZCA0IG92ZXIgYW5kIG92ZXIgdW50aWwgdGhlIGdyb3VwcyBkb24ndCBjaGFuZ2UuDQoNClByb3Mgb2YgSy1NZWFuczoNCg0KK1F1aWNrIGFuZCB3b3JrcyB3aXRoIGJpZyBkYXRhIHNldHMuDQoNCitHb29kIGZvciByb3VuZCBncm91cHMgdGhhdCBhcmUgZmFyIGFwYXJ0Lg0KDQorU2ltcGxlIHRvIGdldCBhbmQgdXNlLg0KDQoNCkNvbnMgb2YgSy1NZWFuczoNCg0KLVlvdSBtdXN0IHBpY2sgdGhlIGNvdW50IG9mIGdyb3VwcyAoaykgYXQgZmlyc3QuDQoNCi1Eb2VzIG5vdCBsaWtlIG9kZCBwb2ludHMsIGFzIHRoZXkgY2FuIG1lc3MgdXAgZ3JvdXAgcGlja3MuDQoNCi1Ob3QgZ29vZCBmb3Igb2RkIG9yIG5vdCByb3VuZCBncm91cHMuDQoNCg0KIyMgUGFydGl0aW9uaW5nIEFyb3VuZCBNZWRvaWRzIChQQU0pDQoNClBBTSBpcyBhIHdheSB0byBncm91cCBkYXRhIGxpa2UgSy1NZWFucywgYnV0IGl0IGRlYWxzIHdpdGggb2RkIGRhdGEgYmV0dGVyLiBJbnN0ZWFkIG9mIG1pZGRsZSBwb2ludHMsIGl0IHBpY2tzIHJlYWwgZGF0YSBwb2ludHMgYXMgbWVkb2lkcywgd2hpY2ggaGVscHMgaXQgbm90IHRvIGdldCB0aHJvd24gb2ZmIGJ5IHdlaXJkIG51bWJlcnMuDQoNCkhlcmXigJlzIGhvdyBpdCB3b3JrczoNCg0KUGljayBrIHJlYWwgZGF0YSBwb2ludHMgYXMgbWVkb2lkcy4NCg0KTGluayBldmVyeSBkYXRhIHBvaW50IHRvIHRoZSBuZWFyZXN0IG1lZG9pZC4NCg0KS2VlcCBzd2FwcGluZyBtZWRvaWRzIHdpdGggb3RoZXIgcG9pbnRzIHRvIGN1dCBkb3duIG9uIGRpZmZlcmVuY2VzLiANCg0KU3RvcCB3aGVuIHlvdSBjYW4ndCBnZXQgYSBiZXR0ZXIgc2V0IG9mIG1lZG9pZHMuDQoNCldoeSBQQU0gaXMgZ29vZDoNCg0KK0JldHRlciB3aXRoIG9kZCBkYXRhIHRoYW4gSy1NZWFucy4NCg0KK0dvb2QgZm9yIHNtYWxsIHNldHMgb2YgZGF0YS4NCg0KK1VzZXMgcmVhbCBkYXRhIHBvaW50cywgc28gaXQncyBlYXN5IHRvIHNlZSB3aGF0IGl0J3MgZG9pbmcuDQoNCg0KV2h5IFBBTSBtaWdodCBub3QgYmUgdGhlIGJlc3Q6DQoNCg0KLVRha2VzIGEgbG90IG9mIGNvbXB1dGVyIHBvd2VyIGZvciBiaWcgc2V0cyBvZiBkYXRhLg0KDQotU2xvd2VyIHRoYW4gSy1NZWFucyBiZWNhdXNlIGl0IGhhcyB0byB0cnkgYSBsb3Qgb2Ygc3dhcHMuDQoNCiMjIENsdXN0ZXJpbmcgTGFyZ2UgQXBwbGljYXRpb25zIChDTEFSQSkNCkNMQVJBIGlzIGEgZm9ybSBvZiBQQU0gbWFkZSBmb3IgYmlnIHNldHMgb2YgZGF0YS4gSXQgZG9lcyBub3QgbG9vayBhdCBhbGwgdGhlIGRhdGEgYnV0IHVzZXMgcGFydHMgdG8gZ3Vlc3MgcmVzdWx0cyBmYXN0Lg0KDQpIZXJlIGlzIHdoYXQgdGhlIG1ldGhvZCBkb2VzOg0KDQoxLUl0IHBpY2tzIG1hbnkgc21hbGwgYml0cyBmcm9tIGFsbCB0aGUgZGF0YS4NCg0KMi1Vc2VzIFBBTSBvbiBlYWNoIGJpdCB0byBmaW5kIHRoZSB0b3AgY2VudGVycy4NCg0KMy1QdXRzIG90aGVyIGRhdGEgcG9pbnRzIHdpdGggdGhlIGNsb3Nlc3QgY2VudGVyLg0KDQo0LVBpY2tzIHRoZSBiZXN0IGdyb3VwIHdheSBiYXNlZCBvbiBjb3N0LiANCg0KR29vZCB0aGluZ3MgYWJvdXQgQ0xBUkE6DQoNCitXb3JrcyB3ZWxsIHdpdGggbG90cyBvZiBkYXRhLg0KK0tlZXBzIHN0cm9uZyBhZ2FpbnN0IG9kZCBkYXRhIGFuZCBpcyBxdWljay4NCitNaXhlcyB3ZWxsIHRoZSB3YXlzIG9mIEstTWVhbnMgYW5kIFBBTS4NCg0KTm90IHNvIGdvb2QgdGhpbmdzIGFib3V0IENMQVJBOg0KDQotUGlja2luZyBwYXJ0cyBjYW4gbWFrZSBpdCBsZXNzIHJpZ2h0Lg0KLVRha2VzIG1vcmUgY29tcHV0ZXIgd29yayB0aGFuIEstTWVhbnMuDQoNCg0KIyMgQ29uY2x1c2lvbg0KRWFjaCBjbHVzdGVyaW5nIG1ldGhvZCBoYXMgaXRzIG93biBhZHZhbnRhZ2VzIGFuZCBpcyBiZXN0IHN1aXRlZCBmb3Igc3BlY2lmaWMgc2NlbmFyaW9zOg0KDQpLLU1lYW5zIGlzIHByZWZlcnJlZCBmb3IgbGFyZ2UgZGF0YXNldHMgd2l0aCB3ZWxsLXNlcGFyYXRlZCBjbHVzdGVycyBhbmQgd2hlbiBzcGVlZCBpcyBhIHByaW9yaXR5Lg0KDQpQQU0gaXMgYmV0dGVyIGZvciBzbWFsbCBkYXRhc2V0cyB3aGVyZSByb2J1c3RuZXNzIHRvIG91dGxpZXJzIGlzIGltcG9ydGFudC4NCg0KQ0xBUkEgb2ZmZXJzIGEgYmFsYW5jZSBiZXR3ZWVuIFBBTSBhbmQgSy1NZWFucywgd29ya2luZyB3ZWxsIGZvciBsYXJnZSBkYXRhc2V0cyB3aGVyZSBmdWxsIFBBTSBjb21wdXRhdGlvbiBpcyBpbXByYWN0aWNhbC4NCg0KVGhlIGNob2ljZSBvZiB0aGUgYmVzdCBjbHVzdGVyaW5nIG1ldGhvZCBkZXBlbmRzIG9uIHRoZSBuYXR1cmUgb2YgdGhlIGRhdGFzZXQsIHdoZXRoZXIgb3V0bGllciByZXNpc3RhbmNlIGlzIG5lZWRlZCwgYW5kIGF2YWlsYWJsZSBjb21wdXRhdGlvbmFsIHJlc291cmNlcy4=