HomeОбразованиеRelated VideosMore From: sentdex

Clustering Introduction - Practical Machine Learning Tutorial with Python p.34

566 ratings | 59206 views
In this tutorial, we shift gears and introduce the concept of clustering. Clustering is form of unsupervised machine learning, where the machine automatically determines the grouping for data. There are two major forms of clustering: Flat and Hierarchical. Flat clustering allows the scientist to tell the machine how many clusters to come up with, where hierarchical clustering allows the machine to determine the groupings. https://pythonprogramming.net https://twitter.com/sentdex https://www.facebook.com/pythonprogramming.net/ https://plus.google.com/+sentdex
Html code for embedding videos on your blog
Text Comments (55)
Lefteris Griparis (25 days ago)
Always great job!!! i always wondering how you comment out a block of code with hashtags? like 13:14
Mohammad Shehada (1 month ago)
you are the man
sentdex (1 month ago)
Karan Chopra (1 month ago)
What is the difference between K nearest neighbors and K means Clustering?
Ravi Bhagat (1 month ago)
k nearest neighbors is a supervised learning algorithm used for classification problems while k means clustering is an unsupervised learning algorithm.
Saanvi Sharma (2 months ago)
If there's any ads appear on your video......I'll click them, to help you in my way
bipin tripathi (3 months ago)
one correction plt.plot(X[i][0], X[i][1], colors[i], markersize=34) instead of colors[labels[i]] to show all those colors
Suman Baruwal (8 months ago)
@sentdex #sentdex I am working with the NLP project that I want to make clusters from the primary source of data ,I have vectorized the point and then how can i convert the vectorized point into clusters
Liangyu Min (9 months ago)
plt.plot(),color could be 'y.' means yellow
Rashid Mahmud (9 months ago)
python libraries are incredible
Duilio (1 year ago)
Hi! Off topic question. If I perform the cluster analysis with different algorithms from sklearn how can I check the best one? Thanks in advance!
Arghyadeep giri (1 year ago)
How do you comment multiple lines? Stupid question I know... :/ but still it is kinda cool. :)
Arghyadeep giri (1 year ago)
This is awesome. Thank you so much for the reply. Huge fan here... <3
sentdex (1 year ago)
alt+3 and alt+4 to undo it in IDLE.
Orr Chen (1 year ago)
And finally we'll do a 9/11
Andrew Acreman (1 year ago)
Annnnnnd he's on a list.
Ken M (1 year ago)
I am going to turn off the addblock for all of your tutorials
Matan Bendix Shenhav (1 year ago)
Your'e definition of the difference between flat and hierarchical is wrong. To quote "Machine Learning: a Probabilistic Perspective" by Kevin P. Murphy: "flat clustering, also called partitional clustering, where we partition the objects into disjoint sets; and hierarchical clustering, where we create a nested tree of partitions. [...] Furthermore, _most_ hierarchical clustering algorithms are deterministic and do not require the specification of K, the number of clusters, whereas _most_ flat clustering algorithms are sensitive to the initial conditions and require some model selection method for K." [emphasis mine]
5:58 two little stickman falling off and two little stickman watching
sentdex (1 year ago)
Hmm, I see 2 stick men falling with 1 below with his arms up in alarm. Art, especially my art, is so beautiful and meaningful
Janio Bachmann (1 year ago)
Hello sentdex, I have a question with regards to the centroids (mean of our features in each cluster.) When I perform the mean of the first column of our np.array X = np.array([[1, 2], [1.5, 1.8], [5, 8], [8, 8], [1, 0.6], [9, 11]]) which is (1 + 1.5 + 5) / 3 I get 2.5 however, my result when applying the cluster classifier is 1.16. Basically, what I want to know is this because we repeat the mean process over and over again until things smooth out?
Heming Wang (1 year ago)
This guy fucks!
James Baxter (1 year ago)
I like you a lot.
sentdex (1 year ago)
ragvri (1 year ago)
You should have used enumerate instead of i in range(len(X)) or did you forget :wink:
pusj61 (2 years ago)
plt.plot(X[i][0], X[i][1], colors[lables[i]], .... did not work, replacing (this) plot with scatter worked fine. Perhaps the case is that the new version of matplotlic has limited the use of the plot command when plotting individual points
Sukumar H (2 years ago)
awesome tutorial on Kmeans
sentdex (2 years ago)
Thank you
Mike Liebertee (2 years ago)
nice content, you should mention fuzzy (or soft) kmeans as an alternative.
Petar King (2 years ago)
9/11 :o
Maxim Rusakov (2 years ago)
Regarding your point of k-means versus hierarchical clustering I mean you can also use K-means with say 5-6 clusters each representing likeliness to buy an amazon product. Same idea as hierarchical clustering right? I always thought of k-means as more useful because you have centroids which you can use to train/test whereas hierarchical clustering training and testing is all done in the same step (So technically not really even machine learning)
Omkar Ranadive (2 years ago)
What do the 'dots' after each colour name signify? For example, 'g.', 'r.' etc. Why doesn't it plot anything if we do make that list without the dots or use something like c = 'b' inside the plt.plot function?
Omkar Ranadive (2 years ago)
sentdex (2 years ago)
means it's going to plot a green dot, rather than a green line. If you did a green line, then it wouldn't be a scatter plot.
Berk Tosun (2 years ago)
Very informative video thanks! I got an error when importing style: line 2, in <module> from matplotlib import style I try uninstall and install again but it doesnt work for me. Can you help me?
Berk Tosun (2 years ago)
sentdex (2 years ago)
you can just not do the style part, it's not really essential to this series.
Andrew Czeizler (2 years ago)
Hey!! big fan,😃! was wondering if reinforcement learning was in the plans. best, Andrew
Peter Simkin (2 years ago)
Very informative and simple(ish) course on machine learning - thank you Sentdex!
sentdex (2 years ago)
Great to hear, happy to share!
maria camargo (2 years ago)
Hello, I have a question, you know how to extract data from a .csv file and then make k -means , need to get the data from that file with python , because those values ​​will make my two variables , could you help me please? :/
Hamid Shirdastian (2 years ago)
Hi Harrison, Thanks for your great SVM tutorials. A question for you: Could we have more than two features (not classes)? If yes, how should we handle that?!
Hamid Shirdastian (2 years ago)
Great! That makes sense.
sentdex (2 years ago)
Yes, you can. Recall that the separating boundary is NOT a line. It's a hyperplane, and it can be any dimensional hyperplane. IF the data has more than 2D, its even more likely you will need to incorporate a kernel, but that's fine. Remember that your X (dataset) is just a vector, and so is W. Getting the dot product produces a scalar, no matter how large the vectors are, so the exact same formulas are used whether your data is 1D, 2D or 500D
Deino475 (2 years ago)
How do you get the values of the points in each cluster?
Deino475 (2 years ago)
I phrased my question poorly. But I didn't know the labels were a list and I just needed to iterate through the list and pair them with each data.
sentdex (2 years ago)
+Deino475 which values? The data values would be from your sample data, whatever it is.
Adama Barrow (2 years ago)
ChristFan868 (2 years ago)
Are you going to cover/ discuss what Feature Extraction is?
Suprotik Dey (1 year ago)
It is the algorithm of extracting simpler features from complex data. For example, say you are making a cat identifier, you can make feature extractors which extract its eyes, nose , lips, whiskers, etc. This is where deep learning comes in. You dot need to do feature extraction in deep learning.
ug0ts3rvd (2 years ago)
It's basically trying to make a structure from the data, extracting elements of significance
Akshay Mallipeddi (2 years ago)
I have a list of products around 36000 items and I need to cluster them into groups. For example I have list of Pepsi variants like Pepsi max,Pepsi diet ,Pepsi cola etc and I want to cluster all these Pepsi variants into one cluster similarly for the other products in the list (all varieties of puddings like raspberry pudding, christmas pudding ..orange pudding into single category pudding). Can you please help me how do I achieve this? +sentdex And I don't know the number of clusters beforehand.
Zach Does Tech (2 years ago)
Sorry I mean sendex
Zach Does Tech (2 years ago)
Hey Semtex I'm your biggest fan
Zach Does Tech (2 years ago)

Would you like to comment?

Join YouTube for a free account, or sign in if you are already a member.