In this tutorial, we shift gears and introduce the concept of clustering. Clustering is form of unsupervised machine learning, where the machine automatically determines the grouping for data. There are two major forms of clustering: Flat and Hierarchical. Flat clustering allows the scientist to tell the machine how many clusters to come up with, where hierarchical clustering allows the machine to determine the groupings.
@sentdex #sentdex I am working with the NLP project that I want to make clusters from the primary source of data ,I have vectorized the point and then how can i convert the vectorized point into clusters
Your'e definition of the difference between flat and hierarchical is wrong. To quote "Machine Learning: a Probabilistic Perspective" by Kevin P. Murphy:
"flat clustering, also called partitional clustering, where we partition the objects into disjoint sets; and hierarchical clustering, where we create a nested tree of partitions. [...] Furthermore,
_most_ hierarchical clustering algorithms are deterministic and do not require the specification of K, the number of clusters, whereas _most_ flat clustering algorithms are sensitive to the initial
conditions and require some model selection method for K." [emphasis mine]
Hello sentdex, I have a question with regards to the centroids (mean of our features in each cluster.) When I perform the mean of the first column of our np.array X = np.array([[1, 2], [1.5, 1.8], [5, 8], [8, 8], [1, 0.6], [9, 11]]) which is (1 + 1.5 + 5) / 3 I get 2.5 however, my result when applying the cluster classifier is 1.16. Basically, what I want to know is this because we repeat the mean process over and over again until things smooth out?
plt.plot(X[i], X[i], colors[lables[i]], .... did not work, replacing (this) plot with scatter worked fine. Perhaps the case is that the new version of matplotlic has limited the use of the plot command when plotting individual points
Regarding your point of k-means versus hierarchical clustering I mean you can also use K-means with say 5-6 clusters each representing likeliness to buy an amazon product. Same idea as hierarchical clustering right? I always thought of k-means as more useful because you have centroids which you can use to train/test whereas hierarchical clustering training and testing is all done in the same step (So technically not really even machine learning)
What do the 'dots' after each colour name signify? For example, 'g.', 'r.' etc. Why doesn't it plot anything if we do make that list without the dots or use something like c = 'b' inside the plt.plot function?
Hello, I have a question, you know how to extract data from a .csv file and then make k -means , need to get the data from that file with python , because those values will make my two variables , could you help me please? :/
Yes, you can. Recall that the separating boundary is NOT a line. It's a hyperplane, and it can be any dimensional hyperplane. IF the data has more than 2D, its even more likely you will need to incorporate a kernel, but that's fine.
Remember that your X (dataset) is just a vector, and so is W. Getting the dot product produces a scalar, no matter how large the vectors are, so the exact same formulas are used whether your data is 1D, 2D or 500D
It is the algorithm of extracting simpler features from complex data.
For example, say you are making a cat identifier, you can make feature extractors which extract its eyes, nose , lips, whiskers, etc. This is where deep learning comes in. You dot need to do feature extraction in deep learning.
I have a list of products around 36000 items and I need to cluster them into groups. For example I have list of Pepsi variants like Pepsi max,Pepsi diet ,Pepsi cola etc and I want to cluster all these Pepsi variants into one cluster similarly for the other products in the list (all varieties of puddings like raspberry pudding, christmas pudding ..orange pudding into single category pudding). Can you please help me how do I achieve this? +sentdex
And I don't know the number of clusters beforehand.
His health issues were debilitating.
Everyday rituals to ensure a relaxing nights rest.
All the wellness news you need to know today, including a huge almond milk recall, peppers saving grasslands, and the pH of your brain.
9 Giant Companies That Have Made Impressive Green Commitments This Year.
Some good things have happened in 2018 after all.
It all has to do with the gut-brain connection.
First Tennessee Bank extends partnership with Bristol Motor Speedway.
NASCAR Related Racing Birthdays.
Sunday Watkins Glen Notebook.
Kahne expected to remain at Leavine Family Racing.
Cummins joins Stewart-Haas Racing.
Civil trial against Greg Biffle underway.
Hendrick Motorsports extends relationships with Nationwide and Alex Bowman.
Hanson to sing National Anthem at Darlington Raceway.
Sad News - Tom Higgins.
NASCAR considering rule change on windshield wipers.