Let's start by loading some data from a csv file on the elemental composition of some pottery. A total of 5 elements are reported.

In [65]:
data = read.csv('https://vincentarelbundock.github.io/Rdatasets/csv/car/Pottery.csv')
data[1:6,]
Out[65]:
XSiteAlFeMgCaNa
11Llanedyrn14.474.30.150.51
22Llanedyrn13.87.083.430.120.17
33Llanedyrn14.67.093.880.130.2
44Llanedyrn11.56.375.640.160.14
55Llanedyrn13.87.065.340.20.2
66Llanedyrn10.96.263.470.170.22

First we need to trim the data because no one likes test in there dataframes. Notice that there are also 26 rows.

In [66]:
data = data[,3:7]
head(data)
nrow(data)
Out[66]:
AlFeMgCaNa
114.474.30.150.51
213.87.083.430.120.17
314.67.093.880.130.2
411.56.375.640.160.14
513.87.065.340.20.2
610.96.263.470.170.22
Out[66]:
26

Time to load some libraries.

In [67]:
library(fpc)
library(cluster)

Clustering techniques do not care about the values in the data frame per se, rather they care about how far each value is from the others. So we create a distance matrix which looks like this:

In [68]:
d = dist(data)
as.matrix(d)
Out[68]:
1234567891017181920212223242526
10 1.11346306629362 0.5665686189686123.2771176359722 1.24249748490691 3.68388382010074 5.10985322685496 3.4704322497349 3.64447527087234 3.11517254738803 ⋯ 7.83263046492045 6.07630644388513 7.52389526774529 7.24702007724554 9.17011450310191 7.72415691192249 7.93386412790136 7.52997343952819 5.63120768574557 8.04627242889526
21.11346306629362 0 0.9184770002564023.26813402417955 1.91201464429538 3.01479684224327 4.72628818418852 3.56147441377865 3.32445484252682 3.8278061601915 ⋯ 7.8444438936103 5.81937281844014 7.51120496325323 7.22990318054122 9.34753443427731 7.68011718660594 7.94790538443935 7.40216860116007 5.24172681470525 8.10632469125189
30.5665686189686120.9184770002564020 3.63737542742016 1.66655332947974 3.81431514167353 5.32997185733658 3.85241482709222 3.90579057298263 3.56565561993864 ⋯ 7.60191423261273 5.84230262824513 7.29014403149897 6.99886419356741 8.9205156801611 7.50318598996453 7.70869638784665 7.33064117250326 5.41286430644626 7.79363201594738
43.2771176359722 3.26813402417955 3.63737542742016 0 2.42101218501684 2.25554871372799 2.88438901675901 0.6571149062378671.49375366108338 2.54137757918811 ⋯ 9.84266224148731 7.71122558352432 9.523481506256 9.32974812093017 11.5898964620052 9.5825518521947 9.91523575110547 9.1041309305172 6.98518432111852 10.2737626992256
51.242497484906911.912014644295381.666553329479742.421012185016840 3.542343856827 4.7641158676086 2.608696992753283.231934405274961.93858195596678⋯ 8.690517821165786.932149738717428.391555278969458.1393918691754910.05851877763328.5662827410727 8.783359266248888.324920420040066.443810984192518.94275684562652
63.683883820100743.014796842243273.814315141673532.255548713727993.542343856827 0 2.294907405539492.584124609998521.325216963368644.56461389385784⋯ 9.351796618832136.863708909911618.997710819980828.7998011341166111.32075085848999.009184202801059.426314231978488.417463988636965.957256079773649.8418646607236
75.109853226854964.726288184188525.329971857336582.884389016759014.7641158676086 2.294907405539490 2.698833081166751.613381541979455.17622449281327⋯ 9.437155291717947.0156610522459 9.108578374257978.9970272868320211.60204292355459.019068688063089.481508318827768.287430241033716.1101800300809510.0644175191613
83.4704322497349 3.56147441377865 3.85241482709222 0.6571149062378672.60869699275328 2.58412460999852 2.69883308116675 0 1.51588258120476 2.50870484513424 ⋯ 9.62481168646951 7.55339658696669 9.31530461122984 9.14145502641675 11.394064244158 9.35801795253674 9.69039214892772 8.86651002367899 6.85374350264146 10.0801289674289
93.644475270872343.324454842526823.905790572982631.493753661083383.231934405274961.325216963368641.613381541979451.515882581204760 3.83263616848769⋯ 9.193862082933386.854057192641458.859379210757388.6910125992314611.15611043330078.857144009216519.257451053070718.264738350365366.017067392010839.71071058162069
103.115172547388033.8278061601915 3.565655619938642.541377579188111.938581955966784.564613893857845.176224492813272.508704845134243.832636168487690 ⋯ 9.947090026736468.3629779385097 9.677938830143549.4637518986921911.24554578488759.8329497100310610.02611091101639.585165621938937.9089190158959110.2169760692682
112.587431158504512.849034924320873.026780467757781.016710381573831.750257123967792.701888228628273.310860915230360.978212655816721.879095527108722.02852162916741⋯ 9.155823283572057.172537905093298.847135129520748.6496647333870710.80839488545828.931013380350529.22909529694 8.517270689604746.536719360657919.55158102096192
121.83907585487929 2.24744299149055 2.26790652364686 1.63777287802674 0.8420213774008363.02754025571915 4.03021091259502 1.78126359643934 2.51129448691307 1.76898275853667 ⋯ 8.90604850649265 7.0211751153208 8.60102319494605 8.37415667395828 10.422485308217 8.73085333744646 8.98967185163063 8.40267219401066 6.45291407040261 9.23018418017756
131.759318049699941.5528683137987 2.024895059009231.722353041626481.462668793678192.094325667130113.564070706369332.042963533693152.004844133592432.87716874722356⋯ 8.659185873972226.539181906018528.329237660194368.0936703664036110.31438800899028.438139605386968.746393542483678.045141390926585.861629466283259.0183756852329
142.033838735003341.540551849176132.201158785730821.976512079396432.008905174466931.6780643611018 3.258481241314732.265612499965521.791284455355993.44514150652771⋯ 8.428202655370846.191025763151058.088232192512777.8582059021127710.17854115283728.174429643712158.513571518463927.734015774486115.453567639628218.82099767600015
153.09299207887767 2.64556232207824 3.25892620352164 1.96837496427891 2.93598365118064 1.32385799842732 2.10104735786702 2.01625891194559 0.9468896451012663.94974682732957 ⋯ 8.38579155476691 6.02849069004838 8.04617921749199 7.86290658726148 10.3377850625751 8.05945407580439 8.45521141072179 7.49351052578163 5.20152862147273 8.88401373254229
163.30682627302978 2.80226693946169 3.45648665555069 2.11910358406568 3.18143049586189 1.17341382299681 1.9466638127833 2.1803210772728 0.9381364506296514.19192080077856 ⋯ 8.45343717076078 6.04800793650273 8.11051786262752 7.93139332021808 10.4425523699908 8.11171375234605 8.52147287738452 7.52110364241845 5.19039497533666 8.96651548819273
177.83263046492045 7.8444438936103 7.60191423261273 9.84266224148731 8.69051782116578 9.35179661883213 9.43715529171794 9.62481168646951 9.19386208293338 9.94709002673646 ⋯ 0 2.73572659452658 0.3737646318206160.6712674578735372.51234949797993 0.6320601237224210.14456832294801 1.64620776331543 3.79236074233452 0.88283633817373
186.076306443885135.819372818440145.842302628245137.711225583524326.932149738717426.863708909911617.0156610522459 7.553396586966696.854057192641458.3629779385097 ⋯ 2.735726594526580 2.373625918294622.258893534454425.078356033206022.287072364399522.795836189765061.726557268091621.060471593207473.38549848619077
197.52389526774529 7.51120496325323 7.29014403149897 9.523481506256 8.39155527896945 8.99771081998082 9.10857837425797 9.31530461122984 8.85937921075738 9.67793883014354 ⋯ 0.3737646318206162.37362591829462 0 0.3806573262134862.8013925108774 0.4989989979949860.4713809499757081.43041951888249 3.4319236588246 1.11512331156693
207.24702007724554 7.22990318054122 6.99886419356741 9.32974812093017 8.13939186917549 8.79980113411661 8.99702728683202 9.14145502641675 8.69101259923146 9.46375189869219 ⋯ 0.6712674578735372.25889353445442 0.3806573262134860 2.82589808733436 0.8275868534480230.8001874780324921.62302187292716 3.31363848360077 1.1323427043082
219.170114503101919.347534434277318.9205156801611 11.589896462005210.058518777633211.320750858489911.602042923554511.394064244158 11.156110433300711.2455457848875⋯ 2.512349497979935.078356033206022.8013925108774 2.825898087334360 3.128801687547492.528240494889684.1473244387195 6.125316318362671.71087696810729
227.72415691192249 7.68011718660594 7.50318598996453 9.5825518521947 8.5662827410727 9.00918420280105 9.01906868806308 9.35801795253674 8.85714400921651 9.83294971003106 ⋯ 0.6320601237224212.28707236439952 0.4989989979949860.8275868534480233.12880168754749 0 0.6104096984812761.02151847756171 3.32377797092405 1.49482440440341
237.93386412790136 7.94790538443935 7.70869638784665 9.91523575110547 8.78335926624888 9.42631423197848 9.48150831882776 9.69039214892772 9.25745105307071 10.0261109110163 ⋯ 0.14456832294801 2.79583618976506 0.4713809499757080.8001874780324922.52824049488968 0.6104096984812760 1.62188162330054 3.84849321163491 0.947048045243746
247.529973439528197.402168601160077.330641172503269.1041309305172 8.324920420040068.417463988636968.287430241033718.866510023678998.264738350365369.58516562193893⋯ 1.646207763315431.726557268091621.430419518882491.623021872927164.1473244387195 1.021518477561711.621881623300540 2.634843448859912.50834606862769
255.631207685745575.241726814705255.412864306446266.985184321118526.443810984192515.957256079773646.110180030080956.853743502641466.017067392010837.90891901589591⋯ 3.792360742334521.060471593207473.4319236588246 3.313638483600776.125316318362673.323777970924053.848493211634912.634843448859910 4.43961710060676
268.04627242889526 8.10632469125189 7.79363201594738 10.2737626992256 8.94275684562652 9.8418646607236 10.0644175191613 10.0801289674289 9.71071058162069 10.2169760692682 ⋯ 0.88283633817373 3.38549848619077 1.11512331156693 1.1323427043082 1.71087696810729 1.49482440440341 0.9470480452437462.50834606862769 4.43961710060676 0

In the cluster library we have a tool called pamk() which we shall use to determine the statistically significant number of clusters to form. Here is is 2 according to 'nc'.

In [69]:
pamk(d)$nc
Out[69]:
2

Using the argument 2 for number of cluster, we generate pam.data. Plotting the results we can see the choosen clusters on the first two principle axes as well as a Silhouette of the clusters (see below).

In [74]:
pam.data = pam(d, 2)
clusplot(pam.data)
plot(pam.data)

The 'Silhouette' is the measure of how well-supported the particular cluster is according to the data. A value of less than ~0.3 generally indicates poor support while a value of >0.7 is excellent support.

Loading another library, we will try another approach using heiarchtical clustering.

In [46]:
library(pvclust)

This method, pvclust, attempts to determine how significnatly different each member is from the others and forms a dendrogram from the results.

In [61]:
pv.data = pvclust(scale(d), method.dist="cor", method.hclust="average", nboot=1000)
Bootstrap (r = 0.5)... Done.
Bootstrap (r = 0.58)... Done.
Bootstrap (r = 0.69)... Done.
Bootstrap (r = 0.77)... Done.
Bootstrap (r = 0.88)... Done.
Bootstrap (r = 1.0)... Done.
Bootstrap (r = 1.08)... Done.
Bootstrap (r = 1.19)... Done.
Bootstrap (r = 1.27)... Done.
Bootstrap (r = 1.38)... Done.
In [63]:
plot(pv.data)
pvrect(pv.data, alpha=0.95)

Here the data was paritioned through the use of pvclust, which bootstraps significance factors by rearranging the provided data. Two clusters of p>0.95 significance were found.

In [ ]:
In [ ]: