x
 
Let's start by loading some data from a csv file on the elemental composition of some pottery. A total of 5 elements are reported. 

Let's start by loading some data from a csv file on the elemental composition of some pottery. A total of 5 elements are reported.

In [65]:
data = read.csv('https://vincentarelbundock.github.io/Rdatasets/csv/car/Pottery.csv')
data[1:6,]
Out[65]:
XSiteAlFeMgCaNa
11Llanedyrn14.474.30.150.51
22Llanedyrn13.87.083.430.120.17
33Llanedyrn14.67.093.880.130.2
44Llanedyrn11.56.375.640.160.14
55Llanedyrn13.87.065.340.20.2
66Llanedyrn10.96.263.470.170.22
xxxxxxxxxx
First we need to trim the data because no one likes test in there dataframes. Notice that there are also 26 rows.

First we need to trim the data because no one likes test in there dataframes. Notice that there are also 26 rows.

In [66]:
 
data = data[,3:7]
head(data)
nrow(data)
Out[66]:
AlFeMgCaNa
114.474.30.150.51
213.87.083.430.120.17
314.67.093.880.130.2
411.56.375.640.160.14
513.87.065.340.20.2
610.96.263.470.170.22
Out[66]:
26
xxxxxxxxxx
 
Time to load some libraries.

Time to load some libraries.

In [67]:
 
library(fpc)
library(cluster)
xxxxxxxxxx
Clustering techniques do not care about the values in the data frame per se, rather they care about how far each value is from the others. So we create a distance matrix which looks like this:

Clustering techniques do not care about the values in the data frame per se, rather they care about how far each value is from the others. So we create a distance matrix which looks like this:

In [68]:
 
d = dist(data)
as.matrix(d)
Out[68]:
1234567891017181920212223242526
10 1.11346306629362 0.5665686189686123.2771176359722 1.24249748490691 3.68388382010074 5.10985322685496 3.4704322497349 3.64447527087234 3.11517254738803 7.83263046492045 6.07630644388513 7.52389526774529 7.24702007724554 9.17011450310191 7.72415691192249 7.93386412790136 7.52997343952819 5.63120768574557 8.04627242889526
21.11346306629362 0 0.9184770002564023.26813402417955 1.91201464429538 3.01479684224327 4.72628818418852 3.56147441377865 3.32445484252682 3.8278061601915 7.8444438936103 5.81937281844014 7.51120496325323 7.22990318054122 9.34753443427731 7.68011718660594 7.94790538443935 7.40216860116007 5.24172681470525 8.10632469125189
30.5665686189686120.9184770002564020 3.63737542742016 1.66655332947974 3.81431514167353 5.32997185733658 3.85241482709222 3.90579057298263 3.56565561993864 7.60191423261273 5.84230262824513 7.29014403149897 6.99886419356741 8.9205156801611 7.50318598996453 7.70869638784665 7.33064117250326 5.41286430644626 7.79363201594738
43.2771176359722 3.26813402417955 3.63737542742016 0 2.42101218501684 2.25554871372799 2.88438901675901 0.6571149062378671.49375366108338 2.54137757918811 9.84266224148731 7.71122558352432 9.523481506256 9.32974812093017 11.5898964620052 9.5825518521947 9.91523575110547 9.1041309305172 6.98518432111852 10.2737626992256
51.242497484906911.912014644295381.666553329479742.421012185016840 3.542343856827 4.7641158676086 2.608696992753283.231934405274961.938581955966788.690517821165786.932149738717428.391555278969458.1393918691754910.05851877763328.5662827410727 8.783359266248888.324920420040066.443810984192518.94275684562652
63.683883820100743.014796842243273.814315141673532.255548713727993.542343856827 0 2.294907405539492.584124609998521.325216963368644.564613893857849.351796618832136.863708909911618.997710819980828.7998011341166111.32075085848999.009184202801059.426314231978488.417463988636965.957256079773649.8418646607236
75.109853226854964.726288184188525.329971857336582.884389016759014.7641158676086 2.294907405539490 2.698833081166751.613381541979455.176224492813279.437155291717947.0156610522459 9.108578374257978.9970272868320211.60204292355459.019068688063089.481508318827768.287430241033716.1101800300809510.0644175191613
83.4704322497349 3.56147441377865 3.85241482709222 0.6571149062378672.60869699275328 2.58412460999852 2.69883308116675 0 1.51588258120476 2.50870484513424 9.62481168646951 7.55339658696669 9.31530461122984 9.14145502641675 11.394064244158 9.35801795253674 9.69039214892772 8.86651002367899 6.85374350264146 10.0801289674289
93.644475270872343.324454842526823.905790572982631.493753661083383.231934405274961.325216963368641.613381541979451.515882581204760 3.832636168487699.193862082933386.854057192641458.859379210757388.6910125992314611.15611043330078.857144009216519.257451053070718.264738350365366.017067392010839.71071058162069
103.115172547388033.8278061601915 3.565655619938642.541377579188111.938581955966784.564613893857845.176224492813272.508704845134243.832636168487690 9.947090026736468.3629779385097 9.677938830143549.4637518986921911.24554578488759.8329497100310610.02611091101639.585165621938937.9089190158959110.2169760692682
112.587431158504512.849034924320873.026780467757781.016710381573831.750257123967792.701888228628273.310860915230360.978212655816721.879095527108722.028521629167419.155823283572057.172537905093298.847135129520748.6496647333870710.80839488545828.931013380350529.22909529694 8.517270689604746.536719360657919.55158102096192
121.83907585487929 2.24744299149055 2.26790652364686 1.63777287802674 0.8420213774008363.02754025571915 4.03021091259502 1.78126359643934 2.51129448691307 1.76898275853667 8.90604850649265 7.0211751153208 8.60102319494605 8.37415667395828 10.422485308217 8.73085333744646 8.98967185163063 8.40267219401066 6.45291407040261 9.23018418017756
131.759318049699941.5528683137987 2.024895059009231.722353041626481.462668793678192.094325667130113.564070706369332.042963533693152.004844133592432.877168747223568.659185873972226.539181906018528.329237660194368.0936703664036110.31438800899028.438139605386968.746393542483678.045141390926585.861629466283259.0183756852329
142.033838735003341.540551849176132.201158785730821.976512079396432.008905174466931.6780643611018 3.258481241314732.265612499965521.791284455355993.445141506527718.428202655370846.191025763151058.088232192512777.8582059021127710.17854115283728.174429643712158.513571518463927.734015774486115.453567639628218.82099767600015
153.09299207887767 2.64556232207824 3.25892620352164 1.96837496427891 2.93598365118064 1.32385799842732 2.10104735786702 2.01625891194559 0.9468896451012663.94974682732957 8.38579155476691 6.02849069004838 8.04617921749199 7.86290658726148 10.3377850625751 8.05945407580439 8.45521141072179 7.49351052578163 5.20152862147273 8.88401373254229
163.30682627302978 2.80226693946169 3.45648665555069 2.11910358406568 3.18143049586189 1.17341382299681 1.9466638127833 2.1803210772728 0.9381364506296514.19192080077856 8.45343717076078 6.04800793650273 8.11051786262752 7.93139332021808 10.4425523699908 8.11171375234605 8.52147287738452 7.52110364241845 5.19039497533666 8.96651548819273
177.83263046492045 7.8444438936103 7.60191423261273 9.84266224148731 8.69051782116578 9.35179661883213 9.43715529171794 9.62481168646951 9.19386208293338 9.94709002673646 0 2.73572659452658 0.3737646318206160.6712674578735372.51234949797993 0.6320601237224210.14456832294801 1.64620776331543 3.79236074233452 0.88283633817373
186.076306443885135.819372818440145.842302628245137.711225583524326.932149738717426.863708909911617.0156610522459 7.553396586966696.854057192641458.3629779385097 2.735726594526580 2.373625918294622.258893534454425.078356033206022.287072364399522.795836189765061.726557268091621.060471593207473.38549848619077
197.52389526774529 7.51120496325323 7.29014403149897 9.523481506256 8.39155527896945 8.99771081998082 9.10857837425797 9.31530461122984 8.85937921075738 9.67793883014354 0.3737646318206162.37362591829462 0 0.3806573262134862.8013925108774 0.4989989979949860.4713809499757081.43041951888249 3.4319236588246 1.11512331156693
207.24702007724554 7.22990318054122 6.99886419356741 9.32974812093017 8.13939186917549 8.79980113411661 8.99702728683202 9.14145502641675 8.69101259923146 9.46375189869219 0.6712674578735372.25889353445442 0.3806573262134860 2.82589808733436 0.8275868534480230.8001874780324921.62302187292716 3.31363848360077 1.1323427043082
219.170114503101919.347534434277318.9205156801611 11.589896462005210.058518777633211.320750858489911.602042923554511.394064244158 11.156110433300711.24554578488752.512349497979935.078356033206022.8013925108774 2.825898087334360 3.128801687547492.528240494889684.1473244387195 6.125316318362671.71087696810729
227.72415691192249 7.68011718660594 7.50318598996453 9.5825518521947 8.5662827410727 9.00918420280105 9.01906868806308 9.35801795253674 8.85714400921651 9.83294971003106 0.6320601237224212.28707236439952 0.4989989979949860.8275868534480233.12880168754749 0 0.6104096984812761.02151847756171 3.32377797092405 1.49482440440341
237.93386412790136 7.94790538443935 7.70869638784665 9.91523575110547 8.78335926624888 9.42631423197848 9.48150831882776 9.69039214892772 9.25745105307071 10.0261109110163 0.14456832294801 2.79583618976506 0.4713809499757080.8001874780324922.52824049488968 0.6104096984812760 1.62188162330054 3.84849321163491 0.947048045243746
247.529973439528197.402168601160077.330641172503269.1041309305172 8.324920420040068.417463988636968.287430241033718.866510023678998.264738350365369.585165621938931.646207763315431.726557268091621.430419518882491.623021872927164.1473244387195 1.021518477561711.621881623300540 2.634843448859912.50834606862769
255.631207685745575.241726814705255.412864306446266.985184321118526.443810984192515.957256079773646.110180030080956.853743502641466.017067392010837.908919015895913.792360742334521.060471593207473.4319236588246 3.313638483600776.125316318362673.323777970924053.848493211634912.634843448859910 4.43961710060676
268.04627242889526 8.10632469125189 7.79363201594738 10.2737626992256 8.94275684562652 9.8418646607236 10.0644175191613 10.0801289674289 9.71071058162069 10.2169760692682 0.88283633817373 3.38549848619077 1.11512331156693 1.1323427043082 1.71087696810729 1.49482440440341 0.9470480452437462.50834606862769 4.43961710060676 0
xxxxxxxxxx
 
In the cluster library we have a tool called pamk() which we shall use to determine the statistically significant number of clusters to form. Here is is 2 according to 'nc'.

In the cluster library we have a tool called pamk() which we shall use to determine the statistically significant number of clusters to form. Here is is 2 according to 'nc'.

In [69]:
 
pamk(d)$nc
Out[69]:
2
x
 
Using the argument 2 for number of cluster, we generate pam.data. Plotting the results we can see the choosen clusters on the first two principle axes as well as a Silhouette of the clusters (see below).

Using the argument 2 for number of cluster, we generate pam.data. Plotting the results we can see the choosen clusters on the first two principle axes as well as a Silhouette of the clusters (see below).

In [74]:
pam.data = pam(d, 2)
clusplot(pam.data)
plot(pam.data)
xxxxxxxxxx
 
The 'Silhouette' is the measure of how well-supported the particular cluster is according to the data. A value of less than ~0.3 generally indicates poor support while a value of >0.7 is excellent support.

The 'Silhouette' is the measure of how well-supported the particular cluster is according to the data. A value of less than ~0.3 generally indicates poor support while a value of >0.7 is excellent support.

xxxxxxxxxx
 
Loading another library, we will try another approach using heiarchtical clustering.

Loading another library, we will try another approach using heiarchtical clustering.

In [46]:
 
library(pvclust)
xxxxxxxxxx
This method, pvclust, attempts to determine how significnatly different each member is from the others and forms a dendrogram from the results.

This method, pvclust, attempts to determine how significnatly different each member is from the others and forms a dendrogram from the results.

In [61]:
 
pv.data = pvclust(scale(d), method.dist="cor", method.hclust="average", nboot=1000)
Bootstrap (r = 0.5)... Done.
Bootstrap (r = 0.58)... Done.
Bootstrap (r = 0.69)... Done.
Bootstrap (r = 0.77)... Done.
Bootstrap (r = 0.88)... Done.
Bootstrap (r = 1.0)... Done.
Bootstrap (r = 1.08)... Done.
Bootstrap (r = 1.19)... Done.
Bootstrap (r = 1.27)... Done.
Bootstrap (r = 1.38)... Done.
In [63]:
 
plot(pv.data)
pvrect(pv.data, alpha=0.95)
xxxxxxxxxx
 
Here the data was paritioned through the use of pvclust, which bootstraps significance factors by rearranging the provided data. Two clusters of p>0.95 significance were found.

Here the data was paritioned through the use of pvclust, which bootstraps significance factors by rearranging the provided data. Two clusters of p>0.95 significance were found.

In [ ]:
x
 
In [ ]: