Skip to content

Commit fb11db2

Browse files
authored
Merge pull request #111 from PyDataBlog/revert-110-revert-108-stable-release
Revert "Revert "Updated benchmark files. Table & Plot left""
2 parents 1805cb5 + e727b75 commit fb11db2

12 files changed

+330
-264
lines changed

.gitignore

+2-1
Original file line numberDiff line numberDiff line change
@@ -12,4 +12,5 @@
1212
.idea/*
1313
.vscode/*
1414
test/experiments.jl
15-
/extras/.ipynb_checkpoints/*
15+
/extras/.ipynb_checkpoints/*
16+
.Rproj.user

Project.toml

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
name = "ParallelKMeans"
22
uuid = "42b8e9d4-006b-409a-8472-7f34b3fb58af"
33
authors = ["Bernard Brenyah", "Andrey Oskin"]
4-
version = "0.2.2"
4+
version = "1.0.0"
55

66
[deps]
77
Distances = "b4f34e82-e78d-54a5-968a-f98e89d6e8f7"

README.md

+2-1
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,8 @@
22

33
[![Stable](https://img.shields.io/badge/docs-stable-blue.svg)](https://PyDataBlog.github.io/ParallelKMeans.jl/stable)
44
[![Dev](https://img.shields.io/badge/docs-dev-blue.svg)](https://PyDataBlog.github.io/ParallelKMeans.jl/dev)
5-
[![Build Status](https://github.com/PyDataBlog/ParallelKMeans.jl/actions/workflows/CI.yml/badge.svg)](https://github.com/PyDataBlog/ParallelKMeans.jl/actions/workflows/CI.yml/badge.svg)
5+
[![ColPrac: Contributor's Guide on Collaborative Practices for Community Packages](https://img.shields.io/badge/ColPrac-Contributor's%20Guide-blueviolet)](https://github.com/SciML/ColPrac)
6+
[![Build Status](https://github.com/PyDataBlog/ParallelKMeans.jl/actions/workflows/CI.yml/badge.svg)](https://github.com/PyDataBlog/ParallelKMeans.jl/actions)
67
[![codecov](https://codecov.io/gh/PyDataBlog/ParallelKMeans.jl/branch/master/graph/badge.svg?token=799USS6BPH)](https://codecov.io/gh/PyDataBlog/ParallelKMeans.jl)
78
[![FOSSA Status](https://app.fossa.com/api/projects/git%2Bgithub.com%2FPyDataBlog%2FParallelKMeans.jl.svg?type=shield)](https://app.fossa.com/projects/git%2Bgithub.com%2FPyDataBlog%2FParallelKMeans.jl?ref=badge_shield)
89
[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/PyDataBlog/ParallelKMeans.jl/master)

docs/src/benchmark_image.png

-75.1 KB
Loading

docs/src/index.md

+20-20
Original file line numberDiff line numberDiff line change
@@ -79,14 +79,15 @@ pkg> free ParallelKMeans
7979
- [X] Support for weighted K-means.
8080
- [X] Support of MLJ Random generation hyperparameter.
8181
- [X] Implementation of [Mini-batch KMeans variant](https://www.eecs.tufts.edu/~dsculley/papers/fastkmeans.pdf)
82+
- [X] Add contribution guidelines.
8283
- [ ] Support for other distance metrics supported by [Distances.jl](https://github.com/JuliaStats/Distances.jl#supported-distances).
8384
- [ ] Implementation of [Geometric methods to accelerate k-means algorithm](http://cs.baylor.edu/~hamerly/papers/sdm2016_rysavy_hamerly.pdf).
8485
- [ ] Native support for tabular data inputs outside of MLJModels' interface.
85-
- [ ] GPU support?
8686
- [ ] Distributed calculations support.
87-
- [ ] Optimization of code base.
88-
- [ ] Improved Documentation
87+
- [ ] Further optimization of code base.
88+
- [ ] Improved Documentation with more tutorials.
8989
- [ ] More benchmark tests.
90+
- [ ] GPU support?
9091

9192
## How To Use
9293

@@ -127,7 +128,7 @@ r.converged # whether the procedure converged
127128
- [Elkan()](https://www.aaai.org/Papers/ICML/2003/ICML03-022.pdf) - Recommended for high dimensional data.
128129
- [Yinyang()](https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/ding15.pdf) - Recommended for large dimensions and/or large number of clusters.
129130
- [Coreset()](http://proceedings.mlr.press/v51/lucic16-supp.pdf) - Recommended for very fast clustering of very large datasets, when extreme accuracy is not important.
130-
- [MiniBatch()](https://www.eecs.tufts.edu/~dsculley/papers/fastkmeans.pdf) - Recommended for extremely large datasets, when extreme accuracy is not important.
131+
- [MiniBatch()](https://www.eecs.tufts.edu/~dsculley/papers/fastkmeans.pdf) - Recommended for extremely large datasets, when extreme accuracy is not important. *Experimental Implementation*
131132
- [Geometric()](http://cs.baylor.edu/~hamerly/papers/sdm2016_rysavy_hamerly.pdf) - (Coming soon)
132133

133134
### Practical Usage Examples
@@ -187,17 +188,17 @@ Currently, the benchmark speed tests are based on the search for optimal number
187188

188189
_________________________________________________________________________________________________________
189190

190-
|1 million sample (secs)|100k sample (secs)|10k sample (secs)|1k sample (secs)|package |language |
191-
|:---------------------:|:----------------:|:---------------:|:--------------:|:---------------------:|:---------:|
192-
| 538.53100 | 33.15700 | 0.74238 | 0.01710 | Clustering.jl | Julia |
193-
| 220.35700 | 20.93600 | 0.82430 | 0.02639 | mlpack |C++ Wrapper|
194-
| 20.55400 | 2.91300 | 0.17559 | 0.00609 | Lloyd | Julia |
195-
| 11.51800 | 0.96637 | 0.09990 | 0.00635 | Hamerly | Julia |
196-
| 14.01900 | 1.13100 | 0.07912 | 0.00646 | Elkan | Julia |
197-
| 9.97000 | 1.14600 | 0.10834 | 0.00704 | Yinyang | Julia |
198-
| 1,430.00000 | 146.00000 | 5.77000 | 0.34400 | Sklearn KMeans | Python |
199-
| 30.10000 | 3.75000 | 0.61300 | 0.20100 |Sklearn MiniBatchKMeans| Python |
200-
| 218.20000 | 15.51000 | 0.73370 | 0.01947 | Knor | R |
191+
|1 million sample (secs)|100k sample (secs)|10k sample (secs)|1k sample (secs)|package |language |process |
192+
|-----------------------|------------------|-----------------|----------------|------------------------|-----------|----------|
193+
|282.7 |15.27 |0.7324 |0.01682 |Knor |R |full scan |
194+
|854 |87 |6.11 |0.000719 |Sklearn KMeans |Python |full scan |
195+
|11.2 |1.41 |0.000317 |0.000141 |Sklearn MiniBatch Kmeans|Python |stochastic|
196+
|254.481 |18.517 |0.000794956 |0.000031211 |Mlpack |C++ Wrapper|full scan |
197+
|653.178 |45.468 |0.000824115 |0.000017301 |Clustering.jl |Julia |full scan |
198+
|19.955 |2.758 |0.000166957 |0.000009206 |ParallelKMeans Lloyd |Julia |full scan |
199+
|11.234 |1.654 |0.000109074 |0.000012819 |ParallelKMeans Hamerly |Julia |full scan |
200+
|19.394 |1.436 |0.000109262 |0.000013726 |ParallelKMeans Elkan |Julia |full scan |
201+
|14.080 |0.000972914 |0.000095325 |0.000009802 |ParallelKMeans YingYang |Julia |full scan |
201202

202203
_________________________________________________________________________________________________________
203204

@@ -214,15 +215,14 @@ ________________________________________________________________________________
214215
- 0.1.8 Minor cleanup
215216
- 0.1.9 Added travis support for Julia 1.5
216217
- 0.2.0 Updated MLJ Interface
217-
- 0.2.1 Mini-batch implementation
218+
- 0.2.1 Initial Mini-batch implementation
219+
- 0.2.2 Updated MLJInterface
220+
- 1.0.0 Stable public release
218221

219222
## Contributing
220223

221224
Ultimately, we see this package as potentially the one-stop-shop for everything related to KMeans algorithm and its speed up variants. We are open to new implementations and ideas from anyone interested in this project.
222-
223-
Detailed contribution guidelines will be added in upcoming releases.
224-
225-
<!--- TODO: Contribution Guidelines --->
225+
This project adopts the [ColPrac community guidelines](https://github.com/SciML/ColPrac).
226226

227227
```@index
228228
```

0 commit comments

Comments
 (0)