Open source Python library that is fast and efficient in performing mathematical operations. Being written in C programming language it's execution time is very low. Because it facilitates operations on multi-dimensional arrays and matrices it is commonly applied in science and engineering. Arbitrary data types can be defined using NumPy and integration with a wide variety of databases is also supported.
numpy.ndarray()
numpy.zeros()
numpy.ones()
numpy.empty()
numpy.arange()
numpy.reshape()
numpy.linspace()
ndarray.ndim - array's number of dimensions
ndarray.shape - tuple of integers that represents the size of the array in each dimension
1. Initialization.
Start with each data point as a singleton cluster. Each data point is considered a cluster of its own.
2. Compute pairwise distances.
Calculate the distance or similarity between all pairs of clusters. The distance between clusters can be computed using various metrics, such as Euclidean distance, Manhattan distance, or cosine similarity.
3. Merge closest clusters.
Identify the two clusters that are closest to each other based on the distance metric. Merge these two clusters into a single cluster.
4. Update distance matrix.
Update the distance to matrix to reflect the distances between the newly formed cluster and all other clusters. Depending on the linkage criteria chosen (e.g., single linkage, complete linkage, avarage linkage), the distance between clusters may be computed differently.
5. Repeat.
Repeat stept 2-4 until only one cluster remains or a stopping criterion is met. This stopping criterion can be based on the number of desired clusters, a specified distance threshold, or other criteria.The process of merging clusters continues iteratively until the desired number of clusters is obtained or until clusters become too dissimilar to merge further.One key aspect of agglomerative clustering is the choice of linkage criteria, which determines how the distance between clusters is computed. There are several common linkage criteria:'Single linkage' - the distance between two clusters is defined as the minimum distance between any two points in the two clusters. It tends to produce elongated clusters.'Complete linkage' - the distance between two clusters is defined as the maximum distance between any two points in the two clusters. It tends to produce compact, spherical clusters.'Average linkage' - the distance between two clusters is defined as the average distance between all pairs of points in the two clusters. It provides a balance between single and complete linkage.Agglomerative clustering is intuitive and easy to understand, making it a popular choice for hierarchical clustering tasks. It can produce dendrogram visualizations that illustrate the hierarchical relationships between clusters, which can be helpful for understanding the structure of the data.Agglomerative clustering can be used in various real-world applications across different domains. Some of the use cases include:Market SegmentationIn marketing, agglomerative clustering can be used to segment customers based on their purchasing behavior or demographic information. By identifying groups of customers with similar characteristics, businesses can tailor their makreting strategies to target each segment more effecrively.Genomic AnalysisIn bioinformatics, agglomerative clustering can be applied to analyze gene expression data or DNA sequences. By clustering genes or genomic regions with similar expression patterns or sequences, researchers can uncover insights into genetic regulation, disease mechanisms, or evolutionary relationships.Image SegmentationIn computer vision, agglomerative clustering can be used for image segmentation tasks. By clustering pixels based on their color or intensity values, images can be partitioned into distinct regions or objects. This is useful for tasks such as object detection, image segmentation, or content-based image retrieval.Anomaly DetectionAgglomerative clustering can also be used for anomaly detection in various domains, such as network security, fraud detection, or equipment maintenance. By clustering data points and identifying clusters with significantly different characteristics from the rest, anomalies or outliers can be detected.Document ClusteringIn natural language processing (NLP), agglomerative clustering can be used for document clustering or topic modeling. By clustering documents based on their similarity in terms of word usage or semantic content, documents can be organized into thematic groups or topics, enabling tasks such as document classification, summarization, or recommendation.These are just a few examples of how agglomerative clustering can be applied in practice. Its flexibility and versatility make it a valuable tool for exploratory data analysis, pattern recognition, and knowledge discovery in various fields.
#import NumPy library
import numpy as np
#create Numpy array
regular_array = np.array([1, 2, 3, 4])
#print the created array
print(regular_array)
#array of zeros (shape of it within parentheses)
zeros_array = np.zeros((3, 3))
print(zeros_array)
#Output:
[[0. 0. 0.]
[0. 0. 0.]
[0. 0. 0.]]
#array of ones
ones_array = np.ones((3, 3))
print(ones_array)
#Output:
[[1. 1. 1.]
[1. 1. 1.]
[1. 1. 1.]]
#empty array
empty_array = np.empty((2, 3))
print(empty_array)
[[0. 0. 0.]
[0. 0. 0.]
[0. 0. 0.]]
#arange method on array
array_arange = np.arange(12)
print(array_arange)
[0 1 2 3 4 5 6 7 8 9 10 11]
array_arrange.reshape(3, 4)
#Output:
[[0, 1, 2, 3],
[4, 5, 6, 7],
[8, 9, 10, 11]]
#linspace - equaly spaced data elements
linear_data = np.linspace(11, 20, 5)
#11 - first element
#20 - last element
#5 - number of equidistant elements
print(linear_data)
#Output:
[11. 13.25 15.5 17.75 20. ]
#One dimensional array
one_dimension = np.arrange(15)
print(one_dimension)
#Output:
[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14]
#Two dimensional array
two_dimensions = one_dimension.reshape(3, 5)
print(two_dimensions)
#Output:
[[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]
[10, 11, 12, 13, 14]
#Three dimensional array
three_dimensions = np.arrange(27).reshape(3, 3, 3)
print(three_dimensions)
#Output:
[[[ 0 1 2 ]
[ 3 4 5 ]
[ 6 7 8 ]]
[[ 9 10 11 ]
[ 12 13 14 ]
[ 15 16 17 ]]
[[ 18 19 20 ]
[ 21 22 23 ]
[ 24 25 26 ]]]