Python has become the first choice for data science, numerical computing, and exploratory analysis. At the heart of this ecosystem are two foundational libraries: NumPy, which provides high-performance arrays and mathematical operations SciPy, which extends NumPy with advanced statistical, scientific, and analytical tools In this article, we’ll walk through how NumPy and SciPy can be used for statistical analysis — starting with array creation and manipulation, and progressing to key descriptive statistics.
A Quick Overview of NumPy and SciPy ✅ NumPy (Numerical Python) NumPy provides: Multidimensional array objects Fast mathematical and logical operations Vectorized computations that run significantly faster than pure Python Efficient memory usage compared to lists A NumPy array can r…
Python has become the first choice for data science, numerical computing, and exploratory analysis. At the heart of this ecosystem are two foundational libraries: NumPy, which provides high-performance arrays and mathematical operations SciPy, which extends NumPy with advanced statistical, scientific, and analytical tools In this article, we’ll walk through how NumPy and SciPy can be used for statistical analysis — starting with array creation and manipulation, and progressing to key descriptive statistics.
A Quick Overview of NumPy and SciPy ✅ NumPy (Numerical Python) NumPy provides: Multidimensional array objects Fast mathematical and logical operations Vectorized computations that run significantly faster than pure Python Efficient memory usage compared to lists A NumPy array can replace a list in most mathematical tasks, while being faster, lighter, and easier to compute at scale. ✅ SciPy (Scientific Python) SciPy builds on NumPy by providing: Probability distributions Statistical tests Optimization Signal processing Linear algebra Interpolation Together, NumPy + SciPy form the foundation of scientific computing in Python.
Installing NumPy You can install NumPy in two ways: ✅ Using pip pip install numpy
✅ Using Anaconda (recommended for data science) NumPy comes preinstalled.
Creating Arrays in NumPy Let’s start by importing NumPy: import numpy as np
✅ Creating a 5×5 matrix a = np.arange(25).reshape(5,5) print(a)
np.arange() creates a sequence of numbers, which we reshaped into 5×5. ✅ Checking data type print(a.dtype)
By default, NumPy stores integers as 32-bit (int32). ✅ Number of elements a.size
Basic Array Creation ✅ 1D array arr = np.array([1, 2, 3, 4, 5])
✅ 2D array b = np.array([[1, 2],[3, 4]])
✅ 5D array You can also create arrays of higher dimension, though they are less common in statistical analysis.
Basic Operations with NumPy Let’s define two simple arrays: a = np.array([1,2,3]) b = np.array([4,5,6])
NumPy supports vectorized operations: a - b a * b a ** 2 # squaring a > 2 b < 4
These operations run element-wise and are blazing fast compared to Python loops.
Indexing and Slicing in NumPy Consider the earlier 5×5 matrix: a = np.arange(25).reshape(5,5)
✅ Slice the first row a[0, :]
✅ Slice the first column a[:, 0]
✅ Extract a specific element (2nd row, 3rd column) a[1, 2]
Remember: NumPy uses zero-based indexing.
Stacking Arrays NumPy allows you to join arrays: ✅ Vertical stacking (row-wise) np.vstack((a, b))
✅ Horizontal stacking (column-wise) np.hstack((a, b))
Arrays must have compatible shapes to stack.
Descriptive Statistics with NumPy and SciPy Descriptive statistics summarize and describe a dataset, forming the foundation of any statistical analysis. We’ll use a 7×4 array for examples: a = np.random.randint(1, 10, (7,4))
- Mean Mean (average) is computed using: np.mean(a)
✅ Mean by column np.mean(a, axis=0)
✅ Mean by row np.mean(a, axis=1)
Mean is widely used but sensitive to outliers.
- Median Median represents the middle value when data is sorted. np.median(a)
✅ Median by rows or columns np.median(a, axis=0)
Median is preferred over mean when data contains extreme values.
- Mode Mode is available through SciPy. from scipy import stats stats.mode(a, axis=0)
Mode is useful for categorical or discrete values.
- Range Range = max − min Using NumPy: np.ptp(a) # ptp = peak-to-peak np.ptp(a, axis=0)
Range is easy to compute but sensitive to outliers and gives no information about internal distribution.
Variance Variance measures the spread of data around the mean. np.var(a) np.var(a, axis=0) 1.
Standard Deviation Standard deviation is simply: np.std(a) np.std(a, axis=0)
It is widely used in finance, forecasting, simulations, and probability.
- Interquartile Range (IQR) IQR = Q3 − Q1 from scipy.stats import iqr iqr(a, axis=0, interpolation=‘linear’)
IQR is critical for detecting outliers (boxplot whiskers are based on IQR).
- Skewness Skewness describes the asymmetry of a distribution. from scipy.stats import skew skew(a, axis=0)
Positive skew → long right tail Negative skew → long left tail Skewness helps determine where most values lie relative to the average.
Conclusion
NumPy and SciPy together provide a powerful, efficient, and intuitive way to perform statistical analysis in Python. While descriptive statistics help summarize data, they cannot be used to generalize findings to a broader population. For that, inferential statistics — such as hypothesis testing, confidence intervals, or regression — are required. NumPy and SciPy both support these advanced techniques, making them an essential part of every data scientist’s toolkit.
At Perceptive Analytics, we help organizations transform data into actionable intelligence. Companies looking to hire Power BI consultants rely on us to build scalable dashboards, automate reporting, and strengthen their BI foundations. Our Tableau consultancy delivers advanced visualization, dashboard development, and analytics solutions that give leaders clarity and confidence in their decisions. With deep expertise across BI, analytics, and data engineering, we enable businesses to move faster with data-driven insights.