Data Distributions in StatisticsΒΆ

This post aims to display and consider different data distributions commonly used in statistics and data analysis.

The code below aims to generate plots for various distributions. The code uses https://docs.scipy.org/doc/scipy/reference/stats.html for the distributions and matplotlib for plotting.

Properties of distributionsΒΆ

Different distributions have different properties:

Central tendencyΒΆ

Central tendency measures where the center of a distribution lies. Common measures include:

  • Mean (ΞΌ or xΜ„): The arithmetic average of all values in the distribution.
  • Median (M or xΜƒ): The middle value when the data is ordered from least to greatest.
  • Mode (Mo): The most frequently occurring value in the distribution.

DispersionΒΆ

Dispersion measures how spread out the values in a distribution are. Common measures include:

  • Range (R): The difference between the maximum and minimum values.
  • Variance (σ² or sΒ²): The average of the squared differences from the mean.
  • Standard Deviation (Οƒ or s): The square root of the variance, representing the average distance from the mean.
  • Interquartile Range (IQR or Q₃ - Q₁): The range between the first quartile (25th percentile) and the third quartile (75th percentile), representing the middle 50% of the data.

ShapeΒΆ

Shape describes the overall form of the distribution, including its symmetry and the presence of tails. Common measures include:

  • Skewness (γ₁ or skew): A measure of the asymmetry of the distribution. Positive skew indicates a longer tail on the right side, while negative skew indicates a longer tail on the left side.
  • Kurtosis (Ξ³β‚‚ or kurt): A measure of the "tailedness" of the distribution. Higher kurtosis indicates more frequent extreme deviations from the mean.
InΒ [1]:
# Import libraries
import numpy as np
from scipy.stats import norm, skewnorm
import matplotlib.pyplot as plt
import seaborn as sns

# Set seaborn style for better-looking plots
sns.set_style("whitegrid")

# Create x values ranging from -10 to 10 with 500 points
x = np.linspace(-10, 10, 500)


# ===== PLOT 1: Normal Distributions with Different Standard Deviations =====
plt.figure(figsize=(12, 6))

# Loop through 4 different standard deviation values
for std in [1, 2, 3, 4]:
    # Create a normal distribution with mean=0 and current std
    dist = norm(loc=0, scale=std)
    
    # Calculate the probability density function (PDF) values
    y = dist.pdf(x)
    
    # Plot the distribution curve
    plt.plot(x, y, label=f'Οƒ={std}', linewidth=2)
    
    # Shade the area within Β±1 standard deviation from the mean
    x_shade = x[(x >= -std) & (x <= std)]  # Select x values within Β±std
    y_shade = dist.pdf(x_shade)  # Get corresponding y values
    plt.fill_between(x_shade, y_shade, alpha=0.2)  # Fill with transparency

# Label the x-axis
plt.xlabel('x')
# Label the y-axis
plt.ylabel('Probability Density')
# Add title to the plot
plt.title('Normal Distributions (ΞΌ=0) with Increasing Standard Deviations')
# Display the legend
plt.legend()
# Show the plot
plt.show()


# ===== PLOT 2: How Skewness Changes the Shape =====
plt.figure(figsize=(12, 6))

# Create x values for skewed distributions
x_skew = np.linspace(-5, 10, 500)

# Loop through different skewness parameter values
# Negative values = left skew, Positive values = right skew
for skewness in [-5, -2, -1, -0.5, -0.25, 0]:
    # Create a skew-normal distribution
    # a = skewness parameter, loc = location, scale = spread
    dist = skewnorm(a=skewness, loc=0, scale=2)
    
    # Calculate PDF values
    y = dist.pdf(x_skew)
    
    # Get the actual mean of this distribution
    mean = dist.mean()
    
    # Get the actual skewness statistic
    actual_skew = dist.stats(moments='s')
    
    # Plot the distribution
    plt.plot(x_skew, y, label=f'Ξ±={skewness}, ΞΌ={mean:.2f}, γ₁={actual_skew:.2f}', linewidth=2)
    
    # Add a vertical line at the mean to show how it shifts
    # plt.axvline(mean, linestyle='--', linewidth=1, alpha=0.5)

# Label the axes
plt.xlabel('x')
plt.ylabel('Probability Density')
# Add title explaining what we're showing
plt.title('How Skewness Parameter (Ξ±) Changes the Shape of the Distribution')
# Display legend
plt.legend()
# Show the plot
plt.show()
No description has been provided for this image
/opt/anaconda3/envs/pymc_env/lib/python3.11/site-packages/IPython/core/pylabtools.py:170: UserWarning: Glyph 8321 (\N{SUBSCRIPT ONE}) missing from font(s) Arial.
  fig.canvas.print_figure(bytes_io, **kw)
No description has been provided for this image

Normal DistributionΒΆ

Normal continuous random variable. Also known as Gaussian distribution.