Sempre seguendo i post precedenti questa e' la stima degli errori relativi ai tre parametri usando NUTS (No-U-Turn Sampler) is an advanced Markov Chain Monte Carlo (MCMC) algorithm used to sample from the posterior distribution in Bayesian inference.
y = beta*x + alpha + sigma
alpha : distribuzione dell'errore per il parametro intercetta
beta : distribuzione dell'errore per il parametro coefficiente angolare
sigma : distribuzione del termine di errore
mean sd hdi_3% hdi_97% ... mcse_sd ess_bulk ess_tail r_hat
alpha 1.63 0.04 1.55 1.71 ... 0.0 3026.81 3754.21 1.0
beta -0.01 0.00 -0.01 -0.01 ... 0.0 3068.24 3519.73 1.0
sigma 0.09 0.02 0.06 0.12 ... 0.0 3733.28 3659.21 1.0
date;value
2002-08-02 15:10:10; 1.7195121951219514
2002-08-12 06:55:13; 1.4329268292682926
2002-08-21 17:38:00; 1.451219512195122
2002-09-04 09:06:26; 1.274390243902439
2002-09-13 04:42:21; 1.1402439024390243
2002-09-20 02:57:43; 1.207317073170732
2002-10-01 00:56:28; 1.0121951219512195
2002-10-02 07:10:10; 0.8841463414634148
2002-10-08 09:16:24; 0.8902439024390243
2002-10-15 07:31:46; 0.7073170731707319
2002-10-21 04:35:43; 0.5548780487804876
2002-10-30 00:11:38; 0.5548780487804876
2002-11-06 23:38:24; 0.6219512195121952
2002-11-12 05:35:30; 0.5548780487804876
2002-11-21 06:13:42; 0.40853658536585336
2002-11-27 03:17:39; 0.26829268292682906
2002-12-02 04:12:27; 0.21341463414634143
2002-12-14 03:22:38; 0.13414634146341475
2002-12-20 05:28:51; 0.12195121951219523
import pandas as pd
import pymc as pm
import numpy as np
import matplotlib.pyplot as plt
import arviz as az
# Step 1: Read the CSV file containing the time series data
# The CSV is assumed to have columns 'date' and 'value'
data = pd.read_csv('dataset.csv', sep=';')
# Convert the date column to datetime format (if it's not already)
data['date'] = pd.to_datetime(data['date'])
# Convert the date to a numeric format (e.g., number of days since the first date)
data['date_numeric'] = (data['date'] - data['date'].min()).dt.days
# Step 2: Prepare the data for regression
X = data['date_numeric'].values # Independent variable (time)
y = data['value'].values # Dependent variable (value)
# Step 3: Set up the Bayesian Linear Regression model with pymc3
with pm.Model() as model:
# Define the prior distributions for the coefficients (alpha, beta)
alpha = pm.Normal('alpha', mu=0, sigma=10)
beta = pm.Normal('beta', mu=0, sigma=10)
# Define the likelihood function (linear model with Gaussian noise)
sigma = pm.HalfNormal('sigma', sigma=1)
mu = alpha + beta * X # Linear relationship
# Observed data likelihood
y_obs = pm.Normal('y_obs', mu=mu, sigma=sigma, observed=y)
# Step 4: Inference with MCMC sampling (e.g., NUTS sampler)
trace = pm.sample(2000, return_inferencedata=True)
# Step 7: Make predictions using posterior predictive sampling within the same model context
posterior_predictive = pm.sample_posterior_predictive(trace, var_names=["alpha", "beta", "sigma"])
# Step 5: Posterior diagnostics and results
az.plot_trace(trace)
plt.show()
# Step 6: Summary of the posterior distribution
print(az.summary(trace, round_to=2))
# Step 7: Extract the posterior predictions
# Access posterior samples
alpha_samples = trace.posterior["alpha"].values
beta_samples = trace.posterior["beta"].values
# Ensure proper shape for broadcasting (flatten the samples)
alpha_samples = alpha_samples.flatten() # Shape: (num_samples,)
beta_samples = beta_samples.flatten() # Shape: (num_samples,)
# Make predictions using the posterior samples
y_pred_samples = alpha_samples[:, None] + beta_samples[:, None] * X[None, :]
# Calculate the mean prediction
y_pred_mean = np.mean(y_pred_samples, axis=0)
# Plot the data and the fitted regression line
plt.plot(data['date'], y, 'o', label='Data')
plt.plot(data['date'], y_pred_mean, label='Bayesian Linear Regression', color='red')
plt.xlabel('Date')
plt.ylabel('Value')
plt.legend()
plt.show()
# Step 1: Calculate the x-axis intercept (numeric form) from posterior samples
intercept_x_numeric_samples = -alpha_samples / beta_samples
# Step 2: Calculate the mean intercept (mean of the samples)
intercept_x_numeric_mean = np.mean(intercept_x_numeric_samples)
# Step 3: Convert the numeric intercept back to a date
intercept_date = data['date'].min() + pd.to_timedelta(intercept_x_numeric_mean, unit='D')
print(f"Date when the regression line crosses the x-axis: {intercept_date}")
# Step 1: Calculate the variance of the intercept (numerically)
alpha_var = np.var(alpha_samples)
beta_var = np.var(beta_samples)
beta_mean = np.mean(beta_samples)
alpha_mean = np.mean(alpha_samples)
# Step 2: Estimate the standard deviation of the intercept (using the delta method)
intercept_sd = np.sqrt((1 / beta_mean) ** 2 * alpha_var + (alpha_mean / beta_mean**2) ** 2 * beta_var)
print(f"Standard deviation of the intercept on the x-axis: {intercept_sd} days")
sempre per confronto con la regressione lineare ai minimi quadrati
Intercept: 1.6299999167194188
Slope: -0.01137953913185518
Intercept with x-axis (datetime): 2002-12-23 20:55:05.998311688
Standard error of intercept on x-axis: 6.648111410435301 days
import pandas as pd
import numpy as np
import statsmodels.api as sm
# Load the dataset
df = pd.read_csv('dataset.csv', sep=';')
# Convert 'date' to datetime if it's not already in datetime format
df['date'] = pd.to_datetime(df['date'], errors='coerce')
# Convert 'date' to the number of days since the earliest date
df['date_numeric'] = (df['date'] - df['date'].min()).dt.days
# Extract the numeric 'date' and 'value' for regression analysis
X = df['date_numeric'].values
Y = df['value'].values
# Add a constant (for intercept) to the X values for the regression
X = sm.add_constant(X) # Adds the intercept term
# Fit the linear regression model using Ordinary Least Squares (OLS)
model = sm.OLS(Y, X)
results = model.fit()
# Get the intercept and slope
intercept = results.params[0]
slope = results.params[1]
# Standard error of the intercept
stderr_intercept = results.bse[0]
# Calculate the intercept of the regression line with the x-axis (y = 0), which is x = -intercept / slope
intercept_x_axis = -intercept / slope
# Standard deviation of the residuals (errors)
std_dev = np.std(results.resid)
# Standard error for the intercept at the point of intercept on the x-axis
stderr_intercept_x = std_dev / abs(slope)
# Print results
print(f'Intercept: {intercept}')
print(f'Slope: {slope}')
intercept_x_axis_numeric = -intercept / slope
intercept_x_axis_datetime = df['date'].min() + pd.to_timedelta(intercept_x_axis_numeric, unit='D')
print(f'Intercept with x-axis (datetime): {intercept_x_axis_datetime}')
print(f'Standard error of intercept on x-axis: {stderr_intercept_x} days')
Nessun commento:
Posta un commento