DHOB (IU5SGN): febbraio 2025

giovedì 27 febbraio 2025

XGBoost

Leggendo alcuni articoli ho visto che viene consigliato l'utlilizzo di XGBoost al posto di LSTM sul forecasting di serie tempo...proviamoci con i dati gia' usati nei tentativi precedenti

Guardando il grafico si conferma che se il modello non conosce una accelerazione non e' in grado di prevederla anche se si usa variabile forzante come la pioggia

Inoltre non vedo particolari motivi per privilegiare XGBoost su LSTM

# -*- coding: utf-8 -*-
"""xgboost.ipynb

Automatically generated by Colab.

Original file is located at
https://colab.research.google.com/drive/1zmFI2Djb4bD1hVbJfvFGYgqh0EpfRnRj
"""

import pandas as pd
import matplotlib.pyplot as plt

dati = pd.read_csv('/content/prima.csv')
print(dati)
train = dati.head(int(len(dati)*0.8))
test = dati.tail(int(len(dati)*0.2))

train["Data"] = pd.to_datetime(train["Data"])
test["Data"] = pd.to_datetime(test["Data"])

train = train.set_index("Data")
test = test.set_index("Data")

train["Est"].plot( , figsize=(10, 5), label="train")
test["Est"].plot(style="b", figsize=(10, 5), label="test")
plt.title("Dati")
plt.legend()

X_train = train.drop('Est', axis =1)
y_train = train['Est']

X_test = test.drop('Est', axis =1)
y_test = test['Est']

!pip install xgboost

import xgboost as xgb

reg = xgb.XGBRegressor(n_estimators=1000)
reg.fit(X_train, y_train, verbose = False)

xgb.plot_importance(reg)

test['Est_Prediction'] = reg.predict(X_test)

train['Est'].plot(style='k', figsize=(10,5), label = 'train')
test['Est'].plot(style='b', figsize=(10,5), label = 'test')
test['Est_Prediction'].plot(style='r', figsize=(10,5), label = 'prediction')
plt.title('XGBoost')
plt.legend()

mercoledì 26 febbraio 2025

Ipad 1

Mi e' capitato di tirare fuori dalla discarica informatica di ufficio un paio di Ipad prima generazione per vedere se erano utilizzabili

Una volta connessi all'alimentazione gli Ipad mostravano lo schermo flashare circa una volta al secondo...nessun segno dell'icona di ricarica...nessun altro segno di vita

Ho provato a metterli in DFU ed incredibilmente i dispositivi sono stati visti ma il ripristino via ITunes e' fallito con un errore generico che non spiega niente sulle condizioni interne

Ho trovato il programma idevicerestore su Linux che e' molto interessante, specialmente in modo verbose, per capire come funziona il sistema Ipad

La procedura sembra funzionare per il ripristino di IOs ma quando alla fine il dispositivo deve riavviarsi la procedura fallisce miseramennte (il problema e' segnalato da altre persone https://github.com/libimobiledevice/idevicerestore/issues/324)

Sembra che la batteria sia completamente morta e che non si ricarichi

Purtroppo mi sa che sono diventati dei fermacarte...non so se avro' il tempo e la voglia di aprirli per vedere come sono dentro

martedì 25 febbraio 2025

ConvLSTM (2)

Proviamo un approccio differente

Fino ad adesso avevo usato solo i dati di allungamento come variabile dipendente e pioggia e temperatura come variabili indipendenti. Adesso provo a mettere la derivata seconda dell'allungamento

Deformazione

Derivata prima della deformazione

Derivata seconda della deformazione

Confronto tra pioggia e derivata seconda della deformazione

Grafico di previsione tramite ConvLSTM

# -*- coding: utf-8 -*-
"""convlstm.ipynb

Automatically generated by Colab.

Original file is located at
    https://colab.research.google.com/drive/1yXzW-fOBePUvgY63uMJTjSprP5vMy0tn
"""

import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import ConvLSTM2D, BatchNormalization, Flatten, Dense
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split

df = pd.read_csv("prima2.csv", parse_dates=["Data"])
df.set_index("Data", inplace=True)

features = df[['Temp', 'Rain']] # Input variables
target = df[['Acc']] # What we are predicting

scaler = MinMaxScaler()
features_scaled = scaler.fit_transform(features)

def create_sequences(data, labels, seq_length=10):
    X, y = [], []
    for i in range(len(data) - seq_length):
        X.append(data[i:i+seq_length]) # Take the last `seq_length` time steps
        y.append(labels[i+seq_length]) # Predict the next step
    return np.array(X), np.array(y)

seq_length = 20 # Lookback period
X, y = create_sequences(features_scaled, target.to_numpy(), seq_length)

X = X.reshape((X.shape[0], seq_length, 1, X.shape[2], 1)) # (samples, time, height, width, channels)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, shuffle=False)

model = Sequential([
    ConvLSTM2D(filters=64, kernel_size=(1, 1), activation='relu', return_sequences=True,
               input_shape=(seq_length, 1, X.shape[3], 1)),
    BatchNormalization(),
    ConvLSTM2D(filters=32, kernel_size=(1, 1), activation='relu', return_sequences=False),
    BatchNormalization(),
    Flatten(),
    Dense(32, activation='relu'),
    Dense(1, activation='sigmoid')
])

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

model.fit(X_train, y_train, epochs=50, batch_size=32, validation_data=(X_test, y_test))

y_pred = model.predict(X_test)
#y_pred = (y_pred > 0.5).astype(int) # Convert probabilities to binary (0 or 1)

# Commented out IPython magic to ensure Python compatibility.
import matplotlib.pyplot as plt
# %matplotlib inline

plt.plot(y_test, label="Actual Acc,")
plt.plot(y_pred, linestyle="dashed", label="Predicted Acc")
plt.legend()
plt.show()

sabato 22 febbraio 2025

ConvLSTM

Continua l'esplorazione della previsione di serie tempo stavolta con ConvLSTM (il codice e' stato riadattato partendo da Gemini AI)

Il dataset e' sempre multivariato con Est come variabile che vuole essere prevista

Data,Est,Temp,Rain
2023-10-01,-55.7,18.7,0
2023-10-02,-55.6,19,0
2023-10-03,-55.6,19.2,0
2023-10-04,-55.5,19.5,0.2
2023-10-05,-55.9,18.8,0.2
2023-10-06,-55.7,18.1,0

import pandas as pd
import numpy as np
import tensorflow as tf
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
import matplotlib.pyplot as plt

# 1. Caricamento e preparazione dei dati
def load_and_prepare_data(filepath):
    df = pd.read_csv(filepath, parse_dates=['Data'], index_col='Data')
    df = df.fillna(method='ffill')  # Gestione dei valori mancanti
    scaler = MinMaxScaler()
    scaled_data = scaler.fit_transform(df)
    return scaled_data, scaler, df.columns.get_loc('Est')

# 2. Creazione delle sequenze per il training
def create_sequences(data, target_index, seq_length):
    xs, ys = [], []
    for i in range(len(data) - seq_length):
        x = data[i:i + seq_length]
        y = data[i + seq_length, target_index]
        xs.append(x)
        ys.append(y)
    return np.array(xs), np.array(ys)

# 3. Costruzione del modello LSTM
def build_model(input_shape):
    model = tf.keras.Sequential([
        tf.keras.layers.LSTM(50, return_sequences=True, input_shape=input_shape),
        tf.keras.layers.LSTM(50),
        tf.keras.layers.Dense(1)
    ])
    model.compile(optimizer='adam', loss='mse')
    return model

# 4. Addestramento e valutazione del modello
def train_and_evaluate_model(model, X_train, y_train, X_test, y_test):
    model.fit(X_train, y_train, epochs=50, batch_size=32, validation_split=0.1, verbose=1)
    loss = model.evaluate(X_test, y_test)
    print(f'Test Loss: {loss}')
    return model

# 5. Previsioni e inversione della scala
def make_predictions(model, X_test, scaler, target_index):
    predictions = model.predict(X_test)
    dummy = np.zeros((len(predictions), X_test.shape[2]))
    dummy[:, target_index] = predictions[:, 0]
    predictions_original = scaler.inverse_transform(dummy)[:, target_index]
    return predictions_original

def plot_predictions(predictions, y_test, scaler, target_index):
    # Inverti la scala dei dati reali
    dummy_real = np.zeros((len(y_test), X_test.shape[2]))
    dummy_real[:, target_index] = y_test
    y_test_original = scaler.inverse_transform(dummy_real)[:, target_index]

    plt.figure(figsize=(12, 6))
    plt.plot(y_test_original, label='Dati reali')
    plt.plot(predictions, label='Previsioni')
    plt.legend()
    plt.title('Previsioni vs. Dati reali')
    plt.xlabel('Tempo')
    plt.ylabel('Valore di est')
    plt.show()

def make_single_prediction(model, scaler, seq_length, temp, rain, target_index):
    # Crea una sequenza di input fittizia
    dummy_data = np.zeros((seq_length, 3))
    dummy_data[:, 1] = temp  # temp nella seconda colonna
    dummy_data[:, 2] = rain  # rain nella terza colonna

    # Scala la sequenza di input
    scaled_dummy_data = scaler.transform(dummy_data)

    # Rimodella la sequenza per l'input del modello
    input_data = np.reshape(scaled_dummy_data, (1, seq_length, 3))

    # Effettua la previsione
    prediction = model.predict(input_data)

    # Inverti la scala della previsione
    dummy_prediction = np.zeros((1, 3))
    dummy_prediction[0, target_index] = prediction[0, 0]
    prediction_original = scaler.inverse_transform(dummy_prediction)[0, target_index]

    return prediction_original



# Main
filepath = 'prima.csv'
seq_length = 20  # Lunghezza della sequenza per il modello LSTM
prediction_steps = 20  # dati futuri previsti

scaled_data, scaler, target_index = load_and_prepare_data(filepath)
X, y = create_sequences(scaled_data, target_index, seq_length)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, shuffle=False)

model = build_model(X_train.shape[1:])
trained_model = train_and_evaluate_model(model, X_train, y_train, X_test, y_test)
predictions = make_predictions(trained_model, X_test, scaler, target_index)

print(predictions)
plot_predictions(predictions, y_test, scaler, target_index)

# Esempio di singola previsione
temp_input = 25.0
rain_input = 10.0
single_prediction = make_single_prediction(trained_model, scaler, seq_length, temp_input, rain_input, target_index)
print(f"Previsione per temp={temp_input} e rain={rain_input}: {single_prediction}")

venerdì 21 febbraio 2025

(Non) comprare Epson XP810

promemoria...non devo comprare stampanti Epson

Oggi mi e' capitato di mettere le mani su una Epson XP 810 che mostrava un errore (waste ink pad) e non permetteva di essere resettata

Un breve giro su Google ed ho scoperto che la stampante ha un contatore in un chip che, una volta raggiunto un determinato numero, blocca la stampante anche se non ci sono reali problemi hardware

Conoscendo il codice di sblocco si puo' resettare il contatore via software e continuare ad usare il dispositivo...ma questi codici sono in vendita su siti pirati e quindi eviterei

LSTM BiLSTM multivariato

Ho provato ad estendere la prova di questo post utilizzando un dataset multivariato composto da temperatura e pioggia (dati giornalieri)

I dati sono estesi da ottobre 2023 a marzo 2024. La cosa interessante e' che il dataset di addestramento non comprende la accelerazione che e' iniziata il 3 di marzo e quindi la rete di fatto "non conosce" il trend di accelerazione

E' stato utilizzato lo script in calce (riadattando questo esempio) usando sia LSTM che BiLSTM

La differenza risulta nel fatto che LSTM si puo' usare per forecasting puro mentre per una post-analisi e' piu' conveniente BiLSTM

LSTM

BiLSTM

# Commented out IPython magic to ensure Python compatibility.
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
# %matplotlib inline
import seaborn as sns
import time

df = pd.read_csv('/content/ridotto.csv')
df['date'] = pd.to_datetime(df['Data'])

del df['Data']
columns = list(df.columns)
print(columns)

df_corr = df.corr()
df_corr

look_back = 20
sample_size = len(df) - look_back
past_size = int(sample_size*0.8)
future_size = sample_size - past_size +1

def make_dataset(raw_data, look_back=20):
    _X = []
    _y = []

    for i in range(len(raw_data) - look_back):
        _X.append(raw_data[i : i + look_back])
        _y.append(raw_data[i + look_back])
    _X = np.array(_X).reshape(len(_X), look_back, 1)
    _y = np.array(_y).reshape(len(_y), 1)

    return _X, _y

from sklearn import preprocessing

Xs = []
for i in range(len(columns)):
    Xs.append(preprocessing.minmax_scale(df[columns[i]]))
Xs = np.array(Xs)

X_est, y_est = make_dataset(Xs[0], look_back=look_back)
X_temp, y_temp = make_dataset(Xs[1], look_back=look_back)
X_rain, y_rain = make_dataset(Xs[2], look_back=look_back)

X_con = np.concatenate([X_est, X_temp, X_rain], axis=2)

X = X_con
y = y_est

X.shape

y.shape

X_past = X[:past_size]
X_future = X[past_size-1:]
y_past = y[:past_size]
y_future = y[past_size-1:]

X_train = X_past
y_train = y_past

import tensorflow as tf
from tensorflow.keras.layers import Input, LSTM, Dense, BatchNormalization, Bidirectional
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import SGD, Adam

# LSTM
def create_LSTM_model():
    input = Input(shape=(np.array(X_train).shape[1], np.array(X_train).shape[2]))
    x = LSTM(64, return_sequences=True)(input)
    x = BatchNormalization()(x)
    x = LSTM(64)(x)
    output = Dense(1, activation='relu')(x)
    model = Model(input, output)
    return model

# Bidirectional-LSTM
def create_BiLSTM_model():
    input = Input(shape=(np.array(X_train).shape[1], np.array(X_train).shape[2]))
    x = Bidirectional(LSTM(64, return_sequences=True))(input)
    x = BatchNormalization()(x)
    x = Bidirectional(LSTM(64))(x)
    output = Dense(1, activation='relu')(x)
    model = Model(input, output)
    return model

model = create_BiLSTM_model()
model.summary()
model.compile(optimizer=Adam(learning_rate=0.0001), loss='mean_squared_error')

t1 = time.time()
history = model.fit(X_train, y_train, epochs = 350, batch_size = 64, verbose = 0)
t2 = time.time()

tt = t2 - t1
t_h = tt//3600
t_m = (tt - t_h*3600)//60
t_s = (tt - t_h*3600 - t_m*60)//1
print('Training Time : '+str(t_h)+' h, '+str(t_m)+' m, '+str(t_s)+' s')

predictions = model.predict(X_past)
future_predictions = model.predict(X_future)

plt.figure(figsize=(18, 9))
plt.plot(df['date'][look_back:], y, color="b", label="Real data")
plt.plot(df['date'][look_back:look_back + past_size], predictions, color="r", linestyle="dashed", label="prediction")
plt.plot(df['date'][-future_size:], future_predictions, color="g", linestyle="dashed", label="future_predisction")
plt.legend()
plt.show()

se si vuole aggiungere in dataset di validazione

X_train, X_val, y_train, y_val = train_test_split(X_past, y_past, test_size=0.2, shuffle=False)

history = model.fit(X_train, y_train, epochs=350, batch_size=64, verbose=0, validation_data=(X_val, y_val))

# ... (codice successivo)

# Visualizzazione della perdita di addestramento e validazione
plt.figure(figsize=(12, 6))
plt.plot(history.history['loss'], label='Training Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.legend()
plt.show()

mercoledì 12 febbraio 2025

Grafico Voight Log(v)-Log(a)

Nel precedente post la previsione del tempo dell'evento di frana mostra una finestra di errore di +/- 6 giorni

Proviamo un approccio differente

la relazione lineare tra inverso della velocita' e e tempo e' solo una semplificazione che vale per il valore di alfa=2...ma e' realmente questo il valore corretto

Per stimarlo si puo' usare il grafico log10(v) vs log10(a) (A relation to describe rate-dependent material failure Barry Voight Science 243)

Il valore della pendenza della retta di regressione indica il valore di alfa (in questo caso 1.72)

α=10log(A)−β⋅log(V)

Il valore di alfa e' molto differente da 2. Come si puo' interpretare?

La formula completa e'

seppure non cambi il tempo stimato dal modello per l'evento si vede che il valore di alfa modifica la forma della serie tempo. Con i dati a disposizione ed il valore di alfa=1.72 si ha che la forma della curva risulterebbe di tipo concavo

Se si fa l'ipotesi di andamento lineare e siamo in realta' in condizioni di concavita' la regressione lineare crea una stima dell'evento anticipata rispetto a quella del modello non lineare

A rendere le cose ancora piu' complicate che alfa non ' da considerarsi a priori costante ma puo' essere funzione del tempo

SLO (Shear Line Optimization) Method

il valore del coefficiente angolare della retta corrisponde al tempo accumulo di

martedì 11 febbraio 2025

Stima dell'errore di Linear Bayesian Regression con PyMC

Sempre seguendo i post precedenti questa e' la stima degli errori relativi ai tre parametri usando NUTS (No-U-Turn Sampler) is an advanced Markov Chain Monte Carlo (MCMC) algorithm used to sample from the posterior distribution in Bayesian inference.

y = beta*x + alpha + sigma

alpha : distribuzione dell'errore per il parametro intercetta

beta : distribuzione dell'errore per il parametro coefficiente angolare

sigma : distribuzione del termine di errore

mean sd hdi_3% hdi_97% ... mcse_sd ess_bulk ess_tail r_hat

alpha 1.63 0.04 1.55 1.71 ... 0.0 3026.81 3754.21 1.0

beta -0.01 0.00 -0.01 -0.01 ... 0.0 3068.24 3519.73 1.0

sigma 0.09 0.02 0.06 0.12 ... 0.0 3733.28 3659.21 1.0

date;value
2002-08-02 15:10:10; 1.7195121951219514
2002-08-12 06:55:13; 1.4329268292682926
2002-08-21 17:38:00; 1.451219512195122
2002-09-04 09:06:26; 1.274390243902439
2002-09-13 04:42:21; 1.1402439024390243
2002-09-20 02:57:43; 1.207317073170732
2002-10-01 00:56:28; 1.0121951219512195
2002-10-02 07:10:10; 0.8841463414634148
2002-10-08 09:16:24; 0.8902439024390243
2002-10-15 07:31:46; 0.7073170731707319
2002-10-21 04:35:43; 0.5548780487804876
2002-10-30 00:11:38; 0.5548780487804876
2002-11-06 23:38:24; 0.6219512195121952
2002-11-12 05:35:30; 0.5548780487804876
2002-11-21 06:13:42; 0.40853658536585336
2002-11-27 03:17:39; 0.26829268292682906
2002-12-02 04:12:27; 0.21341463414634143
2002-12-14 03:22:38; 0.13414634146341475
2002-12-20 05:28:51; 0.12195121951219523

import pandas as pd
import pymc as pm
import numpy as np
import matplotlib.pyplot as plt
import arviz as az

# Step 1: Read the CSV file containing the time series data
# The CSV is assumed to have columns 'date' and 'value'
data = pd.read_csv('dataset.csv', sep=';')

# Convert the date column to datetime format (if it's not already)
data['date'] = pd.to_datetime(data['date'])

# Convert the date to a numeric format (e.g., number of days since the first date)
data['date_numeric'] = (data['date'] - data['date'].min()).dt.days

# Step 2: Prepare the data for regression
X = data['date_numeric'].values  # Independent variable (time)
y = data['value'].values  # Dependent variable (value)

# Step 3: Set up the Bayesian Linear Regression model with pymc3
with pm.Model() as model:
    # Define the prior distributions for the coefficients (alpha, beta)
    alpha = pm.Normal('alpha', mu=0, sigma=10)
    beta = pm.Normal('beta', mu=0, sigma=10)

    # Define the likelihood function (linear model with Gaussian noise)
    sigma = pm.HalfNormal('sigma', sigma=1)
    mu = alpha + beta * X  # Linear relationship

    # Observed data likelihood
    y_obs = pm.Normal('y_obs', mu=mu, sigma=sigma, observed=y)

    # Step 4: Inference with MCMC sampling (e.g., NUTS sampler)
    trace = pm.sample(2000, return_inferencedata=True)

    # Step 7: Make predictions using posterior predictive sampling within the same model context
    posterior_predictive = pm.sample_posterior_predictive(trace, var_names=["alpha", "beta", "sigma"])

# Step 5: Posterior diagnostics and results
az.plot_trace(trace)
plt.show()

# Step 6: Summary of the posterior distribution
print(az.summary(trace, round_to=2))

# Step 7: Extract the posterior predictions
# Access posterior samples
alpha_samples = trace.posterior["alpha"].values
beta_samples = trace.posterior["beta"].values

# Ensure proper shape for broadcasting (flatten the samples)
alpha_samples = alpha_samples.flatten()  # Shape: (num_samples,)
beta_samples = beta_samples.flatten()  # Shape: (num_samples,)

# Make predictions using the posterior samples
y_pred_samples = alpha_samples[:, None] + beta_samples[:, None] * X[None, :]

# Calculate the mean prediction
y_pred_mean = np.mean(y_pred_samples, axis=0)

# Plot the data and the fitted regression line
plt.plot(data['date'], y, 'o', label='Data')
plt.plot(data['date'], y_pred_mean, label='Bayesian Linear Regression', color='red')
plt.xlabel('Date')
plt.ylabel('Value')
plt.legend()
plt.show()

# Step 1: Calculate the x-axis intercept (numeric form) from posterior samples
intercept_x_numeric_samples = -alpha_samples / beta_samples

# Step 2: Calculate the mean intercept (mean of the samples)
intercept_x_numeric_mean = np.mean(intercept_x_numeric_samples)

# Step 3: Convert the numeric intercept back to a date
intercept_date = data['date'].min() + pd.to_timedelta(intercept_x_numeric_mean, unit='D')

print(f"Date when the regression line crosses the x-axis: {intercept_date}")

# Step 1: Calculate the variance of the intercept (numerically)
alpha_var = np.var(alpha_samples)
beta_var = np.var(beta_samples)
beta_mean = np.mean(beta_samples)
alpha_mean = np.mean(alpha_samples)

# Step 2: Estimate the standard deviation of the intercept (using the delta method)
intercept_sd = np.sqrt((1 / beta_mean) ** 2 * alpha_var + (alpha_mean / beta_mean**2) ** 2 * beta_var)

print(f"Standard deviation of the intercept on the x-axis: {intercept_sd} days")

sempre per confronto con la regressione lineare ai minimi quadrati

Intercept: 1.6299999167194188

Slope: -0.01137953913185518

Intercept with x-axis (datetime): 2002-12-23 20:55:05.998311688

Standard error of intercept on x-axis: 6.648111410435301 days

import pandas as pd
import numpy as np
import statsmodels.api as sm

# Load the dataset
df = pd.read_csv('dataset.csv', sep=';')

# Convert 'date' to datetime if it's not already in datetime format
df['date'] = pd.to_datetime(df['date'], errors='coerce')

# Convert 'date' to the number of days since the earliest date
df['date_numeric'] = (df['date'] - df['date'].min()).dt.days

# Extract the numeric 'date' and 'value' for regression analysis
X = df['date_numeric'].values
Y = df['value'].values

# Add a constant (for intercept) to the X values for the regression
X = sm.add_constant(X)  # Adds the intercept term

# Fit the linear regression model using Ordinary Least Squares (OLS)
model = sm.OLS(Y, X)
results = model.fit()

# Get the intercept and slope
intercept = results.params[0]
slope = results.params[1]

# Standard error of the intercept
stderr_intercept = results.bse[0]

# Calculate the intercept of the regression line with the x-axis (y = 0), which is x = -intercept / slope
intercept_x_axis = -intercept / slope

# Standard deviation of the residuals (errors)
std_dev = np.std(results.resid)

# Standard error for the intercept at the point of intercept on the x-axis
stderr_intercept_x = std_dev / abs(slope)

# Print results
print(f'Intercept: {intercept}')
print(f'Slope: {slope}')
intercept_x_axis_numeric = -intercept / slope
intercept_x_axis_datetime = df['date'].min() + pd.to_timedelta(intercept_x_axis_numeric, unit='D')
print(f'Intercept with x-axis (datetime): {intercept_x_axis_datetime}')
print(f'Standard error of intercept on x-axis: {stderr_intercept_x} days')

lunedì 10 febbraio 2025

Bayesian linear regression Vajont

Nello stesso articolo sono presentati anche i dati della frana del Vajont avvenuta nella notte edl 9 ottobre 1963 (digitalizzando il grafico il punto risulta essere alle 22:30 del 9/10/1963, l'orario esatto e' le 22:39)

Le previsioni risultano essere

Metodo minimi quadrati : 07/10/1963 14:30 circa

Bayesian regression : 07/10/1963 15:10 circa

In questo caso i due metodi sono sostanzialmente sovrapponibili e sbagliano entrambi la data dell'evento

Grafico originale dell'articolo

grafici originali dell'invaso del Vajont. Si vede che le condizioni al contorno non sono stabili e questo genera i flessi nella serie di misure

Ascissa dell'intercetta con l'asse x: x = 2438310.10911
Errore stimato per l'ascissa dell'intercetta: σ_x = 83814.68063
Valore di RMSE: 0.00745
Valore di R^2: 0.99006
Coefficiente angolare (m): -0.00372
Intercetta (b): 9082.54821
Coefficiente angolare (m): -0.00372 ± 0.00009
Intercetta (b): 9082.54821 ± 220.76079

Mean Squared Error (MSE): 2.22082811850103e-06
R^2 Score: 0.999641381986245
Coefficiente: [-0.003733]
Intercetta: 9102.213990030079
Intercetta sull'asse x: 2438310.1325 ± 2608880.4621
Coefficiente angolare (m): -0.0037 ± 0.0040
Intercetta sull'asse y (b): 9102.2140 ± 0.0087

se si prendono in considerazione solo gli ultimi 6 punti si ha che la stima dell'evento e'

Metodo minimi quadrati : 10/10/1963 05:30 circa

Bayesian regression : 10/10/1963 12:00 circa

Mean Squared Error (MSE): 1.1478977160850574e-06
R^2 Score: 0.8087633959041971
Coefficiente: [-0.00254709]
Intercetta: 6210.613078964549
Intercetta sull'asse x: 2438312.9889 ± 2810932.1096
Coefficiente angolare (m): -0.0025 ± 0.0029
Intercetta sull'asse y (b): 6210.6131 ± 0.0008

Ascissa dell'intercetta con l'asse x: x = 2438312.73126
Errore stimato per l'ascissa dell'intercetta: σ_x = 203901.17932
Valore di RMSE: 0.00055
Valore di R^2: 0.98621
Coefficiente angolare (m): -0.00271
Intercetta (b): 6599.21777
Coefficiente angolare (m): -0.00271 ± 0.00016
Intercetta (b): 6599.21777 ± 390.21805

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import BayesianRidge
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.model_selection import train_test_split

# Crea un DataFrame dai dati
data = {
    "Tempo": [

2438249.6968,
2438254.7664,
2438259.6571,
2438264.6074,
2438270.0944,
2438274.3290,
2438280.2336,
2438285.4225,
2438290.1342,
2438295.3827,
2438300.0348,
2438303.3151,
2438304.1501,
2438305.2833,
2438306.4165,
2438307.4304,
2438308.2654,
2438309.2793,
2438310.4125

    ],
    "1/V": [

0.2248,
0.2060,
0.1903,
0.1770,
0.1649,
0.1310,
0.1038,
0.0802,
0.0591,
0.0452,
0.0300,
0.0246,
0.0228,
0.0210,
0.0161,
0.0143,
0.0119,
0.0095,
0.0065

    ]
}
df = pd.DataFrame(data)

# Variabili indipendente (Tempo) e dipendente (1/V)
X = df[["Tempo"]]
y = df["1/V"]

# Dividi il dataset in training e test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Inizializza il modello Bayesian Ridge
model = BayesianRidge()

# Addestra il modello
model.fit(X_train, y_train)

# Effettua previsioni
y_pred = model.predict(X_test)

# Calcola metriche
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

# Stampa i risultati
print("Mean Squared Error (MSE):", mse)
print("R^2 Score:", r2)
print("Coefficiente:", model.coef_)
print("Intercetta:", model.intercept_)

# Incertezze (varianza -> deviazione standard)
sigma_m = np.sqrt(1 / model.lambda_) # Deviazione standard di m
sigma_b = np.sqrt(1 / model.alpha_)   # Deviazione standard di b

# Calcolo punto di intercetta sull'asse x
m = model.coef_[0]         # Coefficiente angolare (slope)
b = model.intercept_       # Intercetta sull'asse y
x_intercept = -b / m

# Propagazione errore per l'intercetta sull'asse x
sigma_x = np.sqrt((sigma_b / m) ** 2 + (b * sigma_m / m**2) ** 2)

# Risultati
print(f"Intercetta sull'asse x: {x_intercept:.4f} ± {sigma_x:.4f}")
print(f"Coefficiente angolare (m): {m:.4f} ± {sigma_m:.4f}")
print(f"Intercetta sull'asse y (b): {b:.4f} ± {sigma_b:.4f}")

# Visualizza il fit del modello
plt.figure(figsize=(10, 6))
plt.scatter(df["Tempo"], df["1/V"], color="blue", label="Dati originali")
plt.plot(df["Tempo"], model.predict(df[["Tempo"]]), color="red", label="Fit del modello")
intercept_x = -model.intercept_ / model.coef_[0]
plt.axvline(x=intercept_x, color="green", linestyle="--", label=f"Intersezione x={intercept_x:.2f}")
plt.xlabel("Tempo")
plt.ylabel("1/V")
plt.title("Regressione Lineare Bayesiana: Tempo vs 1/V")
plt.legend()
plt.grid()
plt.show()

Bayesian linear regression (2)

Per migliorare il post precedente ho inserito le date in formato giuliano (dati digitalizzati con https://apps.automeris.io/wpd4/)

L'evento e' avvenuto il 28 dicembre 2002 ore 10 corrispondente al tempo giuliano 2452636,91522

Base 1-2

Il metodo ai minimi quadrati ha stimato l'evento per il 24 dicembre 2002 alle 01 (circa)

Il metodo di regressione bayesiana ha stimato l'evento per 25 dicembre 2002 ore 6 (circa)

Base 15-13

Il metodo ai minimi quadrati ha stimato l'evento per il 21 dicembre 2002 alle 05 (circa)

Il metodo di regressione bayesiana ha stimato l'evento per 23 dicembre 2002 ore 18 (circa)

Al di la' di quale sia il metodo migliore e' da mettere in evidenzia il valore di σ_x

Base 1-2

Metodo minimi quadrati

Ascissa dell'intercetta con l'asse x: x = 2452632.56444
Errore stimato per l'ascissa dell'intercetta: σ_x = 136727.00244
Valore di RMSE: 0.07491
Valore di R^2: 0.97426
Coefficiente angolare (m): -0.01137
Intercetta (b): 27882.92583
Coefficiente angolare (m): -0.01137 ± 0.00045
Intercetta (b): 27882.92583 ± 1099.10446

Regressione lineare bayesiana

Mean Squared Error (MSE): 0.009615241747958183
R^2 Score: 0.9475633566354236
Coefficiente: [-0.01111325]
Intercetta: 27256.72908552682
Intercetta sull'asse x: 2452633.7423 ± 2474970.5684
Coefficiente angolare (m): -0.0111 ± 0.0112
Intercetta sull'asse y (b): 27256.7291 ± 0.0710

Base 15-13

Metodo minimi quadrati

Ascissa dell'intercetta con l'asse x: x = 2452629.70635
Errore stimato per l'ascissa dell'intercetta: σ_x = 290006.65020
Valore di RMSE: 0.03778
Valore di R^2: 0.88275
Coefficiente angolare (m): -0.00254
Intercetta (b): 6217.58752
Coefficiente angolare (m): -0.00254 ± 0.00021
Intercetta (b): 6217.58752 ± 519.84916

Regressione lineare bayesiana

Mean Squared Error (MSE): 0.002548090926111798
R^2 Score: 0.8769276021004735
Coefficiente: [-0.00238676]
Intercetta: 5853.843312339611
Intercetta sull'asse x: 2452632.2719 ± 2861671.0348
Coefficiente angolare (m): -0.0024 ± 0.0028
Intercetta sull'asse y (b): 5853.8433 ± 0.0348

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import BayesianRidge
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.model_selection import train_test_split

# Crea un DataFrame dai dati
data = {
    "Tempo": [
        2452489.13,2452498.78,2452508.23,2452521.87,2452530.69,2452537.62,2452548.53,2452549.79,2452555.88,
    2452562.81,2452568.69,2452577.50,2452585.48,2452590.73,2452599.75,2452605.63,2452610.67,2452622.64,2452628.72
    ],
    "1/V": [
        1.71,1.43,1.45,1.27,1.14,1.2,1.01,0.88,0.89,0.7,0.55,0.55,0.62,0.55,0.4,0.26,0.21,0.13,0.12
    ]
}
df = pd.DataFrame(data)

# Variabili indipendente (Tempo) e dipendente (1/V)
X = df[["Tempo"]]
y = df["1/V"]

# Dividi il dataset in training e test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Inizializza il modello Bayesian Ridge
model = BayesianRidge()

# Addestra il modello
model.fit(X_train, y_train)

# Effettua previsioni
y_pred = model.predict(X_test)

# Calcola metriche
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

# Stampa i risultati
print("Mean Squared Error (MSE):", mse)
print("R^2 Score:", r2)
print("Coefficiente:", model.coef_)
print("Intercetta:", model.intercept_)

# Incertezze (varianza -> deviazione standard)
sigma_m = np.sqrt(1 / model.lambda_) # Deviazione standard di m
sigma_b = np.sqrt(1 / model.alpha_)   # Deviazione standard di b

# Calcolo punto di intercetta sull'asse x
m = model.coef_[0]         # Coefficiente angolare (slope)
b = model.intercept_       # Intercetta sull'asse y
x_intercept = -b / m

# Propagazione errore per l'intercetta sull'asse x
sigma_x = np.sqrt((sigma_b / m) ** 2 + (b * sigma_m / m**2) ** 2)

# Risultati
print(f"Intercetta sull'asse x: {x_intercept:.4f} ± {sigma_x:.4f}")
print(f"Coefficiente angolare (m): {m:.4f} ± {sigma_m:.4f}")
print(f"Intercetta sull'asse y (b): {b:.4f} ± {sigma_b:.4f}")

# Visualizza il fit del modello
plt.figure(figsize=(10, 6))
plt.scatter(df["Tempo"], df["1/V"], color="blue", label="Dati originali")
plt.plot(df["Tempo"], model.predict(df[["Tempo"]]), color="red", label="Fit del modello")
intercept_x = -model.intercept_ / model.coef_[0]
plt.axvline(x=intercept_x, color="green", linestyle="--", label=f"Intersezione x={intercept_x:.2f}")
plt.xlabel("Tempo")
plt.ylabel("1/V")
plt.title("Regressione Lineare Bayesiana: Tempo vs 1/V")
plt.legend()
plt.grid()
plt.show()