DHOB (IU5SGN): Marida dataset

domenica 15 febbraio 2026

Marida dataset

Kikaki K, Kakogeorgiou I, Mikeli P, Raitsos DE, Karantzalos K (2022) MARIDA: A benchmark for Marine Debris detection from Sentinel-2 remote sensing data. PLoS ONE 17(1): e0262247. https://doi.org/10.1371/journal.pone.0262247

///////////////////////////////////////////

MARIDA e' un dataset di 1381 immagini Sentinel 2 di dati classificati a libero download che puo' essere usato per testare gli algoritmi

Si puo' scaricare da questo link https://www.kaggle.com/datasets/weinima/marida o https://zenodo.org/records/5151941

Nel folder patches ci sono geotiff a 11 bande di dimensione 256x256 pixel ed un file con lo stesso nome *_cl.tif in cui e' indicata la classificazione dei pixel della patch multispettrale

Classi

1: Marine Debris
2: Dense Sargassum
3: Sparse Sargassum
4: Natural Organic Material
5: Ship
6: Clouds
7: Marine Water
8: Sediment-Laden Water
9: Foam
10: Turbid Water
11: Shallow Water
12: Waves
13: Cloud Shadows
14: Wakes
15: Mixed Water

come si vede la distribuzione del numero di elementi in ogni classe e' estremamente variabile

Per addestrare una rete random forest mi sono create un csv in cui per ogni riga sono riportati i valori delle 11 bande

import rasterio
from pathlib import Path
import os

#awk -F, '{counts[$NF]++} END {for (val in counts) print val, counts[val]}' yourfile.csv | sort -n

csv_string =""

folder_path = './train/'
files = [f for f in os.listdir(folder_path)
         if f.endswith('.tif') and not f.endswith('_cl.tif')]

for filename in files:
    base_name = Path(filename).stem
    print(base_name)

    for r in range(255):
        for c in range(255):
            with rasterio.open(folder_path+ base_name+".tif") as src:
                # Read all bands
                data = src.read()
                pixel_values = data[:, r, c]
                csv_string = ",".join(map(str, pixel_values))
                #print(csv_string)
            # apre la maschera di categoria
            with rasterio.open(folder_path+base_name+"_cl.tif") as src:
                data = src.read()
                pixel_cat = data[:, r, c]
                pp = str(int(pixel_cat[0]))
                csv_string = csv_string+","+ pp
            print(csv_string)

Uno dei problemi e' che la classe 0 da sola rappresenta il 99% dei dati

0 740486
1 73
2 40
3 42
5 200
6 16
7 223
8 13828
9 2
10 120
11 67
13 27
14 254
15 5

Proviamo ad usare il file CSV per istruire una rete random forest (impiegato Google Colab)

from google.colab import drive
drive.mount('/content/drive')

# Install the matching pair for stability
!pip install tensorflow-decision-forests wurlitzer

import pandas as pd
import tensorflow as tf
import tensorflow_decision_forests as tfdf
from sklearn.model_selection import train_test_split
import os

# Force Keras 2 usage for TF-DF compatibility
os.environ['TF_USE_LEGACY_KERAS'] = '1'

# Verify the versions
print(f"TensorFlow version: {tf.__version__}")
print(f"TF-DF version: {tfdf.__version__}")

# 1. Load Data
df = pd.read_csv('/content/drive/My Drive/spettri3.csv')

# categoria nell'ultima colonna
LABEL = df.columns[-1] 

# 2. Split Data
train_df, test_df = train_test_split(df, test_size=0.2, random_state=42)

# 3. Convert to TF Dataset
train_ds = tfdf.keras.pd_dataframe_to_tf_dataset(train_df, label=LABEL)
test_ds = tfdf.keras.pd_dataframe_to_tf_dataset(test_df, label=LABEL)

# 4. Create and Train Random Forest
model = tfdf.keras.RandomForestModel(task=tfdf.keras.Task.CLASSIFICATION)
model.compile(metrics=["accuracy"])

print("Starting Training...")
model.fit(train_ds)

# 5. Summary and Evaluation
print("\n--- Model Summary ---")
model.summary()

evaluation = model.evaluate(test_ds, return_dict=True)
print(f"\nTest Accuracy: {evaluation['accuracy']:.4f}")

# 6. Save the Model
model.save("exported_model")
print("Model saved to /app/exported_model")
import numpy as np
from sklearn.metrics import classification_report

# Get predictions on the test dataset
# The predict method returns probabilities for each class
predictions_prob = model.predict(test_ds)

# Convert probabilities to predicted class labels
predicted_labels = np.argmax(predictions_prob, axis=1)

# Extract true labels from the test_df
# Assuming LABEL was defined as the last column of df
true_labels = test_df[LABEL].values

# Get unique class labels (categories)
class_labels = np.unique(true_labels)

# Generate a classification report to show precision, recall, and F1-score for each class
print("\n--- Classification Report per Category ---")
print(classification_report(true_labels, predicted_labels, target_names=[str(c) for c in class_labels]))

# You can also manually calculate accuracy per class if preferred
print("\n--- Accuracy per Category ---")
for cls in class_labels:
    idx = (true_labels == cls)
    correct_predictions_for_cls = (predicted_labels[idx] == cls).sum()
    total_predictions_for_cls = idx.sum()
    if total_predictions_for_cls > 0:
        accuracy_for_cls = correct_predictions_for_cls / total_predictions_for_cls
        print(f"Category {cls}: Accuracy = {accuracy_for_cls:.4f}")
    else:
        print(f"Category {cls}: No true instances in test set.")
 
 

--- Model Summary ---
Model: "random_forest_model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
=================================================================
Total params: 1 (1.00 Byte)
Trainable params: 0 (0.00 Byte)
Non-trainable params: 1 (1.00 Byte)
_________________________________________________________________
Type: "RANDOM_FOREST"
Task: CLASSIFICATION
Label: "__LABEL"

Input Features (11):
	0.0111599155
	0.01483216
	0.01739901
	0.018406885
	0.018967822
	0.02049054
	0.020988442
	0.021692224
	0.026393838
	0.033805072
	0.036814593

No weights

Variable Importance: INV_MEAN_MIN_DEPTH:
    1.  "0.021692224"  0.278557 ################
    2.  "0.036814593"  0.164558 #####
    3.  "0.026393838"  0.146458 ###
    4.   "0.01483216"  0.134337 ##
    5.   "0.02049054"  0.125129 #
    6.  "0.018406885"  0.124548 #
    7.   "0.01739901"  0.119835 #
    8.  "0.033805072"  0.116388 #
    9.  "0.018967822"  0.114600 
   10.  "0.020988442"  0.105245 
   11. "0.0111599155"  0.105237 

Variable Importance: NUM_AS_ROOT:
    1. "0.021692224" 138.000000 ################
    2. "0.026393838" 82.000000 #########
    3. "0.033805072" 52.000000 #####
    4. "0.018406885" 17.000000 #
    5. "0.018967822" 10.000000 #
    6. "0.020988442"  1.000000 

Variable Importance: NUM_NODES:
    1.  "0.036814593" 33033.000000 ################
    2.   "0.01483216" 24259.000000 ######
    3.  "0.018406885" 23259.000000 #####
    4.  "0.020988442" 21752.000000 ###
    5.   "0.02049054" 21560.000000 ###
    6. "0.0111599155" 20757.000000 ##
    7.  "0.026393838" 20511.000000 #
    8.  "0.018967822" 20197.000000 #
    9.   "0.01739901" 19841.000000 #
   10.  "0.033805072" 18931.000000 
   11.  "0.021692224" 18752.000000 

Variable Importance: SUM_SCORE:
    1.  "0.021692224" 6680292.214150 ################
    2.  "0.026393838" 3547615.844493 ########
    3.  "0.033805072" 2038345.698294 ####
    4.  "0.036814593" 1304726.855687 ##
    5.  "0.018406885" 1110520.264613 #
    6.   "0.01483216" 836882.331232 #
    7.  "0.018967822" 783745.709187 
    8.   "0.02049054" 776627.158520 
    9.  "0.020988442" 530432.154355 
   10.   "0.01739901" 456566.598678 
   11. "0.0111599155" 401849.939701 



Winner takes all: true
Out-of-bag evaluation: accuracy:0.998054 logloss:0.0105165
Number of trees: 300
Total number of nodes: 486004

Number of nodes by tree:
Count: 300 Average: 1620.01 StdDev: 76.5775
Min: 1423 Max: 1799 Ignored: 0
----------------------------------------------
[ 1423, 1441)  2   0.67%   0.67%
[ 1441, 1460)  6   2.00%   2.67% #
[ 1460, 1479)  4   1.33%   4.00% #
[ 1479, 1498)  8   2.67%   6.67% ##
[ 1498, 1517)  9   3.00%   9.67% ##
[ 1517, 1536) 10   3.33%  13.00% ##
[ 1536, 1554) 16   5.33%  18.33% ####
[ 1554, 1573) 19   6.33%  24.67% #####
[ 1573, 1592) 26   8.67%  33.33% ######
[ 1592, 1611) 41  13.67%  47.00% ##########
[ 1611, 1630) 34  11.33%  58.33% ########
[ 1630, 1649) 24   8.00%  66.33% ######
[ 1649, 1668) 20   6.67%  73.00% #####
[ 1668, 1686) 19   6.33%  79.33% #####
[ 1686, 1705) 13   4.33%  83.67% ###
[ 1705, 1724) 18   6.00%  89.67% ####
[ 1724, 1743) 11   3.67%  93.33% ###
[ 1743, 1762) 12   4.00%  97.33% ###
[ 1762, 1781)  5   1.67%  99.00% #
[ 1781, 1799]  3   1.00% 100.00% #

Depth by leafs:
Count: 243152 Average: 12.0108 StdDev: 2.39733
Min: 3 Max: 15 Ignored: 0
----------------------------------------------
[  3,  4)   128   0.05%   0.05%
[  4,  5)   389   0.16%   0.21%
[  5,  6)  1084   0.45%   0.66%
[  6,  7)  2770   1.14%   1.80% #
[  7,  8)  6288   2.59%   4.38% #
[  8,  9) 11242   4.62%   9.01% ##
[  9, 10) 17797   7.32%  16.33% ####
[ 10, 11) 24395  10.03%  26.36% #####
[ 11, 12) 30327  12.47%  38.83% #######
[ 12, 13) 34392  14.14%  52.98% #######
[ 13, 14) 35111  14.44%  67.42% ########
[ 14, 15) 32791  13.49%  80.90% #######
[ 15, 15] 46438  19.10% 100.00% ##########

Number of training obs by leaf:
Count: 243152 Average: 745.589 StdDev: 6136.2
Min: 5 Max: 219057 Ignored: 0
----------------------------------------------
[      5,  10957) 239559  98.52%  98.52% ##########
[  10957,  21910)   1503   0.62%  99.14%
[  21910,  32862)    888   0.37%  99.51%
[  32862,  43815)    355   0.15%  99.65%
[  43815,  54768)    235   0.10%  99.75%
[  54768,  65720)    109   0.04%  99.79%
[  65720,  76673)     68   0.03%  99.82%
[  76673,  87626)     68   0.03%  99.85%
[  87626,  98578)     62   0.03%  99.87%
[  98578, 109531)    112   0.05%  99.92%
[ 109531, 120484)     56   0.02%  99.94%
[ 120484, 131436)     24   0.01%  99.95%
[ 131436, 142389)     32   0.01%  99.97%
[ 142389, 153342)     19   0.01%  99.97%
[ 153342, 164294)     18   0.01%  99.98%
[ 164294, 175247)     14   0.01%  99.99%
[ 175247, 186200)     20   0.01% 100.00%
[ 186200, 197152)      5   0.00% 100.00%
[ 197152, 208105)      2   0.00% 100.00%
[ 208105, 219057]      3   0.00% 100.00%

Attribute in nodes:
	33033 : 0.036814593 [NUMERICAL]
	24259 : 0.01483216 [NUMERICAL]
	23259 : 0.018406885 [NUMERICAL]
	21752 : 0.020988442 [NUMERICAL]
	21560 : 0.02049054 [NUMERICAL]
	20757 : 0.0111599155 [NUMERICAL]
	20511 : 0.026393838 [NUMERICAL]
	20197 : 0.018967822 [NUMERICAL]
	19841 : 0.01739901 [NUMERICAL]
	18931 : 0.033805072 [NUMERICAL]
	18752 : 0.021692224 [NUMERICAL]

Attribute in nodes with depth <= 0:
	138 : 0.021692224 [NUMERICAL]
	82 : 0.026393838 [NUMERICAL]
	52 : 0.033805072 [NUMERICAL]
	17 : 0.018406885 [NUMERICAL]
	10 : 0.018967822 [NUMERICAL]
	1 : 0.020988442 [NUMERICAL]

Attribute in nodes with depth <= 1:
	380 : 0.021692224 [NUMERICAL]
	122 : 0.026393838 [NUMERICAL]
	83 : 0.01739901 [NUMERICAL]
	72 : 0.033805072 [NUMERICAL]
	57 : 0.036814593 [NUMERICAL]
	54 : 0.018406885 [NUMERICAL]
	51 : 0.018967822 [NUMERICAL]
	38 : 0.02049054 [NUMERICAL]
	26 : 0.020988442 [NUMERICAL]
	10 : 0.01483216 [NUMERICAL]
	7 : 0.0111599155 [NUMERICAL]

Attribute in nodes with depth <= 2:
	581 : 0.021692224 [NUMERICAL]
	229 : 0.02049054 [NUMERICAL]
	219 : 0.01483216 [NUMERICAL]
	193 : 0.018406885 [NUMERICAL]
	186 : 0.01739901 [NUMERICAL]
	154 : 0.036814593 [NUMERICAL]
	150 : 0.026393838 [NUMERICAL]
	117 : 0.018967822 [NUMERICAL]
	100 : 0.020988442 [NUMERICAL]
	92 : 0.033805072 [NUMERICAL]
	79 : 0.0111599155 [NUMERICAL]

Attribute in nodes with depth <= 3:
	766 : 0.021692224 [NUMERICAL]
	633 : 0.036814593 [NUMERICAL]
	514 : 0.02049054 [NUMERICAL]
	503 : 0.01483216 [NUMERICAL]
	345 : 0.01739901 [NUMERICAL]
	300 : 0.020988442 [NUMERICAL]
	288 : 0.018406885 [NUMERICAL]
	284 : 0.018967822 [NUMERICAL]
	277 : 0.0111599155 [NUMERICAL]
	272 : 0.026393838 [NUMERICAL]
	190 : 0.033805072 [NUMERICAL]

Attribute in nodes with depth <= 5:
	2877 : 0.036814593 [NUMERICAL]
	1895 : 0.01483216 [NUMERICAL]
	1725 : 0.02049054 [NUMERICAL]
	1541 : 0.021692224 [NUMERICAL]
	1244 : 0.01739901 [NUMERICAL]
	1210 : 0.018406885 [NUMERICAL]
	1146 : 0.020988442 [NUMERICAL]
	1127 : 0.026393838 [NUMERICAL]
	1069 : 0.0111599155 [NUMERICAL]
	1041 : 0.018967822 [NUMERICAL]
	878 : 0.033805072 [NUMERICAL]

Condition type in nodes:
	242852 : HigherCondition
Condition type in nodes with depth <= 0:
	300 : HigherCondition
Condition type in nodes with depth <= 1:
	900 : HigherCondition
Condition type in nodes with depth <= 2:
	2100 : HigherCondition
Condition type in nodes with depth <= 3:
	4372 : HigherCondition
Condition type in nodes with depth <= 5:
	15753 : HigherCondition
Node format: NOT_SET

Training OOB:
	trees: 1, Out-of-bag evaluation: accuracy:0.996575 logloss:0.123458
	trees: 9, Out-of-bag evaluation: accuracy:0.997327 logloss:0.0479432
	trees: 19, Out-of-bag evaluation: accuracy:0.99767 logloss:0.0254875
	trees: 29, Out-of-bag evaluation: accuracy:0.997806 logloss:0.0193881
	trees: 39, Out-of-bag evaluation: accuracy:0.997889 logloss:0.0171792
	trees: 49, Out-of-bag evaluation: accuracy:0.997902 logloss:0.0153532
	trees: 59, Out-of-bag evaluation: accuracy:0.997955 logloss:0.0144014
	trees: 69, Out-of-bag evaluation: accuracy:0.998018 logloss:0.0137202
	trees: 79, Out-of-bag evaluation: accuracy:0.998029 logloss:0.0131981
	trees: 89, Out-of-bag evaluation: accuracy:0.998024 logloss:0.0128051
	trees: 99, Out-of-bag evaluation: accuracy:0.998036 logloss:0.0124228
	trees: 109, Out-of-bag evaluation: accuracy:0.998042 logloss:0.0120357
	trees: 119, Out-of-bag evaluation: accuracy:0.998049 logloss:0.0117484
	trees: 129, Out-of-bag evaluation: accuracy:0.998054 logloss:0.0116494
	trees: 139, Out-of-bag evaluation: accuracy:0.998069 logloss:0.0113382
	trees: 149, Out-of-bag evaluation: accuracy:0.998067 logloss:0.0112786
	trees: 159, Out-of-bag evaluation: accuracy:0.998066 logloss:0.0111747
	trees: 169, Out-of-bag evaluation: accuracy:0.998064 logloss:0.0111785
	trees: 179, Out-of-bag evaluation: accuracy:0.998071 logloss:0.0109639
	trees: 189, Out-of-bag evaluation: accuracy:0.998066 logloss:0.0108541
	trees: 199, Out-of-bag evaluation: accuracy:0.998052 logloss:0.0107001
	trees: 209, Out-of-bag evaluation: accuracy:0.998041 logloss:0.0106965
	trees: 219, Out-of-bag evaluation: accuracy:0.998056 logloss:0.0106957
	trees: 229, Out-of-bag evaluation: accuracy:0.998066 logloss:0.0106392
	trees: 239, Out-of-bag evaluation: accuracy:0.998075 logloss:0.010582
	trees: 249, Out-of-bag evaluation: accuracy:0.998064 logloss:0.0105818
	trees: 259, Out-of-bag evaluation: accuracy:0.998056 logloss:0.0105275
	trees: 269, Out-of-bag evaluation: accuracy:0.998072 logloss:0.0105194
	trees: 279, Out-of-bag evaluation: accuracy:0.998059 logloss:0.0105219
	trees: 289, Out-of-bag evaluation: accuracy:0.998056 logloss:0.0105255
	trees: 299, Out-of-bag evaluation: accuracy:0.998049 logloss:0.010518
	trees: 300, Out-of-bag evaluation: accuracy:0.998054 logloss:0.0105165

152/152 [==============================] - 10s 65ms/step - loss: 0.0000e+00 - accuracy: 0.9979

Test Accuracy: 0.9979

WARNING:absl:`0.0111599155` is not a valid tf.function parameter name. Sanitizing to `arg_0_0111599155`.
WARNING:absl:`0.01483216` is not a valid tf.function parameter name. Sanitizing to `arg_0_01483216`.
WARNING:absl:`0.01739901` is not a valid tf.function parameter name. Sanitizing to `arg_0_01739901`.
WARNING:absl:`0.018406885` is not a valid tf.function parameter name. Sanitizing to `arg_0_018406885`.
WARNING:absl:`0.018967822` is not a valid tf.function parameter name. Sanitizing to `arg_0_018967822`.

Model saved to /app/exported_model

---------------------------------------------------------------------

L'accuratezza e' molto buona pero' vediamo nel dettaglio come ogni classe viene prevista

---------------------------------------------------------------------

--- Classification Report per Category ---
              precision    recall  f1-score   support

           0       1.00      1.00      1.00    148127
           1       1.00      0.36      0.53        11
           2       0.75      0.86      0.80         7
           3       0.75      0.60      0.67        10
           5       1.00      0.63      0.78        41
           6       0.00      0.00      0.00         4
           7       1.00      0.09      0.16        35
           8       0.96      0.97      0.97      2744
          10       1.00      0.20      0.33        25
          11       0.90      0.64      0.75        14
          13       0.50      0.33      0.40         3
          14       0.93      0.24      0.38        55
          15       0.00      0.00      0.00         1

    accuracy                           1.00    151077
   macro avg       0.75      0.46      0.52    151077
weighted avg       1.00      1.00      1.00    151077

DHOB (IU5SGN)

domenica 15 febbraio 2026

Marida dataset

Nessun commento:

Posta un commento

NRDE

Clock

Ricerca

analitcs