Kikaki
K, Kakogeorgiou I, Mikeli P, Raitsos DE, Karantzalos K (2022) MARIDA: A
benchmark for Marine Debris detection from Sentinel-2 remote sensing
data. PLoS ONE 17(1): e0262247. https://doi.org/10.1371/journal.pone.0262247
///////////////////////////////////////////
MARIDA e' un dataset di 1381 immagini Sentinel 2 di dati classificati a libero download che puo' essere usato per testare gli algoritmi
Si puo' scaricare da questo link https://www.kaggle.com/datasets/weinima/marida o https://zenodo.org/records/5151941
Nel folder patches ci sono geotiff a 11 bande di dimensione 256x256 pixel ed un file con lo stesso nome *_cl.tif in cui e' indicata la classificazione dei pixel della patch multispettrale
Classi
1: Marine Debris
2: Dense Sargassum
3: Sparse Sargassum
4: Natural Organic Material
5: Ship
6: Clouds
7: Marine Water
8: Sediment-Laden Water
9: Foam
10: Turbid Water
11: Shallow Water
12: Waves
13: Cloud Shadows
14: Wakes
15: Mixed Water
come si vede la distribuzione del numero di elementi in ogni classe e' estremamente variabile
Per addestrare una rete random forest mi sono create un csv in cui per ogni riga sono riportati i valori delle 11 bande
import rasterio
from pathlib import Path
import os
#awk -F, '{counts[$NF]++} END {for (val in counts) print val, counts[val]}' yourfile.csv | sort -n
csv_string =""
folder_path = './train/'
files = [f for f in os.listdir(folder_path)
if f.endswith('.tif') and not f.endswith('_cl.tif')]
for filename in files:
base_name = Path(filename).stem
print(base_name)
for r in range(255):
for c in range(255):
with rasterio.open(folder_path+ base_name+".tif") as src:
# Read all bands
data = src.read()
pixel_values = data[:, r, c]
csv_string = ",".join(map(str, pixel_values))
#print(csv_string)
# apre la maschera di categoria
with rasterio.open(folder_path+base_name+"_cl.tif") as src:
data = src.read()
pixel_cat = data[:, r, c]
pp = str(int(pixel_cat[0]))
csv_string = csv_string+","+ pp
print(csv_string)
Uno dei problemi e' che la classe 0 da sola rappresenta il 99% dei dati
0 740486
1 73
2 40
3 42
5 200
6 16
7 223
8 13828
9 2
10 120
11 67
13 27
14 254
15 5
Proviamo ad usare il file CSV per istruire una rete random forest (impiegato Google Colab)
from google.colab import drive
drive.mount('/content/drive')
# Install the matching pair for stability
!pip install tensorflow-decision-forests wurlitzer
import pandas as pd
import tensorflow as tf
import tensorflow_decision_forests as tfdf
from sklearn.model_selection import train_test_split
import os
# Force Keras 2 usage for TF-DF compatibility
os.environ['TF_USE_LEGACY_KERAS'] = '1'
# Verify the versions
print(f"TensorFlow version: {tf.__version__}")
print(f"TF-DF version: {tfdf.__version__}")
# 1. Load Data
df = pd.read_csv('/content/drive/My Drive/spettri3.csv')
# categoria nell'ultima colonna
LABEL = df.columns[-1]
# 2. Split Data
train_df, test_df = train_test_split(df, test_size=0.2, random_state=42)
# 3. Convert to TF Dataset
train_ds = tfdf.keras.pd_dataframe_to_tf_dataset(train_df, label=LABEL)
test_ds = tfdf.keras.pd_dataframe_to_tf_dataset(test_df, label=LABEL)
# 4. Create and Train Random Forest
model = tfdf.keras.RandomForestModel(task=tfdf.keras.Task.CLASSIFICATION)
model.compile(metrics=["accuracy"])
print("Starting Training...")
model.fit(train_ds)
# 5. Summary and Evaluation
print("\n--- Model Summary ---")
model.summary()
evaluation = model.evaluate(test_ds, return_dict=True)
print(f"\nTest Accuracy: {evaluation['accuracy']:.4f}")
# 6. Save the Model
model.save("exported_model")
print("Model saved to /app/exported_model")
import numpy as np
from sklearn.metrics import classification_report
# Get predictions on the test dataset
# The predict method returns probabilities for each class
predictions_prob = model.predict(test_ds)
# Convert probabilities to predicted class labels
predicted_labels = np.argmax(predictions_prob, axis=1)
# Extract true labels from the test_df
# Assuming LABEL was defined as the last column of df
true_labels = test_df[LABEL].values
# Get unique class labels (categories)
class_labels = np.unique(true_labels)
# Generate a classification report to show precision, recall, and F1-score for each class
print("\n--- Classification Report per Category ---")
print(classification_report(true_labels, predicted_labels, target_names=[str(c) for c in class_labels]))
# You can also manually calculate accuracy per class if preferred
print("\n--- Accuracy per Category ---")
for cls in class_labels:
idx = (true_labels == cls)
correct_predictions_for_cls = (predicted_labels[idx] == cls).sum()
total_predictions_for_cls = idx.sum()
if total_predictions_for_cls > 0:
accuracy_for_cls = correct_predictions_for_cls / total_predictions_for_cls
print(f"Category {cls}: Accuracy = {accuracy_for_cls:.4f}")
else:
print(f"Category {cls}: No true instances in test set.")
--- Model Summary ---
Model: "random_forest_model"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
=================================================================
Total params: 1 (1.00 Byte)
Trainable params: 0 (0.00 Byte)
Non-trainable params: 1 (1.00 Byte)
_________________________________________________________________
Type: "RANDOM_FOREST"
Task: CLASSIFICATION
Label: "__LABEL"
Input Features (11):
0.0111599155
0.01483216
0.01739901
0.018406885
0.018967822
0.02049054
0.020988442
0.021692224
0.026393838
0.033805072
0.036814593
No weights
Variable Importance: INV_MEAN_MIN_DEPTH:
1. "0.021692224" 0.278557 ################
2. "0.036814593" 0.164558 #####
3. "0.026393838" 0.146458 ###
4. "0.01483216" 0.134337 ##
5. "0.02049054" 0.125129 #
6. "0.018406885" 0.124548 #
7. "0.01739901" 0.119835 #
8. "0.033805072" 0.116388 #
9. "0.018967822" 0.114600
10. "0.020988442" 0.105245
11. "0.0111599155" 0.105237
Variable Importance: NUM_AS_ROOT:
1. "0.021692224" 138.000000 ################
2. "0.026393838" 82.000000 #########
3. "0.033805072" 52.000000 #####
4. "0.018406885" 17.000000 #
5. "0.018967822" 10.000000 #
6. "0.020988442" 1.000000
Variable Importance: NUM_NODES:
1. "0.036814593" 33033.000000 ################
2. "0.01483216" 24259.000000 ######
3. "0.018406885" 23259.000000 #####
4. "0.020988442" 21752.000000 ###
5. "0.02049054" 21560.000000 ###
6. "0.0111599155" 20757.000000 ##
7. "0.026393838" 20511.000000 #
8. "0.018967822" 20197.000000 #
9. "0.01739901" 19841.000000 #
10. "0.033805072" 18931.000000
11. "0.021692224" 18752.000000
Variable Importance: SUM_SCORE:
1. "0.021692224" 6680292.214150 ################
2. "0.026393838" 3547615.844493 ########
3. "0.033805072" 2038345.698294 ####
4. "0.036814593" 1304726.855687 ##
5. "0.018406885" 1110520.264613 #
6. "0.01483216" 836882.331232 #
7. "0.018967822" 783745.709187
8. "0.02049054" 776627.158520
9. "0.020988442" 530432.154355
10. "0.01739901" 456566.598678
11. "0.0111599155" 401849.939701
Winner takes all: true
Out-of-bag evaluation: accuracy:0.998054 logloss:0.0105165
Number of trees: 300
Total number of nodes: 486004
Number of nodes by tree:
Count: 300 Average: 1620.01 StdDev: 76.5775
Min: 1423 Max: 1799 Ignored: 0
----------------------------------------------
[ 1423, 1441) 2 0.67% 0.67%
[ 1441, 1460) 6 2.00% 2.67% #
[ 1460, 1479) 4 1.33% 4.00% #
[ 1479, 1498) 8 2.67% 6.67% ##
[ 1498, 1517) 9 3.00% 9.67% ##
[ 1517, 1536) 10 3.33% 13.00% ##
[ 1536, 1554) 16 5.33% 18.33% ####
[ 1554, 1573) 19 6.33% 24.67% #####
[ 1573, 1592) 26 8.67% 33.33% ######
[ 1592, 1611) 41 13.67% 47.00% ##########
[ 1611, 1630) 34 11.33% 58.33% ########
[ 1630, 1649) 24 8.00% 66.33% ######
[ 1649, 1668) 20 6.67% 73.00% #####
[ 1668, 1686) 19 6.33% 79.33% #####
[ 1686, 1705) 13 4.33% 83.67% ###
[ 1705, 1724) 18 6.00% 89.67% ####
[ 1724, 1743) 11 3.67% 93.33% ###
[ 1743, 1762) 12 4.00% 97.33% ###
[ 1762, 1781) 5 1.67% 99.00% #
[ 1781, 1799] 3 1.00% 100.00% #
Depth by leafs:
Count: 243152 Average: 12.0108 StdDev: 2.39733
Min: 3 Max: 15 Ignored: 0
----------------------------------------------
[ 3, 4) 128 0.05% 0.05%
[ 4, 5) 389 0.16% 0.21%
[ 5, 6) 1084 0.45% 0.66%
[ 6, 7) 2770 1.14% 1.80% #
[ 7, 8) 6288 2.59% 4.38% #
[ 8, 9) 11242 4.62% 9.01% ##
[ 9, 10) 17797 7.32% 16.33% ####
[ 10, 11) 24395 10.03% 26.36% #####
[ 11, 12) 30327 12.47% 38.83% #######
[ 12, 13) 34392 14.14% 52.98% #######
[ 13, 14) 35111 14.44% 67.42% ########
[ 14, 15) 32791 13.49% 80.90% #######
[ 15, 15] 46438 19.10% 100.00% ##########
Number of training obs by leaf:
Count: 243152 Average: 745.589 StdDev: 6136.2
Min: 5 Max: 219057 Ignored: 0
----------------------------------------------
[ 5, 10957) 239559 98.52% 98.52% ##########
[ 10957, 21910) 1503 0.62% 99.14%
[ 21910, 32862) 888 0.37% 99.51%
[ 32862, 43815) 355 0.15% 99.65%
[ 43815, 54768) 235 0.10% 99.75%
[ 54768, 65720) 109 0.04% 99.79%
[ 65720, 76673) 68 0.03% 99.82%
[ 76673, 87626) 68 0.03% 99.85%
[ 87626, 98578) 62 0.03% 99.87%
[ 98578, 109531) 112 0.05% 99.92%
[ 109531, 120484) 56 0.02% 99.94%
[ 120484, 131436) 24 0.01% 99.95%
[ 131436, 142389) 32 0.01% 99.97%
[ 142389, 153342) 19 0.01% 99.97%
[ 153342, 164294) 18 0.01% 99.98%
[ 164294, 175247) 14 0.01% 99.99%
[ 175247, 186200) 20 0.01% 100.00%
[ 186200, 197152) 5 0.00% 100.00%
[ 197152, 208105) 2 0.00% 100.00%
[ 208105, 219057] 3 0.00% 100.00%
Attribute in nodes:
33033 : 0.036814593 [NUMERICAL]
24259 : 0.01483216 [NUMERICAL]
23259 : 0.018406885 [NUMERICAL]
21752 : 0.020988442 [NUMERICAL]
21560 : 0.02049054 [NUMERICAL]
20757 : 0.0111599155 [NUMERICAL]
20511 : 0.026393838 [NUMERICAL]
20197 : 0.018967822 [NUMERICAL]
19841 : 0.01739901 [NUMERICAL]
18931 : 0.033805072 [NUMERICAL]
18752 : 0.021692224 [NUMERICAL]
Attribute in nodes with depth <= 0:
138 : 0.021692224 [NUMERICAL]
82 : 0.026393838 [NUMERICAL]
52 : 0.033805072 [NUMERICAL]
17 : 0.018406885 [NUMERICAL]
10 : 0.018967822 [NUMERICAL]
1 : 0.020988442 [NUMERICAL]
Attribute in nodes with depth <= 1:
380 : 0.021692224 [NUMERICAL]
122 : 0.026393838 [NUMERICAL]
83 : 0.01739901 [NUMERICAL]
72 : 0.033805072 [NUMERICAL]
57 : 0.036814593 [NUMERICAL]
54 : 0.018406885 [NUMERICAL]
51 : 0.018967822 [NUMERICAL]
38 : 0.02049054 [NUMERICAL]
26 : 0.020988442 [NUMERICAL]
10 : 0.01483216 [NUMERICAL]
7 : 0.0111599155 [NUMERICAL]
Attribute in nodes with depth <= 2:
581 : 0.021692224 [NUMERICAL]
229 : 0.02049054 [NUMERICAL]
219 : 0.01483216 [NUMERICAL]
193 : 0.018406885 [NUMERICAL]
186 : 0.01739901 [NUMERICAL]
154 : 0.036814593 [NUMERICAL]
150 : 0.026393838 [NUMERICAL]
117 : 0.018967822 [NUMERICAL]
100 : 0.020988442 [NUMERICAL]
92 : 0.033805072 [NUMERICAL]
79 : 0.0111599155 [NUMERICAL]
Attribute in nodes with depth <= 3:
766 : 0.021692224 [NUMERICAL]
633 : 0.036814593 [NUMERICAL]
514 : 0.02049054 [NUMERICAL]
503 : 0.01483216 [NUMERICAL]
345 : 0.01739901 [NUMERICAL]
300 : 0.020988442 [NUMERICAL]
288 : 0.018406885 [NUMERICAL]
284 : 0.018967822 [NUMERICAL]
277 : 0.0111599155 [NUMERICAL]
272 : 0.026393838 [NUMERICAL]
190 : 0.033805072 [NUMERICAL]
Attribute in nodes with depth <= 5:
2877 : 0.036814593 [NUMERICAL]
1895 : 0.01483216 [NUMERICAL]
1725 : 0.02049054 [NUMERICAL]
1541 : 0.021692224 [NUMERICAL]
1244 : 0.01739901 [NUMERICAL]
1210 : 0.018406885 [NUMERICAL]
1146 : 0.020988442 [NUMERICAL]
1127 : 0.026393838 [NUMERICAL]
1069 : 0.0111599155 [NUMERICAL]
1041 : 0.018967822 [NUMERICAL]
878 : 0.033805072 [NUMERICAL]
Condition type in nodes:
242852 : HigherCondition
Condition type in nodes with depth <= 0:
300 : HigherCondition
Condition type in nodes with depth <= 1:
900 : HigherCondition
Condition type in nodes with depth <= 2:
2100 : HigherCondition
Condition type in nodes with depth <= 3:
4372 : HigherCondition
Condition type in nodes with depth <= 5:
15753 : HigherCondition
Node format: NOT_SET
Training OOB:
trees: 1, Out-of-bag evaluation: accuracy:0.996575 logloss:0.123458
trees: 9, Out-of-bag evaluation: accuracy:0.997327 logloss:0.0479432
trees: 19, Out-of-bag evaluation: accuracy:0.99767 logloss:0.0254875
trees: 29, Out-of-bag evaluation: accuracy:0.997806 logloss:0.0193881
trees: 39, Out-of-bag evaluation: accuracy:0.997889 logloss:0.0171792
trees: 49, Out-of-bag evaluation: accuracy:0.997902 logloss:0.0153532
trees: 59, Out-of-bag evaluation: accuracy:0.997955 logloss:0.0144014
trees: 69, Out-of-bag evaluation: accuracy:0.998018 logloss:0.0137202
trees: 79, Out-of-bag evaluation: accuracy:0.998029 logloss:0.0131981
trees: 89, Out-of-bag evaluation: accuracy:0.998024 logloss:0.0128051
trees: 99, Out-of-bag evaluation: accuracy:0.998036 logloss:0.0124228
trees: 109, Out-of-bag evaluation: accuracy:0.998042 logloss:0.0120357
trees: 119, Out-of-bag evaluation: accuracy:0.998049 logloss:0.0117484
trees: 129, Out-of-bag evaluation: accuracy:0.998054 logloss:0.0116494
trees: 139, Out-of-bag evaluation: accuracy:0.998069 logloss:0.0113382
trees: 149, Out-of-bag evaluation: accuracy:0.998067 logloss:0.0112786
trees: 159, Out-of-bag evaluation: accuracy:0.998066 logloss:0.0111747
trees: 169, Out-of-bag evaluation: accuracy:0.998064 logloss:0.0111785
trees: 179, Out-of-bag evaluation: accuracy:0.998071 logloss:0.0109639
trees: 189, Out-of-bag evaluation: accuracy:0.998066 logloss:0.0108541
trees: 199, Out-of-bag evaluation: accuracy:0.998052 logloss:0.0107001
trees: 209, Out-of-bag evaluation: accuracy:0.998041 logloss:0.0106965
trees: 219, Out-of-bag evaluation: accuracy:0.998056 logloss:0.0106957
trees: 229, Out-of-bag evaluation: accuracy:0.998066 logloss:0.0106392
trees: 239, Out-of-bag evaluation: accuracy:0.998075 logloss:0.010582
trees: 249, Out-of-bag evaluation: accuracy:0.998064 logloss:0.0105818
trees: 259, Out-of-bag evaluation: accuracy:0.998056 logloss:0.0105275
trees: 269, Out-of-bag evaluation: accuracy:0.998072 logloss:0.0105194
trees: 279, Out-of-bag evaluation: accuracy:0.998059 logloss:0.0105219
trees: 289, Out-of-bag evaluation: accuracy:0.998056 logloss:0.0105255
trees: 299, Out-of-bag evaluation: accuracy:0.998049 logloss:0.010518
trees: 300, Out-of-bag evaluation: accuracy:0.998054 logloss:0.0105165
152/152 [==============================] - 10s 65ms/step - loss: 0.0000e+00 - accuracy: 0.9979
Test Accuracy: 0.9979
WARNING:absl:`0.0111599155` is not a valid tf.function parameter name. Sanitizing to `arg_0_0111599155`.
WARNING:absl:`0.01483216` is not a valid tf.function parameter name. Sanitizing to `arg_0_01483216`.
WARNING:absl:`0.01739901` is not a valid tf.function parameter name. Sanitizing to `arg_0_01739901`.
WARNING:absl:`0.018406885` is not a valid tf.function parameter name. Sanitizing to `arg_0_018406885`.
WARNING:absl:`0.018967822` is not a valid tf.function parameter name. Sanitizing to `arg_0_018967822`.
Model saved to /app/exported_model
---------------------------------------------------------------------
L'accuratezza e' molto buona pero' vediamo nel dettaglio come ogni classe viene prevista
---------------------------------------------------------------------
--- Classification Report per Category ---
precision recall f1-score support
0 1.00 1.00 1.00 148127
1 1.00 0.36 0.53 11
2 0.75 0.86 0.80 7
3 0.75 0.60 0.67 10
5 1.00 0.63 0.78 41
6 0.00 0.00 0.00 4
7 1.00 0.09 0.16 35
8 0.96 0.97 0.97 2744
10 1.00 0.20 0.33 25
11 0.90 0.64 0.75 14
13 0.50 0.33 0.40 3
14 0.93 0.24 0.38 55
15 0.00 0.00 0.00 1
accuracy 1.00 151077
macro avg 0.75 0.46 0.52 151077
weighted avg 1.00 1.00 1.00 151077
Nessun commento:
Posta un commento