mercoledì 18 febbraio 2026

Spectral waste dataset

Questo interessante progetto (avrei voluto fare anche io una cosa praticamente identica) mette a disposizione una serie di immagini di rifiuti ripresi sia con camera RGB che iperspettrale a 224 bande e modelli gia' calcolati di segmentazione di alcune reti neurali 


 

 

Lo scopo e' di effettuare una segmentazione sulle seguenti classi

  • film
  • basket
  • videotape
  • filament
  • trashbag
  • cardboard 

lo scopo principale e' quindi quello di individuare il materiale che potrebbe bloccare i macchinari del trattamento rifiuti 

Il link Github e' il seguente https://github.com/ferpb/spectralwaste-segmentation/tree/main mentre la pagina del progetto e' https://sites.google.com/unizar.es/spectralwaste

L'articolo e' consultabile a questo link 

Per far funzionare il progetto e' necessario Python 3.9 (con Debian Trixie siamo a 13.3 e non compila con cython)

si deve quindi prima creare un ambiente idoneo 



curl https://pyenv.run | bash
pyenv install 3.9.19
pyenv shell 3.9.19
python -m venv my_39_envgit clone https://github.com/ferpb/spectralwaste-segmentation
pip install -e .


Le immagini raw subiscono un primo passaggio di riduzione della dimensionalita' tramite PCA o FastICA o FactorAnalysis (vedi dim_reduction.py)
Successivamente vengono testate le reti neurali Segformer, Segformer multimodale, Mininet, Mininet Multimodale, e CMX 
A questo punto si avranno i checkpoint dei modelli gia' addestrati a questo link. I files .pth sono divisi per modello e per tipo di pretrattamento delle immagini
I files pth possono essere utilizzati per fare inferenza utilizzando il notebook python nel folder del repository GitHub
 
Le immagini iperspettrali sono state acquisite con una Specim FX17 (900-1700 nm)
Il formato in cui incluse nel dataset e' un tiff multipagina che non e' immediato da gestire

 Per otterne lo spettro di un punto a coordinate x,y mi sono fatto uno script
import matplotlib.pyplot as plt
from PIL import Image

x=50
y=100

pixel_values = []

with Image.open('1.tiff') as img:
for i in range(img.n_frames):
img.seek(i)
pixel = img.getpixel((x , y))
pixel_values.append(pixel)

x_values = [900 + (i * 3.57) for i in range(len(pixel_values))]

# Plotting
plt.figure(figsize=(10, 6))
plt.plot(x_values, pixel_values, color='blue', label='Intensity')

plt.title('Pixel Value')
plt.xlabel('Lambda (nm)')
plt.ylabel('Pixel Intensity')
plt.grid(True, linestyle='--', alpha=0.6)
plt.legend()
plt.show()
 

 Le maschere di addestramento della rete sono in formato tiff e devono essere stretchate 
 Il valore del pixel corrisponde alla classe 
 

 per rendere la cosa piu' agevole l'immagine geotiff multipagina puo' essere convertita in formato ENVI
 In questo modo in ESA Snap si puo' usare lo strumento Spectrum View
 

 
 
import rasterio
from PIL import Image
import numpy as np

input_file = '1.tiff'
output_base = 'output_envi'
output_dat = f'{output_base}.dat'
output_hdr = f'{output_base}.hdr'
start_wavelength = 900.0
step = 3.57

with Image.open(input_file) as img:
n_bands = img.n_frames
width, height = img.size

wavelength_values = [start_wavelength + (i * step) for i in range(n_bands)]

with rasterio.open(
output_dat, 'w',
driver='ENVI',
height=height, width=width,
count=n_bands,
dtype='float32'
) as dst:
with Image.open(input_file) as img:
for i in range(n_bands):
img.seek(i)
band_data = np.array(img).astype('float32') / 65535.0
dst.write(band_data, i + 1)

wavelength_str = ", ".join([f"{w:.2f}" for w in wavelength_values])

hdr_content = f"""ENVI    
description = {{ Prodotto convertito per ESA SNAP }}
samples = {width}
lines = {height}
bands = {n_bands}
header offset = 0
file type = ENVI Standard
data type = 4
interleave = bsq
byte order = 0
wavelength units = nanometers
wavelength = {{
{wavelength_str}
}}
"""

with open(output_hdr, 'w') as f:
f.write(hdr_content)

print(f"Conversione completata. Apri il file {output_hdr} in SNAP.")
 
 
 

 

lunedì 16 febbraio 2026

Riparazione calcolatrice mecccanica SIGMA 51-SVR

Mi sono comprato per pochi euro questa calcolatrice meccanica SIGMA-51SVR stimata anni 50 e basata sul sistema Odhner

  


La calcolatrice costava cosi' poco perche' era bloccata 


 Aperta la copertura e lubrificato il tutto i meccanismi si sono piano piano sbloccati 


 

 

 


Telerilevamento plastiche in mare

Bibliografia

Biermann, L., Clewley, D., Martinez-Vicente, V. et al. Finding Plastic Patches in Coastal Waters using Optical Satellite Data. Sci Rep 10, 5364 (2020). https://doi.org/10.1038/s41598-020-62298-z 

Moshtaghi, M., Knaeps, E., Sterckx, S. et al. Spectral reflectance of marine macroplastics in the VNIR and SWIR measured in a controlled environment. Sci Rep 11, 5436 (2021). https://doi.org/10.1038/s41598-021-84867-6

MARIDA: A benchmark for Marine Debris detection from Sentinel-2 remote sensing data Katerina Kikaki ,Ioannis Kakogeorgiou, Paraskevi Mikeli, Dionysios E. Raitsos, Konstantinos Karantzalos PLOS Published: January 7, 2022 https://doi.org/10.1371/journal.pone.0262247  

Microplastiche in acqua

Alessio Monnanni, Valentina Rimondi, Guia Morelli, Alessia Nannoni, Alessandra Cincinelli, Tania Martellini, David Chelazzi, Marco Laurati, Laura Sforzi, Francesco Ciani, Pierfranco Lattanzi, Pilario Costagliola,
Microplastics and microfibers contamination in the Arno River (Central Italy): Impact from urban areas and contribution to the Mediterranean Sea,
Science of The Total Environment,
Volume 955,
2024,
177113,
ISSN 0048-9697,
https://doi.org/10.1016/j.scitotenv.2024.177113.
(https://www.sciencedirect.com/science/article/pii/S004896972407270X) 

Microplastiche in sedimenti

Types, occurrence and distribution of microplastics in sediments from the northern Tyrrhenian Se Michele Mistri a, Marco Scoponi b, Tommaso Granata c, Letizia Moruzzi c, Francesca Massara d, Cristina Munari a https://doi.org/10.1016/j.marpolbul.2020.111016

 

Annotazione : le plastiche in acqua si presentano alterate per la permanenza nel liquido. Inoltre la firma spettrale e' profondamente influenzata dalla presenza anche di una pellicola d'acqua sulla plastica galleggiante . Per quanto riguarda la sezione SWIR dello spettro la presenza di sedimenti puo' disturbare la risposta degli indici spettrali di calcolo

----------------------------------------------- 

Per Landsat 8 il metodo comune per determinare le plastiche e'determinato dall'indice

  Plastic Index  = Band 5 (NIR) / (Band 5 + Band 6 (SWIR1))


basato sul fatto che la risposta spettrale delle plastiche ha un picco nell'infrarosso vicino e un assorbimento nello SWIR

 


 

 var areaOfInterest =

    /* color: #98ff00 */
    /* shown: false */
    ee.Geometry.MultiPolygon(
        [[[[9.950883097809994, 43.88872579729536],
           [9.934403605622494, 43.48554689334167],
           [10.292832560700619, 43.51443524985007],
           [10.269486613434994, 43.53634133439625],
           [10.263993449372494, 43.62985044452412],
           [10.259873576325619, 43.7311534518937],
           [10.253007121247494, 43.76289953785138],
           [10.182969279450619, 43.910495519373605]]],
         [[[9.911900455707178, 43.92257421254923],
           [9.872075016254053, 43.45486023186587],
           [10.347233707660303, 43.44688454959106],
           [10.314274723285303, 43.50169612390543],
           [10.299168522113428, 43.51862782505474],
           [10.268956119769678, 43.55048637670663],
           [10.277195865863428, 43.565414278891666],
           [10.288182193988428, 43.57735393781841],
           [10.281315738910303, 43.60918145612981],
           [10.268956119769678, 43.656891171520144],
           [10.273075992816553, 43.70158462959755],
           [10.266209537738428, 43.73334638914901],
           [10.262089664691553, 43.78492333929173],
           [10.238743717425928, 43.83843695830995],
           [10.212651188129053, 43.89289227846965]]]]);


// 2. Pre-processing Function for Landsat 8
var processL8 = function(image) {
  var opticalBands = image.select('SR_B.*').multiply(0.0000275).add(-0.2);
  
  // Plastic Index Formula: NIR / (NIR + SWIR1)
  // Landsat 8: B5 is NIR, B6 is SWIR1
  var plasticIndex = opticalBands.expression(
    'NIR / (NIR + SWIR1)', {
      'NIR': opticalBands.select('SR_B5'),
      'SWIR1': opticalBands.select('SR_B6')
    }).rename('PI');
    
  return image.addBands(plasticIndex).select('PI');
};

// 3. Load 2025 Median Plastic Index
var pi2025 = ee.ImageCollection("LANDSAT/LC08/C02/T1_L2")
  .filterBounds(areaOfInterest)
  .filterDate('2025-01-01', '2025-12-31')
  .map(processL8)
  .median()
  .clip(areaOfInterest);

// 4. Visualization Parameters
// Plastic usually results in higher values in this index (closer to 0.4 - 0.6)
var piViz = {
  min: 0, 
  max: 0.6, 
  palette: ['#0000FF', '#00FFFF', '#FFFF00', '#FF0000'] // Blue to Red
};

// 5. Display the Map
Map.addLayer(pi2025, piViz, 'Plastic Index Median 2025');

// --- 6. ADDING THE LEGEND ---

var legend = ui.Panel({
  style: {
    position: 'bottom-left',
    padding: '8px 15px'
  }
});

var legendTitle = ui.Label({
  value: 'Plastic Index (2025)',
  style: {fontWeight: 'bold', fontSize: '16px', margin: '0 0 4px 0', padding: '0'}
});
legend.add(legendTitle);

var makeRow = function(color, name) {
  var colorBox = ui.Label({
    style: {
      backgroundColor: color,
      padding: '8px',
      margin: '0 0 4px 0'
    }
  });
  var description = ui.Label({
    value: name,
    style: {margin: '0 0 4px 6px'}
  });
  return ui.Panel({
    widgets: [colorBox, description],
    layout: ui.Panel.Layout.Flow('horizontal')
  });
};

// Labels adjusted for Plastic Index typical ranges
var names = ['Water/Low', 'Low Probability', 'Medium Probability', 'High Probability (Potential Plastic)'];
var colors = ['#0000FF', '#00FFFF', '#FFFF00', '#FF0000'];

for (var i = 0; i < 4; i++) {
  legend.add(makeRow(colors[i], names[i]));
}

Map.add(legend);

// 7. Export the result
Export.image.toDrive({
  image: pi2025,
  description: 'Plastic_Index_2025',
  folder: 'EarthEngine_Exports',
  fileNamePrefix: 'PI_2025_Site',
  region: areaOfInterest,
  scale: 30,
  crs: 'EPSG:4326',
  fileFormat: 'GeoTIFF',
  formatOptions: {
    cloudOptimized: true
  }
});

Il problema e' che la risposta delle plastiche e' simile a quella della vegetazione (delle alghe nello specifico) per quanto riguardo l'infrarosso vicino


 

 Per cercare di discriminare si possono incrociare NDVI e PI

 

/**
 * Landsat 8 Plastic vs. Algae Discrimination Script
 * 2025 Median Composite
 */

var areaOfInterest =

    /* color: #98ff00 */
    /* shown: false */
    ee.Geometry.MultiPolygon(
        [[[[9.950883097809994, 43.88872579729536],
           [9.934403605622494, 43.48554689334167],
           [10.292832560700619, 43.51443524985007],
           [10.269486613434994, 43.53634133439625],
           [10.263993449372494, 43.62985044452412],
           [10.259873576325619, 43.7311534518937],
           [10.253007121247494, 43.76289953785138],
           [10.182969279450619, 43.910495519373605]]],
         [[[9.911900455707178, 43.92257421254923],
           [9.872075016254053, 43.45486023186587],
           [10.347233707660303, 43.44688454959106],
           [10.314274723285303, 43.50169612390543],
           [10.299168522113428, 43.51862782505474],
           [10.268956119769678, 43.55048637670663],
           [10.277195865863428, 43.565414278891666],
           [10.288182193988428, 43.57735393781841],
           [10.281315738910303, 43.60918145612981],
           [10.268956119769678, 43.656891171520144],
           [10.273075992816553, 43.70158462959755],
           [10.266209537738428, 43.73334638914901],
           [10.262089664691553, 43.78492333929173],
           [10.238743717425928, 43.83843695830995],
           [10.212651188129053, 43.89289227846965]]]]);




// 2. Cloud and Water Masking
var maskCloudsAndLand = function(image) {
  var qa = image.select('QA_PIXEL');
  // Cloud/Cirrus Mask
  var cloudMask = qa.bitwiseAnd(1 << 1).eq(0)
    .and(qa.bitwiseAnd(1 << 2).eq(0))
    .and(qa.bitwiseAnd(1 << 3).eq(0))
    .and(qa.bitwiseAnd(1 << 4).eq(0));
  
  // Optional: Simple Water Mask (using QA bits)
  // This helps focus only on marine/river surfaces
  var waterMask = qa.bitwiseAnd(1 << 7).neq(0); 
  
  return image.updateMask(cloudMask).updateMask(waterMask);
};

// 3. Discrimination Processing Function
var processDiscrimination = function(image) {
  // Scaling factors for Landsat 8 C2 L2
  var optical = image.select('SR_B.*').multiply(0.0000275).add(-0.2);
  
  // A. NDVI: Highlights Chlorophyll (Algae/Vegetation)
  var ndvi = optical.normalizedDifference(['SR_B5', 'SR_B4']).rename('NDVI');
  
  // B. Plastic Index (PI): Highlights high-reflectance floating solids
  var pi = optical.expression('B5 / (B5 + B6)', {
    'B5': optical.select('SR_B5'), // NIR
    'B6': optical.select('SR_B6')  // SWIR1
  }).rename('PI');

  // C. Discrimination Logic
  // Plastic: Moderate/High PI AND relatively low NDVI (compared to algae)
  // Algae: Very High NDVI AND lower PI (due to SWIR absorption by water content)
  var plasticMask = pi.gt(0.4).and(ndvi.lt(0.2)).rename('Plastic_Likely');
  var algaeMask = ndvi.gt(0.3).and(pi.lt(0.4)).rename('Algae_Likely');
  
  return image.addBands([ndvi, pi, plasticMask, algaeMask]);
};

// 4. Load Collection and Create Median
var collection = ee.ImageCollection("LANDSAT/LC08/C02/T1_L2")
  .filterBounds(areaOfInterest)
  .filterDate('2025-01-01', '2025-12-31')
  .map(maskCloudsAndLand)
  .map(processDiscrimination);

var finalMedian = collection.median().clip(areaOfInterest);

// 5. Visualization
Map.centerObject(areaOfInterest, 12);

// Natural Color for context
Map.addLayer(finalMedian, {bands:['SR_B4','SR_B3','SR_B2'], min:0, max:0.3}, 'Natural Color RGB');

// Algae Layer (Green)
Map.addLayer(finalMedian.select('Algae_Likely').selfMask(), {palette: ['#00FF00']}, 'Detected Algae');

// Plastic Layer (Red)
Map.addLayer(finalMedian.select('Plastic_Likely').selfMask(), {palette: ['#FF0000']}, 'Detected Plastic');

// 6. Legend for UI
var legend = ui.Panel({style: {position: 'bottom-right', padding: '8px 15px'}});
legend.add(ui.Label({value: 'Classification', style: {fontWeight: 'bold'}}));
var addLegendRow = function(color, text) {
  var row = ui.Panel({layout: ui.Panel.Layout.Flow('horizontal')});
  row.add(ui.Label({style: {backgroundColor: color, padding: '8px', margin: '0 0 4px 0'}}));
  row.add(ui.Label({value: text, style: {margin: '0 0 4px 6px'}}));
  legend.add(row);
};
addLegendRow('#00FF00', 'Organic / Algae (High NDVI)');
addLegendRow('#FF0000', 'Plastic / Debris (High PI, Low NDVI)');
Map.add(legend);

// 7. Export
Export.image.toDrive({
  image: finalMedian.select(['Plastic_Likely', 'Algae_Likely']),
  description: 'Plastic_vs_Algae_2025',
  folder: 'EarthEngine_Exports',
  region: areaOfInterest,
  scale: 30
}); 

 


Tutto molto lineare ed anche ragionevole che alla foce di un fiume come l'Arno vi sia un accumulo di microplastiche in mare. Ma se si legge l'articolo Monnanni et alii 2024 e' stimata una quantita' di fibre di 7.6 t/y  con 56,011 ± 16,411 particelle per litro rappresentate principalmente da dimensioni 60 μm

Considerando che studi su Sentinel 2 hanno posto il limite di detection della plastica del 25% di superficie del pixel e considerando che un pixel e' 1000 metri quadri (10x10 m) per una superficie di plastica di almeno 25 mq. Limite simile (30 mq su pixel di 30x30 m) viene indicato per Landsat 8. Non e' pensabile che di rilevare da satellite rilevare le concentrazioni indicate nell'articoli di Monnanni  

Volendo passare all'iperspettrale si vede una prima differenza (si precisa che il dato Enmap e'stato corretto con l'algoritmo di water correction..questo vuol dire che il dato scaricato da  EOWeb risulta mascherato sulla terraferma) Nel dato multispettrale di Sentinel 2 la riflettanza e' bassa ma apprezzabile nel dato iperspettrale si registra praticamente solo rumore

 
Sentinel 2

Enmap (correzione water)

 Questo comportamento e' dato dall'algoritmo MIP Modular Inversion Program di Enmap che di fatto forza a zero i valori di riflettanza sopra i 900 nm. Per avere un comportamento simile a Sentinel si deve quindi processare l'immagine Enmap su EOWeb con l'algoritmo di rimozione atmosferica terrestre (dove viene usato SICOR) anche se si sta lavorando in mare

 

Usando la correzione atmosferica per Land si ha come risulta uno spettro di questo tipo per punti in acqua

i punti a riflettanza 6.5 sono chiaramente errori e devono essere scartati. Modificando la scala verticale del grafico

si osserva che l'algoritmo Land di Enmap non taglia le lunghezze d'onda SWIR come l'algoritmo Water (anche se ovviamente avendo riflettanze cosi' basse il rumore strumentale e' molto significativo

 

Per replicare quanto fatto (unica immagine disponibili foce del Serchio) gli indici spettrali in multispettrale con i dati iperspettrali si usa l'approccio di derivata prima  

Red Edge Slope (Algae): (B_705nm - B_680nm) / 25 

Hydrocarbon Slope (Plastic): (B_1720nm - B_1730nm) / 15 

In Enmap con Band Math diventano

Algae = (Band054-Band050)/30  (706-679 nm) 
Hydro = (Band160-Band162)/21 (1717-1738 nm)

RGB

 

Hydro
 

Algae

 

con la regola si evidenziano i potenziali punti di presenza di plastica

if (Plastic_Derivative > 0.005 && Algae_Derivative < 0.002) then 1 else 0  

 



per terminare un metodo multispettrale basato su FDI

 

var aoi = 
    /* color: #d63000 */
    /* shown: false */
    /* displayProperties: [
      {
        "type": "rectangle"
      }
    ] */
    ee.Geometry.Polygon(
        [[[8.78864704620947, 43.906970344109496],
          [8.78864704620947, 43.094138414833864],
          [10.18391071808447, 43.094138414833864],
          [10.18391071808447, 43.906970344109496]]], null, false);

 

/**
 * Detect Plastic Debris in Oceans using Remote Sensing (Sentinel-2)
 * Based on the methodology by Jonas (Medium: @escuvert)
 * * Logic: 
 * 1. FDI (Floating Debris Index) identifies floating matter (plastic/vegetation).
 * 2. NDVI (Normalized Difference Vegetation Index) identifies organic vegetation.
 * 3. Plastic is identified where FDI is high and NDVI is low.
 */


/**
 * Detect Plastic Debris in Oceans using Remote Sensing (Sentinel-2)
 * Optimized for Noise Reduction in SWIR Bands
 */

// 1. Define Area of Interest (Ensure 'aoi' is defined in your imports or geometry tools)
Map.centerObject(aoi, 9);

// 2. Load and Pre-process Sentinel-2 Data
// Using the Harmonized collection to ensure consistent scaling across time
var dataset = ee.ImageCollection('COPERNICUS/S2_SR_HARMONIZED')
  .filterBounds(aoi)
  .filterDate('2021-08-01', '2021-08-31')
  .filter(ee.Filter.lt('CLOUDY_PIXEL_PERCENTAGE', 5))
  .median() // STEP 1: Temporal Reduction (Option 1) - removes transient noise/waves
  .clip(aoi);

// 3. Remove SWIR Noise (Spatial Filtering)
// SWIR bands (B11) often have sensor noise that creates "fake" plastic pixels.
// We apply a focal median filter to smooth out these spikes.
var swirClean = dataset.select('B11')
  .focal_median({radius: 20, units: 'meters', iterations: 1})
  .rename('B11_clean');

// Add the cleaned band back into our main image
var img = dataset.addBands(swirClean, null, true);

// 4. Create a Water Mask
// This prevents the script from detecting plastic on land/buildings.
var ndwi = img.normalizedDifference(['B3', 'B8']).rename('NDWI');
var waterMask = ndwi.gt(0.1); // Keep only water pixels

// 5. Define the Bands for Indices
var nir = img.select('B8');      // 833nm
var re2 = img.select('B6');      // 740nm
var swir1 = img.select('B11_clean'); // Using our noise-reduced SWIR
var red = img.select('B4');      // 665nm

// 6. Calculate NDVI (to distinguish vegetation from plastic)
var ndvi = img.normalizedDifference(['B8', 'B4']).rename('NDVI');

// 7. Calculate FDI (Floating Debris Index)
// Formula: NIR - [RE2 + (SWIR1 - RE2) * ( (λNIR - λRE2) / (λSWIR1 - λRE2) )]
var lambdaNIR = 833;
var lambdaRE2 = 740;
var lambdaSWIR1 = 1610;

var fdi = nir.subtract(
  re2.add(
    swir1.subtract(re2).multiply(
      (lambdaNIR - lambdaRE2) / (lambdaSWIR1 - lambdaRE2)
    )
  )
).rename('FDI');

// 8. Apply Thresholds and Masking
// FDI > 0.03 (Floating debris) 
// NDVI < 0.1 (Likely non-organic/plastic)
// waterMask (Only in the ocean/bay)
var fdiThreshold = 0.03; 
var ndviThreshold = 0.1;

var plasticDebris = fdi.gt(fdiThreshold)
  .and(ndvi.lt(ndviThreshold))
  .and(waterMask);

var plasticMasked = plasticDebris.updateMask(plasticDebris);

// 9. Visualization
var rgbVis = {min: 0, max: 3000, bands: ['B4', 'B3', 'B2']};
Map.addLayer(img, rgbVis, 'Original Sentinel-2 (RGB)');
Map.addLayer(fdi.updateMask(waterMask), {min: -0.05, max: 0.1, palette: ['blue', 'yellow', 'red']}, 'FDI (Heatmap)');
Map.addLayer(plasticMasked, {palette: ['#FF00FF']}, 'Detected Plastic Debris');

// 10. Print Stats
var area = plasticDebris.multiply(ee.Image.pixelArea()).reduceRegion({
  reducer: ee.Reducer.sum(),
  geometry: aoi,
  scale: 10,
  maxPixels: 1e9
});

print('Estimated Plastic Area (sq meters):', area.get('FDI')); 

 

il problema utilizzando questo approccio e' il rumore derivante dal sensore SWIR 

 

FDI

 

Plastica

 





domenica 15 febbraio 2026

Marida dataset

Kikaki K, Kakogeorgiou I, Mikeli P, Raitsos DE, Karantzalos K (2022) MARIDA: A benchmark for Marine Debris detection from Sentinel-2 remote sensing data. PLoS ONE 17(1): e0262247. https://doi.org/10.1371/journal.pone.0262247

///////////////////////////////////////////  

MARIDA e' un dataset di 1381 immagini Sentinel 2 di dati classificati a libero download che puo' essere usato per testare gli algoritmi

 

Si puo' scaricare da questo link https://www.kaggle.com/datasets/weinima/marida o https://zenodo.org/records/5151941

Nel folder patches ci sono geotiff a 11 bande di dimensione 256x256 pixel ed un file con lo stesso nome *_cl.tif in cui e' indicata la classificazione dei pixel della patch multispettrale 

Classi

1: Marine Debris
2: Dense Sargassum
3: Sparse Sargassum
4: Natural Organic Material
5: Ship
6: Clouds
7: Marine Water
8: Sediment-Laden Water
9: Foam
10: Turbid Water
11: Shallow Water
12: Waves
13: Cloud Shadows
14: Wakes
15: Mixed Water
 

 


 come si vede la distribuzione del numero di elementi in ogni classe e' estremamente variabile


 


Per addestrare una rete random forest mi sono create un csv in cui per ogni riga sono riportati i valori delle 11 bande 

import rasterio
from pathlib import Path
import os

#awk -F, '{counts[$NF]++} END {for (val in counts) print val, counts[val]}' yourfile.csv | sort -n

csv_string =""

folder_path = './train/'
files = [f for f in os.listdir(folder_path)
if f.endswith('.tif') and not f.endswith('_cl.tif')]

for filename in files:
base_name = Path(filename).stem
print(base_name)

for r in range(255):
for c in range(255):
with rasterio.open(folder_path+ base_name+".tif") as src:
# Read all bands
data = src.read()
pixel_values = data[:, r, c]
csv_string = ",".join(map(str, pixel_values))
#print(csv_string)
# apre la maschera di categoria
with rasterio.open(folder_path+base_name+"_cl.tif") as src:
data = src.read()
pixel_cat = data[:, r, c]
pp = str(int(pixel_cat[0]))
csv_string = csv_string+","+ pp
print(csv_string)


Uno dei problemi e' che la classe 0 da sola rappresenta il 99% dei dati

0 740486
1 73
2 40
3 42
5 200
6 16
7 223
8 13828
9 2
10 120
11 67
13 27
14 254
15 5

Proviamo ad usare il file CSV per istruire una rete random forest (impiegato Google Colab)

 

from google.colab import drive
drive.mount('/content/drive')

# Install the matching pair for stability
!pip install tensorflow-decision-forests wurlitzer
import pandas as pd
import tensorflow as tf
import tensorflow_decision_forests as tfdf
from sklearn.model_selection import train_test_split
import os

# Force Keras 2 usage for TF-DF compatibility
os.environ['TF_USE_LEGACY_KERAS'] = '1'

# Verify the versions
print(f"TensorFlow version: {tf.__version__}")
print(f"TF-DF version: {tfdf.__version__}")

# 1. Load Data
df = pd.read_csv('/content/drive/My Drive/spettri3.csv')

# categoria nell'ultima colonna
LABEL = df.columns[-1]

# 2. Split Data
train_df, test_df = train_test_split(df, test_size=0.2, random_state=42)

# 3. Convert to TF Dataset
train_ds = tfdf.keras.pd_dataframe_to_tf_dataset(train_df, label=LABEL)
test_ds = tfdf.keras.pd_dataframe_to_tf_dataset(test_df, label=LABEL)

# 4. Create and Train Random Forest
model = tfdf.keras.RandomForestModel(task=tfdf.keras.Task.CLASSIFICATION)
model.compile(metrics=["accuracy"])

print("Starting Training...")
model.fit(train_ds)

# 5. Summary and Evaluation
print("\n--- Model Summary ---")
model.summary()

evaluation = model.evaluate(test_ds, return_dict=True)
print(f"\nTest Accuracy: {evaluation['accuracy']:.4f}")

# 6. Save the Model
model.save("exported_model")
print("Model saved to /app/exported_model")
import numpy as np
from sklearn.metrics import classification_report

# Get predictions on the test dataset
# The predict method returns probabilities for each class
predictions_prob = model.predict(test_ds)

# Convert probabilities to predicted class labels
predicted_labels = np.argmax(predictions_prob, axis=1)

# Extract true labels from the test_df
# Assuming LABEL was defined as the last column of df
true_labels = test_df[LABEL].values

# Get unique class labels (categories)
class_labels = np.unique(true_labels)

# Generate a classification report to show precision, recall, and F1-score for each class
print("\n--- Classification Report per Category ---")
print(classification_report(true_labels, predicted_labels, target_names=[str(c) for c in class_labels]))

# You can also manually calculate accuracy per class if preferred
print("\n--- Accuracy per Category ---")
for cls in class_labels:
idx = (true_labels == cls)
correct_predictions_for_cls = (predicted_labels[idx] == cls).sum()
total_predictions_for_cls = idx.sum()
if total_predictions_for_cls > 0:
accuracy_for_cls = correct_predictions_for_cls / total_predictions_for_cls
print(f"Category {cls}: Accuracy = {accuracy_for_cls:.4f}")
else:
print(f"Category {cls}: No true instances in test set.")
 
 

  

--- Model Summary ---
Model: "random_forest_model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
=================================================================
Total params: 1 (1.00 Byte)
Trainable params: 0 (0.00 Byte)
Non-trainable params: 1 (1.00 Byte)
_________________________________________________________________
Type: "RANDOM_FOREST"
Task: CLASSIFICATION
Label: "__LABEL"

Input Features (11):
	0.0111599155
	0.01483216
	0.01739901
	0.018406885
	0.018967822
	0.02049054
	0.020988442
	0.021692224
	0.026393838
	0.033805072
	0.036814593

No weights

Variable Importance: INV_MEAN_MIN_DEPTH:
    1.  "0.021692224"  0.278557 ################
    2.  "0.036814593"  0.164558 #####
    3.  "0.026393838"  0.146458 ###
    4.   "0.01483216"  0.134337 ##
    5.   "0.02049054"  0.125129 #
    6.  "0.018406885"  0.124548 #
    7.   "0.01739901"  0.119835 #
    8.  "0.033805072"  0.116388 #
    9.  "0.018967822"  0.114600 
   10.  "0.020988442"  0.105245 
   11. "0.0111599155"  0.105237 

Variable Importance: NUM_AS_ROOT:
    1. "0.021692224" 138.000000 ################
    2. "0.026393838" 82.000000 #########
    3. "0.033805072" 52.000000 #####
    4. "0.018406885" 17.000000 #
    5. "0.018967822" 10.000000 #
    6. "0.020988442"  1.000000 

Variable Importance: NUM_NODES:
    1.  "0.036814593" 33033.000000 ################
    2.   "0.01483216" 24259.000000 ######
    3.  "0.018406885" 23259.000000 #####
    4.  "0.020988442" 21752.000000 ###
    5.   "0.02049054" 21560.000000 ###
    6. "0.0111599155" 20757.000000 ##
    7.  "0.026393838" 20511.000000 #
    8.  "0.018967822" 20197.000000 #
    9.   "0.01739901" 19841.000000 #
   10.  "0.033805072" 18931.000000 
   11.  "0.021692224" 18752.000000 

Variable Importance: SUM_SCORE:
    1.  "0.021692224" 6680292.214150 ################
    2.  "0.026393838" 3547615.844493 ########
    3.  "0.033805072" 2038345.698294 ####
    4.  "0.036814593" 1304726.855687 ##
    5.  "0.018406885" 1110520.264613 #
    6.   "0.01483216" 836882.331232 #
    7.  "0.018967822" 783745.709187 
    8.   "0.02049054" 776627.158520 
    9.  "0.020988442" 530432.154355 
   10.   "0.01739901" 456566.598678 
   11. "0.0111599155" 401849.939701 



Winner takes all: true
Out-of-bag evaluation: accuracy:0.998054 logloss:0.0105165
Number of trees: 300
Total number of nodes: 486004

Number of nodes by tree:
Count: 300 Average: 1620.01 StdDev: 76.5775
Min: 1423 Max: 1799 Ignored: 0
----------------------------------------------
[ 1423, 1441)  2   0.67%   0.67%
[ 1441, 1460)  6   2.00%   2.67% #
[ 1460, 1479)  4   1.33%   4.00% #
[ 1479, 1498)  8   2.67%   6.67% ##
[ 1498, 1517)  9   3.00%   9.67% ##
[ 1517, 1536) 10   3.33%  13.00% ##
[ 1536, 1554) 16   5.33%  18.33% ####
[ 1554, 1573) 19   6.33%  24.67% #####
[ 1573, 1592) 26   8.67%  33.33% ######
[ 1592, 1611) 41  13.67%  47.00% ##########
[ 1611, 1630) 34  11.33%  58.33% ########
[ 1630, 1649) 24   8.00%  66.33% ######
[ 1649, 1668) 20   6.67%  73.00% #####
[ 1668, 1686) 19   6.33%  79.33% #####
[ 1686, 1705) 13   4.33%  83.67% ###
[ 1705, 1724) 18   6.00%  89.67% ####
[ 1724, 1743) 11   3.67%  93.33% ###
[ 1743, 1762) 12   4.00%  97.33% ###
[ 1762, 1781)  5   1.67%  99.00% #
[ 1781, 1799]  3   1.00% 100.00% #

Depth by leafs:
Count: 243152 Average: 12.0108 StdDev: 2.39733
Min: 3 Max: 15 Ignored: 0
----------------------------------------------
[  3,  4)   128   0.05%   0.05%
[  4,  5)   389   0.16%   0.21%
[  5,  6)  1084   0.45%   0.66%
[  6,  7)  2770   1.14%   1.80% #
[  7,  8)  6288   2.59%   4.38% #
[  8,  9) 11242   4.62%   9.01% ##
[  9, 10) 17797   7.32%  16.33% ####
[ 10, 11) 24395  10.03%  26.36% #####
[ 11, 12) 30327  12.47%  38.83% #######
[ 12, 13) 34392  14.14%  52.98% #######
[ 13, 14) 35111  14.44%  67.42% ########
[ 14, 15) 32791  13.49%  80.90% #######
[ 15, 15] 46438  19.10% 100.00% ##########

Number of training obs by leaf:
Count: 243152 Average: 745.589 StdDev: 6136.2
Min: 5 Max: 219057 Ignored: 0
----------------------------------------------
[      5,  10957) 239559  98.52%  98.52% ##########
[  10957,  21910)   1503   0.62%  99.14%
[  21910,  32862)    888   0.37%  99.51%
[  32862,  43815)    355   0.15%  99.65%
[  43815,  54768)    235   0.10%  99.75%
[  54768,  65720)    109   0.04%  99.79%
[  65720,  76673)     68   0.03%  99.82%
[  76673,  87626)     68   0.03%  99.85%
[  87626,  98578)     62   0.03%  99.87%
[  98578, 109531)    112   0.05%  99.92%
[ 109531, 120484)     56   0.02%  99.94%
[ 120484, 131436)     24   0.01%  99.95%
[ 131436, 142389)     32   0.01%  99.97%
[ 142389, 153342)     19   0.01%  99.97%
[ 153342, 164294)     18   0.01%  99.98%
[ 164294, 175247)     14   0.01%  99.99%
[ 175247, 186200)     20   0.01% 100.00%
[ 186200, 197152)      5   0.00% 100.00%
[ 197152, 208105)      2   0.00% 100.00%
[ 208105, 219057]      3   0.00% 100.00%

Attribute in nodes:
	33033 : 0.036814593 [NUMERICAL]
	24259 : 0.01483216 [NUMERICAL]
	23259 : 0.018406885 [NUMERICAL]
	21752 : 0.020988442 [NUMERICAL]
	21560 : 0.02049054 [NUMERICAL]
	20757 : 0.0111599155 [NUMERICAL]
	20511 : 0.026393838 [NUMERICAL]
	20197 : 0.018967822 [NUMERICAL]
	19841 : 0.01739901 [NUMERICAL]
	18931 : 0.033805072 [NUMERICAL]
	18752 : 0.021692224 [NUMERICAL]

Attribute in nodes with depth <= 0:
	138 : 0.021692224 [NUMERICAL]
	82 : 0.026393838 [NUMERICAL]
	52 : 0.033805072 [NUMERICAL]
	17 : 0.018406885 [NUMERICAL]
	10 : 0.018967822 [NUMERICAL]
	1 : 0.020988442 [NUMERICAL]

Attribute in nodes with depth <= 1:
	380 : 0.021692224 [NUMERICAL]
	122 : 0.026393838 [NUMERICAL]
	83 : 0.01739901 [NUMERICAL]
	72 : 0.033805072 [NUMERICAL]
	57 : 0.036814593 [NUMERICAL]
	54 : 0.018406885 [NUMERICAL]
	51 : 0.018967822 [NUMERICAL]
	38 : 0.02049054 [NUMERICAL]
	26 : 0.020988442 [NUMERICAL]
	10 : 0.01483216 [NUMERICAL]
	7 : 0.0111599155 [NUMERICAL]

Attribute in nodes with depth <= 2:
	581 : 0.021692224 [NUMERICAL]
	229 : 0.02049054 [NUMERICAL]
	219 : 0.01483216 [NUMERICAL]
	193 : 0.018406885 [NUMERICAL]
	186 : 0.01739901 [NUMERICAL]
	154 : 0.036814593 [NUMERICAL]
	150 : 0.026393838 [NUMERICAL]
	117 : 0.018967822 [NUMERICAL]
	100 : 0.020988442 [NUMERICAL]
	92 : 0.033805072 [NUMERICAL]
	79 : 0.0111599155 [NUMERICAL]

Attribute in nodes with depth <= 3:
	766 : 0.021692224 [NUMERICAL]
	633 : 0.036814593 [NUMERICAL]
	514 : 0.02049054 [NUMERICAL]
	503 : 0.01483216 [NUMERICAL]
	345 : 0.01739901 [NUMERICAL]
	300 : 0.020988442 [NUMERICAL]
	288 : 0.018406885 [NUMERICAL]
	284 : 0.018967822 [NUMERICAL]
	277 : 0.0111599155 [NUMERICAL]
	272 : 0.026393838 [NUMERICAL]
	190 : 0.033805072 [NUMERICAL]

Attribute in nodes with depth <= 5:
	2877 : 0.036814593 [NUMERICAL]
	1895 : 0.01483216 [NUMERICAL]
	1725 : 0.02049054 [NUMERICAL]
	1541 : 0.021692224 [NUMERICAL]
	1244 : 0.01739901 [NUMERICAL]
	1210 : 0.018406885 [NUMERICAL]
	1146 : 0.020988442 [NUMERICAL]
	1127 : 0.026393838 [NUMERICAL]
	1069 : 0.0111599155 [NUMERICAL]
	1041 : 0.018967822 [NUMERICAL]
	878 : 0.033805072 [NUMERICAL]

Condition type in nodes:
	242852 : HigherCondition
Condition type in nodes with depth <= 0:
	300 : HigherCondition
Condition type in nodes with depth <= 1:
	900 : HigherCondition
Condition type in nodes with depth <= 2:
	2100 : HigherCondition
Condition type in nodes with depth <= 3:
	4372 : HigherCondition
Condition type in nodes with depth <= 5:
	15753 : HigherCondition
Node format: NOT_SET

Training OOB:
	trees: 1, Out-of-bag evaluation: accuracy:0.996575 logloss:0.123458
	trees: 9, Out-of-bag evaluation: accuracy:0.997327 logloss:0.0479432
	trees: 19, Out-of-bag evaluation: accuracy:0.99767 logloss:0.0254875
	trees: 29, Out-of-bag evaluation: accuracy:0.997806 logloss:0.0193881
	trees: 39, Out-of-bag evaluation: accuracy:0.997889 logloss:0.0171792
	trees: 49, Out-of-bag evaluation: accuracy:0.997902 logloss:0.0153532
	trees: 59, Out-of-bag evaluation: accuracy:0.997955 logloss:0.0144014
	trees: 69, Out-of-bag evaluation: accuracy:0.998018 logloss:0.0137202
	trees: 79, Out-of-bag evaluation: accuracy:0.998029 logloss:0.0131981
	trees: 89, Out-of-bag evaluation: accuracy:0.998024 logloss:0.0128051
	trees: 99, Out-of-bag evaluation: accuracy:0.998036 logloss:0.0124228
	trees: 109, Out-of-bag evaluation: accuracy:0.998042 logloss:0.0120357
	trees: 119, Out-of-bag evaluation: accuracy:0.998049 logloss:0.0117484
	trees: 129, Out-of-bag evaluation: accuracy:0.998054 logloss:0.0116494
	trees: 139, Out-of-bag evaluation: accuracy:0.998069 logloss:0.0113382
	trees: 149, Out-of-bag evaluation: accuracy:0.998067 logloss:0.0112786
	trees: 159, Out-of-bag evaluation: accuracy:0.998066 logloss:0.0111747
	trees: 169, Out-of-bag evaluation: accuracy:0.998064 logloss:0.0111785
	trees: 179, Out-of-bag evaluation: accuracy:0.998071 logloss:0.0109639
	trees: 189, Out-of-bag evaluation: accuracy:0.998066 logloss:0.0108541
	trees: 199, Out-of-bag evaluation: accuracy:0.998052 logloss:0.0107001
	trees: 209, Out-of-bag evaluation: accuracy:0.998041 logloss:0.0106965
	trees: 219, Out-of-bag evaluation: accuracy:0.998056 logloss:0.0106957
	trees: 229, Out-of-bag evaluation: accuracy:0.998066 logloss:0.0106392
	trees: 239, Out-of-bag evaluation: accuracy:0.998075 logloss:0.010582
	trees: 249, Out-of-bag evaluation: accuracy:0.998064 logloss:0.0105818
	trees: 259, Out-of-bag evaluation: accuracy:0.998056 logloss:0.0105275
	trees: 269, Out-of-bag evaluation: accuracy:0.998072 logloss:0.0105194
	trees: 279, Out-of-bag evaluation: accuracy:0.998059 logloss:0.0105219
	trees: 289, Out-of-bag evaluation: accuracy:0.998056 logloss:0.0105255
	trees: 299, Out-of-bag evaluation: accuracy:0.998049 logloss:0.010518
	trees: 300, Out-of-bag evaluation: accuracy:0.998054 logloss:0.0105165

152/152 [==============================] - 10s 65ms/step - loss: 0.0000e+00 - accuracy: 0.9979

Test Accuracy: 0.9979
WARNING:absl:`0.0111599155` is not a valid tf.function parameter name. Sanitizing to `arg_0_0111599155`.
WARNING:absl:`0.01483216` is not a valid tf.function parameter name. Sanitizing to `arg_0_01483216`.
WARNING:absl:`0.01739901` is not a valid tf.function parameter name. Sanitizing to `arg_0_01739901`.
WARNING:absl:`0.018406885` is not a valid tf.function parameter name. Sanitizing to `arg_0_018406885`.
WARNING:absl:`0.018967822` is not a valid tf.function parameter name. Sanitizing to `arg_0_018967822`.
Model saved to /app/exported_model

---------------------------------------------------------------------

L'accuratezza e' molto buona pero' vediamo nel dettaglio come ogni classe viene prevista

--------------------------------------------------------------------- 

 

--- Classification Report per Category ---
              precision    recall  f1-score   support

           0       1.00      1.00      1.00    148127
           1       1.00      0.36      0.53        11
           2       0.75      0.86      0.80         7
           3       0.75      0.60      0.67        10
           5       1.00      0.63      0.78        41
           6       0.00      0.00      0.00         4
           7       1.00      0.09      0.16        35
           8       0.96      0.97      0.97      2744
          10       1.00      0.20      0.33        25
          11       0.90      0.64      0.75        14
          13       0.50      0.33      0.40         3
          14       0.93      0.24      0.38        55
          15       0.00      0.00      0.00         1

    accuracy                           1.00    151077
   macro avg       0.75      0.46      0.52    151077
weighted avg       1.00      1.00      1.00    151077

 

Spectral waste dataset

Questo interessante progetto (avrei voluto fare anche io una cosa praticamente identica) mette a disposizione una serie di immagini di rifiu...