Spoiler: il modello ha effettuato il corretto riconoscimento di poco oltre il 54% dei tentativi...tirando a caso i successi stimati sarebbero del 20%
il dataaset non e' enorme ma si tratta di un esempio (immagini tratte da http://www.endlessforams.org/)
Sono state selezionate le specie con almeno 1500 immagini (ogni classe ha lo stesso numero di immagini in modo da non favorire nessuna classe nel training) e sono state selezionate le immagini a piu' alta qualita'
globigerinoides_ruber images: 1500
globigerina_bulloides images: 1500
globigerinita_glutinata images: 1500
globigerinoides_sacculifer images: 1500
neogloboquadrina_pachyderma images: 1500
Si tratta quindi di algoritmi di image classification basati su class_mode=categorical
le immagini originali sono state tagliate dei 160 pixel alla base per togliere la banda bianca con le scritte
-------------------------------------------
for f in *.jpg
do
convert $f -gravity South -chop 0x160 a_$f
done
-------------------------------------------
===========================================================
import os
a_dir = os.path.join('./train/globigerinoides_ruber')
b_dir = os.path.join('./train/globigerina_bulloides')
c_dir = os.path.join('./train/globigerinita_glutinata')
d_dir = os.path.join('./train/globigerinoides_sacculifer')
e_dir = os.path.join('./train/neogloboquadrina_pachyderma')
print('globigerinoides_ruber images:', len(os.listdir(a_dir)))
print('globigerina_bulloides images:', len(os.listdir(b_dir)))
print('globigerinita_glutinata images:', len(os.listdir(c_dir)))
print('globigerinoides_sacculifer images:', len(os.listdir(d_dir)))
print('neogloboquadrina_pachyderma images:', len(os.listdir(e_dir)))
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
batch_size = 16
from tensorflow.keras.preprocessing.image import ImageDataGenerator
# riscala i valorei
train_datagen = ImageDataGenerator(rescale=1/255)
train_generator = train_datagen.flow_from_directory(
'./train',
target_size=(200, 200), # tutte le immagini vengono riscalate a 200x200. Gli originali sono circa 400x640
batch_size=batch_size,
classes = ['globigerinoides_ruber','globigerina_bulloides','globigerinita_glutinata','globigerinoides_sacculifer','neogloboquadrina_pachyderma'],
class_mode='categorical')
import tensorflow as tf
model = tf.keras.models.Sequential([
# The first convolution
tf.keras.layers.Conv2D(16, (3,3), activation='relu', input_shape=(200, 200, 3)),
tf.keras.layers.MaxPooling2D(2, 2),
tf.keras.layers.Conv2D(32, (3,3), activation='relu'),
tf.keras.layers.MaxPooling2D(2,2),
tf.keras.layers.Conv2D(64, (3,3), activation='relu'),
tf.keras.layers.MaxPooling2D(2,2),
tf.keras.layers.Conv2D(64, (3,3), activation='relu'),
tf.keras.layers.MaxPooling2D(2,2),
tf.keras.layers.Conv2D(64, (3,3), activation='relu'),
tf.keras.layers.MaxPooling2D(2,2),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(128, activation='relu'),
# softmax deve essere impostato al numero di classi
tf.keras.layers.Dense(5, activation='softmax')
])
model.summary()
from tensorflow.keras.optimizers import RMSprop
model.compile(loss='categorical_crossentropy',
optimizer=RMSprop(lr=0.001),
metrics=['acc'])
total_sample=train_generator.n
n_epochs = 15
history = model.fit_generator(
train_generator,
steps_per_epoch=int(total_sample/batch_size),
epochs=n_epochs,
verbose=1)
model.save('1500el_5classi_model.h5')
===========================================================
a_dir = os.path.join('./train/globigerinoides_ruber')
b_dir = os.path.join('./train/globigerina_bulloides')
c_dir = os.path.join('./train/globigerinita_glutinata')
d_dir = os.path.join('./train/globigerinoides_sacculifer')
e_dir = os.path.join('./train/neogloboquadrina_pachyderma')
print('globigerinoides_ruber images:', len(os.listdir(a_dir)))
print('globigerina_bulloides images:', len(os.listdir(b_dir)))
print('globigerinita_glutinata images:', len(os.listdir(c_dir)))
print('globigerinoides_sacculifer images:', len(os.listdir(d_dir)))
print('neogloboquadrina_pachyderma images:', len(os.listdir(e_dir)))
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
batch_size = 16
from tensorflow.keras.preprocessing.image import ImageDataGenerator
# riscala i valorei
train_datagen = ImageDataGenerator(rescale=1/255)
train_generator = train_datagen.flow_from_directory(
'./train',
target_size=(200, 200), # tutte le immagini vengono riscalate a 200x200. Gli originali sono circa 400x640
batch_size=batch_size,
classes = ['globigerinoides_ruber','globigerina_bulloides','globigerinita_glutinata','globigerinoides_sacculifer','neogloboquadrina_pachyderma'],
class_mode='categorical')
import tensorflow as tf
model = tf.keras.models.Sequential([
# The first convolution
tf.keras.layers.Conv2D(16, (3,3), activation='relu', input_shape=(200, 200, 3)),
tf.keras.layers.MaxPooling2D(2, 2),
tf.keras.layers.Conv2D(32, (3,3), activation='relu'),
tf.keras.layers.MaxPooling2D(2,2),
tf.keras.layers.Conv2D(64, (3,3), activation='relu'),
tf.keras.layers.MaxPooling2D(2,2),
tf.keras.layers.Conv2D(64, (3,3), activation='relu'),
tf.keras.layers.MaxPooling2D(2,2),
tf.keras.layers.Conv2D(64, (3,3), activation='relu'),
tf.keras.layers.MaxPooling2D(2,2),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(128, activation='relu'),
# softmax deve essere impostato al numero di classi
tf.keras.layers.Dense(5, activation='softmax')
])
model.summary()
from tensorflow.keras.optimizers import RMSprop
model.compile(loss='categorical_crossentropy',
optimizer=RMSprop(lr=0.001),
metrics=['acc'])
total_sample=train_generator.n
n_epochs = 15
history = model.fit_generator(
train_generator,
steps_per_epoch=int(total_sample/batch_size),
epochs=n_epochs,
verbose=1)
model.save('1500el_5classi_model.h5')
i dati sono salvati in un file .h5
per la verifica del modello sono state selezionate 5 immagini per ogni classe che non erano comprese nel training con 15 epochs
questo e' lo script di predizione basato sulla libreria creata in precedenza (il nome del file immagine da esaminare viene passato sulla linea di comando)
============================================================
import tensorflow as tf
from tensorflow.keras import Model
from PIL import Image
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import sys
from numpy import asarray
import numpy as np
model = tf.compat.v2.keras.models.load_model('../1500el_5classi_model.h5')
image = Image.open(sys.argv[1])
image = image.resize((200,200))
pic = asarray(image)
pic = pic.astype('float32')
pic /= 255.0
pic=np.expand_dims(pic,axis=0)
prediction = model.predict(pic)
print(sys.argv[1])
float_formatter = "{:.2f}".format
np.set_printoptions(formatter={'float_kind':float_formatter})
print(prediction)
print()
============================================================
i risulati sono i seguenti
Corretti | Errati | |||
Globigerina Bulloides | 9 | 1 | ||
Globigerinita Glutinata | 1 | 9 | ||
Globigerinoides Ruber | 3 | 7 | ||
Globigerinoides Sacculifer | 9 | 0 | ||
Neogloboquadrina Pachyderma | 3 | 6 | ||
52% | 48% |
I risultati delle colonne seguenti sono in percentuali di affidabilita' (1=100%)
Globigerinoides Ruber | Globigerina Bulloides | Globigerinita Glutinata | Globigerinoides Sacculifer | Neogloboquadrina Pachyderma | |||
Predizione | |||||||
Immagini reali | |||||||
Globigerina Bulloides | 1 | 1.00 | |||||
2 | 1.00 | ||||||
3 | 1.00 | ||||||
4 | 1.00 | ||||||
5 | 1.00 | ||||||
6 | 0.17 | 0.93 | |||||
7 | 0.25 | 0.74 | 0.01 | ||||
8 | 1.00 | ||||||
9 | 1.00 | ||||||
10 | 1.00 | ||||||
Globigerinita Glutinata | 1 | 0.95 | 0.04 | ||||
2 | 0.32 | 0.03 | 0.65 | ||||
3 | 0.01 | 0.77 | 0.05 | 0.18 | |||
4 | 0.79 | 0.17 | 0.01 | 0.03 | |||
5 | 0.07 | 0.05 | 0.68 | 0.20 | |||
6 | 0.82 | 0.02 | 0.03 | 0.13 | |||
7 | 0.94 | 0.06 | |||||
8 | 0.00 | 0.61 | 0.04 | 0.35 | |||
9 | 0.21 | 0.11 | 0.1 | 0.58 | |||
10 | 0.12 | 0.14 | 0.01 | 0.73 | |||
Globigerinoides Ruber | 1 | 1.00 | |||||
2 | 0.02 | 0.98 | |||||
3 | 0.26 | 0.18 | 0.55 | ||||
4 | 1.00 | ||||||
5 | 0.98 | 0.01 | 0.01 | ||||
6 | 0.70 | 0.30 | |||||
7 | 0.02 | 0.98 | |||||
8 | 1.00 | ||||||
9 | 1.00 | ||||||
10 | 0.63 | 0.36 | |||||
Globigerinoides Sacculifer | 1 | 0.27 | 0.73 | ||||
2 | 0.01 | 0.99 | |||||
3 | 1.00 | ||||||
4 | 1.00 | ||||||
5 | 0.07 | 0.93 | |||||
6 | 1.00 | ||||||
7 | 1.00 | ||||||
8 | 1.00 | ||||||
9 | 1.00 | ||||||
Neogloboquadrina Pachyderma | 1 | 1.00 | |||||
2 | 1.00 | ||||||
3 | 0.89 | 0.01 | |||||
4 | 0.01 | 0.01 | 0.98 | ||||
5 | 1.00 | ||||||
6 | 1.00 | ||||||
7 | 0.95 | 0.01 | |||||
8 | 0.08 | 0.08 | 0.83 | ||||
9 | 0.02 | 0.97 |
Aggiungedo un Dropout del 20% le cose migliorano ma non troppo
Corretti | Errati | |||
Globigerina Bulloides | 10 | 0 | ||
Globigerinita Glutinata | 4 | 6 | ||
Globigerinoides Ruber | 5 | 5 | ||
Globigerinoides Sacculifer | 6 | 3 | ||
Neogloboquadrina Pachyderma | 1 | 9 | ||
54% | 48% |
ho provato ad usare l'optimizer adam con 3 livelli di convoluzione ma i risultati sono stati peggiori