Food detection and recognition using detectron2
Food recognition using deep learning has gained a large popularity recently due to the advent of efficient algorithms and abundant food images publicly available. By taking a snap of the food we eat, now we are able to identify the calorie and nutritional facts instantly.
What is Food Recognition and Why does it matter?
Food recognition is the process of detecting food items in a given image, identifying the type of food and retrieving the nutritional information to the user. A more advanced system might also calculate the volume of food items using segmentation to retrieve more accurate nutritional information.
Some applications of Food Recognition include:
- Nutrition tracking for patients
- Weight loss
- Reduce unhealthy eating habits
Recognition using Detectron2
Detectron2 is FAIR’s (facebook AI) recognition and segmentation framework. It supports segmentation and detection at the same time which is essential in our case. For example, if there are multiple food items in an image, we need to first segment the items and then recognize them.
Some major reasons why we choose detectron2:
- Segmentation and Recognition
- Option to choose the model
Let’s Build our model
Following are the steps involved in building the model:
- Collect food images
- Preprocess the images
- Annotate the processed image collection
- Initialize detectron2 with pretrained model
- Custom train using our dataset
- Evaluate the model
- Make Predictions
Image Collection:
Image has been collected from multiple sources. Google search being primary source, instagram, pinterest and other social medias has been used to collect the images. We collected around 4000 images and 30 classes. Most of them are South Indian foods.
Preprocessing:
Collected images were of different resolution. A basic resizing is done using this script. And for uniformity, renaming of images is done using this script.
Annotation:
Detectron2 uses coco datasets. We use a framework called labelme to annotate or dataset, which gives a json annotation file for each image.
Build model
First we initialize config file with default configuration and our registered dataset. We are initializing the model with mask_rcnn_R_50_FPN_3x.yaml
config file. We also add number of classes and output directory, and keep other parameters default.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
cfg = get_cfg()
cfg.merge_from_file(model_zoo.get_config_file("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml"))
cfg.DATASETS.TRAIN = ("class45_food_indian_train",)
cfg.DATASETS.TEST = ()
cfg.DATALOADER.NUM_WORKERS = 2
cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml")
cfg.SOLVER.IMS_PER_BATCH = 2
cfg.SOLVER.BASE_LR = 0.00025
cfg.SOLVER.MAX_ITER = 7000
cfg.SOLVER.STEPS = []
cfg.MODEL.ROI_HEADS.NUM_CLASSES = class_num
cfg.OUTPUT_DIR = '/content/drive/MyDrive/testwork/output7/'
os.makedirs(cfg.OUTPUT_DIR, exist_ok=True)
trainer = DefaultTrainer(cfg)
trainer.resume_or_load(resume=True)
Then train the model and save weight and config file.
trainer.train()
print(cfg.dump()) # save as cfg.yaml
Now we have our model weights custom trained to our food image dataset. We can now evaluate our model.
1
2
3
4
5
6
7
8
9
10
predictor = DefaultPredictor(cfg)
for imageName in glob.glob('/content/drive/MyDrive/newtestimages/*.jpg'):
im = cv2.imread(imageName)
outputs = predictor(im)
v = Visualizer(im[:, :, ::-1],
metadata=food_metadata,
scale=0.8
)
out = v.draw_instance_predictions(outputs["instances"].to("cpu"))
cv2_imshow(out.get_image()[:, :, ::-1])
Following are some of the results:
We build a custom detectron2 model using food image dataset.