Food recognition using deep learning has gained a large popularity recently due to the advent of efficient algorithms and abundant food images publicly available. By taking a snap of the food we eat, now we are able to identify the calorie and nutritional facts instantly.

What is Food Recognition and Why does it matter?

Food recognition is the process of detecting food items in a given image, identifying the type of food and retrieving the nutritional information to the user. A more advanced system might also calculate the volume of food items using segmentation to retrieve more accurate nutritional information.

Some applications of Food Recognition include:

  • Nutrition tracking for patients
  • Weight loss
  • Reduce unhealthy eating habits

Recognition using Detectron2

Detectron2 is FAIR’s (facebook AI) recognition and segmentation framework. It supports segmentation and detection at the same time which is essential in our case. For example, if there are multiple food items in an image, we need to first segment the items and then recognize them.

Some major reasons why we choose detectron2:

  • Segmentation and Recognition
  • Option to choose the model

Let’s Build our model

Following are the steps involved in building the model:

  1. Collect food images
  2. Preprocess the images
  3. Annotate the processed image collection
  4. Initialize detectron2 with pretrained model
  5. Custom train using our dataset
  6. Evaluate the model
  7. Make Predictions

Image Collection:

Image has been collected from multiple sources. Google search being primary source, instagram, pinterest and other social medias has been used to collect the images. We collected around 4000 images and 30 classes. Most of them are South Indian foods.

Preprocessing:

Collected images were of different resolution. A basic resizing is done using this script. And for uniformity, renaming of images is done using this script.

Annotation:

Detectron2 uses coco datasets. We use a framework called labelme to annotate or dataset, which gives a json annotation file for each image.

Build model

First we initialize config file with default configuration and our registered dataset. We are initializing the model with mask_rcnn_R_50_FPN_3x.yaml config file. We also add number of classes and output directory, and keep other parameters default.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
cfg = get_cfg()
cfg.merge_from_file(model_zoo.get_config_file("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml"))
cfg.DATASETS.TRAIN = ("class45_food_indian_train",)
cfg.DATASETS.TEST = ()
cfg.DATALOADER.NUM_WORKERS = 2
cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml")
cfg.SOLVER.IMS_PER_BATCH = 2
cfg.SOLVER.BASE_LR = 0.00025
cfg.SOLVER.MAX_ITER = 7000
cfg.SOLVER.STEPS = []
cfg.MODEL.ROI_HEADS.NUM_CLASSES = class_num
cfg.OUTPUT_DIR = '/content/drive/MyDrive/testwork/output7/'
os.makedirs(cfg.OUTPUT_DIR, exist_ok=True)
trainer = DefaultTrainer(cfg) 
trainer.resume_or_load(resume=True)

Then train the model and save weight and config file.

trainer.train()
print(cfg.dump()) # save as cfg.yaml

Now we have our model weights custom trained to our food image dataset. We can now evaluate our model.

1
2
3
4
5
6
7
8
9
10
predictor = DefaultPredictor(cfg)
for imageName in glob.glob('/content/drive/MyDrive/newtestimages/*.jpg'):
  im = cv2.imread(imageName)
  outputs = predictor(im)
  v = Visualizer(im[:, :, ::-1],
                metadata=food_metadata, 
                scale=0.8
                 )
  out = v.draw_instance_predictions(outputs["instances"].to("cpu"))
  cv2_imshow(out.get_image()[:, :, ::-1])

Following are some of the results:

Model predictions.

We build a custom detectron2 model using food image dataset.