Vision with Core ML

Introduction

The Vision framework is a computer vision library for iOS. It is developed to allow developers to apply computer vision to their applications. Out of the box, you can perform face recognition, landmark detection, text detection, barcode recognition, image registration, and general feature tracking. With Core ML, you can apply your own custom models for your specific use case.

In this example, I will demonstrate how you can apply your own image classifier with the vision framework. However, before we begin, you must have general knowledge of the iOS Development stack and the Swift programming language. You can download pre-made classifiers here. For this tutorial, I will be using the ResNet50 model.


Setup the Model

After downloading a model of your choice, we need to create a VNCoreMLModel


  var model: VNCoreMLModel?
  
  override func viewDidLoad() {
    super.viewDidLoad()
  
    do {
      model = try VNCoreMLModel(for: ResNet50().model)
    } catch {
      fatalError(error.localizedDescription)
    }
  
  }
              

Classify Images

Now we want to classify some images. First, we need to convert our images of type UIImage to a CGImage then converting that to a CVPixelBuffer

To convert to a CVPixelBuffer, we need a special method to do that. The function below will do that for us. Go ahead and copy that to your code.


func pixelBuffer (forImage image: CGImage, with frameSize: CGSize) -> CVPixelBuffer? {

  var pixelBuffer: CVPixelBuffer?
  
  let status = CVPixelBufferCreate(kCFAllocatorDefault, Int(frameSize.width), Int(frameSize.height), kCVPixelFormatType_32BGRA , nil, &pixelBuffer)
  
  if status != kCVReturnSuccess {
      return nil
  }
  
  CVPixelBufferLockBaseAddress(pixelBuffer!, CVPixelBufferLockFlags.init(rawValue: 0))
 
  let data = CVPixelBufferGetBaseAddress(pixelBuffer!)
  let rgbColorSpace = CGColorSpaceCreateDeviceRGB()
  let bitmapInfo = CGBitmapInfo(rawValue: CGBitmapInfo.byteOrder32Little.rawValue | CGImageAlphaInfo.premultipliedFirst.rawValue)
  let context = CGContext(data: data, 
          width: Int(frameSize.width), 
          height: Int(frameSize.height), 
          bitsPerComponent: 8, 
          bytesPerRow: CVPixelBufferGetBytesPerRow(pixelBuffer!), 
          space: rgbColorSpace, 
          bitmapInfo: bitmapInfo.rawValue)

  context?.draw(image, in: CGRect(x: 0, y: 0, width: image.width, height: image.height))
  
  CVPixelBufferUnlockBaseAddress(pixelBuffer!, CVPixelBufferLockFlags(rawValue: 0))
  
  return pixelBuffer   
}

Now that you have that, we can proceed witht the classification function.


public func classify(image: UIImage) {

    // Step 1: Convert to CGImage
    guard let cgimage = image.cgImage else { return }  

    // Step 2: Convert to CVPixelBuffer
    guard let imagePixelBuffer = pixelBuffer(forImage: cgimage, with: CGSize(width: 416, height: 416)) else { return }

}
    

Next, we need to call the VNCoreMLRequest function and pass in our custom VNCoreMLModel.


let visionRequest = VNCoreMLRequest(model: model) { (request, error) in

}
    

Notice the callback parameters request and error. The request parameter contains the results of the classified image. We want to take this and convert it to an array of VNClassificationObservation. From there, we can parse the information we need to display.


let visionRequest = VNCoreMLRequest(model: model) { (request, error) in
    guard let results = request.results as? [VNClassificationObservation] else { return }
    guard let recentResults = results.first else { return }

    let label = recentResults.identifier
    let confidenceLevel = Double(recentResults.confidence)

    print(label, confidenceLevel)

}

Lastly, we want to call the VNImageRequestHandler. In this function, we pass the VNCoreMLRequest and the imagePixelBuffer


do {
    try VNImageRequestHandler(cvPixelBuffer: imagePixelBuffer, options: [:]).perform([visionRequest])
} catch {
    fatalError(error.localizedDescription)
}

Conclusion

Your entire classification function should look something like this.


public func classify(image: UIImage) {

    guard let cgimage = image.cgImage else { return }  
    guard let imagePixelBuffer = pixelBuffer(forImage: cgimage, with: CGSize(width: 416, height: 416)) else { return }

    let visionRequest = VNCoreMLRequest(model: model) { (request, error) in

        guard let results = request.results as? [VNClassificationObservation] else { return }
        guard let recentResults = results.first else { return }

        let label = recentResults.identifier
        let confidenceLevel = Double(recentResults.confidence)

        print(label, confidenceLevel)

    }

    do {
        try VNImageRequestHandler(cvPixelBuffer: imagePixelBuffer, options: [:]).perform([visionRequest])
    } catch {
        fatalError(error.localizedDescription)
    }
}