使用机器学习套件识别图片中的文本 (iOS)

您可以使用机器学习套件识别图片或视频中的文本,例如 街道标志。此功能的主要特性包括:

文本识别 v2 API
说明识别图片或视频中的文字,支持 拉丁文、中文、梵文、日文和韩文以及 多种语言
SDK 名称GoogleMLKit/TextRecognition
GoogleMLKit/TextRecognitionChinese
GoogleMLKit/TextRecognitionDevanagari
GoogleMLKit/TextRecognitionJapanese
GoogleMLKit/TextRecognitionKorean
实现在构建时将资源静态关联到您的应用
应用大小影响每个脚本 SDK 约 38 MB
性能对于拉丁脚本 SDK,在大多数设备上实时运行;对于其他设备,则速度较慢。

试试看

准备工作

  1. 在 Podfile 中添加以下机器学习套件 Pod:
    # To recognize Latin script
    pod 'GoogleMLKit/TextRecognition', '3.2.0'
    # To recognize Chinese script
    pod 'GoogleMLKit/TextRecognitionChinese', '3.2.0'
    # To recognize Devanagari script
    pod 'GoogleMLKit/TextRecognitionDevanagari', '3.2.0'
    # To recognize Japanese script
    pod 'GoogleMLKit/TextRecognitionJapanese', '3.2.0'
    # To recognize Korean script
    pod 'GoogleMLKit/TextRecognitionKorean', '3.2.0'
    
  2. 安装或更新项目的 Pod 之后,使用 Xcode 项目的 .xcworkspace。Xcode 12.4 或更高版本支持机器学习套件。

1. 创建 TextRecognizer 实例

通过调用以下方法来创建 TextRecognizer 的实例: +textRecognizer(options:),传递与您声明的 SDK 相关的选项 以下依赖项:

Swift

// When using Latin script recognition SDK
let latinOptions = TextRecognizerOptions()
let latinTextRecognizer = TextRecognizer.textRecognizer(options:options)

// When using Chinese script recognition SDK
let chineseOptions = ChineseTextRecognizerOptions()
let chineseTextRecognizer = TextRecognizer.textRecognizer(options:options)

// When using Devanagari script recognition SDK
let devanagariOptions = DevanagariTextRecognizerOptions()
let devanagariTextRecognizer = TextRecognizer.textRecognizer(options:options)

// When using Japanese script recognition SDK
let japaneseOptions = JapaneseTextRecognizerOptions()
let japaneseTextRecognizer = TextRecognizer.textRecognizer(options:options)

// When using Korean script recognition SDK
let koreanOptions = KoreanTextRecognizerOptions()
let koreanTextRecognizer = TextRecognizer.textRecognizer(options:options)

Objective-C

// When using Latin script recognition SDK
MLKTextRecognizerOptions *latinOptions = [[MLKTextRecognizerOptions alloc] init];
MLKTextRecognizer *latinTextRecognizer = [MLKTextRecognizer textRecognizerWithOptions:options];

// When using Chinese script recognition SDK
MLKChineseTextRecognizerOptions *chineseOptions = [[MLKChineseTextRecognizerOptions alloc] init];
MLKTextRecognizer *chineseTextRecognizer = [MLKTextRecognizer textRecognizerWithOptions:options];

// When using Devanagari script recognition SDK
MLKDevanagariTextRecognizerOptions *devanagariOptions = [[MLKDevanagariTextRecognizerOptions alloc] init];
MLKTextRecognizer *devanagariTextRecognizer = [MLKTextRecognizer textRecognizerWithOptions:options];

// When using Japanese script recognition SDK
MLKJapaneseTextRecognizerOptions *japaneseOptions = [[MLKJapaneseTextRecognizerOptions alloc] init];
MLKTextRecognizer *japaneseTextRecognizer = [MLKTextRecognizer textRecognizerWithOptions:options];

// When using Korean script recognition SDK
MLKKoreanTextRecognizerOptions *koreanOptions = [[MLKKoreanTextRecognizerOptions alloc] init];
MLKTextRecognizer *koreanTextRecognizer = [MLKTextRecognizer textRecognizerWithOptions:options];

2. 准备输入图片

将图片作为 UIImageCMSampleBufferRef 传递给 TextRecognizerprocess(_:completion:) 方法:

使用 UIImageVisionImage CMSampleBuffer

如果您使用 UIImage,请按以下步骤操作:

  • 使用 UIImage 创建一个 VisionImage 对象。请务必指定正确的 .orientation

    Swift

    let image = VisionImage(image: UIImage)
    visionImage.orientation = image.imageOrientation

    Objective-C

    MLKVisionImage *visionImage = [[MLKVisionImage alloc] initWithImage:image];
    visionImage.orientation = image.imageOrientation;

如果您使用 CMSampleBuffer,请按以下步骤操作:

  • 指定 CMSampleBuffer

    如需获取图片方向,请执行以下操作:

    Swift

    func imageOrientation(
      deviceOrientation: UIDeviceOrientation,
      cameraPosition: AVCaptureDevice.Position
    ) -> UIImage.Orientation {
      switch deviceOrientation {
      case .portrait:
        return cameraPosition == .front ? .leftMirrored : .right
      case .landscapeLeft:
        return cameraPosition == .front ? .downMirrored : .up
      case .portraitUpsideDown:
        return cameraPosition == .front ? .rightMirrored : .left
      case .landscapeRight:
        return cameraPosition == .front ? .upMirrored : .down
      case .faceDown, .faceUp, .unknown:
        return .up
      }
    }
          

    Objective-C

    - (UIImageOrientation)
      imageOrientationFromDeviceOrientation:(UIDeviceOrientation)deviceOrientation
                             cameraPosition:(AVCaptureDevicePosition)cameraPosition {
      switch (deviceOrientation) {
        case UIDeviceOrientationPortrait:
          return cameraPosition == AVCaptureDevicePositionFront ? UIImageOrientationLeftMirrored
                                                                : UIImageOrientationRight;
    
        case UIDeviceOrientationLandscapeLeft:
          return cameraPosition == AVCaptureDevicePositionFront ? UIImageOrientationDownMirrored
                                                                : UIImageOrientationUp;
        case UIDeviceOrientationPortraitUpsideDown:
          return cameraPosition == AVCaptureDevicePositionFront ? UIImageOrientationRightMirrored
                                                                : UIImageOrientationLeft;
        case UIDeviceOrientationLandscapeRight:
          return cameraPosition == AVCaptureDevicePositionFront ? UIImageOrientationUpMirrored
                                                                : UIImageOrientationDown;
        case UIDeviceOrientationUnknown:
        case UIDeviceOrientationFaceUp:
        case UIDeviceOrientationFaceDown:
          return UIImageOrientationUp;
      }
    }
          
  • 使用VisionImage CMSampleBuffer 对象和方向:

    Swift

    let image = VisionImage(buffer: sampleBuffer)
    image.orientation = imageOrientation(
      deviceOrientation: UIDevice.current.orientation,
      cameraPosition: cameraPosition)

    Objective-C

     MLKVisionImage *image = [[MLKVisionImage alloc] initWithBuffer:sampleBuffer];
     image.orientation =
       [self imageOrientationFromDeviceOrientation:UIDevice.currentDevice.orientation
                                    cameraPosition:cameraPosition];

3. 处理图片

然后,将图片传递给 process(_:completion:) 方法:

Swift

textRecognizer.process(visionImage) { result, error in
  guard error == nil, let result = result else {
    // Error handling
    return
  }
  // Recognized text
}

Objective-C

[textRecognizer processImage:image
                  completion:^(MLKText *_Nullable result,
                               NSError *_Nullable error) {
  if (error != nil || result == nil) {
    // Error handling
    return;
  }
  // Recognized text
}];

4. 从识别出的文本块中提取文本

如果文本识别操作成功,它将返回一个 Text 对象。Text 对象包含完整文本 以及零个或多个 TextBlock 对象的操作。

每个 TextBlock 表示一个矩形文本块, 包含零个或零个以上的 TextLine 对象。每TextLine 对象包含零个或零个以上的 TextElement 对象, 表示字词和类似字词的实体,例如日期和数字。

对于每个 TextBlockTextLineTextElement 对象,您可以获取在 和区域的边界坐标。

例如:

Swift

let resultText = result.text
for block in result.blocks {
    let blockText = block.text
    let blockLanguages = block.recognizedLanguages
    let blockCornerPoints = block.cornerPoints
    let blockFrame = block.frame
    for line in block.lines {
        let lineText = line.text
        let lineLanguages = line.recognizedLanguages
        let lineCornerPoints = line.cornerPoints
        let lineFrame = line.frame
        for element in line.elements {
            let elementText = element.text
            let elementCornerPoints = element.cornerPoints
            let elementFrame = element.frame
        }
    }
}

Objective-C

NSString *resultText = result.text;
for (MLKTextBlock *block in result.blocks) {
  NSString *blockText = block.text;
  NSArray<MLKTextRecognizedLanguage *> *blockLanguages = block.recognizedLanguages;
  NSArray<NSValue *> *blockCornerPoints = block.cornerPoints;
  CGRect blockFrame = block.frame;
  for (MLKTextLine *line in block.lines) {
    NSString *lineText = line.text;
    NSArray<MLKTextRecognizedLanguage *> *lineLanguages = line.recognizedLanguages;
    NSArray<NSValue *> *lineCornerPoints = line.cornerPoints;
    CGRect lineFrame = line.frame;
    for (MLKTextElement *element in line.elements) {
      NSString *elementText = element.text;
      NSArray<NSValue *> *elementCornerPoints = element.cornerPoints;
      CGRect elementFrame = element.frame;
    }
  }
}

输入图片准则

  • 为了让机器学习套件准确识别文本,输入图片必须包含 用足够的像素数据表示的文本。理想情况下 每个字符至少应为 16x16 像素。通常 字符大于 24x24 像素的准确率。

    例如,640x480 的图片可能非常适合用于扫描名片 占据整个图片宽度的扫描打印的文档 信纸大小的纸张,可能需要 720x1280 像素的图片。

  • 图片聚焦不佳会影响文本识别的准确性。如果您 获得可接受的结果,请尝试让用户重新捕获图片。

  • 如果您要在实时应用中识别文本, 考虑输入图片的整体尺寸。较小 可以更快地处理图像。为了缩短延迟时间,请确保文本在 并以较低的分辨率捕获图片(但请记住 上述要求)。如需了解详情,请参阅 改善效果的提示

效果提升技巧

  • 如需处理视频帧,请使用检测器的 results(in:) 同步 API。致电 此方法(可从 获取) AVCaptureVideoDataOutputSampleBufferDelegate captureOutput(_, didOutput:from:) 函数,用于同步获取指定视频的结果 帧。保留 AVCaptureVideoDataOutput alwaysDiscardsLateVideoFrames 作为 true,以限制对检测器的调用。如果新的 视频帧在检测器运行时可用,则会丢失。
  • 如果您使用检测器的输出在图像上叠加显示 输入图片,首先从机器学习套件获取结果, 和叠加层。通过这种方式,您可以在显示屏上呈现 只对每个已处理的输入帧运行一次。请参阅 updatePreviewOverlayViewWithLastFrame
  • 建议以较低的分辨率捕获图片。但请注意 该 API 的图片尺寸要求
  • 为避免可能出现的性能下降情况,请勿多次运行 同时有 TextRecognizer 个具有不同脚本选项的实例。