Android NDK（C）でのユーザーの環境を理解する

独自のアプリで Scene Semantics API を使用する方法を学びます。

Scene Semantics API は、ML モデルベースのリアルタイムのセマンティック情報を提供することで、開発者がユーザーの周囲の状況を把握できるようにします。屋外シーンの画像を指定すると、API は、空、建物、木、道路、歩道、車両、人物など、有用なセマンティッククラスのセット全体で、各ピクセルのラベルを返します。ピクセルラベルに加えて、Scene Semantics API は、各ピクセルラベルの信頼値と、屋外シーンにおける特定のラベルの占有率を照会する簡単な方法も提供します。

左から右に、入力画像の例、ピクセルラベルのセマンティック画像、対応する信頼度画像を示します。

入力画像、セマンティック画像、セマンティック信頼度画像の例。

前提条件

続行する前に、AR の基本コンセプトと ARCore セッションを構成する方法を理解してください。

シーンのセマンティクスを有効にする

新しい ARCore セッションで、ユーザーのデバイスが Scene Semantics API をサポートしているかどうかを確認します。処理能力に制約があるため、すべての ARCore 対応デバイスが Scene Semantics API をサポートしているわけではありません。

リソースを節約するため、ARCore では Scene Semantics はデフォルトで無効になっています。アプリで Scene Semantics API を使用するには、セマンティックモードを有効にします。

// Check whether the user's device supports the Scene Semantics API.
int32_t is_scene_semantics_supported = 0;
ArSession_isSemanticModeSupported(ar_session, AR_SEMANTIC_MODE_ENABLED, &is_scene_semantics_supported);

// Configure the session for AR_SEMANTIC_MODEL_ENABLED.
ArConfig* ar_config = NULL;
ArConfig_create(ar_session, &ar_config);
if (is_scene_semantics_supported) {
  ArConfig_setSemanticMode(ar_session, ar_config, AR_SEMANTIC_MODE_ENABLED);
}
CHECK(ArSession_configure(ar_session, ar_config) == AR_SUCCESS);
ArConfig_destroy(ar_config);

セマンティック画像を取得する

シーンのセマンティクスを有効にすると、セマンティック画像を取得できます。セマンティック画像は AR_IMAGE_FORMAT_Y8 画像で、各ピクセルは ArSemanticLabel で定義されたセマンティックラベルに対応しています。

ArFrame_acquireSemanticImage() を使用してセマンティック画像を取得します。

// Retrieve the semantic image for the current frame, if available.
ArImage* semantic_image = NULL;
if (ArFrame_acquireSemanticImage(ar_session, ar_frame, &semantic_image) != AR_SUCCESS) {
  // No semantic image retrieved for this frame.
  // The output image may be missing for the first couple frames before the model has had a chance to run yet.
  return;
}
// If a semantic image is available, use it here.

出力セマンティック画像は、デバイスに応じて、セッション開始から約 1～3 フレーム後に利用できるようになります。

信頼性の画像を取得する

この API は、各ピクセルのラベルを提供するセマンティック画像に加えて、対応するピクセル信頼度の信頼度画像も提供します。信頼度の画像は AR_IMAGE_FORMAT_Y8 画像です。ここで、各ピクセルは [0, 255] の範囲の値に対応し、各ピクセルのセマンティックラベルに関連付けられた確率に対応します。

ArFrame_acquireSemanticConfidenceImage() を使用してセマンティック信頼度の画像を取得します。

// Retrieve the semantic confidence image for the current frame, if available.
ArImage* semantic_confidence_image = NULL;
if (ArFrame_acquireSemanticConfidenceImage(ar_session, ar_frame, &semantic_confidence_image) != AR_SUCCESS) {
  // No semantic confidence image retrieved for this frame.
  // The output image may be missing for the first couple frames before the model has had a chance to run yet.
  return;
}
// If a semantic confidence image is available, use it here.

出力信頼性画像は、デバイスに応じて、セッション開始から約 1～3 フレーム後に利用できるようになります。

セマンティックラベルのピクセル数の割合をクエリする

現在のフレーム内の特定のクラス（空など）に属するピクセルの割合をクエリすることもできます。このクエリは、セマンティック画像を返して特定のラベルをピクセル単位で検索するよりも効率的です。返される小数は、[0.0, 1.0] の範囲内の浮動小数点数値です。

ArFrame_getSemanticLabelFraction() を使用して、指定したラベルの分数を取得します。

// Retrieve the fraction of pixels for the semantic label sky in the current frame.
float out_fraction = 0.0f;
if (ArFrame_getSemanticLabelFraction(ar_session, ar_frame, AR_SEMANTIC_LABEL_SKY, &out_fraction) != AR_SUCCESS) {
  // No fraction of semantic labels was retrieved for this frame.
}