Cloud Architecture Center


Process and analyze

3.1. Processing large-scale data

source systems (Google Cloud Storage, Bigtable, Google Cloud SQL) 읽어온 큰 데이터를 처리하고 정규화하고 집계함. 데이터 양이 커서 클러스터로 분산 처리하거나 소프트웨어 툴들의 도움을 받음

Dataproc

cluster에서 실행 (auto scaling)
기존 Hadoop, Hive, Spark 에플리캐이션과 연동
use case: Log processing, Reporting, On-demand Spark clusters, Machine learning

Dataflow

Serverless(cluster X), parallel 처리 (No-Ops)
Spark나 Hadoop과 연동이 아닌 새로운 데이터 pipeline 처리
use case: MapReduce 안쓰는 parallel processing, User analytics, Data science, ETL, Log processing

Dataprep

UI-Driven Data Preparation (No-Ops, 필요시 scaling 가능) use case: Machine learning, Analytics

connect Dataprep to BigQuery

1) Create Flow 2) Add dataset (import) 3) Select BigQuery (left pane) & Create dataset 4) Import & Add to Flow

inspect, process data

1) Edit Recipe dataset의 sample을 Transformer view에서 확인 가능 (visualization)

execute job to load BigQuery

1) click Run in Transformer page
2) click Edit on Create-CSV
3) select BigQuery then create table
4) name output table
5) click Update
6) click Run

job history에서 monitoring 가능


3.2. Analyzing and querying data

BigQuery

앞서 store 할 때 뿐만 아니라 분석할 때도 사용
use case: User analysis(adtech, clickstream, game telemetry), Device and operational metrics(IoT), BI

Machine learning

처리된 결과를 확대시키거나 data-collection 최적화를 제공하기도 하고 결과 예측도 함


Explore and visualize


Cloud Architecture Center


4.1. Explore and visualize

Datalab

interactive한 web 기반 툴
pandas, numpy, scikit-learn 등의 다양한 toolkit 지원

Data science ecosystem

Datalab 말고도, web 기반 툴인 Apache Zeppelin 지원 R 사용하면 Rstudio Server나 Microsoft ML Server 지원 Scala나 Java 사용하면 Jupyter 지원

4.2. Visualizing business intelligence results

Looker

BI platform

BI Engine

analysis service 관리

Sheets

Spreadsheet visualization

Data Catalog

Data discovery and metadata management

Data Studio

Dashboarding and visualization
over 200 connectors ( Google Analytics, BigQuery, Sheets, and external data sources…)


Practice

1) Intro

2) getting set up

3) connecting data studio and bigquery

(1) Open data studio
(2) Start with a Template (Blank template)
(3) Add a data to report Connect to BigQuery (Allow permission to view BigQuery data)
(4) Add table “san_francisco_311” (in public datasets)
(5) Click Manage added data sources under Resources
(6) Edit table fields of “311_service_requests”
(7) “latitude”,”longitude” field from text to Latitude, Longitude in Geo

4) creating visualizations

5) building a dashboard

6) creating filters

7) test it and share it