Architecture Analysis

Data Source

Finnhub API: Data is ingested from the Finnhub API, and streamed in real-time using Websocket connections, ensuring minimal latency.
Batch Stocks Data: Over a few weeks, we've collected our own stocks data from various private sources and pre-processed them during the collection phase (ETL). We then stored all of them in a cloud storage bucket. The "batch" part of this data pipeline aims to collect this data, post-process it, and then analyze it along with the real-time data.

Cloud Functions: Triggered by incoming data, performs preliminary data parsing and validation.
Pub/Sub: Acts as a message broker, decoupling data collection from processing. Supports scalable message queuing.
Dataflow (Stream): Processes and transforms data streams, grouping by fixed window sizes of 5 seconds for efficient aggregation.
Error Handling: Errors are extracted, flattened, and stored in a BigQuery 'deadletter' table for later analysis.

Cloud Storage: Stores batch data in JSON format, each file containing transactional data.
Dataflow (Batch): Triggered manually to process stored batch data, performing cleansing and transformation.

BigQuery: Central data warehousing solution where both streaming and batch processed data are consolidated into a single table for analysis.

Looker Studio: Connects to BigQuery to visualize and report on the data. Reports are dynamically updated to reflect new data entries.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
README.md		README.md