Reading Notes - Large-scale Distributed Storage Systems: Principles Analysis and Architecture Practice: IX
This article was last updated on: July 24, 2024 am
13 Big Data
13.1 Concepts
Features: 4 V
- Volume: The amount of data is particularly large
- Variety: There are a lot of data types
- Velocity: Data is growing particularly fast
- Value: Low value density
13.2 MapReduce
The consumer only needs to write 2 functions called Map and Reduce.
The MapReduce framework includes 3 roles:
- Master: Perform task division, scheduling, coordination between tasks
- Map Worker processes
- Reduce Worker processes
13.3 Streaming Computing
Greater emphasis on delays in data processing.
13.5 Real-Time Analytics
13.5.1 MPP architecture
MPP(Massively Paraller Processing)
13.5.2 EMC Greenplum
OLAP products, underlying based on the open-source Postgresql database.
Reading Notes - Large-scale Distributed Storage Systems: Principles Analysis and Architecture Practice: IX
https://e-whisper.com/posts/27007/