1 | [double_happy@hadoop101 bin]$ spark-submit --help |
1 | spark-submit --master yarn |
1 | on yarn 模式 |
Tuning Spark
优化章节
Memory Management Overview
Memory usage in Spark largely falls under one of two categories: execution and storage. Execution memory refers to that used for computation in shuffles, joins, sorts and aggregations, while storage memory refers to that used for caching and propagating internal data across the cluster.
1 | 内存分为: |
1 | SparkEnv类: |
1.SparkEnv类 进去搜索memoryManager
2.点进去StaticMemoryManager
3.点进去getMaxExecutionMemory 或者getMaxStorageMemory 点不进去 说明这个方法就在这个类里面
搜索getMaxExecutionMemory
1 | StaticMemoryManager 历史遗留版本 静态内存管理 |
1 | 静态内存管理机制: 存储和执行是单独的 |
1 | 统一内存管理:存储和执行内存是公用的 ==》会有相互借内存的 |
新版内存管理:
In Spark, execution and storage share a unified region (M). When no execution memory is used, storage can acquire all the available memory and vice versa.
1 | no execution memory is used 那么storage 会获取所有资源 |
1 | Spark1.0版本: |
1 | 到了Spark1.6版本: |