옵티마이저의 sort_cost 비용 최적화

우선 아래와 같이 MySQL 서버에 간단한 테이블 및 데이터를 생성하고 쿼리를 수행해보겠다.

CREATE TABLE `t` (
  `id` varchar(10) NOT NULL,
  `remark` varchar(10) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `ix_remark` (`remark`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

INSERT INTO t
VALUES ('AAAA', 'ZZ'), 
('BBBB', 'XX'),
('CCCC', 'YY'),
('DDD', 'VVVVV');

왼쪽 세션에서는 ORDER BY절을 명시하지 않고 테이블을 조회하고, 오른쪽 세션에서는 ORDER BY절을 명시하고 테이블을 조회한다.

쿼리를 수행하면 아래 결과와 같이 두 쿼리의 레코드 정렬 순서가 동일한 것을 확인할 수 있다.

이어서 실행 계획도 확인해보자. 실행 계획 상에서도 여전히 두 쿼리의 차이점은 없다. 두 쿼리 모두 ix_remark 인덱스를 사용하여 인덱스 풀스캔을 수행하는 것을 확인할 수 있다.

mysql> SELECT * FROM t;
+------+--------+
| id   | remark |
+------+--------+
| DDD  | VVVVV  |
| BBBB | XX     |
| CCCC | YY     |
| AAAA | ZZ     |
+------+--------+

mysql> EXPLAIN SELECT * FROM t \G
***********************************
           id: 1
  select_type: SIMPLE
        table: t
   partitions: NULL
         type: index
possible_keys: NULL
          key: ix_remark
      key_len: 43
          ref: NULL
         rows: 5
     filtered: 100.00
        Extra: Using index

mysql> SELECT * FROM t ORDER BY remark;
+------+--------+
| id   | remark |
+------+--------+
| DDD  | VVVVV  |
| BBBB | XX     |
| CCCC | YY     |
| AAAA | ZZ     |
+------+--------+

mysql> EXPLAIN SELECT * FROM t ORDER BY remark \G
***********************************
           id: 1
  select_type: SIMPLE
        table: t
   partitions: NULL
         type: index
possible_keys: NULL
          key: ix_remark
      key_len: 43
          ref: NULL
         rows: 5
     filtered: 100.00
        Extra: Using index

이번에는 옵티마이저 트레이스를 켜고, 트레이스의 비교 결과를 확인해보자.

SET OPTIMIZER_TRACE=’enabled=on’,END_MARKERS_IN_JSON=on;
(END_MARKERS_IN_JSON 옵션을 활성화 하면 좀 더 가독성 있는 형태로 트레이스 결과가 출력된다.)

아래는 출력된 옵티마이저 트레이스에서 considered_execution_plans 단계만 가져온 것이다. 왼쪽에서 21번째 줄을 보면 ORDER BY절을 명시하지 않은 쿼리는 실행 계획에 대한 비용으로 0.65가 드는 반면에, 오른쪽의 ORDER BY절을 명시한 쿼리는 실행 계획에 대한 비용으로 4.65가 드는 것을 확인할 수 있다. 사실 이것은 오른쪽 22번째 줄의 소트 비용(sort_cost)이 추가되면서 비용이 새롭게 측정되었기 때문이다. (0.65 + 4 = 4.65)

1 # SELECT * FROM t
2 "considered_execution_plans": [
3   {
4     "plan_prefix": [
5     ] /* plan_prefix */,
6     "table": "`t`",
7     "best_access_path": {
8       "considered_access_paths": [
9         {
10          "rows_to_scan": 4,
11          "access_type": "scan",
12          "resulting_rows": 4,
13          "cost": 0.65,
14          "chosen": true
15          ## use_tmp_table: false
16        }
17      ] /* considered_access_paths */
18    } /* best_access_path */,
19    "condition_filtering_pct": 100,
20    "rows_for_plan": 4,
21    "cost_for_plan": 0.65, # 실행 계획 비용(cost)
22    ## sort_cost: 0
23    ## new_cost_for_plan: 0
24    "chosen": true
25  }
26 ] /* considered_execution_plans */

1 # SELECT * FROM t ORDER BY remark
2 "considered_execution_plans": [
3   {
4     "plan_prefix": [
5     ] /* plan_prefix */,
6     "table": "`t`",
7     "best_access_path": {
8       "considered_access_paths": [
9         {
10          "rows_to_scan": 4,
11         "access_type": "scan",
12          "resulting_rows": 4,
13          "cost": 0.65,
14          "chosen": true,
15          "use_tmp_table": true
16        }
17      ] /* considered_access_paths */
18    } /* best_access_path */,
19    "condition_filtering_pct": 100,
20    "rows_for_plan": 4,
21    "cost_for_plan": 0.65,
22    "sort_cost": 4, # 추가된 코스트
23    "new_cost_for_plan": 4.65, # 실행 계획 비용(cost)
24    "chosen": true
25  }
26 ] /* considered_execution_plans */

위의 내용을 다시 정리해보면,

ORDER BY절을 사용한 쿼리의 옵티마이저 트레이스는 ORDER BY절을 사용하지 않았을 때와 비교하여 차이점이 존재한다.

(1) /* best_access_path */ 에서 “use_tmp_table”: true 추가
(2) /* considered_execution_plans */ 에서 “sort_cost”: 4 추가

best_access_path 단계에 use_tmp_table: true가 추가되면서 considered_execution_plans 단계에 소트 비용(sort_cost: 4)이 발생하며, 이로 인해 실행 계획 비용이 재계산되면서 총 비용은 0.65에서 4.65로 변경된다.

그런데,

위와 같은 실행 계획이 생성된 이유는 무엇일까? 소트가 실제로 발생하는 걸까? 의도한 것일까?

결론부터 말하면 위와 같은 실행 계획이 생성되는 이유는 sql/sql_planner.cc 소스 파일의 best_access_path() 함수와 consider_plan() 함수 때문이다. 두 함수를 통해서 쿼리의 실행 계획에 대한 비용이 재계산된다. 그럼 자세한 내막을 살펴보자.

sort_cost는 best_access_path() 함수 여하에 따라서 발생하기 때문에 best_access_path() 함수의 코드만 살펴보도록 하겠다.

sql/sql_planner.cc

best_access_path() 함수를 보면 조건문(if)에 따라서 실행 계획의 최상 액세스 경로(best access path)에 임시 테이블(temporary table) 사용이 추가(true)된다. 여러 조건들이 존재하지만 소트 비용이 발생하는 문제의 핵심은 join->query_expression()->select_limit_cnt >= rows_fetched 부분이다. 이 조건에 따라서 LIMIT절에 명시된 값이 조회하려는 레코드의 건 수보다 큰 경우에는 반드시 임시 테이블을 사용하도록 실행 계획이 생성된다.

void Optimize_table_order::best_access_path(JOIN_TAB *tab,
                                            const table_map remaining_tables,
                                            const uint idx, bool disable_jbuf,
                                            const double prefix_rowcount,
                                            POSITION *pos)
...
if (!best_ref && idx == join->const_tables && table == join->sort_by_table && join->query_expression()->select_limit_cnt >= rows_fetched) {
	trace_access_scan.add("use_tmp_table", true);
  join->sort_by_table = (TABLE *)1;  // Must use temporary table
}

그런데 분명 위의 SQL 문장에서는 SELECT * FROM t ORDER BY remark 와 같이 LIMIT절을 생략했다. 생략했는데 어떻게 limit 값이 패치된 로우(rows_fetched)보다 크다고 판단하여 use_tmp_table이 true로 설정된 걸까.

아래는 best_access_path() 함수의 호출 부분에 중단점을 찍고 lldb 디버거로 디버깅을 한 모습이다. 12~13번째 줄을 보면 조회하려는 테이블의 레코드의 총 건 수(rows_fetched)는 4건임을 확인할 수 있다. 그리고 14~15번째 줄을 보면 SELECT 문장에 LIMIT절을 명시하지 않았을 경우 select_limit_cnt는 64비트(18446744073709551615)로 고정되어 있는 것을 확인할 수 있다.

이말인 즉슨 ORDER BY를 사용할 때 LIMIT을 지정하지 않으면 무조건(64비트 만큼의 테이블을 조회하는 경우는 없을 것이므로) 임시 테이블이 사용되면서 정렬 비용이 발생한다는 말이다. 또한 LIMIT을 지정한다고 해도 조회하려는 레코드 건 수보다는 작게 지정해야 소트 비용이 발생하지 않는다.

1  Process 58726 stopped
2  * thread #40, name = 'connection', stop reason = step over
3      frame #0: 0x0000000100a84208 mysqld`Optimize_table_order::best_access_path(this=0x0000000171754e08, tab=0x0000000134812c00, remaining_tables=1, idx=0, disable_jbuf=true, prefix_rowcount=1, pos=0x0000000134812db8) at sql_planner.cc:1227:8
4     1224	  pos->loosescan_key = MAX_KEY;
5     1225	  pos->use_join_buffer = best_uses_jbuf;
6     1226
7  -> 1227	  if (!best_ref && idx == join->const_tables && table == join->sort_by_table &&
8     1228	      join->query_expression()->select_limit_cnt >= rows_fetched) {
9     1229	    trace_access_scan.add("use_tmp_table", true);
10    1230	    join->sort_by_table = (TABLE *)1;  // Must use temporary table
11 Target 0: (mysqld) stopped.
12 (lldb) p rows_fetched
13 (double) 4
14 (lldb) p join->query_expression()->select_limit_cnt
15 (ha_rows) 18446744073709551615
...

만일 아래와 같이 LIMIT절이 포함된 쿼리를 수행하면 조회하려는 레코드 건 수보다 LIMIT 값이 작기 때문에 소트 비용이 발생하는 실행 계획은 생성되지 않는다.

mysql> SET OPTIMIZER_TRACE='enabled=on',END_MARKERS_IN_JSON=on;

mysql> SELECT * FROM t ORDER BY remark LIMIT 3;
+------+--------+
| id   | remark |
+------+--------+
| DDD  | VVVVV  |
| BBBB | XX     |
| CCCC | YY     |
+------+--------+

mysql> SELECT * FROM information_schema.optimizer_trace;

{
            "considered_execution_plans": [
              {
                "plan_prefix": [
                ] /* plan_prefix */,
                "table": "`t`",
                "best_access_path": {
                  "considered_access_paths": [
                    {
                      "rows_to_scan": 4,
                      "access_type": "scan",
                      "resulting_rows": 4,
                      "cost": 0.65,
                      "chosen": true
                    }
                  ] /* considered_access_paths */
                } /* best_access_path */,
                "condition_filtering_pct": 100,
                "rows_for_plan": 4,
                "cost_for_plan": 0.65,
                "chosen": true
              }
            ] /* considered_execution_plans */
          }

sql/sql_optimizer.cc

그런데, 아직 옵티마이저는 실행 계획만 생성하고 최적화 작업은 수행하지 않은 상태이다. sql_planner.cc 이후 sql_optimizer.cc 소스 파일에서 실제 최적화 작업 과정을 거치게 되는데, 이 과정에서 explain_flags 인스턴스 멤버 값이 정렬 작업을 수행하지 않는 것으로 결정되었으면 sort_cost를 0으로 초기화한다.

bool JOIN::optimize(bool finalize_access_paths)
...
/*
  If we decided to not sort after all, update the cost of the JOIN.
  Windowing sorts are handled elsewhere
*/
if (sort_cost > 0.0 &&
    !explain_flags.any(ESP_USING_FILESORT, ESC_WINDOWING)) {
  best_read -= sort_cost;
  sort_cost = 0.0;
}

디버깅을 통해서 sort_cost 값이 초기화되는 과정을 살펴보도록 하겠다.

12~15번째 줄을 보면 explain_flags 인스턴스의 값을 확인하기 전, sort_cost는 4이며, best_read는 4.649인 것을 확인할 수 있다. 이것은 sql_planner.cc의 best_access_path() 함수에서 use_tmp_table이 true로 설정되면서 계산된 소트 비용과 읽기 비용이다.

1  Process 58726 stopped
2  * thread #40, name = 'connection', stop reason = step over
3      frame #0: 0x0000000100a1fcd0 mysqld`JOIN::optimize(this=0x000000010ee26710, finalize_access_paths=true) at sql_optimizer.cc:1035:8
4     1032	    Windowing sorts are handled elsewhere
5     1033	  */
6     1034	  if (sort_cost > 0.0 &&
7  -> 1035	      !explain_flags.any(ESP_USING_FILESORT, ESC_WINDOWING)) {
8     1036	    best_read -= sort_cost;
9     1037	    sort_cost = 0.0;
10    1038	  }
11 Target 0: (mysqld) stopped.
12 (lldb) p sort_cost
13 (double) 4
14 (lldb) p best_read
15 (double) 4.649

이어서 8번째 줄의 조건문을 보자. explain_flags(Explain_format_flags 클래스의 인스턴스) 인스턴스 값을 출력해보면 해당 인스턴스의 멤버인 sorts 배열에 \0과 \0x01이 순서대로 저장되어 있는 것을 확인할 수 있다. \0은 Explain_sort_clause enum 값 중 ESC_ORDER_BY에 해당하는 값이며, \0x01은 Explain_sort_property의 enum 값 중 ESP_EXISTS에 해당하는 값이다. \0이 ESC_none이 아닌 이유는 any 메소드의 반복문이 ESC_none + 1 부터 시작하기 때문이다.

좌우지간 explain_flags 인스턴스의 멤버 배열인 sorts에는 ESP_USING_FILESORT와 ESC_WINDOWING이 설정되어 있는 것이 아니다. 즉 best_access_path() 함수에서 use_tmp_table이 true로 설정된 것과는 다르게 임시 테이블(ESP_USING_TMPTABLE)도 사용하지 않고, 파일 소트(ESP_USING_FILESORT)도 사용하지 않는 것이다.

이후 조건을 만족하여 if문 안으로 들어가면 9번째 줄에서 불필요한 소트 비용(sort_cost)이 읽기 비용(best_read)에서 차감되는 것을 확인할 수 있다.

1  Process 58726 stopped
2  * thread #40, name = 'connection', stop reason = step over
3      frame #0: 0x0000000100a1fcec mysqld`JOIN::optimize(this=0x000000010ee26710, finalize_access_paths=true) at sql_optimizer.cc:1034:7
4     1031	    If we decided to not sort after all, update the cost of the JOIN.
5     1032	    Windowing sorts are handled elsewhere
6     1033	  */
7  -> 1034	  if (sort_cost > 0.0 &&
8     1035	      !explain_flags.any(ESP_USING_FILESORT, ESC_WINDOWING)) {
9     1036	    best_read -= sort_cost;
10    1037	    sort_cost = 0.0;
11 (lldb) p explain_flags
12 (Explain_format_flags)  (sorts = "\0\U00000001")

/*
enum Explain_sort_clause {
  ESC_none = 0,
  ESC_ORDER_BY = 1,
  ESC_GROUP_BY = 2,
  ESC_DISTINCT = 3,
  ESC_BUFFER_RESULT = 4,
  ESC_WINDOWING = 5,
  //-----------------
  ESC_MAX
};

enum Explain_sort_property {
  ESP_none = 0,
  ESP_EXISTS = 1 << 0,     ///< Original query has this clause
  ESP_IS_SIMPLE = 1 << 1,  ///< Clause is effective for single JOIN_TAB only
  ESP_USING_FILESORT = 1 << 2,  ///< Clause causes a filesort
  ESP_USING_TMPTABLE = 1 << 3,  ///< Clause creates an intermediate table
  ESP_DUPS_REMOVAL = 1 << 4     ///< Duplicate removal for DISTINCT
};

## opt_explain_format.h
...
  bool any(Explain_sort_property property,
           Explain_sort_clause clause = ESC_none) const {
    for (size_t i = ESC_none + 1; i <= ESC_MAX - 1; i++) {
      if (i != clause && (sorts[i] & property)) return true;
    }
    return false;
  }
...
*/

두 변수의 산술 과정에서 best_read는 sort_cost 만큼 차감되어 1.399, sort_cost는 0이 된다.

1  Process 27633 stopped
2  * thread #40, name = 'connection', stop reason = step over
3      frame #0: 0x0000000100a1fd10 mysqld`JOIN::optimize(this=0x000000010f8b70d8, finalize_access_paths=true) at sql_optimizer.cc:1038:3
4     1035	      !explain_flags.any(ESP_USING_FILESORT, ESC_WINDOWING)) {
5     1036	    best_read -= sort_cost;
6     1037	    sort_cost = 0.0;
7  -> 1038	  }
8     1039
9     1040	  count_field_types(query_block, &tmp_table_param, *fields, false, false);
10    1041
11 Target 0: (mysqld) stopped.
12 (lldb) p best_read
13 (double) 1.399
14 (lldb) p sort_cost
15 (double) 0

그리고 이후 몇 단계의 최적화를 거치면서 쿼리의 최종 읽기 비용(best_read)은 0.65가 되는 것을 확인할 수 있다.

# sql_select.cc
1 (lldb) p join->sort_cost
2 (double) 0
3 (lldb) p join->best_read
4 (double) 0.64900000000000002

쿼리에 ORDER BY절을 사용하면 옵티마이저는 임시 테이블을 사용하여 정렬 작업을 수행할 것처럼 실행 계획(트레이스 상의)을 생성하지만, 이는 계획일 뿐이며 실제로는 실행 계획에 대한 최적화 작업을 수행하면서 불필요한 비용을 감소시킨다.

옵티마이저 트레이스의 소트 비용을 보고 혼동하지 말지어다.

Database Internals