昨晚遇到了一个奇怪的bug, 用back()得到的reference用的好好的,突然就啥数据都没有了.. 经诊断,应该是re-allocation之后原本返回的reference都被invalidate了。我就这样找到了一个C/C++中访问invalid address而不被报错的方法。_


Problem

Show me the code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
#include <iostream>
#include <vector>
using namespace std;

void vectorvector() {
vector<vector<int> > vec;
// vec.reserve(5);

vector<int> v1;
v1.push_back(1);
vec.push_back(v1);

vector<int>& last = vec.back(); // should get v1

cout << "Before push_back() yet another vector<int>:" << endl
<< "-- whole size is " << vec.size() << ", " << endl
<< "-- variable last's size " << last.size() << ". Its items are: ";
for (auto i : last) {
cout << i << "\t";
}
cout << " END" << endl;

vector<int> v2;
v2.push_back(3);
v2.push_back(4);
//vec.push_back(vector<int>()); // after push_back, LAST changed!!
vec.push_back(v2);

cout << "After push_back() yet another vector<int>:" << endl
<< "-- whole size is " << vec.size() << ", " << endl
<< "-- variable last's size " << last.size() << ". Its items are: ";
for (auto i : last) {
cout << i << "\t";
}
cout << " END" << endl;
}

void vectorint() {
vector<int> nums;

nums.push_back(1);
int& n = nums.back(); // should get 1

cout << "OLD: whole size is " << nums.size() << ", "
<< "saved last item: " << n << endl;

for (int i = 2; i < 100; i++) {
nums.push_back(i); // after push_back, LAST not changed.
}


cout << "NEW: whole size is " << nums.size() << ", "
<< "saved last item: " << n << endl;
}

int main() {
vectorvector();
cout << endl
<< endl;
vectorint();

return 0;
}

上述代码的输出是这样的:

1
2
3
4
5
6
7
8
9
10
Before push_back() yet another vector<int>:
-- whole size is 1,
-- variable last's size 1. Its items are: 1 END
After push_back() yet another vector<int>:
-- whole size is 2,
-- variable last's size 0. Its items are: END


OLD: whole size is 1, saved last item: 1
NEW: whole size is 99, saved last item: 0

于是,在第一次back()存着的引用,last, 在push_back()之后神奇的变成invalid了!!

起因

整个事情的起因是这样的:因为在C++中,往vector里添东西是会copy一份的。为了效率,我比较喜欢先添加进去,然后取得引用出来之后,再慢慢修改,比如说这样子:

1
2
3
vector<vector<int> > ret;
ret.push_back(vector<int>());
vector<int>& vec = ret.back();

然而昨晚写的代码需要根据当前的最后一个vector来决定将要添加的新的vector, 于是我这么写:

1
2
3
auto& last = ret.back();
ret.push_back(vector<int>());
auto& vec = ret.back();

多么自然对不对!但是,这样运行的效果是,我上一个iteration里往last添加的东西,在执行完这几句之后,都!看!不!到!了!!

然后我对vector做同样步骤(只push_back()了一次)的操作,就没有遇到这样的问题。

原因

我判断是re-allocation的缘故,原本返回的reference所指向的内存已经不再valid了。

首先,push_back()的文档是这么写的:

If a reallocation happens, all iterators, pointers and references related to the container are invalidated.

Otherwise, only the end iterator is invalidated, and all iterators, pointers and references to elements are guaranteed to keep referring to the same elements they were referring to before the call.

我测试过,不只是back(), front()也会有这个push_back()然后invalid的behavior. 不只是push_back(), 在insert()之后也会有这样的behavior.

而一旦我预先调用ret.reserve(5)以准备好足够的memory, 这个神奇的behavior就消失了!!

综上,我有足够把握声称:因为vector内部的re-allocation, 原本返回的reference可能会失效!

而对vector做同样步骤(只push_back()了一次)的操作却没有遇到这样的问题,是因为int占的地方小,这时候还遇到reallocation. 我push_back()个100个之后,之前存着的int&内的值也变了。

说实话,C++没报错,这点让我有点意外的说..

对策

解决方案1:如上一个section里所说,预先reserve()好足够的内存。

解决方案2:把原本的写法

1
2
3
auto& last = ret.back();
ret.push_back(vector<int>());
auto& vec = ret.back();

改成:

1
2
3
ret.push_back(vector<int>());
auto& last = ret[ret.size() - 2];
auto& vec = ret.back();

这样就没有问题了。