Description
There's a small bug in the loop index variable in the C++ port relating to signedness of the variable.
If a u4
is used as a loop variable:
seq:
- id: num_items
type: u4
- id: items
type: u1
repeat: expr
repeat-expr: num_items
- id: next
type: u4
Then if it encounters a value with the high bit set, this code in the generated C++ will be wrong:
void kaitai_int_loop_t::_read() {
m_num_items = m__io->read_u4be();
m_items = new std::vector<uint8_t>();
const int l_items = num_items();
for (int i = 0; i < l_items; i++) {
m_items->push_back(m__io->read_u1());
}
m_next = m__io->read_u4be();
}
l_items
will be typecast to a signed int
, and will become negative. So the loop will be skipped (i
always greater than l_items
) and m_next
will get the wrong value.
This is probably unlikely to be a major issue in practice, as it implies the file should be over 2GB in size even with u1 members. However, it does mean that the behaviour is not according to expectations or implied contract, in particular when the file is invalid. Rather than zooming off the end of the stream and throwing like it should, it can continue to read the file without any sign that it's parsing things in completely the wrong place. in this case, it will finish parsing happily and tell you next
is bytes 0x04-0x08 in the file.
Please find attached a reproducing case:
Build, and run with ./kaitai_int_loop ../kaitai_int_loop.bin
. It should return 0, but it returns 1:
std::unique_ptr<kaitai_int_loop_t> loop;
try
{
loop = std::make_unique<kaitai_int_loop_t>(&ks);
}
catch (...)
{
printf("Expected overrun\n");
return 0;
}
unsigned loop_num_items = loop->num_items();
unsigned loop_items_size = loop->items()->size();
printf("Loop size var: %u\n", loop_num_items);
printf("Loop items size: %u\n", loop_items_size);
printf("Variable read from wrong place: %08x\n", loop->next());
return loop_num_items == loop_items_size ? 0 : 1;
Produes
File: ../kaitai_int_loop.bin
Loop size var: 2147483648
Loop items size: 0
Variable read from wrong place: 00010203