Speed up deserialization of Intervals

When the Lexer breaks a line into tokens, it also wants to return the
type of the token. This information isn't used by the IntervalFactory
and it slows down the operation since dates end up being parsed at least
twice, once by the Lexer to determine that the string is a date, then
again in the IntervalFactory to actually construct the Date.

Before are the before and after results when exporting a database with
100 lines. The number of instructions executed went from roughly 31,552,467 to
12,952,372 on debug builds. Release builds saw a change from around 14K
to 7K instructions.

Before:

  $ rm -fr ~/.timewarrior; src/timew :yes >/dev/null; for x in {100..1}; do src/timew start ${x}sec ago proj_${x} >/dev/null; done;
  $ sudo chrt -f 99 valgrind --tool=callgrind --callgrind-out-file=callgrind.out src/timew export >/dev/null
  ==20888== Callgrind, a call-graph generating cache profiler
  ==20888== Copyright (C) 2002-2017, and GNU GPL'd, by Josef Weidendorfer et al.
  ==20888== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
  ==20888== Command: src/timew export
  ==20888==
  ==20888== For interactive control, run 'callgrind_control -h'.
  ==20888==
  ==20888== Events    : Ir
  ==20888== Collected : 31552467
  ==20888==
  ==20888== I   refs:      31,552,467

After:

  $ sudo chrt -f 99 valgrind --tool=callgrind --callgrind-out-file=callgrind.out src/timew export >/dev/null
  ==24088== Callgrind, a call-graph generating cache profiler
  ==24088== Copyright (C) 2002-2017, and GNU GPL'd, by Josef Weidendorfer et al.
  ==24088== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
  ==24088== Command: src/timew export
  ==24088==
  ==24088== For interactive control, run 'callgrind_control -h'.
  ==24088==
  ==24088== Events    : Ir
  ==24088== Collected : 12952372
  ==24088==
  ==24088== I   refs:      12,952,372

Signed-off-by: Shaun Ruffell <sruffell@sruffell.net>
This commit is contained in:
Shaun Ruffell 2020-05-09 14:50:43 -05:00 committed by lauft
parent 3de53d7599
commit 85d991704b

View file

@ -29,21 +29,40 @@
#include <IntervalFactory.h>
#include <JSON.h>
////////////////////////////////////////////////////////////////////////////////
// Syntax:
// 'inc' [ <iso> [ '-' <iso> ]] [ '#' <tag> [ <tag> ... ]]
Interval IntervalFactory::fromSerialization (const std::string& line)
static std::vector <std::string> tokenizeSerialization (const std::string& line)
{
Lexer lexer (line);
std::vector <std::string> tokens;
Lexer lexer (line);
std::string token;
Lexer::Type type;
// When parsing the serialization, we only need the lexer to look for strings
// and words since we're not using the provided type information
lexer.noDate ();
lexer.noDuration ();
lexer.noUUID ();
lexer.noHexNumber ();
lexer.noURL ();
lexer.noPath ();
lexer.noPattern ();
lexer.noOperator ();
while (lexer.token (token, type))
{
tokens.push_back (Lexer::dequote (token));
}
return tokens;
}
////////////////////////////////////////////////////////////////////////////////
// Syntax:
// 'inc' [ <iso> [ '-' <iso> ]] [ '#' <tag> [ <tag> ... ]]
Interval IntervalFactory::fromSerialization (const std::string& line)
{
std::vector <std::string> tokens = tokenizeSerialization (line);
// Minimal requirement 'inc'.
if (!tokens.empty () && tokens[0] == "inc")
{