Use a multi-stage docker build so that we can cache early stages and not
need to download the apt-provided dependencies during every build (when
only later stages change).
Deleting the apt cache can be helpful for reducing the size of a
container, but since we don't push it anywhere, it only hinders our
ability to debug problems while working on the container. Keep it.
For some patches, it is not possible to derive the old and new file
paths from the patch header's first line, most importantly when they
contain spaces. In such a case, we derive both paths from the "---" and
"+++" lines, which allow for non-ambiguous parsing. We fail to use these
paths when parsing binary patches without data, though, as we always
expect the header paths to be filled in.
Fix this by using the "---"/"+++" paths by default and only fall back to
header paths if they aren't set. If neither of those paths are set, we
just return an error. Add two tests to verify this behaviour, one of
which would have previously caused a segfault.
IIRC I got a strange return once from lstat, which translated in a weird
error class/message being reported. As a safety measure, enforce a -1 return in
that case.
Similar as in commit dadbb33b6 (Fix crash if snapshotting a
config_snapshot, 2019-11-01), let's implement snapshots for in-memory
configuration entries. As this deletes more code than it adds, it
doesn't make any sense to not allow for this and allows users to treat
config backends mostly the same.
When parsing header paths from a patch, we reject any patches with empty
paths as malformed patches. We perform the check whether a path is empty
before sanitizing it, though, which may lead to a path becoming empty
after the check, e.g. if we have trimmed whitespace. This may lead to a
segfault later when any part of our patching logic actually references
such a path, which may then be a `NULL` pointer.
Fix the issue by performing the check after sanitizing. Add tests to
catch the issue as they would have produced a segfault previosuly.
When creating a configuration file iterator, then we first refresh the
backend and then afterwards duplicate all refreshed configuration
entries into the iterator in order to avoid seeing any concurrent
modifications of the entries while iterating. The duplication of entries
is not guarded, though, as we do not increase the refcount of the
entries that we duplicate right now. This opens us up for a race, as
another thread may concurrently refresh the repository configuration and
thus swap out the current set of entries. As we didn't increase the
refcount, this may lead to the entries being free'd while we iterate
over them in the first thread.
Fix the issue by properly handling the lifecycle of the backend's
entries via `config_file_entries_take` and `git_config_entries_free`,
respectively.
The function to take a reference to the config file's config entries
currently returns the reference via return value. Due to this, it's
harder than necessary to integrate into our typical coding style, as one
needs to make sure that a proper error code is set before erroring out
from the caller. This bites us in `config_file_delete`, where we call
`goto out` directly when `config_file_entries_take` returns `NULL`, but
we actually forget to set up the error code and thus return success.
Fix the issue by refactoring the function to return an error code and
pass the reference via an out-pointer.
As with the predecessing commit, this commit renames backend functions
of the configuration file backend. This helps to clearly separate
functionality and also to be able to see from backtraces which backend
is currently in use.
The configuration snapshot backend has been extracted from the old files
backend back in 2bff84ba4 (config_file: separate out read-only backend,
2019-07-26). To keep code churn manageable, the local functions weren't
renamed yet and thus still have references to the old diskfile backend.
Rename them accordingly to make them easier to understand.
There can be a significant difference between the system where we created the
buffer (if at all) and when the caller provides us with the contents of a
commit.
Verify that the commit we are being asked to create references objects which do
exist in the target repository.
There can be a significant difference between the system where we created the
buffer (if at all) and when the caller provides us with the contents of a
commit.
Provide some test cases (we have to adapt the existing ones because they refer
to trees and commits which do not exist).
When the patch contains lines close to INT_MAX, then it may happen that
we end up with an integer overflow when calculating the line of the
current diff hunk. Reject such patches as unreasonable to avoid the
integer overflow.
As the calculation is performed on integers, we introduce two new
helpers `git__add_int_overflow` and `git__sub_int_overflow` that perform
the integer overflow check in a generic way.
We've got two locations where we copy lines into the patch. The first
one is when copying normal " ", "-" or "+" lines, while the second
location gets executed when we copy "\ No newline at end of file" lines.
While the first one correctly uses `git__strndup` to copy only until the
newline, the other one doesn't. Thus, if the line occurs at the end of
the patch and if there is no terminating NUL character, then it may
result in an out-of-bounds read.
Fix the issue by using `git__strndup`, as was already done in the other
location. Furthermore, add allocation checks to both locations to detect
out-of-memory situations.
When parsing patch headers, we currently accept empty path names just
fine, e.g. a line "--- \n" would be parsed as the empty filename. This
is not a valid patch format and may cause `NULL` pointer accesses at a
later place as `git_buf_detach` will return `NULL` in that case.
Reject such patches as malformed with a nice error message.
It's currently possible to have patches with multiple old path name
headers. As we didn't check for this case, this resulted in a memory
leak when overwriting the old old path with the new old path because we
simply discarded the old pointer.
Instead of fixing this by free'ing the old pointer, we should reject
such patches altogether. It doesn't make any sense for the "---" or
"+++" markers to occur multiple times within a patch n the first place.
This also implicitly fixes the memory leak.
In previous versions, libgit2 could be coerced into writing reflog
messages with embedded newlines into the reflog by using
`git_stash_save` with a message containing newlines. While the root
cause is fixed now, it was noticed that upstream git is in fact able to
read such corrupted reflog messages just fine.
Make the reflog parser more lenient in order to just skip over
malformatted reflog lines to bring us in line with git. This requires us
to change an existing test that verified that we do indeed _fail_ to
parse such logs.
Currently, the reflog disallows any entries that have a message with
newlines, as that would effectively break the reflog format, which may
contain a single line per entry, only. Upstream git behaves a bit
differently, though, especially when considering stashes: instead of
rejecting any reflog entry with newlines, git will simply replace
newlines with spaces. E.g. executing 'git stash push -m "foo\nbar"' will
create a reflog entry with "foo bar" as entry message.
This commit adjusts our own logic to stop rejecting commit messages with
newlines. Previously, this logic was part of `git_reflog_append`, only.
There is a second place though where we add reflog entries, which is the
serialization code in the filesystem refdb. As it didn't contain any
sanity checks whatsoever, the refdb would have been perfectly happy to
write malformatted reflog entries to the disk. This is being fixed with
the same logic as for the reflog itself.