Squashed 'third_party/git/' content from commit cb71568594
git-subtree-dir: third_party/git git-subtree-split: cb715685942260375e1eb8153b0768a376e4ece7
This commit is contained in:
commit
1b593e1ea4
3629 changed files with 1139935 additions and 0 deletions
1
Documentation/technical/.gitignore
vendored
Normal file
1
Documentation/technical/.gitignore
vendored
Normal file
|
|
@ -0,0 +1 @@
|
|||
api-index.txt
|
||||
39
Documentation/technical/api-allocation-growing.txt
Normal file
39
Documentation/technical/api-allocation-growing.txt
Normal file
|
|
@ -0,0 +1,39 @@
|
|||
allocation growing API
|
||||
======================
|
||||
|
||||
Dynamically growing an array using realloc() is error prone and boring.
|
||||
|
||||
Define your array with:
|
||||
|
||||
* a pointer (`item`) that points at the array, initialized to `NULL`
|
||||
(although please name the variable based on its contents, not on its
|
||||
type);
|
||||
|
||||
* an integer variable (`alloc`) that keeps track of how big the current
|
||||
allocation is, initialized to `0`;
|
||||
|
||||
* another integer variable (`nr`) to keep track of how many elements the
|
||||
array currently has, initialized to `0`.
|
||||
|
||||
Then before adding `n`th element to the item, call `ALLOC_GROW(item, n,
|
||||
alloc)`. This ensures that the array can hold at least `n` elements by
|
||||
calling `realloc(3)` and adjusting `alloc` variable.
|
||||
|
||||
------------
|
||||
sometype *item;
|
||||
size_t nr;
|
||||
size_t alloc
|
||||
|
||||
for (i = 0; i < nr; i++)
|
||||
if (we like item[i] already)
|
||||
return;
|
||||
|
||||
/* we did not like any existing one, so add one */
|
||||
ALLOC_GROW(item, nr + 1, alloc);
|
||||
item[nr++] = value you like;
|
||||
------------
|
||||
|
||||
You are responsible for updating the `nr` variable.
|
||||
|
||||
If you need to specify the number of elements to allocate explicitly
|
||||
then use the macro `REALLOC_ARRAY(item, alloc)` instead of `ALLOC_GROW`.
|
||||
65
Documentation/technical/api-argv-array.txt
Normal file
65
Documentation/technical/api-argv-array.txt
Normal file
|
|
@ -0,0 +1,65 @@
|
|||
argv-array API
|
||||
==============
|
||||
|
||||
The argv-array API allows one to dynamically build and store
|
||||
NULL-terminated lists. An argv-array maintains the invariant that the
|
||||
`argv` member always points to a non-NULL array, and that the array is
|
||||
always NULL-terminated at the element pointed to by `argv[argc]`. This
|
||||
makes the result suitable for passing to functions expecting to receive
|
||||
argv from main(), or the link:api-run-command.html[run-command API].
|
||||
|
||||
The string-list API (documented in string-list.h) is similar, but cannot be
|
||||
used for these purposes; instead of storing a straight string pointer,
|
||||
it contains an item structure with a `util` field that is not compatible
|
||||
with the traditional argv interface.
|
||||
|
||||
Each `argv_array` manages its own memory. Any strings pushed into the
|
||||
array are duplicated, and all memory is freed by argv_array_clear().
|
||||
|
||||
Data Structures
|
||||
---------------
|
||||
|
||||
`struct argv_array`::
|
||||
|
||||
A single array. This should be initialized by assignment from
|
||||
`ARGV_ARRAY_INIT`, or by calling `argv_array_init`. The `argv`
|
||||
member contains the actual array; the `argc` member contains the
|
||||
number of elements in the array, not including the terminating
|
||||
NULL.
|
||||
|
||||
Functions
|
||||
---------
|
||||
|
||||
`argv_array_init`::
|
||||
Initialize an array. This is no different than assigning from
|
||||
`ARGV_ARRAY_INIT`.
|
||||
|
||||
`argv_array_push`::
|
||||
Push a copy of a string onto the end of the array.
|
||||
|
||||
`argv_array_pushl`::
|
||||
Push a list of strings onto the end of the array. The arguments
|
||||
should be a list of `const char *` strings, terminated by a NULL
|
||||
argument.
|
||||
|
||||
`argv_array_pushf`::
|
||||
Format a string and push it onto the end of the array. This is a
|
||||
convenience wrapper combining `strbuf_addf` and `argv_array_push`.
|
||||
|
||||
`argv_array_pushv`::
|
||||
Push a null-terminated array of strings onto the end of the array.
|
||||
|
||||
`argv_array_pop`::
|
||||
Remove the final element from the array. If there are no
|
||||
elements in the array, do nothing.
|
||||
|
||||
`argv_array_clear`::
|
||||
Free all memory associated with the array and return it to the
|
||||
initial, empty state.
|
||||
|
||||
`argv_array_detach`::
|
||||
Disconnect the `argv` member from the `argv_array` struct and
|
||||
return it. The caller is responsible for freeing the memory used
|
||||
by the array, and by the strings it references. After detaching,
|
||||
the `argv_array` is in a reinitialized state and can be pushed
|
||||
into again.
|
||||
319
Documentation/technical/api-config.txt
Normal file
319
Documentation/technical/api-config.txt
Normal file
|
|
@ -0,0 +1,319 @@
|
|||
config API
|
||||
==========
|
||||
|
||||
The config API gives callers a way to access Git configuration files
|
||||
(and files which have the same syntax). See linkgit:git-config[1] for a
|
||||
discussion of the config file syntax.
|
||||
|
||||
General Usage
|
||||
-------------
|
||||
|
||||
Config files are parsed linearly, and each variable found is passed to a
|
||||
caller-provided callback function. The callback function is responsible
|
||||
for any actions to be taken on the config option, and is free to ignore
|
||||
some options. It is not uncommon for the configuration to be parsed
|
||||
several times during the run of a Git program, with different callbacks
|
||||
picking out different variables useful to themselves.
|
||||
|
||||
A config callback function takes three parameters:
|
||||
|
||||
- the name of the parsed variable. This is in canonical "flat" form: the
|
||||
section, subsection, and variable segments will be separated by dots,
|
||||
and the section and variable segments will be all lowercase. E.g.,
|
||||
`core.ignorecase`, `diff.SomeType.textconv`.
|
||||
|
||||
- the value of the found variable, as a string. If the variable had no
|
||||
value specified, the value will be NULL (typically this means it
|
||||
should be interpreted as boolean true).
|
||||
|
||||
- a void pointer passed in by the caller of the config API; this can
|
||||
contain callback-specific data
|
||||
|
||||
A config callback should return 0 for success, or -1 if the variable
|
||||
could not be parsed properly.
|
||||
|
||||
Basic Config Querying
|
||||
---------------------
|
||||
|
||||
Most programs will simply want to look up variables in all config files
|
||||
that Git knows about, using the normal precedence rules. To do this,
|
||||
call `git_config` with a callback function and void data pointer.
|
||||
|
||||
`git_config` will read all config sources in order of increasing
|
||||
priority. Thus a callback should typically overwrite previously-seen
|
||||
entries with new ones (e.g., if both the user-wide `~/.gitconfig` and
|
||||
repo-specific `.git/config` contain `color.ui`, the config machinery
|
||||
will first feed the user-wide one to the callback, and then the
|
||||
repo-specific one; by overwriting, the higher-priority repo-specific
|
||||
value is left at the end).
|
||||
|
||||
The `config_with_options` function lets the caller examine config
|
||||
while adjusting some of the default behavior of `git_config`. It should
|
||||
almost never be used by "regular" Git code that is looking up
|
||||
configuration variables. It is intended for advanced callers like
|
||||
`git-config`, which are intentionally tweaking the normal config-lookup
|
||||
process. It takes two extra parameters:
|
||||
|
||||
`config_source`::
|
||||
If this parameter is non-NULL, it specifies the source to parse for
|
||||
configuration, rather than looking in the usual files. See `struct
|
||||
git_config_source` in `config.h` for details. Regular `git_config` defaults
|
||||
to `NULL`.
|
||||
|
||||
`opts`::
|
||||
Specify options to adjust the behavior of parsing config files. See `struct
|
||||
config_options` in `config.h` for details. As an example: regular `git_config`
|
||||
sets `opts.respect_includes` to `1` by default.
|
||||
|
||||
Reading Specific Files
|
||||
----------------------
|
||||
|
||||
To read a specific file in git-config format, use
|
||||
`git_config_from_file`. This takes the same callback and data parameters
|
||||
as `git_config`.
|
||||
|
||||
Querying For Specific Variables
|
||||
-------------------------------
|
||||
|
||||
For programs wanting to query for specific variables in a non-callback
|
||||
manner, the config API provides two functions `git_config_get_value`
|
||||
and `git_config_get_value_multi`. They both read values from an internal
|
||||
cache generated previously from reading the config files.
|
||||
|
||||
`int git_config_get_value(const char *key, const char **value)`::
|
||||
|
||||
Finds the highest-priority value for the configuration variable `key`,
|
||||
stores the pointer to it in `value` and returns 0. When the
|
||||
configuration variable `key` is not found, returns 1 without touching
|
||||
`value`. The caller should not free or modify `value`, as it is owned
|
||||
by the cache.
|
||||
|
||||
`const struct string_list *git_config_get_value_multi(const char *key)`::
|
||||
|
||||
Finds and returns the value list, sorted in order of increasing priority
|
||||
for the configuration variable `key`. When the configuration variable
|
||||
`key` is not found, returns NULL. The caller should not free or modify
|
||||
the returned pointer, as it is owned by the cache.
|
||||
|
||||
`void git_config_clear(void)`::
|
||||
|
||||
Resets and invalidates the config cache.
|
||||
|
||||
The config API also provides type specific API functions which do conversion
|
||||
as well as retrieval for the queried variable, including:
|
||||
|
||||
`int git_config_get_int(const char *key, int *dest)`::
|
||||
|
||||
Finds and parses the value to an integer for the configuration variable
|
||||
`key`. Dies on error; otherwise, stores the value of the parsed integer in
|
||||
`dest` and returns 0. When the configuration variable `key` is not found,
|
||||
returns 1 without touching `dest`.
|
||||
|
||||
`int git_config_get_ulong(const char *key, unsigned long *dest)`::
|
||||
|
||||
Similar to `git_config_get_int` but for unsigned longs.
|
||||
|
||||
`int git_config_get_bool(const char *key, int *dest)`::
|
||||
|
||||
Finds and parses the value into a boolean value, for the configuration
|
||||
variable `key` respecting keywords like "true" and "false". Integer
|
||||
values are converted into true/false values (when they are non-zero or
|
||||
zero, respectively). Other values cause a die(). If parsing is successful,
|
||||
stores the value of the parsed result in `dest` and returns 0. When the
|
||||
configuration variable `key` is not found, returns 1 without touching
|
||||
`dest`.
|
||||
|
||||
`int git_config_get_bool_or_int(const char *key, int *is_bool, int *dest)`::
|
||||
|
||||
Similar to `git_config_get_bool`, except that integers are copied as-is,
|
||||
and `is_bool` flag is unset.
|
||||
|
||||
`int git_config_get_maybe_bool(const char *key, int *dest)`::
|
||||
|
||||
Similar to `git_config_get_bool`, except that it returns -1 on error
|
||||
rather than dying.
|
||||
|
||||
`int git_config_get_string_const(const char *key, const char **dest)`::
|
||||
|
||||
Allocates and copies the retrieved string into the `dest` parameter for
|
||||
the configuration variable `key`; if NULL string is given, prints an
|
||||
error message and returns -1. When the configuration variable `key` is
|
||||
not found, returns 1 without touching `dest`.
|
||||
|
||||
`int git_config_get_string(const char *key, char **dest)`::
|
||||
|
||||
Similar to `git_config_get_string_const`, except that retrieved value
|
||||
copied into the `dest` parameter is a mutable string.
|
||||
|
||||
`int git_config_get_pathname(const char *key, const char **dest)`::
|
||||
|
||||
Similar to `git_config_get_string`, but expands `~` or `~user` into
|
||||
the user's home directory when found at the beginning of the path.
|
||||
|
||||
`git_die_config(const char *key, const char *err, ...)`::
|
||||
|
||||
First prints the error message specified by the caller in `err` and then
|
||||
dies printing the line number and the file name of the highest priority
|
||||
value for the configuration variable `key`.
|
||||
|
||||
`void git_die_config_linenr(const char *key, const char *filename, int linenr)`::
|
||||
|
||||
Helper function which formats the die error message according to the
|
||||
parameters entered. Used by `git_die_config()`. It can be used by callers
|
||||
handling `git_config_get_value_multi()` to print the correct error message
|
||||
for the desired value.
|
||||
|
||||
See test-config.c for usage examples.
|
||||
|
||||
Value Parsing Helpers
|
||||
---------------------
|
||||
|
||||
To aid in parsing string values, the config API provides callbacks with
|
||||
a number of helper functions, including:
|
||||
|
||||
`git_config_int`::
|
||||
Parse the string to an integer, including unit factors. Dies on error;
|
||||
otherwise, returns the parsed result.
|
||||
|
||||
`git_config_ulong`::
|
||||
Identical to `git_config_int`, but for unsigned longs.
|
||||
|
||||
`git_config_bool`::
|
||||
Parse a string into a boolean value, respecting keywords like "true" and
|
||||
"false". Integer values are converted into true/false values (when they
|
||||
are non-zero or zero, respectively). Other values cause a die(). If
|
||||
parsing is successful, the return value is the result.
|
||||
|
||||
`git_config_bool_or_int`::
|
||||
Same as `git_config_bool`, except that integers are returned as-is, and
|
||||
an `is_bool` flag is unset.
|
||||
|
||||
`git_parse_maybe_bool`::
|
||||
Same as `git_config_bool`, except that it returns -1 on error rather
|
||||
than dying.
|
||||
|
||||
`git_config_string`::
|
||||
Allocates and copies the value string into the `dest` parameter; if no
|
||||
string is given, prints an error message and returns -1.
|
||||
|
||||
`git_config_pathname`::
|
||||
Similar to `git_config_string`, but expands `~` or `~user` into the
|
||||
user's home directory when found at the beginning of the path.
|
||||
|
||||
Include Directives
|
||||
------------------
|
||||
|
||||
By default, the config parser does not respect include directives.
|
||||
However, a caller can use the special `git_config_include` wrapper
|
||||
callback to support them. To do so, you simply wrap your "real" callback
|
||||
function and data pointer in a `struct config_include_data`, and pass
|
||||
the wrapper to the regular config-reading functions. For example:
|
||||
|
||||
-------------------------------------------
|
||||
int read_file_with_include(const char *file, config_fn_t fn, void *data)
|
||||
{
|
||||
struct config_include_data inc = CONFIG_INCLUDE_INIT;
|
||||
inc.fn = fn;
|
||||
inc.data = data;
|
||||
return git_config_from_file(git_config_include, file, &inc);
|
||||
}
|
||||
-------------------------------------------
|
||||
|
||||
`git_config` respects includes automatically. The lower-level
|
||||
`git_config_from_file` does not.
|
||||
|
||||
Custom Configsets
|
||||
-----------------
|
||||
|
||||
A `config_set` can be used to construct an in-memory cache for
|
||||
config-like files that the caller specifies (i.e., files like `.gitmodules`,
|
||||
`~/.gitconfig` etc.). For example,
|
||||
|
||||
----------------------------------------
|
||||
struct config_set gm_config;
|
||||
git_configset_init(&gm_config);
|
||||
int b;
|
||||
/* we add config files to the config_set */
|
||||
git_configset_add_file(&gm_config, ".gitmodules");
|
||||
git_configset_add_file(&gm_config, ".gitmodules_alt");
|
||||
|
||||
if (!git_configset_get_bool(gm_config, "submodule.frotz.ignore", &b)) {
|
||||
/* hack hack hack */
|
||||
}
|
||||
|
||||
/* when we are done with the configset */
|
||||
git_configset_clear(&gm_config);
|
||||
----------------------------------------
|
||||
|
||||
Configset API provides functions for the above mentioned work flow, including:
|
||||
|
||||
`void git_configset_init(struct config_set *cs)`::
|
||||
|
||||
Initializes the config_set `cs`.
|
||||
|
||||
`int git_configset_add_file(struct config_set *cs, const char *filename)`::
|
||||
|
||||
Parses the file and adds the variable-value pairs to the `config_set`,
|
||||
dies if there is an error in parsing the file. Returns 0 on success, or
|
||||
-1 if the file does not exist or is inaccessible. The user has to decide
|
||||
if he wants to free the incomplete configset or continue using it when
|
||||
the function returns -1.
|
||||
|
||||
`int git_configset_get_value(struct config_set *cs, const char *key, const char **value)`::
|
||||
|
||||
Finds the highest-priority value for the configuration variable `key`
|
||||
and config set `cs`, stores the pointer to it in `value` and returns 0.
|
||||
When the configuration variable `key` is not found, returns 1 without
|
||||
touching `value`. The caller should not free or modify `value`, as it
|
||||
is owned by the cache.
|
||||
|
||||
`const struct string_list *git_configset_get_value_multi(struct config_set *cs, const char *key)`::
|
||||
|
||||
Finds and returns the value list, sorted in order of increasing priority
|
||||
for the configuration variable `key` and config set `cs`. When the
|
||||
configuration variable `key` is not found, returns NULL. The caller
|
||||
should not free or modify the returned pointer, as it is owned by the cache.
|
||||
|
||||
`void git_configset_clear(struct config_set *cs)`::
|
||||
|
||||
Clears `config_set` structure, removes all saved variable-value pairs.
|
||||
|
||||
In addition to above functions, the `config_set` API provides type specific
|
||||
functions in the vein of `git_config_get_int` and family but with an extra
|
||||
parameter, pointer to struct `config_set`.
|
||||
They all behave similarly to the `git_config_get*()` family described in
|
||||
"Querying For Specific Variables" above.
|
||||
|
||||
Writing Config Files
|
||||
--------------------
|
||||
|
||||
Git gives multiple entry points in the Config API to write config values to
|
||||
files namely `git_config_set_in_file` and `git_config_set`, which write to
|
||||
a specific config file or to `.git/config` respectively. They both take a
|
||||
key/value pair as parameter.
|
||||
In the end they both call `git_config_set_multivar_in_file` which takes four
|
||||
parameters:
|
||||
|
||||
- the name of the file, as a string, to which key/value pairs will be written.
|
||||
|
||||
- the name of key, as a string. This is in canonical "flat" form: the section,
|
||||
subsection, and variable segments will be separated by dots, and the section
|
||||
and variable segments will be all lowercase.
|
||||
E.g., `core.ignorecase`, `diff.SomeType.textconv`.
|
||||
|
||||
- the value of the variable, as a string. If value is equal to NULL, it will
|
||||
remove the matching key from the config file.
|
||||
|
||||
- the value regex, as a string. It will disregard key/value pairs where value
|
||||
does not match.
|
||||
|
||||
- a multi_replace value, as an int. If value is equal to zero, nothing or only
|
||||
one matching key/value is replaced, else all matching key/values (regardless
|
||||
how many) are removed, before the new pair is written.
|
||||
|
||||
It returns 0 on success.
|
||||
|
||||
Also, there are functions `git_config_rename_section` and
|
||||
`git_config_rename_section_in_file` with parameters `old_name` and `new_name`
|
||||
for renaming or removing sections in the config files. If NULL is passed
|
||||
through `new_name` parameter, the section will be removed from the config file.
|
||||
271
Documentation/technical/api-credentials.txt
Normal file
271
Documentation/technical/api-credentials.txt
Normal file
|
|
@ -0,0 +1,271 @@
|
|||
credentials API
|
||||
===============
|
||||
|
||||
The credentials API provides an abstracted way of gathering username and
|
||||
password credentials from the user (even though credentials in the wider
|
||||
world can take many forms, in this document the word "credential" always
|
||||
refers to a username and password pair).
|
||||
|
||||
This document describes two interfaces: the C API that the credential
|
||||
subsystem provides to the rest of Git, and the protocol that Git uses to
|
||||
communicate with system-specific "credential helpers". If you are
|
||||
writing Git code that wants to look up or prompt for credentials, see
|
||||
the section "C API" below. If you want to write your own helper, see
|
||||
the section on "Credential Helpers" below.
|
||||
|
||||
Typical setup
|
||||
-------------
|
||||
|
||||
------------
|
||||
+-----------------------+
|
||||
| Git code (C) |--- to server requiring --->
|
||||
| | authentication
|
||||
|.......................|
|
||||
| C credential API |--- prompt ---> User
|
||||
+-----------------------+
|
||||
^ |
|
||||
| pipe |
|
||||
| v
|
||||
+-----------------------+
|
||||
| Git credential helper |
|
||||
+-----------------------+
|
||||
------------
|
||||
|
||||
The Git code (typically a remote-helper) will call the C API to obtain
|
||||
credential data like a login/password pair (credential_fill). The
|
||||
API will itself call a remote helper (e.g. "git credential-cache" or
|
||||
"git credential-store") that may retrieve credential data from a
|
||||
store. If the credential helper cannot find the information, the C API
|
||||
will prompt the user. Then, the caller of the API takes care of
|
||||
contacting the server, and does the actual authentication.
|
||||
|
||||
C API
|
||||
-----
|
||||
|
||||
The credential C API is meant to be called by Git code which needs to
|
||||
acquire or store a credential. It is centered around an object
|
||||
representing a single credential and provides three basic operations:
|
||||
fill (acquire credentials by calling helpers and/or prompting the user),
|
||||
approve (mark a credential as successfully used so that it can be stored
|
||||
for later use), and reject (mark a credential as unsuccessful so that it
|
||||
can be erased from any persistent storage).
|
||||
|
||||
Data Structures
|
||||
~~~~~~~~~~~~~~~
|
||||
|
||||
`struct credential`::
|
||||
|
||||
This struct represents a single username/password combination
|
||||
along with any associated context. All string fields should be
|
||||
heap-allocated (or NULL if they are not known or not applicable).
|
||||
The meaning of the individual context fields is the same as
|
||||
their counterparts in the helper protocol; see the section below
|
||||
for a description of each field.
|
||||
+
|
||||
The `helpers` member of the struct is a `string_list` of helpers. Each
|
||||
string specifies an external helper which will be run, in order, to
|
||||
either acquire or store credentials. See the section on credential
|
||||
helpers below. This list is filled-in by the API functions
|
||||
according to the corresponding configuration variables before
|
||||
consulting helpers, so there usually is no need for a caller to
|
||||
modify the helpers field at all.
|
||||
+
|
||||
This struct should always be initialized with `CREDENTIAL_INIT` or
|
||||
`credential_init`.
|
||||
|
||||
|
||||
Functions
|
||||
~~~~~~~~~
|
||||
|
||||
`credential_init`::
|
||||
|
||||
Initialize a credential structure, setting all fields to empty.
|
||||
|
||||
`credential_clear`::
|
||||
|
||||
Free any resources associated with the credential structure,
|
||||
returning it to a pristine initialized state.
|
||||
|
||||
`credential_fill`::
|
||||
|
||||
Instruct the credential subsystem to fill the username and
|
||||
password fields of the passed credential struct by first
|
||||
consulting helpers, then asking the user. After this function
|
||||
returns, the username and password fields of the credential are
|
||||
guaranteed to be non-NULL. If an error occurs, the function will
|
||||
die().
|
||||
|
||||
`credential_reject`::
|
||||
|
||||
Inform the credential subsystem that the provided credentials
|
||||
have been rejected. This will cause the credential subsystem to
|
||||
notify any helpers of the rejection (which allows them, for
|
||||
example, to purge the invalid credentials from storage). It
|
||||
will also free() the username and password fields of the
|
||||
credential and set them to NULL (readying the credential for
|
||||
another call to `credential_fill`). Any errors from helpers are
|
||||
ignored.
|
||||
|
||||
`credential_approve`::
|
||||
|
||||
Inform the credential subsystem that the provided credentials
|
||||
were successfully used for authentication. This will cause the
|
||||
credential subsystem to notify any helpers of the approval, so
|
||||
that they may store the result to be used again. Any errors
|
||||
from helpers are ignored.
|
||||
|
||||
`credential_from_url`::
|
||||
|
||||
Parse a URL into broken-down credential fields.
|
||||
|
||||
Example
|
||||
~~~~~~~
|
||||
|
||||
The example below shows how the functions of the credential API could be
|
||||
used to login to a fictitious "foo" service on a remote host:
|
||||
|
||||
-----------------------------------------------------------------------
|
||||
int foo_login(struct foo_connection *f)
|
||||
{
|
||||
int status;
|
||||
/*
|
||||
* Create a credential with some context; we don't yet know the
|
||||
* username or password.
|
||||
*/
|
||||
|
||||
struct credential c = CREDENTIAL_INIT;
|
||||
c.protocol = xstrdup("foo");
|
||||
c.host = xstrdup(f->hostname);
|
||||
|
||||
/*
|
||||
* Fill in the username and password fields by contacting
|
||||
* helpers and/or asking the user. The function will die if it
|
||||
* fails.
|
||||
*/
|
||||
credential_fill(&c);
|
||||
|
||||
/*
|
||||
* Otherwise, we have a username and password. Try to use it.
|
||||
*/
|
||||
status = send_foo_login(f, c.username, c.password);
|
||||
switch (status) {
|
||||
case FOO_OK:
|
||||
/* It worked. Store the credential for later use. */
|
||||
credential_accept(&c);
|
||||
break;
|
||||
case FOO_BAD_LOGIN:
|
||||
/* Erase the credential from storage so we don't try it
|
||||
* again. */
|
||||
credential_reject(&c);
|
||||
break;
|
||||
default:
|
||||
/*
|
||||
* Some other error occurred. We don't know if the
|
||||
* credential is good or bad, so report nothing to the
|
||||
* credential subsystem.
|
||||
*/
|
||||
}
|
||||
|
||||
/* Free any associated resources. */
|
||||
credential_clear(&c);
|
||||
|
||||
return status;
|
||||
}
|
||||
-----------------------------------------------------------------------
|
||||
|
||||
|
||||
Credential Helpers
|
||||
------------------
|
||||
|
||||
Credential helpers are programs executed by Git to fetch or save
|
||||
credentials from and to long-term storage (where "long-term" is simply
|
||||
longer than a single Git process; e.g., credentials may be stored
|
||||
in-memory for a few minutes, or indefinitely on disk).
|
||||
|
||||
Each helper is specified by a single string in the configuration
|
||||
variable `credential.helper` (and others, see linkgit:git-config[1]).
|
||||
The string is transformed by Git into a command to be executed using
|
||||
these rules:
|
||||
|
||||
1. If the helper string begins with "!", it is considered a shell
|
||||
snippet, and everything after the "!" becomes the command.
|
||||
|
||||
2. Otherwise, if the helper string begins with an absolute path, the
|
||||
verbatim helper string becomes the command.
|
||||
|
||||
3. Otherwise, the string "git credential-" is prepended to the helper
|
||||
string, and the result becomes the command.
|
||||
|
||||
The resulting command then has an "operation" argument appended to it
|
||||
(see below for details), and the result is executed by the shell.
|
||||
|
||||
Here are some example specifications:
|
||||
|
||||
----------------------------------------------------
|
||||
# run "git credential-foo"
|
||||
foo
|
||||
|
||||
# same as above, but pass an argument to the helper
|
||||
foo --bar=baz
|
||||
|
||||
# the arguments are parsed by the shell, so use shell
|
||||
# quoting if necessary
|
||||
foo --bar="whitespace arg"
|
||||
|
||||
# you can also use an absolute path, which will not use the git wrapper
|
||||
/path/to/my/helper --with-arguments
|
||||
|
||||
# or you can specify your own shell snippet
|
||||
!f() { echo "password=`cat $HOME/.secret`"; }; f
|
||||
----------------------------------------------------
|
||||
|
||||
Generally speaking, rule (3) above is the simplest for users to specify.
|
||||
Authors of credential helpers should make an effort to assist their
|
||||
users by naming their program "git-credential-$NAME", and putting it in
|
||||
the $PATH or $GIT_EXEC_PATH during installation, which will allow a user
|
||||
to enable it with `git config credential.helper $NAME`.
|
||||
|
||||
When a helper is executed, it will have one "operation" argument
|
||||
appended to its command line, which is one of:
|
||||
|
||||
`get`::
|
||||
|
||||
Return a matching credential, if any exists.
|
||||
|
||||
`store`::
|
||||
|
||||
Store the credential, if applicable to the helper.
|
||||
|
||||
`erase`::
|
||||
|
||||
Remove a matching credential, if any, from the helper's storage.
|
||||
|
||||
The details of the credential will be provided on the helper's stdin
|
||||
stream. The exact format is the same as the input/output format of the
|
||||
`git credential` plumbing command (see the section `INPUT/OUTPUT
|
||||
FORMAT` in linkgit:git-credential[1] for a detailed specification).
|
||||
|
||||
For a `get` operation, the helper should produce a list of attributes
|
||||
on stdout in the same format. A helper is free to produce a subset, or
|
||||
even no values at all if it has nothing useful to provide. Any provided
|
||||
attributes will overwrite those already known about by Git. If a helper
|
||||
outputs a `quit` attribute with a value of `true` or `1`, no further
|
||||
helpers will be consulted, nor will the user be prompted (if no
|
||||
credential has been provided, the operation will then fail).
|
||||
|
||||
For a `store` or `erase` operation, the helper's output is ignored.
|
||||
If it fails to perform the requested operation, it may complain to
|
||||
stderr to inform the user. If it does not support the requested
|
||||
operation (e.g., a read-only store), it should silently ignore the
|
||||
request.
|
||||
|
||||
If a helper receives any other operation, it should silently ignore the
|
||||
request. This leaves room for future operations to be added (older
|
||||
helpers will just ignore the new requests).
|
||||
|
||||
See also
|
||||
--------
|
||||
|
||||
linkgit:gitcredentials[7]
|
||||
|
||||
linkgit:git-config[1] (See configuration variables `credential.*`)
|
||||
174
Documentation/technical/api-diff.txt
Normal file
174
Documentation/technical/api-diff.txt
Normal file
|
|
@ -0,0 +1,174 @@
|
|||
diff API
|
||||
========
|
||||
|
||||
The diff API is for programs that compare two sets of files (e.g. two
|
||||
trees, one tree and the index) and present the found difference in
|
||||
various ways. The calling program is responsible for feeding the API
|
||||
pairs of files, one from the "old" set and the corresponding one from
|
||||
"new" set, that are different. The library called through this API is
|
||||
called diffcore, and is responsible for two things.
|
||||
|
||||
* finding total rewrites (`-B`), renames (`-M`) and copies (`-C`), and
|
||||
changes that touch a string (`-S`), as specified by the caller.
|
||||
|
||||
* outputting the differences in various formats, as specified by the
|
||||
caller.
|
||||
|
||||
Calling sequence
|
||||
----------------
|
||||
|
||||
* Prepare `struct diff_options` to record the set of diff options, and
|
||||
then call `repo_diff_setup()` to initialize this structure. This
|
||||
sets up the vanilla default.
|
||||
|
||||
* Fill in the options structure to specify desired output format, rename
|
||||
detection, etc. `diff_opt_parse()` can be used to parse options given
|
||||
from the command line in a way consistent with existing git-diff
|
||||
family of programs.
|
||||
|
||||
* Call `diff_setup_done()`; this inspects the options set up so far for
|
||||
internal consistency and make necessary tweaking to it (e.g. if
|
||||
textual patch output was asked, recursive behaviour is turned on);
|
||||
the callback set_default in diff_options can be used to tweak this more.
|
||||
|
||||
* As you find different pairs of files, call `diff_change()` to feed
|
||||
modified files, `diff_addremove()` to feed created or deleted files,
|
||||
or `diff_unmerge()` to feed a file whose state is 'unmerged' to the
|
||||
API. These are thin wrappers to a lower-level `diff_queue()` function
|
||||
that is flexible enough to record any of these kinds of changes.
|
||||
|
||||
* Once you finish feeding the pairs of files, call `diffcore_std()`.
|
||||
This will tell the diffcore library to go ahead and do its work.
|
||||
|
||||
* Calling `diff_flush()` will produce the output.
|
||||
|
||||
|
||||
Data structures
|
||||
---------------
|
||||
|
||||
* `struct diff_filespec`
|
||||
|
||||
This is the internal representation for a single file (blob). It
|
||||
records the blob object name (if known -- for a work tree file it
|
||||
typically is a NUL SHA-1), filemode and pathname. This is what the
|
||||
`diff_addremove()`, `diff_change()` and `diff_unmerge()` synthesize and
|
||||
feed `diff_queue()` function with.
|
||||
|
||||
* `struct diff_filepair`
|
||||
|
||||
This records a pair of `struct diff_filespec`; the filespec for a file
|
||||
in the "old" set (i.e. preimage) is called `one`, and the filespec for a
|
||||
file in the "new" set (i.e. postimage) is called `two`. A change that
|
||||
represents file creation has NULL in `one`, and file deletion has NULL
|
||||
in `two`.
|
||||
|
||||
A `filepair` starts pointing at `one` and `two` that are from the same
|
||||
filename, but `diffcore_std()` can break pairs and match component
|
||||
filespecs with other filespecs from a different filepair to form new
|
||||
filepair. This is called 'rename detection'.
|
||||
|
||||
* `struct diff_queue`
|
||||
|
||||
This is a collection of filepairs. Notable members are:
|
||||
|
||||
`queue`::
|
||||
|
||||
An array of pointers to `struct diff_filepair`. This
|
||||
dynamically grows as you add filepairs;
|
||||
|
||||
`alloc`::
|
||||
|
||||
The allocated size of the `queue` array;
|
||||
|
||||
`nr`::
|
||||
|
||||
The number of elements in the `queue` array.
|
||||
|
||||
|
||||
* `struct diff_options`
|
||||
|
||||
This describes the set of options the calling program wants to affect
|
||||
the operation of diffcore library with.
|
||||
|
||||
Notable members are:
|
||||
|
||||
`output_format`::
|
||||
The output format used when `diff_flush()` is run.
|
||||
|
||||
`context`::
|
||||
Number of context lines to generate in patch output.
|
||||
|
||||
`break_opt`, `detect_rename`, `rename-score`, `rename_limit`::
|
||||
Affects the way detection logic for complete rewrites, renames
|
||||
and copies.
|
||||
|
||||
`abbrev`::
|
||||
Number of hexdigits to abbreviate raw format output to.
|
||||
|
||||
`pickaxe`::
|
||||
A constant string (can and typically does contain newlines to
|
||||
look for a block of text, not just a single line) to filter out
|
||||
the filepairs that do not change the number of strings contained
|
||||
in its preimage and postimage of the diff_queue.
|
||||
|
||||
`flags`::
|
||||
This is mostly a collection of boolean options that affects the
|
||||
operation, but some do not have anything to do with the diffcore
|
||||
library.
|
||||
|
||||
`touched_flags`::
|
||||
Records whether a flag has been changed due to user request
|
||||
(rather than just set/unset by default).
|
||||
|
||||
`set_default`::
|
||||
Callback which allows tweaking the options in diff_setup_done().
|
||||
|
||||
BINARY, TEXT;;
|
||||
Affects the way how a file that is seemingly binary is treated.
|
||||
|
||||
FULL_INDEX;;
|
||||
Tells the patch output format not to use abbreviated object
|
||||
names on the "index" lines.
|
||||
|
||||
FIND_COPIES_HARDER;;
|
||||
Tells the diffcore library that the caller is feeding unchanged
|
||||
filepairs to allow copies from unmodified files be detected.
|
||||
|
||||
COLOR_DIFF;;
|
||||
Output should be colored.
|
||||
|
||||
COLOR_DIFF_WORDS;;
|
||||
Output is a colored word-diff.
|
||||
|
||||
NO_INDEX;;
|
||||
Tells diff-files that the input is not tracked files but files
|
||||
in random locations on the filesystem.
|
||||
|
||||
ALLOW_EXTERNAL;;
|
||||
Tells output routine that it is Ok to call user specified patch
|
||||
output routine. Plumbing disables this to ensure stable output.
|
||||
|
||||
QUIET;;
|
||||
Do not show any output.
|
||||
|
||||
REVERSE_DIFF;;
|
||||
Tells the library that the calling program is feeding the
|
||||
filepairs reversed; `one` is two, and `two` is one.
|
||||
|
||||
EXIT_WITH_STATUS;;
|
||||
For communication between the calling program and the options
|
||||
parser; tell the calling program to signal the presence of
|
||||
difference using program exit code.
|
||||
|
||||
HAS_CHANGES;;
|
||||
Internal; used for optimization to see if there is any change.
|
||||
|
||||
SILENT_ON_REMOVE;;
|
||||
Affects if diff-files shows removed files.
|
||||
|
||||
RECURSIVE, TREE_IN_RECURSIVE;;
|
||||
Tells if tree traversal done by tree-diff should recursively
|
||||
descend into a tree object pair that are different in preimage
|
||||
and postimage set.
|
||||
|
||||
(JC)
|
||||
130
Documentation/technical/api-directory-listing.txt
Normal file
130
Documentation/technical/api-directory-listing.txt
Normal file
|
|
@ -0,0 +1,130 @@
|
|||
directory listing API
|
||||
=====================
|
||||
|
||||
The directory listing API is used to enumerate paths in the work tree,
|
||||
optionally taking `.git/info/exclude` and `.gitignore` files per
|
||||
directory into account.
|
||||
|
||||
Data structure
|
||||
--------------
|
||||
|
||||
`struct dir_struct` structure is used to pass directory traversal
|
||||
options to the library and to record the paths discovered. A single
|
||||
`struct dir_struct` is used regardless of whether or not the traversal
|
||||
recursively descends into subdirectories.
|
||||
|
||||
The notable options are:
|
||||
|
||||
`exclude_per_dir`::
|
||||
|
||||
The name of the file to be read in each directory for excluded
|
||||
files (typically `.gitignore`).
|
||||
|
||||
`flags`::
|
||||
|
||||
A bit-field of options:
|
||||
|
||||
`DIR_SHOW_IGNORED`:::
|
||||
|
||||
Return just ignored files in `entries[]`, not untracked
|
||||
files. This flag is mutually exclusive with
|
||||
`DIR_SHOW_IGNORED_TOO`.
|
||||
|
||||
`DIR_SHOW_IGNORED_TOO`:::
|
||||
|
||||
Similar to `DIR_SHOW_IGNORED`, but return ignored files in
|
||||
`ignored[]` in addition to untracked files in
|
||||
`entries[]`. This flag is mutually exclusive with
|
||||
`DIR_SHOW_IGNORED`.
|
||||
|
||||
`DIR_KEEP_UNTRACKED_CONTENTS`:::
|
||||
|
||||
Only has meaning if `DIR_SHOW_IGNORED_TOO` is also set; if this is set, the
|
||||
untracked contents of untracked directories are also returned in
|
||||
`entries[]`.
|
||||
|
||||
`DIR_SHOW_IGNORED_TOO_MODE_MATCHING`:::
|
||||
|
||||
Only has meaning if `DIR_SHOW_IGNORED_TOO` is also set; if
|
||||
this is set, returns ignored files and directories that match
|
||||
an exclude pattern. If a directory matches an exclude pattern,
|
||||
then the directory is returned and the contained paths are
|
||||
not. A directory that does not match an exclude pattern will
|
||||
not be returned even if all of its contents are ignored. In
|
||||
this case, the contents are returned as individual entries.
|
||||
+
|
||||
If this is set, files and directories that explicitly match an ignore
|
||||
pattern are reported. Implicitly ignored directories (directories that
|
||||
do not match an ignore pattern, but whose contents are all ignored)
|
||||
are not reported, instead all of the contents are reported.
|
||||
|
||||
`DIR_COLLECT_IGNORED`:::
|
||||
|
||||
Special mode for git-add. Return ignored files in `ignored[]` and
|
||||
untracked files in `entries[]`. Only returns ignored files that match
|
||||
pathspec exactly (no wildcards). Does not recurse into ignored
|
||||
directories.
|
||||
|
||||
`DIR_SHOW_OTHER_DIRECTORIES`:::
|
||||
|
||||
Include a directory that is not tracked.
|
||||
|
||||
`DIR_HIDE_EMPTY_DIRECTORIES`:::
|
||||
|
||||
Do not include a directory that is not tracked and is empty.
|
||||
|
||||
`DIR_NO_GITLINKS`:::
|
||||
|
||||
If set, recurse into a directory that looks like a Git
|
||||
directory. Otherwise it is shown as a directory.
|
||||
|
||||
The result of the enumeration is left in these fields:
|
||||
|
||||
`entries[]`::
|
||||
|
||||
An array of `struct dir_entry`, each element of which describes
|
||||
a path.
|
||||
|
||||
`nr`::
|
||||
|
||||
The number of members in `entries[]` array.
|
||||
|
||||
`alloc`::
|
||||
|
||||
Internal use; keeps track of allocation of `entries[]` array.
|
||||
|
||||
`ignored[]`::
|
||||
|
||||
An array of `struct dir_entry`, used for ignored paths with the
|
||||
`DIR_SHOW_IGNORED_TOO` and `DIR_COLLECT_IGNORED` flags.
|
||||
|
||||
`ignored_nr`::
|
||||
|
||||
The number of members in `ignored[]` array.
|
||||
|
||||
Calling sequence
|
||||
----------------
|
||||
|
||||
Note: index may be looked at for .gitignore files that are CE_SKIP_WORKTREE
|
||||
marked. If you to exclude files, make sure you have loaded index first.
|
||||
|
||||
* Prepare `struct dir_struct dir` and clear it with `memset(&dir, 0,
|
||||
sizeof(dir))`.
|
||||
|
||||
* To add single exclude pattern, call `add_exclude_list()` and then
|
||||
`add_exclude()`.
|
||||
|
||||
* To add patterns from a file (e.g. `.git/info/exclude`), call
|
||||
`add_excludes_from_file()` , and/or set `dir.exclude_per_dir`. A
|
||||
short-hand function `setup_standard_excludes()` can be used to set
|
||||
up the standard set of exclude settings.
|
||||
|
||||
* Set options described in the Data Structure section above.
|
||||
|
||||
* Call `read_directory()`.
|
||||
|
||||
* Use `dir.entries[]`.
|
||||
|
||||
* Call `clear_directory()` when none of the contained elements are no longer in use.
|
||||
|
||||
(JC)
|
||||
75
Documentation/technical/api-error-handling.txt
Normal file
75
Documentation/technical/api-error-handling.txt
Normal file
|
|
@ -0,0 +1,75 @@
|
|||
Error reporting in git
|
||||
======================
|
||||
|
||||
`die`, `usage`, `error`, and `warning` report errors of various
|
||||
kinds.
|
||||
|
||||
- `die` is for fatal application errors. It prints a message to
|
||||
the user and exits with status 128.
|
||||
|
||||
- `usage` is for errors in command line usage. After printing its
|
||||
message, it exits with status 129. (See also `usage_with_options`
|
||||
in the link:api-parse-options.html[parse-options API].)
|
||||
|
||||
- `error` is for non-fatal library errors. It prints a message
|
||||
to the user and returns -1 for convenience in signaling the error
|
||||
to the caller.
|
||||
|
||||
- `warning` is for reporting situations that probably should not
|
||||
occur but which the user (and Git) can continue to work around
|
||||
without running into too many problems. Like `error`, it
|
||||
returns -1 after reporting the situation to the caller.
|
||||
|
||||
Customizable error handlers
|
||||
---------------------------
|
||||
|
||||
The default behavior of `die` and `error` is to write a message to
|
||||
stderr and then exit or return as appropriate. This behavior can be
|
||||
overridden using `set_die_routine` and `set_error_routine`. For
|
||||
example, "git daemon" uses set_die_routine to write the reason `die`
|
||||
was called to syslog before exiting.
|
||||
|
||||
Library errors
|
||||
--------------
|
||||
|
||||
Functions return a negative integer on error. Details beyond that
|
||||
vary from function to function:
|
||||
|
||||
- Some functions return -1 for all errors. Others return a more
|
||||
specific value depending on how the caller might want to react
|
||||
to the error.
|
||||
|
||||
- Some functions report the error to stderr with `error`,
|
||||
while others leave that for the caller to do.
|
||||
|
||||
- errno is not meaningful on return from most functions (except
|
||||
for thin wrappers for system calls).
|
||||
|
||||
Check the function's API documentation to be sure.
|
||||
|
||||
Caller-handled errors
|
||||
---------------------
|
||||
|
||||
An increasing number of functions take a parameter 'struct strbuf *err'.
|
||||
On error, such functions append a message about what went wrong to the
|
||||
'err' strbuf. The message is meant to be complete enough to be passed
|
||||
to `die` or `error` as-is. For example:
|
||||
|
||||
if (ref_transaction_commit(transaction, &err))
|
||||
die("%s", err.buf);
|
||||
|
||||
The 'err' parameter will be untouched if no error occurred, so multiple
|
||||
function calls can be chained:
|
||||
|
||||
t = ref_transaction_begin(&err);
|
||||
if (!t ||
|
||||
ref_transaction_update(t, "HEAD", ..., &err) ||
|
||||
ret_transaction_commit(t, &err))
|
||||
die("%s", err.buf);
|
||||
|
||||
The 'err' parameter must be a pointer to a valid strbuf. To silence
|
||||
a message, pass a strbuf that is explicitly ignored:
|
||||
|
||||
if (thing_that_can_fail_in_an_ignorable_way(..., &err))
|
||||
/* This failure is okay. */
|
||||
strbuf_reset(&err);
|
||||
154
Documentation/technical/api-gitattributes.txt
Normal file
154
Documentation/technical/api-gitattributes.txt
Normal file
|
|
@ -0,0 +1,154 @@
|
|||
gitattributes API
|
||||
=================
|
||||
|
||||
gitattributes mechanism gives a uniform way to associate various
|
||||
attributes to set of paths.
|
||||
|
||||
|
||||
Data Structure
|
||||
--------------
|
||||
|
||||
`struct git_attr`::
|
||||
|
||||
An attribute is an opaque object that is identified by its name.
|
||||
Pass the name to `git_attr()` function to obtain the object of
|
||||
this type. The internal representation of this structure is
|
||||
of no interest to the calling programs. The name of the
|
||||
attribute can be retrieved by calling `git_attr_name()`.
|
||||
|
||||
`struct attr_check_item`::
|
||||
|
||||
This structure represents one attribute and its value.
|
||||
|
||||
`struct attr_check`::
|
||||
|
||||
This structure represents a collection of `attr_check_item`.
|
||||
It is passed to `git_check_attr()` function, specifying the
|
||||
attributes to check, and receives their values.
|
||||
|
||||
|
||||
Attribute Values
|
||||
----------------
|
||||
|
||||
An attribute for a path can be in one of four states: Set, Unset,
|
||||
Unspecified or set to a string, and `.value` member of `struct
|
||||
attr_check_item` records it. There are three macros to check these:
|
||||
|
||||
`ATTR_TRUE()`::
|
||||
|
||||
Returns true if the attribute is Set for the path.
|
||||
|
||||
`ATTR_FALSE()`::
|
||||
|
||||
Returns true if the attribute is Unset for the path.
|
||||
|
||||
`ATTR_UNSET()`::
|
||||
|
||||
Returns true if the attribute is Unspecified for the path.
|
||||
|
||||
If none of the above returns true, `.value` member points at a string
|
||||
value of the attribute for the path.
|
||||
|
||||
|
||||
Querying Specific Attributes
|
||||
----------------------------
|
||||
|
||||
* Prepare `struct attr_check` using attr_check_initl()
|
||||
function, enumerating the names of attributes whose values you are
|
||||
interested in, terminated with a NULL pointer. Alternatively, an
|
||||
empty `struct attr_check` can be prepared by calling
|
||||
`attr_check_alloc()` function and then attributes you want to
|
||||
ask about can be added to it with `attr_check_append()`
|
||||
function.
|
||||
|
||||
* Call `git_check_attr()` to check the attributes for the path.
|
||||
|
||||
* Inspect `attr_check` structure to see how each of the
|
||||
attribute in the array is defined for the path.
|
||||
|
||||
|
||||
Example
|
||||
-------
|
||||
|
||||
To see how attributes "crlf" and "ident" are set for different paths.
|
||||
|
||||
. Prepare a `struct attr_check` with two elements (because
|
||||
we are checking two attributes):
|
||||
|
||||
------------
|
||||
static struct attr_check *check;
|
||||
static void setup_check(void)
|
||||
{
|
||||
if (check)
|
||||
return; /* already done */
|
||||
check = attr_check_initl("crlf", "ident", NULL);
|
||||
}
|
||||
------------
|
||||
|
||||
. Call `git_check_attr()` with the prepared `struct attr_check`:
|
||||
|
||||
------------
|
||||
const char *path;
|
||||
|
||||
setup_check();
|
||||
git_check_attr(path, check);
|
||||
------------
|
||||
|
||||
. Act on `.value` member of the result, left in `check->items[]`:
|
||||
|
||||
------------
|
||||
const char *value = check->items[0].value;
|
||||
|
||||
if (ATTR_TRUE(value)) {
|
||||
The attribute is Set, by listing only the name of the
|
||||
attribute in the gitattributes file for the path.
|
||||
} else if (ATTR_FALSE(value)) {
|
||||
The attribute is Unset, by listing the name of the
|
||||
attribute prefixed with a dash - for the path.
|
||||
} else if (ATTR_UNSET(value)) {
|
||||
The attribute is neither set nor unset for the path.
|
||||
} else if (!strcmp(value, "input")) {
|
||||
If none of ATTR_TRUE(), ATTR_FALSE(), or ATTR_UNSET() is
|
||||
true, the value is a string set in the gitattributes
|
||||
file for the path by saying "attr=value".
|
||||
} else if (... other check using value as string ...) {
|
||||
...
|
||||
}
|
||||
------------
|
||||
|
||||
To see how attributes in argv[] are set for different paths, only
|
||||
the first step in the above would be different.
|
||||
|
||||
------------
|
||||
static struct attr_check *check;
|
||||
static void setup_check(const char **argv)
|
||||
{
|
||||
check = attr_check_alloc();
|
||||
while (*argv) {
|
||||
struct git_attr *attr = git_attr(*argv);
|
||||
attr_check_append(check, attr);
|
||||
argv++;
|
||||
}
|
||||
}
|
||||
------------
|
||||
|
||||
|
||||
Querying All Attributes
|
||||
-----------------------
|
||||
|
||||
To get the values of all attributes associated with a file:
|
||||
|
||||
* Prepare an empty `attr_check` structure by calling
|
||||
`attr_check_alloc()`.
|
||||
|
||||
* Call `git_all_attrs()`, which populates the `attr_check`
|
||||
with the attributes attached to the path.
|
||||
|
||||
* Iterate over the `attr_check.items[]` array to examine
|
||||
the attribute names and values. The name of the attribute
|
||||
described by an `attr_check.items[]` object can be retrieved via
|
||||
`git_attr_name(check->items[i].attr)`. (Please note that no items
|
||||
will be returned for unset attributes, so `ATTR_UNSET()` will return
|
||||
false for all returned `attr_check.items[]` objects.)
|
||||
|
||||
* Free the `attr_check` struct by calling `attr_check_free()`.
|
||||
8
Documentation/technical/api-grep.txt
Normal file
8
Documentation/technical/api-grep.txt
Normal file
|
|
@ -0,0 +1,8 @@
|
|||
grep API
|
||||
========
|
||||
|
||||
Talk about <grep.h>, things like:
|
||||
|
||||
* grep_buffer()
|
||||
|
||||
(JC)
|
||||
173
Documentation/technical/api-history-graph.txt
Normal file
173
Documentation/technical/api-history-graph.txt
Normal file
|
|
@ -0,0 +1,173 @@
|
|||
history graph API
|
||||
=================
|
||||
|
||||
The graph API is used to draw a text-based representation of the commit
|
||||
history. The API generates the graph in a line-by-line fashion.
|
||||
|
||||
Functions
|
||||
---------
|
||||
|
||||
Core functions:
|
||||
|
||||
* `graph_init()` creates a new `struct git_graph`
|
||||
|
||||
* `graph_update()` moves the graph to a new commit.
|
||||
|
||||
* `graph_next_line()` outputs the next line of the graph into a strbuf. It
|
||||
does not add a terminating newline.
|
||||
|
||||
* `graph_padding_line()` outputs a line of vertical padding in the graph. It
|
||||
is similar to `graph_next_line()`, but is guaranteed to never print the line
|
||||
containing the current commit. Where `graph_next_line()` would print the
|
||||
commit line next, `graph_padding_line()` prints a line that simply extends
|
||||
all branch lines downwards one row, leaving their positions unchanged.
|
||||
|
||||
* `graph_is_commit_finished()` determines if the graph has output all lines
|
||||
necessary for the current commit. If `graph_update()` is called before all
|
||||
lines for the current commit have been printed, the next call to
|
||||
`graph_next_line()` will output an ellipsis, to indicate that a portion of
|
||||
the graph was omitted.
|
||||
|
||||
The following utility functions are wrappers around `graph_next_line()` and
|
||||
`graph_is_commit_finished()`. They always print the output to stdout.
|
||||
They can all be called with a NULL graph argument, in which case no graph
|
||||
output will be printed.
|
||||
|
||||
* `graph_show_commit()` calls `graph_next_line()` and
|
||||
`graph_is_commit_finished()` until one of them return non-zero. This prints
|
||||
all graph lines up to, and including, the line containing this commit.
|
||||
Output is printed to stdout. The last line printed does not contain a
|
||||
terminating newline.
|
||||
|
||||
* `graph_show_oneline()` calls `graph_next_line()` and prints the result to
|
||||
stdout. The line printed does not contain a terminating newline.
|
||||
|
||||
* `graph_show_padding()` calls `graph_padding_line()` and prints the result to
|
||||
stdout. The line printed does not contain a terminating newline.
|
||||
|
||||
* `graph_show_remainder()` calls `graph_next_line()` until
|
||||
`graph_is_commit_finished()` returns non-zero. Output is printed to stdout.
|
||||
The last line printed does not contain a terminating newline. Returns 1 if
|
||||
output was printed, and 0 if no output was necessary.
|
||||
|
||||
* `graph_show_strbuf()` prints the specified strbuf to stdout, prefixing all
|
||||
lines but the first with a graph line. The caller is responsible for
|
||||
ensuring graph output for the first line has already been printed to stdout.
|
||||
(This can be done with `graph_show_commit()` or `graph_show_oneline()`.) If
|
||||
a NULL graph is supplied, the strbuf is printed as-is.
|
||||
|
||||
* `graph_show_commit_msg()` is similar to `graph_show_strbuf()`, but it also
|
||||
prints the remainder of the graph, if more lines are needed after the strbuf
|
||||
ends. It is better than directly calling `graph_show_strbuf()` followed by
|
||||
`graph_show_remainder()` since it properly handles buffers that do not end in
|
||||
a terminating newline. The output printed by `graph_show_commit_msg()` will
|
||||
end in a newline if and only if the strbuf ends in a newline.
|
||||
|
||||
Data structure
|
||||
--------------
|
||||
`struct git_graph` is an opaque data type used to store the current graph
|
||||
state.
|
||||
|
||||
Calling sequence
|
||||
----------------
|
||||
|
||||
* Create a `struct git_graph` by calling `graph_init()`. When using the
|
||||
revision walking API, this is done automatically by `setup_revisions()` if
|
||||
the '--graph' option is supplied.
|
||||
|
||||
* Use the revision walking API to walk through a group of contiguous commits.
|
||||
The `get_revision()` function automatically calls `graph_update()` each time
|
||||
it is invoked.
|
||||
|
||||
* For each commit, call `graph_next_line()` repeatedly, until
|
||||
`graph_is_commit_finished()` returns non-zero. Each call to
|
||||
`graph_next_line()` will output a single line of the graph. The resulting
|
||||
lines will not contain any newlines. `graph_next_line()` returns 1 if the
|
||||
resulting line contains the current commit, or 0 if this is merely a line
|
||||
needed to adjust the graph before or after the current commit. This return
|
||||
value can be used to determine where to print the commit summary information
|
||||
alongside the graph output.
|
||||
|
||||
Limitations
|
||||
-----------
|
||||
|
||||
* `graph_update()` must be called with commits in topological order. It should
|
||||
not be called on a commit if it has already been invoked with an ancestor of
|
||||
that commit, or the graph output will be incorrect.
|
||||
|
||||
* `graph_update()` must be called on a contiguous group of commits. If
|
||||
`graph_update()` is called on a particular commit, it should later be called
|
||||
on all parents of that commit. Parents must not be skipped, or the graph
|
||||
output will appear incorrect.
|
||||
+
|
||||
`graph_update()` may be used on a pruned set of commits only if the parent list
|
||||
has been rewritten so as to include only ancestors from the pruned set.
|
||||
|
||||
* The graph API does not currently support reverse commit ordering. In
|
||||
order to implement reverse ordering, the graphing API needs an
|
||||
(efficient) mechanism to find the children of a commit.
|
||||
|
||||
Sample usage
|
||||
------------
|
||||
|
||||
------------
|
||||
struct commit *commit;
|
||||
struct git_graph *graph = graph_init(opts);
|
||||
|
||||
while ((commit = get_revision(opts)) != NULL) {
|
||||
while (!graph_is_commit_finished(graph))
|
||||
{
|
||||
struct strbuf sb;
|
||||
int is_commit_line;
|
||||
|
||||
strbuf_init(&sb, 0);
|
||||
is_commit_line = graph_next_line(graph, &sb);
|
||||
fputs(sb.buf, stdout);
|
||||
|
||||
if (is_commit_line)
|
||||
log_tree_commit(opts, commit);
|
||||
else
|
||||
putchar(opts->diffopt.line_termination);
|
||||
}
|
||||
}
|
||||
------------
|
||||
|
||||
Sample output
|
||||
-------------
|
||||
|
||||
The following is an example of the output from the graph API. This output does
|
||||
not include any commit summary information--callers are responsible for
|
||||
outputting that information, if desired.
|
||||
|
||||
------------
|
||||
*
|
||||
*
|
||||
*
|
||||
|\
|
||||
* |
|
||||
| | *
|
||||
| \ \
|
||||
| \ \
|
||||
*-. \ \
|
||||
|\ \ \ \
|
||||
| | * | |
|
||||
| | | | | *
|
||||
| | | | | *
|
||||
| | | | | *
|
||||
| | | | | |\
|
||||
| | | | | | *
|
||||
| * | | | | |
|
||||
| | | | | * \
|
||||
| | | | | |\ |
|
||||
| | | | * | | |
|
||||
| | | | * | | |
|
||||
* | | | | | | |
|
||||
| |/ / / / / /
|
||||
|/| / / / / /
|
||||
* | | | | | |
|
||||
|/ / / / / /
|
||||
* | | | | |
|
||||
| | | | | *
|
||||
| | | | |/
|
||||
| | | | *
|
||||
------------
|
||||
13
Documentation/technical/api-index-skel.txt
Normal file
13
Documentation/technical/api-index-skel.txt
Normal file
|
|
@ -0,0 +1,13 @@
|
|||
Git API Documents
|
||||
=================
|
||||
|
||||
Git has grown a set of internal API over time. This collection
|
||||
documents them.
|
||||
|
||||
////////////////////////////////////////////////////////////////
|
||||
// table of contents begin
|
||||
////////////////////////////////////////////////////////////////
|
||||
|
||||
////////////////////////////////////////////////////////////////
|
||||
// table of contents end
|
||||
////////////////////////////////////////////////////////////////
|
||||
28
Documentation/technical/api-index.sh
Executable file
28
Documentation/technical/api-index.sh
Executable file
|
|
@ -0,0 +1,28 @@
|
|||
#!/bin/sh
|
||||
|
||||
(
|
||||
c=////////////////////////////////////////////////////////////////
|
||||
skel=api-index-skel.txt
|
||||
sed -e '/^\/\/ table of contents begin/q' "$skel"
|
||||
echo "$c"
|
||||
|
||||
ls api-*.txt |
|
||||
while read filename
|
||||
do
|
||||
case "$filename" in
|
||||
api-index-skel.txt | api-index.txt) continue ;;
|
||||
esac
|
||||
title=$(sed -e 1q "$filename")
|
||||
html=${filename%.txt}.html
|
||||
echo "* link:$html[$title]"
|
||||
done
|
||||
echo "$c"
|
||||
sed -n -e '/^\/\/ table of contents end/,$p' "$skel"
|
||||
) >api-index.txt+
|
||||
|
||||
if test -f api-index.txt && cmp api-index.txt api-index.txt+ >/dev/null
|
||||
then
|
||||
rm -f api-index.txt+
|
||||
else
|
||||
mv api-index.txt+ api-index.txt
|
||||
fi
|
||||
104
Documentation/technical/api-merge.txt
Normal file
104
Documentation/technical/api-merge.txt
Normal file
|
|
@ -0,0 +1,104 @@
|
|||
merge API
|
||||
=========
|
||||
|
||||
The merge API helps a program to reconcile two competing sets of
|
||||
improvements to some files (e.g., unregistered changes from the work
|
||||
tree versus changes involved in switching to a new branch), reporting
|
||||
conflicts if found. The library called through this API is
|
||||
responsible for a few things.
|
||||
|
||||
* determining which trees to merge (recursive ancestor consolidation);
|
||||
|
||||
* lining up corresponding files in the trees to be merged (rename
|
||||
detection, subtree shifting), reporting edge cases like add/add
|
||||
and rename/rename conflicts to the user;
|
||||
|
||||
* performing a three-way merge of corresponding files, taking
|
||||
path-specific merge drivers (specified in `.gitattributes`)
|
||||
into account.
|
||||
|
||||
Data structures
|
||||
---------------
|
||||
|
||||
* `mmbuffer_t`, `mmfile_t`
|
||||
|
||||
These store data usable for use by the xdiff backend, for writing and
|
||||
for reading, respectively. See `xdiff/xdiff.h` for the definitions
|
||||
and `diff.c` for examples.
|
||||
|
||||
* `struct ll_merge_options`
|
||||
|
||||
This describes the set of options the calling program wants to affect
|
||||
the operation of a low-level (single file) merge. Some options:
|
||||
|
||||
`virtual_ancestor`::
|
||||
Behave as though this were part of a merge between common
|
||||
ancestors in a recursive merge.
|
||||
If a helper program is specified by the
|
||||
`[merge "<driver>"] recursive` configuration, it will
|
||||
be used (see linkgit:gitattributes[5]).
|
||||
|
||||
`variant`::
|
||||
Resolve local conflicts automatically in favor
|
||||
of one side or the other (as in 'git merge-file'
|
||||
`--ours`/`--theirs`/`--union`). Can be `0`,
|
||||
`XDL_MERGE_FAVOR_OURS`, `XDL_MERGE_FAVOR_THEIRS`, or
|
||||
`XDL_MERGE_FAVOR_UNION`.
|
||||
|
||||
`renormalize`::
|
||||
Resmudge and clean the "base", "theirs" and "ours" files
|
||||
before merging. Use this when the merge is likely to have
|
||||
overlapped with a change in smudge/clean or end-of-line
|
||||
normalization rules.
|
||||
|
||||
Low-level (single file) merge
|
||||
-----------------------------
|
||||
|
||||
`ll_merge`::
|
||||
|
||||
Perform a three-way single-file merge in core. This is
|
||||
a thin wrapper around `xdl_merge` that takes the path and
|
||||
any merge backend specified in `.gitattributes` or
|
||||
`.git/info/attributes` into account. Returns 0 for a
|
||||
clean merge.
|
||||
|
||||
Calling sequence:
|
||||
|
||||
* Prepare a `struct ll_merge_options` to record options.
|
||||
If you have no special requests, skip this and pass `NULL`
|
||||
as the `opts` parameter to use the default options.
|
||||
|
||||
* Allocate an mmbuffer_t variable for the result.
|
||||
|
||||
* Allocate and fill variables with the file's original content
|
||||
and two modified versions (using `read_mmfile`, for example).
|
||||
|
||||
* Call `ll_merge()`.
|
||||
|
||||
* Read the merged content from `result_buf.ptr` and `result_buf.size`.
|
||||
|
||||
* Release buffers when finished. A simple
|
||||
`free(ancestor.ptr); free(ours.ptr); free(theirs.ptr);
|
||||
free(result_buf.ptr);` will do.
|
||||
|
||||
If the modifications do not merge cleanly, `ll_merge` will return a
|
||||
nonzero value and `result_buf` will generally include a description of
|
||||
the conflict bracketed by markers such as the traditional `<<<<<<<`
|
||||
and `>>>>>>>`.
|
||||
|
||||
The `ancestor_label`, `our_label`, and `their_label` parameters are
|
||||
used to label the different sides of a conflict if the merge driver
|
||||
supports this.
|
||||
|
||||
Everything else
|
||||
---------------
|
||||
|
||||
Talk about <merge-recursive.h> and merge_file():
|
||||
|
||||
- merge_trees() to merge with rename detection
|
||||
- merge_recursive() for ancestor consolidation
|
||||
- try_merge_command() for other strategies
|
||||
- conflict format
|
||||
- merge options
|
||||
|
||||
(Daniel, Miklos, Stephan, JC)
|
||||
15
Documentation/technical/api-object-access.txt
Normal file
15
Documentation/technical/api-object-access.txt
Normal file
|
|
@ -0,0 +1,15 @@
|
|||
object access API
|
||||
=================
|
||||
|
||||
Talk about <sha1-file.c> and <object.h> family, things like
|
||||
|
||||
* read_sha1_file()
|
||||
* read_object_with_reference()
|
||||
* has_sha1_file()
|
||||
* write_sha1_file()
|
||||
* pretend_object_file()
|
||||
* lookup_{object,commit,tag,blob,tree}
|
||||
* parse_{object,commit,tag,blob,tree}
|
||||
* Use of object flags
|
||||
|
||||
(JC, Shawn, Daniel, Dscho, Linus)
|
||||
90
Documentation/technical/api-oid-array.txt
Normal file
90
Documentation/technical/api-oid-array.txt
Normal file
|
|
@ -0,0 +1,90 @@
|
|||
oid-array API
|
||||
==============
|
||||
|
||||
The oid-array API provides storage and manipulation of sets of object
|
||||
identifiers. The emphasis is on storage and processing efficiency,
|
||||
making them suitable for large lists. Note that the ordering of items is
|
||||
not preserved over some operations.
|
||||
|
||||
Data Structures
|
||||
---------------
|
||||
|
||||
`struct oid_array`::
|
||||
|
||||
A single array of object IDs. This should be initialized by
|
||||
assignment from `OID_ARRAY_INIT`. The `oid` member contains
|
||||
the actual data. The `nr` member contains the number of items in
|
||||
the set. The `alloc` and `sorted` members are used internally,
|
||||
and should not be needed by API callers.
|
||||
|
||||
Functions
|
||||
---------
|
||||
|
||||
`oid_array_append`::
|
||||
Add an item to the set. The object ID will be placed at the end of
|
||||
the array (but note that some operations below may lose this
|
||||
ordering).
|
||||
|
||||
`oid_array_lookup`::
|
||||
Perform a binary search of the array for a specific object ID.
|
||||
If found, returns the offset (in number of elements) of the
|
||||
object ID. If not found, returns a negative integer. If the array
|
||||
is not sorted, this function has the side effect of sorting it.
|
||||
|
||||
`oid_array_clear`::
|
||||
Free all memory associated with the array and return it to the
|
||||
initial, empty state.
|
||||
|
||||
`oid_array_for_each`::
|
||||
Iterate over each element of the list, executing the callback
|
||||
function for each one. Does not sort the list, so any custom
|
||||
hash order is retained. If the callback returns a non-zero
|
||||
value, the iteration ends immediately and the callback's
|
||||
return is propagated; otherwise, 0 is returned.
|
||||
|
||||
`oid_array_for_each_unique`::
|
||||
Iterate over each unique element of the list in sorted order,
|
||||
but otherwise behave like `oid_array_for_each`. If the array
|
||||
is not sorted, this function has the side effect of sorting
|
||||
it.
|
||||
|
||||
`oid_array_filter`::
|
||||
Apply the callback function `want` to each entry in the array,
|
||||
retaining only the entries for which the function returns true.
|
||||
Preserve the order of the entries that are retained.
|
||||
|
||||
Examples
|
||||
--------
|
||||
|
||||
-----------------------------------------
|
||||
int print_callback(const struct object_id *oid,
|
||||
void *data)
|
||||
{
|
||||
printf("%s\n", oid_to_hex(oid));
|
||||
return 0; /* always continue */
|
||||
}
|
||||
|
||||
void some_func(void)
|
||||
{
|
||||
struct sha1_array hashes = OID_ARRAY_INIT;
|
||||
struct object_id oid;
|
||||
|
||||
/* Read objects into our set */
|
||||
while (read_object_from_stdin(oid.hash))
|
||||
oid_array_append(&hashes, &oid);
|
||||
|
||||
/* Check if some objects are in our set */
|
||||
while (read_object_from_stdin(oid.hash)) {
|
||||
if (oid_array_lookup(&hashes, &oid) >= 0)
|
||||
printf("it's in there!\n");
|
||||
|
||||
/*
|
||||
* Print the unique set of objects. We could also have
|
||||
* avoided adding duplicate objects in the first place,
|
||||
* but we would end up re-sorting the array repeatedly.
|
||||
* Instead, this will sort once and then skip duplicates
|
||||
* in linear time.
|
||||
*/
|
||||
oid_array_for_each_unique(&hashes, print_callback, NULL);
|
||||
}
|
||||
-----------------------------------------
|
||||
313
Documentation/technical/api-parse-options.txt
Normal file
313
Documentation/technical/api-parse-options.txt
Normal file
|
|
@ -0,0 +1,313 @@
|
|||
parse-options API
|
||||
=================
|
||||
|
||||
The parse-options API is used to parse and massage options in Git
|
||||
and to provide a usage help with consistent look.
|
||||
|
||||
Basics
|
||||
------
|
||||
|
||||
The argument vector `argv[]` may usually contain mandatory or optional
|
||||
'non-option arguments', e.g. a filename or a branch, and 'options'.
|
||||
Options are optional arguments that start with a dash and
|
||||
that allow to change the behavior of a command.
|
||||
|
||||
* There are basically three types of options:
|
||||
'boolean' options,
|
||||
options with (mandatory) 'arguments' and
|
||||
options with 'optional arguments'
|
||||
(i.e. a boolean option that can be adjusted).
|
||||
|
||||
* There are basically two forms of options:
|
||||
'Short options' consist of one dash (`-`) and one alphanumeric
|
||||
character.
|
||||
'Long options' begin with two dashes (`--`) and some
|
||||
alphanumeric characters.
|
||||
|
||||
* Options are case-sensitive.
|
||||
Please define 'lower-case long options' only.
|
||||
|
||||
The parse-options API allows:
|
||||
|
||||
* 'stuck' and 'separate form' of options with arguments.
|
||||
`-oArg` is stuck, `-o Arg` is separate form.
|
||||
`--option=Arg` is stuck, `--option Arg` is separate form.
|
||||
|
||||
* Long options may be 'abbreviated', as long as the abbreviation
|
||||
is unambiguous.
|
||||
|
||||
* Short options may be bundled, e.g. `-a -b` can be specified as `-ab`.
|
||||
|
||||
* Boolean long options can be 'negated' (or 'unset') by prepending
|
||||
`no-`, e.g. `--no-abbrev` instead of `--abbrev`. Conversely,
|
||||
options that begin with `no-` can be 'negated' by removing it.
|
||||
Other long options can be unset (e.g., set string to NULL, set
|
||||
integer to 0) by prepending `no-`.
|
||||
|
||||
* Options and non-option arguments can clearly be separated using the `--`
|
||||
option, e.g. `-a -b --option -- --this-is-a-file` indicates that
|
||||
`--this-is-a-file` must not be processed as an option.
|
||||
|
||||
Steps to parse options
|
||||
----------------------
|
||||
|
||||
. `#include "parse-options.h"`
|
||||
|
||||
. define a NULL-terminated
|
||||
`static const char * const builtin_foo_usage[]` array
|
||||
containing alternative usage strings
|
||||
|
||||
. define `builtin_foo_options` array as described below
|
||||
in section 'Data Structure'.
|
||||
|
||||
. in `cmd_foo(int argc, const char **argv, const char *prefix)`
|
||||
call
|
||||
|
||||
argc = parse_options(argc, argv, prefix, builtin_foo_options, builtin_foo_usage, flags);
|
||||
+
|
||||
`parse_options()` will filter out the processed options of `argv[]` and leave the
|
||||
non-option arguments in `argv[]`.
|
||||
`argc` is updated appropriately because of the assignment.
|
||||
+
|
||||
You can also pass NULL instead of a usage array as the fifth parameter of
|
||||
parse_options(), to avoid displaying a help screen with usage info and
|
||||
option list. This should only be done if necessary, e.g. to implement
|
||||
a limited parser for only a subset of the options that needs to be run
|
||||
before the full parser, which in turn shows the full help message.
|
||||
+
|
||||
Flags are the bitwise-or of:
|
||||
|
||||
`PARSE_OPT_KEEP_DASHDASH`::
|
||||
Keep the `--` that usually separates options from
|
||||
non-option arguments.
|
||||
|
||||
`PARSE_OPT_STOP_AT_NON_OPTION`::
|
||||
Usually the whole argument vector is massaged and reordered.
|
||||
Using this flag, processing is stopped at the first non-option
|
||||
argument.
|
||||
|
||||
`PARSE_OPT_KEEP_ARGV0`::
|
||||
Keep the first argument, which contains the program name. It's
|
||||
removed from argv[] by default.
|
||||
|
||||
`PARSE_OPT_KEEP_UNKNOWN`::
|
||||
Keep unknown arguments instead of erroring out. This doesn't
|
||||
work for all combinations of arguments as users might expect
|
||||
it to do. E.g. if the first argument in `--unknown --known`
|
||||
takes a value (which we can't know), the second one is
|
||||
mistakenly interpreted as a known option. Similarly, if
|
||||
`PARSE_OPT_STOP_AT_NON_OPTION` is set, the second argument in
|
||||
`--unknown value` will be mistakenly interpreted as a
|
||||
non-option, not as a value belonging to the unknown option,
|
||||
the parser early. That's why parse_options() errors out if
|
||||
both options are set.
|
||||
|
||||
`PARSE_OPT_NO_INTERNAL_HELP`::
|
||||
By default, parse_options() handles `-h`, `--help` and
|
||||
`--help-all` internally, by showing a help screen. This option
|
||||
turns it off and allows one to add custom handlers for these
|
||||
options, or to just leave them unknown.
|
||||
|
||||
Data Structure
|
||||
--------------
|
||||
|
||||
The main data structure is an array of the `option` struct,
|
||||
say `static struct option builtin_add_options[]`.
|
||||
There are some macros to easily define options:
|
||||
|
||||
`OPT__ABBREV(&int_var)`::
|
||||
Add `--abbrev[=<n>]`.
|
||||
|
||||
`OPT__COLOR(&int_var, description)`::
|
||||
Add `--color[=<when>]` and `--no-color`.
|
||||
|
||||
`OPT__DRY_RUN(&int_var, description)`::
|
||||
Add `-n, --dry-run`.
|
||||
|
||||
`OPT__FORCE(&int_var, description)`::
|
||||
Add `-f, --force`.
|
||||
|
||||
`OPT__QUIET(&int_var, description)`::
|
||||
Add `-q, --quiet`.
|
||||
|
||||
`OPT__VERBOSE(&int_var, description)`::
|
||||
Add `-v, --verbose`.
|
||||
|
||||
`OPT_GROUP(description)`::
|
||||
Start an option group. `description` is a short string that
|
||||
describes the group or an empty string.
|
||||
Start the description with an upper-case letter.
|
||||
|
||||
`OPT_BOOL(short, long, &int_var, description)`::
|
||||
Introduce a boolean option. `int_var` is set to one with
|
||||
`--option` and set to zero with `--no-option`.
|
||||
|
||||
`OPT_COUNTUP(short, long, &int_var, description)`::
|
||||
Introduce a count-up option.
|
||||
Each use of `--option` increments `int_var`, starting from zero
|
||||
(even if initially negative), and `--no-option` resets it to
|
||||
zero. To determine if `--option` or `--no-option` was encountered at
|
||||
all, initialize `int_var` to a negative value, and if it is still
|
||||
negative after parse_options(), then neither `--option` nor
|
||||
`--no-option` was seen.
|
||||
|
||||
`OPT_BIT(short, long, &int_var, description, mask)`::
|
||||
Introduce a boolean option.
|
||||
If used, `int_var` is bitwise-ored with `mask`.
|
||||
|
||||
`OPT_NEGBIT(short, long, &int_var, description, mask)`::
|
||||
Introduce a boolean option.
|
||||
If used, `int_var` is bitwise-anded with the inverted `mask`.
|
||||
|
||||
`OPT_SET_INT(short, long, &int_var, description, integer)`::
|
||||
Introduce an integer option.
|
||||
`int_var` is set to `integer` with `--option`, and
|
||||
reset to zero with `--no-option`.
|
||||
|
||||
`OPT_STRING(short, long, &str_var, arg_str, description)`::
|
||||
Introduce an option with string argument.
|
||||
The string argument is put into `str_var`.
|
||||
|
||||
`OPT_STRING_LIST(short, long, &struct string_list, arg_str, description)`::
|
||||
Introduce an option with string argument.
|
||||
The string argument is stored as an element in `string_list`.
|
||||
Use of `--no-option` will clear the list of preceding values.
|
||||
|
||||
`OPT_INTEGER(short, long, &int_var, description)`::
|
||||
Introduce an option with integer argument.
|
||||
The integer is put into `int_var`.
|
||||
|
||||
`OPT_MAGNITUDE(short, long, &unsigned_long_var, description)`::
|
||||
Introduce an option with a size argument. The argument must be a
|
||||
non-negative integer and may include a suffix of 'k', 'm' or 'g' to
|
||||
scale the provided value by 1024, 1024^2 or 1024^3 respectively.
|
||||
The scaled value is put into `unsigned_long_var`.
|
||||
|
||||
`OPT_EXPIRY_DATE(short, long, ×tamp_t_var, description)`::
|
||||
Introduce an option with expiry date argument, see `parse_expiry_date()`.
|
||||
The timestamp is put into `timestamp_t_var`.
|
||||
|
||||
`OPT_CALLBACK(short, long, &var, arg_str, description, func_ptr)`::
|
||||
Introduce an option with argument.
|
||||
The argument will be fed into the function given by `func_ptr`
|
||||
and the result will be put into `var`.
|
||||
See 'Option Callbacks' below for a more elaborate description.
|
||||
|
||||
`OPT_FILENAME(short, long, &var, description)`::
|
||||
Introduce an option with a filename argument.
|
||||
The filename will be prefixed by passing the filename along with
|
||||
the prefix argument of `parse_options()` to `prefix_filename()`.
|
||||
|
||||
`OPT_ARGUMENT(long, &int_var, description)`::
|
||||
Introduce a long-option argument that will be kept in `argv[]`.
|
||||
If this option was seen, `int_var` will be set to one (except
|
||||
if a `NULL` pointer was passed).
|
||||
|
||||
`OPT_NUMBER_CALLBACK(&var, description, func_ptr)`::
|
||||
Recognize numerical options like -123 and feed the integer as
|
||||
if it was an argument to the function given by `func_ptr`.
|
||||
The result will be put into `var`. There can be only one such
|
||||
option definition. It cannot be negated and it takes no
|
||||
arguments. Short options that happen to be digits take
|
||||
precedence over it.
|
||||
|
||||
`OPT_COLOR_FLAG(short, long, &int_var, description)`::
|
||||
Introduce an option that takes an optional argument that can
|
||||
have one of three values: "always", "never", or "auto". If the
|
||||
argument is not given, it defaults to "always". The `--no-` form
|
||||
works like `--long=never`; it cannot take an argument. If
|
||||
"always", set `int_var` to 1; if "never", set `int_var` to 0; if
|
||||
"auto", set `int_var` to 1 if stdout is a tty or a pager,
|
||||
0 otherwise.
|
||||
|
||||
`OPT_NOOP_NOARG(short, long)`::
|
||||
Introduce an option that has no effect and takes no arguments.
|
||||
Use it to hide deprecated options that are still to be recognized
|
||||
and ignored silently.
|
||||
|
||||
`OPT_PASSTHRU(short, long, &char_var, arg_str, description, flags)`::
|
||||
Introduce an option that will be reconstructed into a char* string,
|
||||
which must be initialized to NULL. This is useful when you need to
|
||||
pass the command-line option to another command. Any previous value
|
||||
will be overwritten, so this should only be used for options where
|
||||
the last one specified on the command line wins.
|
||||
|
||||
`OPT_PASSTHRU_ARGV(short, long, &argv_array_var, arg_str, description, flags)`::
|
||||
Introduce an option where all instances of it on the command-line will
|
||||
be reconstructed into an argv_array. This is useful when you need to
|
||||
pass the command-line option, which can be specified multiple times,
|
||||
to another command.
|
||||
|
||||
`OPT_CMDMODE(short, long, &int_var, description, enum_val)`::
|
||||
Define an "operation mode" option, only one of which in the same
|
||||
group of "operating mode" options that share the same `int_var`
|
||||
can be given by the user. `enum_val` is set to `int_var` when the
|
||||
option is used, but an error is reported if other "operating mode"
|
||||
option has already set its value to the same `int_var`.
|
||||
|
||||
|
||||
The last element of the array must be `OPT_END()`.
|
||||
|
||||
If not stated otherwise, interpret the arguments as follows:
|
||||
|
||||
* `short` is a character for the short option
|
||||
(e.g. `'e'` for `-e`, use `0` to omit),
|
||||
|
||||
* `long` is a string for the long option
|
||||
(e.g. `"example"` for `--example`, use `NULL` to omit),
|
||||
|
||||
* `int_var` is an integer variable,
|
||||
|
||||
* `str_var` is a string variable (`char *`),
|
||||
|
||||
* `arg_str` is the string that is shown as argument
|
||||
(e.g. `"branch"` will result in `<branch>`).
|
||||
If set to `NULL`, three dots (`...`) will be displayed.
|
||||
|
||||
* `description` is a short string to describe the effect of the option.
|
||||
It shall begin with a lower-case letter and a full stop (`.`) shall be
|
||||
omitted at the end.
|
||||
|
||||
Option Callbacks
|
||||
----------------
|
||||
|
||||
The function must be defined in this form:
|
||||
|
||||
int func(const struct option *opt, const char *arg, int unset)
|
||||
|
||||
The callback mechanism is as follows:
|
||||
|
||||
* Inside `func`, the only interesting member of the structure
|
||||
given by `opt` is the void pointer `opt->value`.
|
||||
`*opt->value` will be the value that is saved into `var`, if you
|
||||
use `OPT_CALLBACK()`.
|
||||
For example, do `*(unsigned long *)opt->value = 42;` to get 42
|
||||
into an `unsigned long` variable.
|
||||
|
||||
* Return value `0` indicates success and non-zero return
|
||||
value will invoke `usage_with_options()` and, thus, die.
|
||||
|
||||
* If the user negates the option, `arg` is `NULL` and `unset` is 1.
|
||||
|
||||
Sophisticated option parsing
|
||||
----------------------------
|
||||
|
||||
If you need, for example, option callbacks with optional arguments
|
||||
or without arguments at all, or if you need other special cases,
|
||||
that are not handled by the macros above, you need to specify the
|
||||
members of the `option` structure manually.
|
||||
|
||||
This is not covered in this document, but well documented
|
||||
in `parse-options.h` itself.
|
||||
|
||||
Examples
|
||||
--------
|
||||
|
||||
See `test-parse-options.c` and
|
||||
`builtin/add.c`,
|
||||
`builtin/clone.c`,
|
||||
`builtin/commit.c`,
|
||||
`builtin/fetch.c`,
|
||||
`builtin/fsck.c`,
|
||||
`builtin/rm.c`
|
||||
for real-world examples.
|
||||
10
Documentation/technical/api-quote.txt
Normal file
10
Documentation/technical/api-quote.txt
Normal file
|
|
@ -0,0 +1,10 @@
|
|||
quote API
|
||||
=========
|
||||
|
||||
Talk about <quote.h>, things like
|
||||
|
||||
* sq_quote and unquote
|
||||
* c_style quote and unquote
|
||||
* quoting for foreign languages
|
||||
|
||||
(JC)
|
||||
78
Documentation/technical/api-ref-iteration.txt
Normal file
78
Documentation/technical/api-ref-iteration.txt
Normal file
|
|
@ -0,0 +1,78 @@
|
|||
ref iteration API
|
||||
=================
|
||||
|
||||
|
||||
Iteration of refs is done by using an iterate function which will call a
|
||||
callback function for every ref. The callback function has this
|
||||
signature:
|
||||
|
||||
int handle_one_ref(const char *refname, const struct object_id *oid,
|
||||
int flags, void *cb_data);
|
||||
|
||||
There are different kinds of iterate functions which all take a
|
||||
callback of this type. The callback is then called for each found ref
|
||||
until the callback returns nonzero. The returned value is then also
|
||||
returned by the iterate function.
|
||||
|
||||
Iteration functions
|
||||
-------------------
|
||||
|
||||
* `head_ref()` just iterates the head ref.
|
||||
|
||||
* `for_each_ref()` iterates all refs.
|
||||
|
||||
* `for_each_ref_in()` iterates all refs which have a defined prefix and
|
||||
strips that prefix from the passed variable refname.
|
||||
|
||||
* `for_each_tag_ref()`, `for_each_branch_ref()`, `for_each_remote_ref()`,
|
||||
`for_each_replace_ref()` iterate refs from the respective area.
|
||||
|
||||
* `for_each_glob_ref()` iterates all refs that match the specified glob
|
||||
pattern.
|
||||
|
||||
* `for_each_glob_ref_in()` the previous and `for_each_ref_in()` combined.
|
||||
|
||||
* Use `refs_` API for accessing submodules. The submodule ref store could
|
||||
be obtained with `get_submodule_ref_store()`.
|
||||
|
||||
* `for_each_rawref()` can be used to learn about broken ref and symref.
|
||||
|
||||
* `for_each_reflog()` iterates each reflog file.
|
||||
|
||||
Submodules
|
||||
----------
|
||||
|
||||
If you want to iterate the refs of a submodule you first need to add the
|
||||
submodules object database. You can do this by a code-snippet like
|
||||
this:
|
||||
|
||||
const char *path = "path/to/submodule"
|
||||
if (add_submodule_odb(path))
|
||||
die("Error submodule '%s' not populated.", path);
|
||||
|
||||
`add_submodule_odb()` will return zero on success. If you
|
||||
do not do this you will get an error for each ref that it does not point
|
||||
to a valid object.
|
||||
|
||||
Note: As a side-effect of this you cannot safely assume that all
|
||||
objects you lookup are available in superproject. All submodule objects
|
||||
will be available the same way as the superprojects objects.
|
||||
|
||||
Example:
|
||||
--------
|
||||
|
||||
----
|
||||
static int handle_remote_ref(const char *refname,
|
||||
const unsigned char *sha1, int flags, void *cb_data)
|
||||
{
|
||||
struct strbuf *output = cb_data;
|
||||
strbuf_addf(output, "%s\n", refname);
|
||||
return 0;
|
||||
}
|
||||
|
||||
...
|
||||
|
||||
struct strbuf output = STRBUF_INIT;
|
||||
for_each_remote_ref(handle_remote_ref, &output);
|
||||
printf("%s", output.buf);
|
||||
----
|
||||
127
Documentation/technical/api-remote.txt
Normal file
127
Documentation/technical/api-remote.txt
Normal file
|
|
@ -0,0 +1,127 @@
|
|||
Remotes configuration API
|
||||
=========================
|
||||
|
||||
The API in remote.h gives access to the configuration related to
|
||||
remotes. It handles all three configuration mechanisms historically
|
||||
and currently used by Git, and presents the information in a uniform
|
||||
fashion. Note that the code also handles plain URLs without any
|
||||
configuration, giving them just the default information.
|
||||
|
||||
struct remote
|
||||
-------------
|
||||
|
||||
`name`::
|
||||
|
||||
The user's nickname for the remote
|
||||
|
||||
`url`::
|
||||
|
||||
An array of all of the url_nr URLs configured for the remote
|
||||
|
||||
`pushurl`::
|
||||
|
||||
An array of all of the pushurl_nr push URLs configured for the remote
|
||||
|
||||
`push`::
|
||||
|
||||
An array of refspecs configured for pushing, with
|
||||
push_refspec being the literal strings, and push_refspec_nr
|
||||
being the quantity.
|
||||
|
||||
`fetch`::
|
||||
|
||||
An array of refspecs configured for fetching, with
|
||||
fetch_refspec being the literal strings, and fetch_refspec_nr
|
||||
being the quantity.
|
||||
|
||||
`fetch_tags`::
|
||||
|
||||
The setting for whether to fetch tags (as a separate rule from
|
||||
the configured refspecs); -1 means never to fetch tags, 0
|
||||
means to auto-follow tags based on the default heuristic, 1
|
||||
means to always auto-follow tags, and 2 means to fetch all
|
||||
tags.
|
||||
|
||||
`receivepack`, `uploadpack`::
|
||||
|
||||
The configured helper programs to run on the remote side, for
|
||||
Git-native protocols.
|
||||
|
||||
`http_proxy`::
|
||||
|
||||
The proxy to use for curl (http, https, ftp, etc.) URLs.
|
||||
|
||||
`http_proxy_authmethod`::
|
||||
|
||||
The method used for authenticating against `http_proxy`.
|
||||
|
||||
struct remotes can be found by name with remote_get(), and iterated
|
||||
through with for_each_remote(). remote_get(NULL) will return the
|
||||
default remote, given the current branch and configuration.
|
||||
|
||||
struct refspec
|
||||
--------------
|
||||
|
||||
A struct refspec holds the parsed interpretation of a refspec. If it
|
||||
will force updates (starts with a '+'), force is true. If it is a
|
||||
pattern (sides end with '*') pattern is true. src and dest are the
|
||||
two sides (including '*' characters if present); if there is only one
|
||||
side, it is src, and dst is NULL; if sides exist but are empty (i.e.,
|
||||
the refspec either starts or ends with ':'), the corresponding side is
|
||||
"".
|
||||
|
||||
An array of strings can be parsed into an array of struct refspecs
|
||||
using parse_fetch_refspec() or parse_push_refspec().
|
||||
|
||||
remote_find_tracking(), given a remote and a struct refspec with
|
||||
either src or dst filled out, will fill out the other such that the
|
||||
result is in the "fetch" specification for the remote (note that this
|
||||
evaluates patterns and returns a single result).
|
||||
|
||||
struct branch
|
||||
-------------
|
||||
|
||||
Note that this may end up moving to branch.h
|
||||
|
||||
struct branch holds the configuration for a branch. It can be looked
|
||||
up with branch_get(name) for "refs/heads/{name}", or with
|
||||
branch_get(NULL) for HEAD.
|
||||
|
||||
It contains:
|
||||
|
||||
`name`::
|
||||
|
||||
The short name of the branch.
|
||||
|
||||
`refname`::
|
||||
|
||||
The full path for the branch ref.
|
||||
|
||||
`remote_name`::
|
||||
|
||||
The name of the remote listed in the configuration.
|
||||
|
||||
`merge_name`::
|
||||
|
||||
An array of the "merge" lines in the configuration.
|
||||
|
||||
`merge`::
|
||||
|
||||
An array of the struct refspecs used for the merge lines. That
|
||||
is, merge[i]->dst is a local tracking ref which should be
|
||||
merged into this branch by default.
|
||||
|
||||
`merge_nr`::
|
||||
|
||||
The number of merge configurations
|
||||
|
||||
branch_has_merge_config() returns true if the given branch has merge
|
||||
configuration given.
|
||||
|
||||
Other stuff
|
||||
-----------
|
||||
|
||||
There is other stuff in remote.h that is related, in general, to the
|
||||
process of interacting with remotes.
|
||||
|
||||
(Daniel Barkalow)
|
||||
72
Documentation/technical/api-revision-walking.txt
Normal file
72
Documentation/technical/api-revision-walking.txt
Normal file
|
|
@ -0,0 +1,72 @@
|
|||
revision walking API
|
||||
====================
|
||||
|
||||
The revision walking API offers functions to build a list of revisions
|
||||
and then iterate over that list.
|
||||
|
||||
Calling sequence
|
||||
----------------
|
||||
|
||||
The walking API has a given calling sequence: first you need to
|
||||
initialize a rev_info structure, then add revisions to control what kind
|
||||
of revision list do you want to get, finally you can iterate over the
|
||||
revision list.
|
||||
|
||||
Functions
|
||||
---------
|
||||
|
||||
`repo_init_revisions`::
|
||||
|
||||
Initialize a rev_info structure with default values. The third
|
||||
parameter may be NULL or can be prefix path, and then the `.prefix`
|
||||
variable will be set to it. This is typically the first function you
|
||||
want to call when you want to deal with a revision list. After calling
|
||||
this function, you are free to customize options, like set
|
||||
`.ignore_merges` to 0 if you don't want to ignore merges, and so on. See
|
||||
`revision.h` for a complete list of available options.
|
||||
|
||||
`add_pending_object`::
|
||||
|
||||
This function can be used if you want to add commit objects as revision
|
||||
information. You can use the `UNINTERESTING` object flag to indicate if
|
||||
you want to include or exclude the given commit (and commits reachable
|
||||
from the given commit) from the revision list.
|
||||
+
|
||||
NOTE: If you have the commits as a string list then you probably want to
|
||||
use setup_revisions(), instead of parsing each string and using this
|
||||
function.
|
||||
|
||||
`setup_revisions`::
|
||||
|
||||
Parse revision information, filling in the `rev_info` structure, and
|
||||
removing the used arguments from the argument list. Returns the number
|
||||
of arguments left that weren't recognized, which are also moved to the
|
||||
head of the argument list. The last parameter is used in case no
|
||||
parameter given by the first two arguments.
|
||||
|
||||
`prepare_revision_walk`::
|
||||
|
||||
Prepares the rev_info structure for a walk. You should check if it
|
||||
returns any error (non-zero return code) and if it does not, you can
|
||||
start using get_revision() to do the iteration.
|
||||
|
||||
`get_revision`::
|
||||
|
||||
Takes a pointer to a `rev_info` structure and iterates over it,
|
||||
returning a `struct commit *` each time you call it. The end of the
|
||||
revision list is indicated by returning a NULL pointer.
|
||||
|
||||
`reset_revision_walk`::
|
||||
|
||||
Reset the flags used by the revision walking api. You can use
|
||||
this to do multiple sequential revision walks.
|
||||
|
||||
Data structures
|
||||
---------------
|
||||
|
||||
Talk about <revision.h>, things like:
|
||||
|
||||
* two diff_options, one for path limiting, another for output;
|
||||
* remaining functions;
|
||||
|
||||
(Linus, JC, Dscho)
|
||||
264
Documentation/technical/api-run-command.txt
Normal file
264
Documentation/technical/api-run-command.txt
Normal file
|
|
@ -0,0 +1,264 @@
|
|||
run-command API
|
||||
===============
|
||||
|
||||
The run-command API offers a versatile tool to run sub-processes with
|
||||
redirected input and output as well as with a modified environment
|
||||
and an alternate current directory.
|
||||
|
||||
A similar API offers the capability to run a function asynchronously,
|
||||
which is primarily used to capture the output that the function
|
||||
produces in the caller in order to process it.
|
||||
|
||||
|
||||
Functions
|
||||
---------
|
||||
|
||||
`child_process_init`::
|
||||
|
||||
Initialize a struct child_process variable.
|
||||
|
||||
`start_command`::
|
||||
|
||||
Start a sub-process. Takes a pointer to a `struct child_process`
|
||||
that specifies the details and returns pipe FDs (if requested).
|
||||
See below for details.
|
||||
|
||||
`finish_command`::
|
||||
|
||||
Wait for the completion of a sub-process that was started with
|
||||
start_command().
|
||||
|
||||
`run_command`::
|
||||
|
||||
A convenience function that encapsulates a sequence of
|
||||
start_command() followed by finish_command(). Takes a pointer
|
||||
to a `struct child_process` that specifies the details.
|
||||
|
||||
`run_command_v_opt`, `run_command_v_opt_cd_env`::
|
||||
|
||||
Convenience functions that encapsulate a sequence of
|
||||
start_command() followed by finish_command(). The argument argv
|
||||
specifies the program and its arguments. The argument opt is zero
|
||||
or more of the flags `RUN_COMMAND_NO_STDIN`, `RUN_GIT_CMD`,
|
||||
`RUN_COMMAND_STDOUT_TO_STDERR`, or `RUN_SILENT_EXEC_FAILURE`
|
||||
that correspond to the members .no_stdin, .git_cmd,
|
||||
.stdout_to_stderr, .silent_exec_failure of `struct child_process`.
|
||||
The argument dir corresponds the member .dir. The argument env
|
||||
corresponds to the member .env.
|
||||
|
||||
`child_process_clear`::
|
||||
|
||||
Release the memory associated with the struct child_process.
|
||||
Most users of the run-command API don't need to call this
|
||||
function explicitly because `start_command` invokes it on
|
||||
failure and `finish_command` calls it automatically already.
|
||||
|
||||
The functions above do the following:
|
||||
|
||||
. If a system call failed, errno is set and -1 is returned. A diagnostic
|
||||
is printed.
|
||||
|
||||
. If the program was not found, then -1 is returned and errno is set to
|
||||
ENOENT; a diagnostic is printed only if .silent_exec_failure is 0.
|
||||
|
||||
. Otherwise, the program is run. If it terminates regularly, its exit
|
||||
code is returned. No diagnostic is printed, even if the exit code is
|
||||
non-zero.
|
||||
|
||||
. If the program terminated due to a signal, then the return value is the
|
||||
signal number + 128, ie. the same value that a POSIX shell's $? would
|
||||
report. A diagnostic is printed.
|
||||
|
||||
|
||||
`start_async`::
|
||||
|
||||
Run a function asynchronously. Takes a pointer to a `struct
|
||||
async` that specifies the details and returns a set of pipe FDs
|
||||
for communication with the function. See below for details.
|
||||
|
||||
`finish_async`::
|
||||
|
||||
Wait for the completion of an asynchronous function that was
|
||||
started with start_async().
|
||||
|
||||
`run_hook`::
|
||||
|
||||
Run a hook.
|
||||
The first argument is a pathname to an index file, or NULL
|
||||
if the hook uses the default index file or no index is needed.
|
||||
The second argument is the name of the hook.
|
||||
The further arguments correspond to the hook arguments.
|
||||
The last argument has to be NULL to terminate the arguments list.
|
||||
If the hook does not exist or is not executable, the return
|
||||
value will be zero.
|
||||
If it is executable, the hook will be executed and the exit
|
||||
status of the hook is returned.
|
||||
On execution, .stdout_to_stderr and .no_stdin will be set.
|
||||
(See below.)
|
||||
|
||||
|
||||
Data structures
|
||||
---------------
|
||||
|
||||
* `struct child_process`
|
||||
|
||||
This describes the arguments, redirections, and environment of a
|
||||
command to run in a sub-process.
|
||||
|
||||
The caller:
|
||||
|
||||
1. allocates and clears (using child_process_init() or
|
||||
CHILD_PROCESS_INIT) a struct child_process variable;
|
||||
2. initializes the members;
|
||||
3. calls start_command();
|
||||
4. processes the data;
|
||||
5. closes file descriptors (if necessary; see below);
|
||||
6. calls finish_command().
|
||||
|
||||
The .argv member is set up as an array of string pointers (NULL
|
||||
terminated), of which .argv[0] is the program name to run (usually
|
||||
without a path). If the command to run is a git command, set argv[0] to
|
||||
the command name without the 'git-' prefix and set .git_cmd = 1.
|
||||
|
||||
Note that the ownership of the memory pointed to by .argv stays with the
|
||||
caller, but it should survive until `finish_command` completes. If the
|
||||
.argv member is NULL, `start_command` will point it at the .args
|
||||
`argv_array` (so you may use one or the other, but you must use exactly
|
||||
one). The memory in .args will be cleaned up automatically during
|
||||
`finish_command` (or during `start_command` when it is unsuccessful).
|
||||
|
||||
The members .in, .out, .err are used to redirect stdin, stdout,
|
||||
stderr as follows:
|
||||
|
||||
. Specify 0 to request no special redirection. No new file descriptor
|
||||
is allocated. The child process simply inherits the channel from the
|
||||
parent.
|
||||
|
||||
. Specify -1 to have a pipe allocated; start_command() replaces -1
|
||||
by the pipe FD in the following way:
|
||||
|
||||
.in: Returns the writable pipe end into which the caller writes;
|
||||
the readable end of the pipe becomes the child's stdin.
|
||||
|
||||
.out, .err: Returns the readable pipe end from which the caller
|
||||
reads; the writable end of the pipe end becomes child's
|
||||
stdout/stderr.
|
||||
|
||||
The caller of start_command() must close the so returned FDs
|
||||
after it has completed reading from/writing to it!
|
||||
|
||||
. Specify a file descriptor > 0 to be used by the child:
|
||||
|
||||
.in: The FD must be readable; it becomes child's stdin.
|
||||
.out: The FD must be writable; it becomes child's stdout.
|
||||
.err: The FD must be writable; it becomes child's stderr.
|
||||
|
||||
The specified FD is closed by start_command(), even if it fails to
|
||||
run the sub-process!
|
||||
|
||||
. Special forms of redirection are available by setting these members
|
||||
to 1:
|
||||
|
||||
.no_stdin, .no_stdout, .no_stderr: The respective channel is
|
||||
redirected to /dev/null.
|
||||
|
||||
.stdout_to_stderr: stdout of the child is redirected to its
|
||||
stderr. This happens after stderr is itself redirected.
|
||||
So stdout will follow stderr to wherever it is
|
||||
redirected.
|
||||
|
||||
To modify the environment of the sub-process, specify an array of
|
||||
string pointers (NULL terminated) in .env:
|
||||
|
||||
. If the string is of the form "VAR=value", i.e. it contains '='
|
||||
the variable is added to the child process's environment.
|
||||
|
||||
. If the string does not contain '=', it names an environment
|
||||
variable that will be removed from the child process's environment.
|
||||
|
||||
If the .env member is NULL, `start_command` will point it at the
|
||||
.env_array `argv_array` (so you may use one or the other, but not both).
|
||||
The memory in .env_array will be cleaned up automatically during
|
||||
`finish_command` (or during `start_command` when it is unsuccessful).
|
||||
|
||||
To specify a new initial working directory for the sub-process,
|
||||
specify it in the .dir member.
|
||||
|
||||
If the program cannot be found, the functions return -1 and set
|
||||
errno to ENOENT. Normally, an error message is printed, but if
|
||||
.silent_exec_failure is set to 1, no message is printed for this
|
||||
special error condition.
|
||||
|
||||
|
||||
* `struct async`
|
||||
|
||||
This describes a function to run asynchronously, whose purpose is
|
||||
to produce output that the caller reads.
|
||||
|
||||
The caller:
|
||||
|
||||
1. allocates and clears (memset(&asy, 0, sizeof(asy));) a
|
||||
struct async variable;
|
||||
2. initializes .proc and .data;
|
||||
3. calls start_async();
|
||||
4. processes communicates with proc through .in and .out;
|
||||
5. closes .in and .out;
|
||||
6. calls finish_async().
|
||||
|
||||
The members .in, .out are used to provide a set of fd's for
|
||||
communication between the caller and the callee as follows:
|
||||
|
||||
. Specify 0 to have no file descriptor passed. The callee will
|
||||
receive -1 in the corresponding argument.
|
||||
|
||||
. Specify < 0 to have a pipe allocated; start_async() replaces
|
||||
with the pipe FD in the following way:
|
||||
|
||||
.in: Returns the writable pipe end into which the caller
|
||||
writes; the readable end of the pipe becomes the function's
|
||||
in argument.
|
||||
|
||||
.out: Returns the readable pipe end from which the caller
|
||||
reads; the writable end of the pipe becomes the function's
|
||||
out argument.
|
||||
|
||||
The caller of start_async() must close the returned FDs after it
|
||||
has completed reading from/writing from them.
|
||||
|
||||
. Specify a file descriptor > 0 to be used by the function:
|
||||
|
||||
.in: The FD must be readable; it becomes the function's in.
|
||||
.out: The FD must be writable; it becomes the function's out.
|
||||
|
||||
The specified FD is closed by start_async(), even if it fails to
|
||||
run the function.
|
||||
|
||||
The function pointer in .proc has the following signature:
|
||||
|
||||
int proc(int in, int out, void *data);
|
||||
|
||||
. in, out specifies a set of file descriptors to which the function
|
||||
must read/write the data that it needs/produces. The function
|
||||
*must* close these descriptors before it returns. A descriptor
|
||||
may be -1 if the caller did not configure a descriptor for that
|
||||
direction.
|
||||
|
||||
. data is the value that the caller has specified in the .data member
|
||||
of struct async.
|
||||
|
||||
. The return value of the function is 0 on success and non-zero
|
||||
on failure. If the function indicates failure, finish_async() will
|
||||
report failure as well.
|
||||
|
||||
|
||||
There are serious restrictions on what the asynchronous function can do
|
||||
because this facility is implemented by a thread in the same address
|
||||
space on most platforms (when pthreads is available), but by a pipe to
|
||||
a forked process otherwise:
|
||||
|
||||
. It cannot change the program's state (global variables, environment,
|
||||
etc.) in a way that the caller notices; in other words, .in and .out
|
||||
are the only communication channels to the caller.
|
||||
|
||||
. It must not change the program's state that the caller of the
|
||||
facility also uses.
|
||||
47
Documentation/technical/api-setup.txt
Normal file
47
Documentation/technical/api-setup.txt
Normal file
|
|
@ -0,0 +1,47 @@
|
|||
setup API
|
||||
=========
|
||||
|
||||
Talk about
|
||||
|
||||
* setup_git_directory()
|
||||
* setup_git_directory_gently()
|
||||
* is_inside_git_dir()
|
||||
* is_inside_work_tree()
|
||||
* setup_work_tree()
|
||||
|
||||
(Dscho)
|
||||
|
||||
Pathspec
|
||||
--------
|
||||
|
||||
See glossary-context.txt for the syntax of pathspec. In memory, a
|
||||
pathspec set is represented by "struct pathspec" and is prepared by
|
||||
parse_pathspec(). This function takes several arguments:
|
||||
|
||||
- magic_mask specifies what features that are NOT supported by the
|
||||
following code. If a user attempts to use such a feature,
|
||||
parse_pathspec() can reject it early.
|
||||
|
||||
- flags specifies other things that the caller wants parse_pathspec to
|
||||
perform.
|
||||
|
||||
- prefix and args come from cmd_* functions
|
||||
|
||||
parse_pathspec() helps catch unsupported features and reject them
|
||||
politely. At a lower level, different pathspec-related functions may
|
||||
not support the same set of features. Such pathspec-sensitive
|
||||
functions are guarded with GUARD_PATHSPEC(), which will die in an
|
||||
unfriendly way when an unsupported feature is requested.
|
||||
|
||||
The command designers are supposed to make sure that GUARD_PATHSPEC()
|
||||
never dies. They have to make sure all unsupported features are caught
|
||||
by parse_pathspec(), not by GUARD_PATHSPEC. grepping GUARD_PATHSPEC()
|
||||
should give the designers all pathspec-sensitive codepaths and what
|
||||
features they support.
|
||||
|
||||
A similar process is applied when a new pathspec magic is added. The
|
||||
designer lifts the GUARD_PATHSPEC restriction in the functions that
|
||||
support the new magic. At the same time (s)he has to make sure this
|
||||
new feature will be caught at parse_pathspec() in commands that cannot
|
||||
handle the new magic in some cases. grepping parse_pathspec() should
|
||||
help.
|
||||
41
Documentation/technical/api-sigchain.txt
Normal file
41
Documentation/technical/api-sigchain.txt
Normal file
|
|
@ -0,0 +1,41 @@
|
|||
sigchain API
|
||||
============
|
||||
|
||||
Code often wants to set a signal handler to clean up temporary files or
|
||||
other work-in-progress when we die unexpectedly. For multiple pieces of
|
||||
code to do this without conflicting, each piece of code must remember
|
||||
the old value of the handler and restore it either when:
|
||||
|
||||
1. The work-in-progress is finished, and the handler is no longer
|
||||
necessary. The handler should revert to the original behavior
|
||||
(either another handler, SIG_DFL, or SIG_IGN).
|
||||
|
||||
2. The signal is received. We should then do our cleanup, then chain
|
||||
to the next handler (or die if it is SIG_DFL).
|
||||
|
||||
Sigchain is a tiny library for keeping a stack of handlers. Your handler
|
||||
and installation code should look something like:
|
||||
|
||||
------------------------------------------
|
||||
void clean_foo_on_signal(int sig)
|
||||
{
|
||||
clean_foo();
|
||||
sigchain_pop(sig);
|
||||
raise(sig);
|
||||
}
|
||||
|
||||
void other_func()
|
||||
{
|
||||
sigchain_push_common(clean_foo_on_signal);
|
||||
mess_up_foo();
|
||||
clean_foo();
|
||||
}
|
||||
------------------------------------------
|
||||
|
||||
Handlers are given the typedef of sigchain_fun. This is the same type
|
||||
that is given to signal() or sigaction(). It is perfectly reasonable to
|
||||
push SIG_DFL or SIG_IGN onto the stack.
|
||||
|
||||
You can sigchain_push and sigchain_pop individual signals. For
|
||||
convenience, sigchain_push_common will push the handler onto the stack
|
||||
for many common signals.
|
||||
66
Documentation/technical/api-submodule-config.txt
Normal file
66
Documentation/technical/api-submodule-config.txt
Normal file
|
|
@ -0,0 +1,66 @@
|
|||
submodule config cache API
|
||||
==========================
|
||||
|
||||
The submodule config cache API allows to read submodule
|
||||
configurations/information from specified revisions. Internally
|
||||
information is lazily read into a cache that is used to avoid
|
||||
unnecessary parsing of the same .gitmodules files. Lookups can be done by
|
||||
submodule path or name.
|
||||
|
||||
Usage
|
||||
-----
|
||||
|
||||
To initialize the cache with configurations from the worktree the caller
|
||||
typically first calls `gitmodules_config()` to read values from the
|
||||
worktree .gitmodules and then to overlay the local git config values
|
||||
`parse_submodule_config_option()` from the config parsing
|
||||
infrastructure.
|
||||
|
||||
The caller can look up information about submodules by using the
|
||||
`submodule_from_path()` or `submodule_from_name()` functions. They return
|
||||
a `struct submodule` which contains the values. The API automatically
|
||||
initializes and allocates the needed infrastructure on-demand. If the
|
||||
caller does only want to lookup values from revisions the initialization
|
||||
can be skipped.
|
||||
|
||||
If the internal cache might grow too big or when the caller is done with
|
||||
the API, all internally cached values can be freed with submodule_free().
|
||||
|
||||
Data Structures
|
||||
---------------
|
||||
|
||||
`struct submodule`::
|
||||
|
||||
This structure is used to return the information about one
|
||||
submodule for a certain revision. It is returned by the lookup
|
||||
functions.
|
||||
|
||||
Functions
|
||||
---------
|
||||
|
||||
`void submodule_free(struct repository *r)`::
|
||||
|
||||
Use these to free the internally cached values.
|
||||
|
||||
`int parse_submodule_config_option(const char *var, const char *value)`::
|
||||
|
||||
Can be passed to the config parsing infrastructure to parse
|
||||
local (worktree) submodule configurations.
|
||||
|
||||
`const struct submodule *submodule_from_path(const unsigned char *treeish_name, const char *path)`::
|
||||
|
||||
Given a tree-ish in the superproject and a path, return the
|
||||
submodule that is bound at the path in the named tree.
|
||||
|
||||
`const struct submodule *submodule_from_name(const unsigned char *treeish_name, const char *name)`::
|
||||
|
||||
The same as above but lookup by name.
|
||||
|
||||
Whenever a submodule configuration is parsed in `parse_submodule_config_option`
|
||||
via e.g. `gitmodules_config()`, it will overwrite the null_sha1 entry.
|
||||
So in the normal case, when HEAD:.gitmodules is parsed first and then overlayed
|
||||
with the repository configuration, the null_sha1 entry contains the local
|
||||
configuration of a submodule (e.g. consolidated values from local git
|
||||
configuration and the .gitmodules file in the worktree).
|
||||
|
||||
For an example usage see test-submodule-config.c.
|
||||
140
Documentation/technical/api-trace.txt
Normal file
140
Documentation/technical/api-trace.txt
Normal file
|
|
@ -0,0 +1,140 @@
|
|||
trace API
|
||||
=========
|
||||
|
||||
The trace API can be used to print debug messages to stderr or a file. Trace
|
||||
code is inactive unless explicitly enabled by setting `GIT_TRACE*` environment
|
||||
variables.
|
||||
|
||||
The trace implementation automatically adds `timestamp file:line ... \n` to
|
||||
all trace messages. E.g.:
|
||||
|
||||
------------
|
||||
23:59:59.123456 git.c:312 trace: built-in: git 'foo'
|
||||
00:00:00.000001 builtin/foo.c:99 foo: some message
|
||||
------------
|
||||
|
||||
Data Structures
|
||||
---------------
|
||||
|
||||
`struct trace_key`::
|
||||
|
||||
Defines a trace key (or category). The default (for API functions that
|
||||
don't take a key) is `GIT_TRACE`.
|
||||
+
|
||||
E.g. to define a trace key controlled by environment variable `GIT_TRACE_FOO`:
|
||||
+
|
||||
------------
|
||||
static struct trace_key trace_foo = TRACE_KEY_INIT(FOO);
|
||||
|
||||
static void trace_print_foo(const char *message)
|
||||
{
|
||||
trace_printf_key(&trace_foo, "%s", message);
|
||||
}
|
||||
------------
|
||||
+
|
||||
Note: don't use `const` as the trace implementation stores internal state in
|
||||
the `trace_key` structure.
|
||||
|
||||
Functions
|
||||
---------
|
||||
|
||||
`int trace_want(struct trace_key *key)`::
|
||||
|
||||
Checks whether the trace key is enabled. Used to prevent expensive
|
||||
string formatting before calling one of the printing APIs.
|
||||
|
||||
`void trace_disable(struct trace_key *key)`::
|
||||
|
||||
Disables tracing for the specified key, even if the environment
|
||||
variable was set.
|
||||
|
||||
`void trace_printf(const char *format, ...)`::
|
||||
`void trace_printf_key(struct trace_key *key, const char *format, ...)`::
|
||||
|
||||
Prints a formatted message, similar to printf.
|
||||
|
||||
`void trace_argv_printf(const char **argv, const char *format, ...)``::
|
||||
|
||||
Prints a formatted message, followed by a quoted list of arguments.
|
||||
|
||||
`void trace_strbuf(struct trace_key *key, const struct strbuf *data)`::
|
||||
|
||||
Prints the strbuf, without additional formatting (i.e. doesn't
|
||||
choke on `%` or even `\0`).
|
||||
|
||||
`uint64_t getnanotime(void)`::
|
||||
|
||||
Returns nanoseconds since the epoch (01/01/1970), typically used
|
||||
for performance measurements.
|
||||
+
|
||||
Currently there are high precision timer implementations for Linux (using
|
||||
`clock_gettime(CLOCK_MONOTONIC)`) and Windows (`QueryPerformanceCounter`).
|
||||
Other platforms use `gettimeofday` as time source.
|
||||
|
||||
`void trace_performance(uint64_t nanos, const char *format, ...)`::
|
||||
`void trace_performance_since(uint64_t start, const char *format, ...)`::
|
||||
|
||||
Prints the elapsed time (in nanoseconds), or elapsed time since
|
||||
`start`, followed by a formatted message. Enabled via environment
|
||||
variable `GIT_TRACE_PERFORMANCE`. Used for manual profiling, e.g.:
|
||||
+
|
||||
------------
|
||||
uint64_t start = getnanotime();
|
||||
/* code section to measure */
|
||||
trace_performance_since(start, "foobar");
|
||||
------------
|
||||
+
|
||||
------------
|
||||
uint64_t t = 0;
|
||||
for (;;) {
|
||||
/* ignore */
|
||||
t -= getnanotime();
|
||||
/* code section to measure */
|
||||
t += getnanotime();
|
||||
/* ignore */
|
||||
}
|
||||
trace_performance(t, "frotz");
|
||||
------------
|
||||
|
||||
Bugs & Caveats
|
||||
--------------
|
||||
|
||||
GIT_TRACE_* environment variables can be used to tell Git to show
|
||||
trace output to its standard error stream. Git can often spawn a pager
|
||||
internally to run its subcommand and send its standard output and
|
||||
standard error to it.
|
||||
|
||||
Because GIT_TRACE_PERFORMANCE trace is generated only at the very end
|
||||
of the program with atexit(), which happens after the pager exits, it
|
||||
would not work well if you send its log to the standard error output
|
||||
and let Git spawn the pager at the same time.
|
||||
|
||||
As a work around, you can for example use '--no-pager', or set
|
||||
GIT_TRACE_PERFORMANCE to another file descriptor which is redirected
|
||||
to stderr, or set GIT_TRACE_PERFORMANCE to a file specified by its
|
||||
absolute path.
|
||||
|
||||
For example instead of the following command which by default may not
|
||||
print any performance information:
|
||||
|
||||
------------
|
||||
GIT_TRACE_PERFORMANCE=2 git log -1
|
||||
------------
|
||||
|
||||
you may want to use:
|
||||
|
||||
------------
|
||||
GIT_TRACE_PERFORMANCE=2 git --no-pager log -1
|
||||
------------
|
||||
|
||||
or:
|
||||
|
||||
------------
|
||||
GIT_TRACE_PERFORMANCE=3 3>&2 git log -1
|
||||
------------
|
||||
|
||||
or:
|
||||
|
||||
------------
|
||||
GIT_TRACE_PERFORMANCE=/path/to/log/file git log -1
|
||||
------------
|
||||
1378
Documentation/technical/api-trace2.txt
Normal file
1378
Documentation/technical/api-trace2.txt
Normal file
File diff suppressed because it is too large
Load diff
147
Documentation/technical/api-tree-walking.txt
Normal file
147
Documentation/technical/api-tree-walking.txt
Normal file
|
|
@ -0,0 +1,147 @@
|
|||
tree walking API
|
||||
================
|
||||
|
||||
The tree walking API is used to traverse and inspect trees.
|
||||
|
||||
Data Structures
|
||||
---------------
|
||||
|
||||
`struct name_entry`::
|
||||
|
||||
An entry in a tree. Each entry has a sha1 identifier, pathname, and
|
||||
mode.
|
||||
|
||||
`struct tree_desc`::
|
||||
|
||||
A semi-opaque data structure used to maintain the current state of the
|
||||
walk.
|
||||
+
|
||||
* `buffer` is a pointer into the memory representation of the tree. It always
|
||||
points at the current entry being visited.
|
||||
|
||||
* `size` counts the number of bytes left in the `buffer`.
|
||||
|
||||
* `entry` points to the current entry being visited.
|
||||
|
||||
`struct traverse_info`::
|
||||
|
||||
A structure used to maintain the state of a traversal.
|
||||
+
|
||||
* `prev` points to the traverse_info which was used to descend into the
|
||||
current tree. If this is the top-level tree `prev` will point to
|
||||
a dummy traverse_info.
|
||||
|
||||
* `name` is the entry for the current tree (if the tree is a subtree).
|
||||
|
||||
* `pathlen` is the length of the full path for the current tree.
|
||||
|
||||
* `conflicts` can be used by callbacks to maintain directory-file conflicts.
|
||||
|
||||
* `fn` is a callback called for each entry in the tree. See Traversing for more
|
||||
information.
|
||||
|
||||
* `data` can be anything the `fn` callback would want to use.
|
||||
|
||||
* `show_all_errors` tells whether to stop at the first error or not.
|
||||
|
||||
Initializing
|
||||
------------
|
||||
|
||||
`init_tree_desc`::
|
||||
|
||||
Initialize a `tree_desc` and decode its first entry. The buffer and
|
||||
size parameters are assumed to be the same as the buffer and size
|
||||
members of `struct tree`.
|
||||
|
||||
`fill_tree_descriptor`::
|
||||
|
||||
Initialize a `tree_desc` and decode its first entry given the
|
||||
object ID of a tree. Returns the `buffer` member if the latter
|
||||
is a valid tree identifier and NULL otherwise.
|
||||
|
||||
`setup_traverse_info`::
|
||||
|
||||
Initialize a `traverse_info` given the pathname of the tree to start
|
||||
traversing from. The `base` argument is assumed to be the `path`
|
||||
member of the `name_entry` being recursed into unless the tree is a
|
||||
top-level tree in which case the empty string ("") is used.
|
||||
|
||||
Walking
|
||||
-------
|
||||
|
||||
`tree_entry`::
|
||||
|
||||
Visit the next entry in a tree. Returns 1 when there are more entries
|
||||
left to visit and 0 when all entries have been visited. This is
|
||||
commonly used in the test of a while loop.
|
||||
|
||||
`tree_entry_len`::
|
||||
|
||||
Calculate the length of a tree entry's pathname. This utilizes the
|
||||
memory structure of a tree entry to avoid the overhead of using a
|
||||
generic strlen().
|
||||
|
||||
`update_tree_entry`::
|
||||
|
||||
Walk to the next entry in a tree. This is commonly used in conjunction
|
||||
with `tree_entry_extract` to inspect the current entry.
|
||||
|
||||
`tree_entry_extract`::
|
||||
|
||||
Decode the entry currently being visited (the one pointed to by
|
||||
`tree_desc's` `entry` member) and return the sha1 of the entry. The
|
||||
`pathp` and `modep` arguments are set to the entry's pathname and mode
|
||||
respectively.
|
||||
|
||||
`get_tree_entry`::
|
||||
|
||||
Find an entry in a tree given a pathname and the sha1 of a tree to
|
||||
search. Returns 0 if the entry is found and -1 otherwise. The third
|
||||
and fourth parameters are set to the entry's sha1 and mode
|
||||
respectively.
|
||||
|
||||
Traversing
|
||||
----------
|
||||
|
||||
`traverse_trees`::
|
||||
|
||||
Traverse `n` number of trees in parallel. The `fn` callback member of
|
||||
`traverse_info` is called once for each tree entry.
|
||||
|
||||
`traverse_callback_t`::
|
||||
The arguments passed to the traverse callback are as follows:
|
||||
+
|
||||
* `n` counts the number of trees being traversed.
|
||||
|
||||
* `mask` has its nth bit set if something exists in the nth entry.
|
||||
|
||||
* `dirmask` has its nth bit set if the nth tree's entry is a directory.
|
||||
|
||||
* `entry` is an array of size `n` where the nth entry is from the nth tree.
|
||||
|
||||
* `info` maintains the state of the traversal.
|
||||
|
||||
+
|
||||
Returning a negative value will terminate the traversal. Otherwise the
|
||||
return value is treated as an update mask. If the nth bit is set the nth tree
|
||||
will be updated and if the bit is not set the nth tree entry will be the
|
||||
same in the next callback invocation.
|
||||
|
||||
`make_traverse_path`::
|
||||
|
||||
Generate the full pathname of a tree entry based from the root of the
|
||||
traversal. For example, if the traversal has recursed into another
|
||||
tree named "bar" the pathname of an entry "baz" in the "bar"
|
||||
tree would be "bar/baz".
|
||||
|
||||
`traverse_path_len`::
|
||||
|
||||
Calculate the length of a pathname returned by `make_traverse_path`.
|
||||
This utilizes the memory structure of a tree entry to avoid the
|
||||
overhead of using a generic strlen().
|
||||
|
||||
Authors
|
||||
-------
|
||||
|
||||
Written by Junio C Hamano <gitster@pobox.com> and Linus Torvalds
|
||||
<torvalds@linux-foundation.org>
|
||||
7
Documentation/technical/api-xdiff-interface.txt
Normal file
7
Documentation/technical/api-xdiff-interface.txt
Normal file
|
|
@ -0,0 +1,7 @@
|
|||
xdiff interface API
|
||||
===================
|
||||
|
||||
Talk about our calling convention to xdiff library, including
|
||||
xdiff_emit_consume_fn.
|
||||
|
||||
(Dscho, JC)
|
||||
164
Documentation/technical/bitmap-format.txt
Normal file
164
Documentation/technical/bitmap-format.txt
Normal file
|
|
@ -0,0 +1,164 @@
|
|||
GIT bitmap v1 format
|
||||
====================
|
||||
|
||||
- A header appears at the beginning:
|
||||
|
||||
4-byte signature: {'B', 'I', 'T', 'M'}
|
||||
|
||||
2-byte version number (network byte order)
|
||||
The current implementation only supports version 1
|
||||
of the bitmap index (the same one as JGit).
|
||||
|
||||
2-byte flags (network byte order)
|
||||
|
||||
The following flags are supported:
|
||||
|
||||
- BITMAP_OPT_FULL_DAG (0x1) REQUIRED
|
||||
This flag must always be present. It implies that the bitmap
|
||||
index has been generated for a packfile with full closure
|
||||
(i.e. where every single object in the packfile can find
|
||||
its parent links inside the same packfile). This is a
|
||||
requirement for the bitmap index format, also present in JGit,
|
||||
that greatly reduces the complexity of the implementation.
|
||||
|
||||
- BITMAP_OPT_HASH_CACHE (0x4)
|
||||
If present, the end of the bitmap file contains
|
||||
`N` 32-bit name-hash values, one per object in the
|
||||
pack. The format and meaning of the name-hash is
|
||||
described below.
|
||||
|
||||
4-byte entry count (network byte order)
|
||||
|
||||
The total count of entries (bitmapped commits) in this bitmap index.
|
||||
|
||||
20-byte checksum
|
||||
|
||||
The SHA1 checksum of the pack this bitmap index belongs to.
|
||||
|
||||
- 4 EWAH bitmaps that act as type indexes
|
||||
|
||||
Type indexes are serialized after the hash cache in the shape
|
||||
of four EWAH bitmaps stored consecutively (see Appendix A for
|
||||
the serialization format of an EWAH bitmap).
|
||||
|
||||
There is a bitmap for each Git object type, stored in the following
|
||||
order:
|
||||
|
||||
- Commits
|
||||
- Trees
|
||||
- Blobs
|
||||
- Tags
|
||||
|
||||
In each bitmap, the `n`th bit is set to true if the `n`th object
|
||||
in the packfile is of that type.
|
||||
|
||||
The obvious consequence is that the OR of all 4 bitmaps will result
|
||||
in a full set (all bits set), and the AND of all 4 bitmaps will
|
||||
result in an empty bitmap (no bits set).
|
||||
|
||||
- N entries with compressed bitmaps, one for each indexed commit
|
||||
|
||||
Where `N` is the total amount of entries in this bitmap index.
|
||||
Each entry contains the following:
|
||||
|
||||
- 4-byte object position (network byte order)
|
||||
The position **in the index for the packfile** where the
|
||||
bitmap for this commit is found.
|
||||
|
||||
- 1-byte XOR-offset
|
||||
The xor offset used to compress this bitmap. For an entry
|
||||
in position `x`, a XOR offset of `y` means that the actual
|
||||
bitmap representing this commit is composed by XORing the
|
||||
bitmap for this entry with the bitmap in entry `x-y` (i.e.
|
||||
the bitmap `y` entries before this one).
|
||||
|
||||
Note that this compression can be recursive. In order to
|
||||
XOR this entry with a previous one, the previous entry needs
|
||||
to be decompressed first, and so on.
|
||||
|
||||
The hard-limit for this offset is 160 (an entry can only be
|
||||
xor'ed against one of the 160 entries preceding it). This
|
||||
number is always positive, and hence entries are always xor'ed
|
||||
with **previous** bitmaps, not bitmaps that will come afterwards
|
||||
in the index.
|
||||
|
||||
- 1-byte flags for this bitmap
|
||||
At the moment the only available flag is `0x1`, which hints
|
||||
that this bitmap can be re-used when rebuilding bitmap indexes
|
||||
for the repository.
|
||||
|
||||
- The compressed bitmap itself, see Appendix A.
|
||||
|
||||
== Appendix A: Serialization format for an EWAH bitmap
|
||||
|
||||
Ewah bitmaps are serialized in the same protocol as the JAVAEWAH
|
||||
library, making them backwards compatible with the JGit
|
||||
implementation:
|
||||
|
||||
- 4-byte number of bits of the resulting UNCOMPRESSED bitmap
|
||||
|
||||
- 4-byte number of words of the COMPRESSED bitmap, when stored
|
||||
|
||||
- N x 8-byte words, as specified by the previous field
|
||||
|
||||
This is the actual content of the compressed bitmap.
|
||||
|
||||
- 4-byte position of the current RLW for the compressed
|
||||
bitmap
|
||||
|
||||
All words are stored in network byte order for their corresponding
|
||||
sizes.
|
||||
|
||||
The compressed bitmap is stored in a form of run-length encoding, as
|
||||
follows. It consists of a concatenation of an arbitrary number of
|
||||
chunks. Each chunk consists of one or more 64-bit words
|
||||
|
||||
H L_1 L_2 L_3 .... L_M
|
||||
|
||||
H is called RLW (run length word). It consists of (from lower to higher
|
||||
order bits):
|
||||
|
||||
- 1 bit: the repeated bit B
|
||||
|
||||
- 32 bits: repetition count K (unsigned)
|
||||
|
||||
- 31 bits: literal word count M (unsigned)
|
||||
|
||||
The bitstream represented by the above chunk is then:
|
||||
|
||||
- K repetitions of B
|
||||
|
||||
- The bits stored in `L_1` through `L_M`. Within a word, bits at
|
||||
lower order come earlier in the stream than those at higher
|
||||
order.
|
||||
|
||||
The next word after `L_M` (if any) must again be a RLW, for the next
|
||||
chunk. For efficient appending to the bitstream, the EWAH stores a
|
||||
pointer to the last RLW in the stream.
|
||||
|
||||
|
||||
== Appendix B: Optional Bitmap Sections
|
||||
|
||||
These sections may or may not be present in the `.bitmap` file; their
|
||||
presence is indicated by the header flags section described above.
|
||||
|
||||
Name-hash cache
|
||||
---------------
|
||||
|
||||
If the BITMAP_OPT_HASH_CACHE flag is set, the end of the bitmap contains
|
||||
a cache of 32-bit values, one per object in the pack. The value at
|
||||
position `i` is the hash of the pathname at which the `i`th object
|
||||
(counting in index order) in the pack can be found. This can be fed
|
||||
into the delta heuristics to compare objects with similar pathnames.
|
||||
|
||||
The hash algorithm used is:
|
||||
|
||||
hash = 0;
|
||||
while ((c = *name++))
|
||||
if (!isspace(c))
|
||||
hash = (hash >> 2) + (c << 24);
|
||||
|
||||
Note that this hashing scheme is tied to the BITMAP_OPT_HASH_CACHE flag.
|
||||
If implementations want to choose a different hashing scheme, they are
|
||||
free to do so, but MUST allocate a new header flag (because comparing
|
||||
hashes made under two different schemes would be pointless).
|
||||
104
Documentation/technical/commit-graph-format.txt
Normal file
104
Documentation/technical/commit-graph-format.txt
Normal file
|
|
@ -0,0 +1,104 @@
|
|||
Git commit graph format
|
||||
=======================
|
||||
|
||||
The Git commit graph stores a list of commit OIDs and some associated
|
||||
metadata, including:
|
||||
|
||||
- The generation number of the commit. Commits with no parents have
|
||||
generation number 1; commits with parents have generation number
|
||||
one more than the maximum generation number of its parents. We
|
||||
reserve zero as special, and can be used to mark a generation
|
||||
number invalid or as "not computed".
|
||||
|
||||
- The root tree OID.
|
||||
|
||||
- The commit date.
|
||||
|
||||
- The parents of the commit, stored using positional references within
|
||||
the graph file.
|
||||
|
||||
These positional references are stored as unsigned 32-bit integers
|
||||
corresponding to the array position within the list of commit OIDs. Due
|
||||
to some special constants we use to track parents, we can store at most
|
||||
(1 << 30) + (1 << 29) + (1 << 28) - 1 (around 1.8 billion) commits.
|
||||
|
||||
== Commit graph files have the following format:
|
||||
|
||||
In order to allow extensions that add extra data to the graph, we organize
|
||||
the body into "chunks" and provide a binary lookup table at the beginning
|
||||
of the body. The header includes certain values, such as number of chunks
|
||||
and hash type.
|
||||
|
||||
All 4-byte numbers are in network order.
|
||||
|
||||
HEADER:
|
||||
|
||||
4-byte signature:
|
||||
The signature is: {'C', 'G', 'P', 'H'}
|
||||
|
||||
1-byte version number:
|
||||
Currently, the only valid version is 1.
|
||||
|
||||
1-byte Hash Version (1 = SHA-1)
|
||||
We infer the hash length (H) from this value.
|
||||
|
||||
1-byte number (C) of "chunks"
|
||||
|
||||
1-byte number (B) of base commit-graphs
|
||||
We infer the length (H*B) of the Base Graphs chunk
|
||||
from this value.
|
||||
|
||||
CHUNK LOOKUP:
|
||||
|
||||
(C + 1) * 12 bytes listing the table of contents for the chunks:
|
||||
First 4 bytes describe the chunk id. Value 0 is a terminating label.
|
||||
Other 8 bytes provide the byte-offset in current file for chunk to
|
||||
start. (Chunks are ordered contiguously in the file, so you can infer
|
||||
the length using the next chunk position if necessary.) Each chunk
|
||||
ID appears at most once.
|
||||
|
||||
The remaining data in the body is described one chunk at a time, and
|
||||
these chunks may be given in any order. Chunks are required unless
|
||||
otherwise specified.
|
||||
|
||||
CHUNK DATA:
|
||||
|
||||
OID Fanout (ID: {'O', 'I', 'D', 'F'}) (256 * 4 bytes)
|
||||
The ith entry, F[i], stores the number of OIDs with first
|
||||
byte at most i. Thus F[255] stores the total
|
||||
number of commits (N).
|
||||
|
||||
OID Lookup (ID: {'O', 'I', 'D', 'L'}) (N * H bytes)
|
||||
The OIDs for all commits in the graph, sorted in ascending order.
|
||||
|
||||
Commit Data (ID: {'C', 'D', 'A', 'T' }) (N * (H + 16) bytes)
|
||||
* The first H bytes are for the OID of the root tree.
|
||||
* The next 8 bytes are for the positions of the first two parents
|
||||
of the ith commit. Stores value 0x7000000 if no parent in that
|
||||
position. If there are more than two parents, the second value
|
||||
has its most-significant bit on and the other bits store an array
|
||||
position into the Extra Edge List chunk.
|
||||
* The next 8 bytes store the generation number of the commit and
|
||||
the commit time in seconds since EPOCH. The generation number
|
||||
uses the higher 30 bits of the first 4 bytes, while the commit
|
||||
time uses the 32 bits of the second 4 bytes, along with the lowest
|
||||
2 bits of the lowest byte, storing the 33rd and 34th bit of the
|
||||
commit time.
|
||||
|
||||
Extra Edge List (ID: {'E', 'D', 'G', 'E'}) [Optional]
|
||||
This list of 4-byte values store the second through nth parents for
|
||||
all octopus merges. The second parent value in the commit data stores
|
||||
an array position within this list along with the most-significant bit
|
||||
on. Starting at that array position, iterate through this list of commit
|
||||
positions for the parents until reaching a value with the most-significant
|
||||
bit on. The other bits correspond to the position of the last parent.
|
||||
|
||||
Base Graphs List (ID: {'B', 'A', 'S', 'E'}) [Optional]
|
||||
This list of H-byte hashes describe a set of B commit-graph files that
|
||||
form a commit-graph chain. The graph position for the ith commit in this
|
||||
file's OID Lookup chunk is equal to i plus the number of commits in all
|
||||
base graphs. If B is non-zero, this chunk must exist.
|
||||
|
||||
TRAILER:
|
||||
|
||||
H-byte HASH-checksum of all of the above.
|
||||
350
Documentation/technical/commit-graph.txt
Normal file
350
Documentation/technical/commit-graph.txt
Normal file
|
|
@ -0,0 +1,350 @@
|
|||
Git Commit Graph Design Notes
|
||||
=============================
|
||||
|
||||
Git walks the commit graph for many reasons, including:
|
||||
|
||||
1. Listing and filtering commit history.
|
||||
2. Computing merge bases.
|
||||
|
||||
These operations can become slow as the commit count grows. The merge
|
||||
base calculation shows up in many user-facing commands, such as 'merge-base'
|
||||
or 'status' and can take minutes to compute depending on history shape.
|
||||
|
||||
There are two main costs here:
|
||||
|
||||
1. Decompressing and parsing commits.
|
||||
2. Walking the entire graph to satisfy topological order constraints.
|
||||
|
||||
The commit-graph file is a supplemental data structure that accelerates
|
||||
commit graph walks. If a user downgrades or disables the 'core.commitGraph'
|
||||
config setting, then the existing ODB is sufficient. The file is stored
|
||||
as "commit-graph" either in the .git/objects/info directory or in the info
|
||||
directory of an alternate.
|
||||
|
||||
The commit-graph file stores the commit graph structure along with some
|
||||
extra metadata to speed up graph walks. By listing commit OIDs in lexi-
|
||||
cographic order, we can identify an integer position for each commit and
|
||||
refer to the parents of a commit using those integer positions. We use
|
||||
binary search to find initial commits and then use the integer positions
|
||||
for fast lookups during the walk.
|
||||
|
||||
A consumer may load the following info for a commit from the graph:
|
||||
|
||||
1. The commit OID.
|
||||
2. The list of parents, along with their integer position.
|
||||
3. The commit date.
|
||||
4. The root tree OID.
|
||||
5. The generation number (see definition below).
|
||||
|
||||
Values 1-4 satisfy the requirements of parse_commit_gently().
|
||||
|
||||
Define the "generation number" of a commit recursively as follows:
|
||||
|
||||
* A commit with no parents (a root commit) has generation number one.
|
||||
|
||||
* A commit with at least one parent has generation number one more than
|
||||
the largest generation number among its parents.
|
||||
|
||||
Equivalently, the generation number of a commit A is one more than the
|
||||
length of a longest path from A to a root commit. The recursive definition
|
||||
is easier to use for computation and observing the following property:
|
||||
|
||||
If A and B are commits with generation numbers N and M, respectively,
|
||||
and N <= M, then A cannot reach B. That is, we know without searching
|
||||
that B is not an ancestor of A because it is further from a root commit
|
||||
than A.
|
||||
|
||||
Conversely, when checking if A is an ancestor of B, then we only need
|
||||
to walk commits until all commits on the walk boundary have generation
|
||||
number at most N. If we walk commits using a priority queue seeded by
|
||||
generation numbers, then we always expand the boundary commit with highest
|
||||
generation number and can easily detect the stopping condition.
|
||||
|
||||
This property can be used to significantly reduce the time it takes to
|
||||
walk commits and determine topological relationships. Without generation
|
||||
numbers, the general heuristic is the following:
|
||||
|
||||
If A and B are commits with commit time X and Y, respectively, and
|
||||
X < Y, then A _probably_ cannot reach B.
|
||||
|
||||
This heuristic is currently used whenever the computation is allowed to
|
||||
violate topological relationships due to clock skew (such as "git log"
|
||||
with default order), but is not used when the topological order is
|
||||
required (such as merge base calculations, "git log --graph").
|
||||
|
||||
In practice, we expect some commits to be created recently and not stored
|
||||
in the commit graph. We can treat these commits as having "infinite"
|
||||
generation number and walk until reaching commits with known generation
|
||||
number.
|
||||
|
||||
We use the macro GENERATION_NUMBER_INFINITY = 0xFFFFFFFF to mark commits not
|
||||
in the commit-graph file. If a commit-graph file was written by a version
|
||||
of Git that did not compute generation numbers, then those commits will
|
||||
have generation number represented by the macro GENERATION_NUMBER_ZERO = 0.
|
||||
|
||||
Since the commit-graph file is closed under reachability, we can guarantee
|
||||
the following weaker condition on all commits:
|
||||
|
||||
If A and B are commits with generation numbers N amd M, respectively,
|
||||
and N < M, then A cannot reach B.
|
||||
|
||||
Note how the strict inequality differs from the inequality when we have
|
||||
fully-computed generation numbers. Using strict inequality may result in
|
||||
walking a few extra commits, but the simplicity in dealing with commits
|
||||
with generation number *_INFINITY or *_ZERO is valuable.
|
||||
|
||||
We use the macro GENERATION_NUMBER_MAX = 0x3FFFFFFF to for commits whose
|
||||
generation numbers are computed to be at least this value. We limit at
|
||||
this value since it is the largest value that can be stored in the
|
||||
commit-graph file using the 30 bits available to generation numbers. This
|
||||
presents another case where a commit can have generation number equal to
|
||||
that of a parent.
|
||||
|
||||
Design Details
|
||||
--------------
|
||||
|
||||
- The commit-graph file is stored in a file named 'commit-graph' in the
|
||||
.git/objects/info directory. This could be stored in the info directory
|
||||
of an alternate.
|
||||
|
||||
- The core.commitGraph config setting must be on to consume graph files.
|
||||
|
||||
- The file format includes parameters for the object ID hash function,
|
||||
so a future change of hash algorithm does not require a change in format.
|
||||
|
||||
- Commit grafts and replace objects can change the shape of the commit
|
||||
history. The latter can also be enabled/disabled on the fly using
|
||||
`--no-replace-objects`. This leads to difficultly storing both possible
|
||||
interpretations of a commit id, especially when computing generation
|
||||
numbers. The commit-graph will not be read or written when
|
||||
replace-objects or grafts are present.
|
||||
|
||||
- Shallow clones create grafts of commits by dropping their parents. This
|
||||
leads the commit-graph to think those commits have generation number 1.
|
||||
If and when those commits are made unshallow, those generation numbers
|
||||
become invalid. Since shallow clones are intended to restrict the commit
|
||||
history to a very small set of commits, the commit-graph feature is less
|
||||
helpful for these clones, anyway. The commit-graph will not be read or
|
||||
written when shallow commits are present.
|
||||
|
||||
Commit Graphs Chains
|
||||
--------------------
|
||||
|
||||
Typically, repos grow with near-constant velocity (commits per day). Over time,
|
||||
the number of commits added by a fetch operation is much smaller than the
|
||||
number of commits in the full history. By creating a "chain" of commit-graphs,
|
||||
we enable fast writes of new commit data without rewriting the entire commit
|
||||
history -- at least, most of the time.
|
||||
|
||||
## File Layout
|
||||
|
||||
A commit-graph chain uses multiple files, and we use a fixed naming convention
|
||||
to organize these files. Each commit-graph file has a name
|
||||
`$OBJDIR/info/commit-graphs/graph-{hash}.graph` where `{hash}` is the hex-
|
||||
valued hash stored in the footer of that file (which is a hash of the file's
|
||||
contents before that hash). For a chain of commit-graph files, a plain-text
|
||||
file at `$OBJDIR/info/commit-graphs/commit-graph-chain` contains the
|
||||
hashes for the files in order from "lowest" to "highest".
|
||||
|
||||
For example, if the `commit-graph-chain` file contains the lines
|
||||
|
||||
```
|
||||
{hash0}
|
||||
{hash1}
|
||||
{hash2}
|
||||
```
|
||||
|
||||
then the commit-graph chain looks like the following diagram:
|
||||
|
||||
+-----------------------+
|
||||
| graph-{hash2}.graph |
|
||||
+-----------------------+
|
||||
|
|
||||
+-----------------------+
|
||||
| |
|
||||
| graph-{hash1}.graph |
|
||||
| |
|
||||
+-----------------------+
|
||||
|
|
||||
+-----------------------+
|
||||
| |
|
||||
| |
|
||||
| |
|
||||
| graph-{hash0}.graph |
|
||||
| |
|
||||
| |
|
||||
| |
|
||||
+-----------------------+
|
||||
|
||||
Let X0 be the number of commits in `graph-{hash0}.graph`, X1 be the number of
|
||||
commits in `graph-{hash1}.graph`, and X2 be the number of commits in
|
||||
`graph-{hash2}.graph`. If a commit appears in position i in `graph-{hash2}.graph`,
|
||||
then we interpret this as being the commit in position (X0 + X1 + i), and that
|
||||
will be used as its "graph position". The commits in `graph-{hash2}.graph` use these
|
||||
positions to refer to their parents, which may be in `graph-{hash1}.graph` or
|
||||
`graph-{hash0}.graph`. We can navigate to an arbitrary commit in position j by checking
|
||||
its containment in the intervals [0, X0), [X0, X0 + X1), [X0 + X1, X0 + X1 +
|
||||
X2).
|
||||
|
||||
Each commit-graph file (except the base, `graph-{hash0}.graph`) contains data
|
||||
specifying the hashes of all files in the lower layers. In the above example,
|
||||
`graph-{hash1}.graph` contains `{hash0}` while `graph-{hash2}.graph` contains
|
||||
`{hash0}` and `{hash1}`.
|
||||
|
||||
## Merging commit-graph files
|
||||
|
||||
If we only added a new commit-graph file on every write, we would run into a
|
||||
linear search problem through many commit-graph files. Instead, we use a merge
|
||||
strategy to decide when the stack should collapse some number of levels.
|
||||
|
||||
The diagram below shows such a collapse. As a set of new commits are added, it
|
||||
is determined by the merge strategy that the files should collapse to
|
||||
`graph-{hash1}`. Thus, the new commits, the commits in `graph-{hash2}` and
|
||||
the commits in `graph-{hash1}` should be combined into a new `graph-{hash3}`
|
||||
file.
|
||||
|
||||
+---------------------+
|
||||
| |
|
||||
| (new commits) |
|
||||
| |
|
||||
+---------------------+
|
||||
| |
|
||||
+-----------------------+ +---------------------+
|
||||
| graph-{hash2} |->| |
|
||||
+-----------------------+ +---------------------+
|
||||
| | |
|
||||
+-----------------------+ +---------------------+
|
||||
| | | |
|
||||
| graph-{hash1} |->| |
|
||||
| | | |
|
||||
+-----------------------+ +---------------------+
|
||||
| tmp_graphXXX
|
||||
+-----------------------+
|
||||
| |
|
||||
| |
|
||||
| |
|
||||
| graph-{hash0} |
|
||||
| |
|
||||
| |
|
||||
| |
|
||||
+-----------------------+
|
||||
|
||||
During this process, the commits to write are combined, sorted and we write the
|
||||
contents to a temporary file, all while holding a `commit-graph-chain.lock`
|
||||
lock-file. When the file is flushed, we rename it to `graph-{hash3}`
|
||||
according to the computed `{hash3}`. Finally, we write the new chain data to
|
||||
`commit-graph-chain.lock`:
|
||||
|
||||
```
|
||||
{hash3}
|
||||
{hash0}
|
||||
```
|
||||
|
||||
We then close the lock-file.
|
||||
|
||||
## Merge Strategy
|
||||
|
||||
When writing a set of commits that do not exist in the commit-graph stack of
|
||||
height N, we default to creating a new file at level N + 1. We then decide to
|
||||
merge with the Nth level if one of two conditions hold:
|
||||
|
||||
1. `--size-multiple=<X>` is specified or X = 2, and the number of commits in
|
||||
level N is less than X times the number of commits in level N + 1.
|
||||
|
||||
2. `--max-commits=<C>` is specified with non-zero C and the number of commits
|
||||
in level N + 1 is more than C commits.
|
||||
|
||||
This decision cascades down the levels: when we merge a level we create a new
|
||||
set of commits that then compares to the next level.
|
||||
|
||||
The first condition bounds the number of levels to be logarithmic in the total
|
||||
number of commits. The second condition bounds the total number of commits in
|
||||
a `graph-{hashN}` file and not in the `commit-graph` file, preventing
|
||||
significant performance issues when the stack merges and another process only
|
||||
partially reads the previous stack.
|
||||
|
||||
The merge strategy values (2 for the size multiple, 64,000 for the maximum
|
||||
number of commits) could be extracted into config settings for full
|
||||
flexibility.
|
||||
|
||||
## Deleting graph-{hash} files
|
||||
|
||||
After a new tip file is written, some `graph-{hash}` files may no longer
|
||||
be part of a chain. It is important to remove these files from disk, eventually.
|
||||
The main reason to delay removal is that another process could read the
|
||||
`commit-graph-chain` file before it is rewritten, but then look for the
|
||||
`graph-{hash}` files after they are deleted.
|
||||
|
||||
To allow holding old split commit-graphs for a while after they are unreferenced,
|
||||
we update the modified times of the files when they become unreferenced. Then,
|
||||
we scan the `$OBJDIR/info/commit-graphs/` directory for `graph-{hash}`
|
||||
files whose modified times are older than a given expiry window. This window
|
||||
defaults to zero, but can be changed using command-line arguments or a config
|
||||
setting.
|
||||
|
||||
## Chains across multiple object directories
|
||||
|
||||
In a repo with alternates, we look for the `commit-graph-chain` file starting
|
||||
in the local object directory and then in each alternate. The first file that
|
||||
exists defines our chain. As we look for the `graph-{hash}` files for
|
||||
each `{hash}` in the chain file, we follow the same pattern for the host
|
||||
directories.
|
||||
|
||||
This allows commit-graphs to be split across multiple forks in a fork network.
|
||||
The typical case is a large "base" repo with many smaller forks.
|
||||
|
||||
As the base repo advances, it will likely update and merge its commit-graph
|
||||
chain more frequently than the forks. If a fork updates their commit-graph after
|
||||
the base repo, then it should "reparent" the commit-graph chain onto the new
|
||||
chain in the base repo. When reading each `graph-{hash}` file, we track
|
||||
the object directory containing it. During a write of a new commit-graph file,
|
||||
we check for any changes in the source object directory and read the
|
||||
`commit-graph-chain` file for that source and create a new file based on those
|
||||
files. During this "reparent" operation, we necessarily need to collapse all
|
||||
levels in the fork, as all of the files are invalid against the new base file.
|
||||
|
||||
It is crucial to be careful when cleaning up "unreferenced" `graph-{hash}.graph`
|
||||
files in this scenario. It falls to the user to define the proper settings for
|
||||
their custom environment:
|
||||
|
||||
1. When merging levels in the base repo, the unreferenced files may still be
|
||||
referenced by chains from fork repos.
|
||||
|
||||
2. The expiry time should be set to a length of time such that every fork has
|
||||
time to recompute their commit-graph chain to "reparent" onto the new base
|
||||
file(s).
|
||||
|
||||
3. If the commit-graph chain is updated in the base, the fork will not have
|
||||
access to the new chain until its chain is updated to reference those files.
|
||||
(This may change in the future [5].)
|
||||
|
||||
Related Links
|
||||
-------------
|
||||
[0] https://bugs.chromium.org/p/git/issues/detail?id=8
|
||||
Chromium work item for: Serialized Commit Graph
|
||||
|
||||
[1] https://public-inbox.org/git/20110713070517.GC18566@sigill.intra.peff.net/
|
||||
An abandoned patch that introduced generation numbers.
|
||||
|
||||
[2] https://public-inbox.org/git/20170908033403.q7e6dj7benasrjes@sigill.intra.peff.net/
|
||||
Discussion about generation numbers on commits and how they interact
|
||||
with fsck.
|
||||
|
||||
[3] https://public-inbox.org/git/20170908034739.4op3w4f2ma5s65ku@sigill.intra.peff.net/
|
||||
More discussion about generation numbers and not storing them inside
|
||||
commit objects. A valuable quote:
|
||||
|
||||
"I think we should be moving more in the direction of keeping
|
||||
repo-local caches for optimizations. Reachability bitmaps have been
|
||||
a big performance win. I think we should be doing the same with our
|
||||
properties of commits. Not just generation numbers, but making it
|
||||
cheap to access the graph structure without zlib-inflating whole
|
||||
commit objects (i.e., packv4 or something like the "metapacks" I
|
||||
proposed a few years ago)."
|
||||
|
||||
[4] https://public-inbox.org/git/20180108154822.54829-1-git@jeffhostetler.com/T/#u
|
||||
A patch to remove the ahead-behind calculation from 'status'.
|
||||
|
||||
[5] https://public-inbox.org/git/f27db281-abad-5043-6d71-cbb083b1c877@gmail.com/
|
||||
A discussion of a "two-dimensional graph position" that can allow reading
|
||||
multiple commit-graph chains at the same time.
|
||||
115
Documentation/technical/directory-rename-detection.txt
Normal file
115
Documentation/technical/directory-rename-detection.txt
Normal file
|
|
@ -0,0 +1,115 @@
|
|||
Directory rename detection
|
||||
==========================
|
||||
|
||||
Rename detection logic in diffcore-rename that checks for renames of
|
||||
individual files is aggregated and analyzed in merge-recursive for cases
|
||||
where combinations of renames indicate that a full directory has been
|
||||
renamed.
|
||||
|
||||
Scope of abilities
|
||||
------------------
|
||||
|
||||
It is perhaps easiest to start with an example:
|
||||
|
||||
* When all of x/a, x/b and x/c have moved to z/a, z/b and z/c, it is
|
||||
likely that x/d added in the meantime would also want to move to z/d by
|
||||
taking the hint that the entire directory 'x' moved to 'z'.
|
||||
|
||||
More interesting possibilities exist, though, such as:
|
||||
|
||||
* one side of history renames x -> z, and the other renames some file to
|
||||
x/e, causing the need for the merge to do a transitive rename.
|
||||
|
||||
* one side of history renames x -> z, but also renames all files within x.
|
||||
For example, x/a -> z/alpha, x/b -> z/bravo, etc.
|
||||
|
||||
* both 'x' and 'y' being merged into a single directory 'z', with a
|
||||
directory rename being detected for both x->z and y->z.
|
||||
|
||||
* not all files in a directory being renamed to the same location;
|
||||
i.e. perhaps most the files in 'x' are now found under 'z', but a few
|
||||
are found under 'w'.
|
||||
|
||||
* a directory being renamed, which also contained a subdirectory that was
|
||||
renamed to some entirely different location. (And perhaps the inner
|
||||
directory itself contained inner directories that were renamed to yet
|
||||
other locations).
|
||||
|
||||
* combinations of the above; see t/t6043-merge-rename-directories.sh for
|
||||
various interesting cases.
|
||||
|
||||
Limitations -- applicability of directory renames
|
||||
-------------------------------------------------
|
||||
|
||||
In order to prevent edge and corner cases resulting in either conflicts
|
||||
that cannot be represented in the index or which might be too complex for
|
||||
users to try to understand and resolve, a couple basic rules limit when
|
||||
directory rename detection applies:
|
||||
|
||||
1) If a given directory still exists on both sides of a merge, we do
|
||||
not consider it to have been renamed.
|
||||
|
||||
2) If a subset of to-be-renamed files have a file or directory in the
|
||||
way (or would be in the way of each other), "turn off" the directory
|
||||
rename for those specific sub-paths and report the conflict to the
|
||||
user.
|
||||
|
||||
3) If the other side of history did a directory rename to a path that
|
||||
your side of history renamed away, then ignore that particular
|
||||
rename from the other side of history for any implicit directory
|
||||
renames (but warn the user).
|
||||
|
||||
Limitations -- detailed rules and testcases
|
||||
-------------------------------------------
|
||||
|
||||
t/t6043-merge-rename-directories.sh contains extensive tests and commentary
|
||||
which generate and explore the rules listed above. It also lists a few
|
||||
additional rules:
|
||||
|
||||
a) If renames split a directory into two or more others, the directory
|
||||
with the most renames, "wins".
|
||||
|
||||
b) Avoid directory-rename-detection for a path, if that path is the
|
||||
source of a rename on either side of a merge.
|
||||
|
||||
c) Only apply implicit directory renames to directories if the other side
|
||||
of history is the one doing the renaming.
|
||||
|
||||
Limitations -- support in different commands
|
||||
--------------------------------------------
|
||||
|
||||
Directory rename detection is supported by 'merge' and 'cherry-pick'.
|
||||
Other git commands which users might be surprised to see limited or no
|
||||
directory rename detection support in:
|
||||
|
||||
* diff
|
||||
|
||||
Folks have requested in the past that `git diff` detect directory
|
||||
renames and somehow simplify its output. It is not clear whether this
|
||||
would be desirable or how the output should be simplified, so this was
|
||||
simply not implemented. Further, to implement this, directory rename
|
||||
detection logic would need to move from merge-recursive to
|
||||
diffcore-rename.
|
||||
|
||||
* am
|
||||
|
||||
git-am tries to avoid a full three way merge, instead calling
|
||||
git-apply. That prevents us from detecting renames at all, which may
|
||||
defeat the directory rename detection. There is a fallback, though; if
|
||||
the initial git-apply fails and the user has specified the -3 option,
|
||||
git-am will fall back to a three way merge. However, git-am lacks the
|
||||
necessary information to do a "real" three way merge. Instead, it has
|
||||
to use build_fake_ancestor() to get a merge base that is missing files
|
||||
whose rename may have been important to detect for directory rename
|
||||
detection to function.
|
||||
|
||||
* rebase
|
||||
|
||||
Since am-based rebases work by first generating a bunch of patches
|
||||
(which no longer record what the original commits were and thus don't
|
||||
have the necessary info from which we can find a real merge-base), and
|
||||
then calling git-am, this implies that am-based rebases will not always
|
||||
successfully detect directory renames either (see the 'am' section
|
||||
above). merged-based rebases (rebase -m) and cherry-pick-based rebases
|
||||
(rebase -i) are not affected by this shortcoming, and fully support
|
||||
directory rename detection.
|
||||
827
Documentation/technical/hash-function-transition.txt
Normal file
827
Documentation/technical/hash-function-transition.txt
Normal file
|
|
@ -0,0 +1,827 @@
|
|||
Git hash function transition
|
||||
============================
|
||||
|
||||
Objective
|
||||
---------
|
||||
Migrate Git from SHA-1 to a stronger hash function.
|
||||
|
||||
Background
|
||||
----------
|
||||
At its core, the Git version control system is a content addressable
|
||||
filesystem. It uses the SHA-1 hash function to name content. For
|
||||
example, files, directories, and revisions are referred to by hash
|
||||
values unlike in other traditional version control systems where files
|
||||
or versions are referred to via sequential numbers. The use of a hash
|
||||
function to address its content delivers a few advantages:
|
||||
|
||||
* Integrity checking is easy. Bit flips, for example, are easily
|
||||
detected, as the hash of corrupted content does not match its name.
|
||||
* Lookup of objects is fast.
|
||||
|
||||
Using a cryptographically secure hash function brings additional
|
||||
advantages:
|
||||
|
||||
* Object names can be signed and third parties can trust the hash to
|
||||
address the signed object and all objects it references.
|
||||
* Communication using Git protocol and out of band communication
|
||||
methods have a short reliable string that can be used to reliably
|
||||
address stored content.
|
||||
|
||||
Over time some flaws in SHA-1 have been discovered by security
|
||||
researchers. On 23 February 2017 the SHAttered attack
|
||||
(https://shattered.io) demonstrated a practical SHA-1 hash collision.
|
||||
|
||||
Git v2.13.0 and later subsequently moved to a hardened SHA-1
|
||||
implementation by default, which isn't vulnerable to the SHAttered
|
||||
attack.
|
||||
|
||||
Thus Git has in effect already migrated to a new hash that isn't SHA-1
|
||||
and doesn't share its vulnerabilities, its new hash function just
|
||||
happens to produce exactly the same output for all known inputs,
|
||||
except two PDFs published by the SHAttered researchers, and the new
|
||||
implementation (written by those researchers) claims to detect future
|
||||
cryptanalytic collision attacks.
|
||||
|
||||
Regardless, it's considered prudent to move past any variant of SHA-1
|
||||
to a new hash. There's no guarantee that future attacks on SHA-1 won't
|
||||
be published in the future, and those attacks may not have viable
|
||||
mitigations.
|
||||
|
||||
If SHA-1 and its variants were to be truly broken, Git's hash function
|
||||
could not be considered cryptographically secure any more. This would
|
||||
impact the communication of hash values because we could not trust
|
||||
that a given hash value represented the known good version of content
|
||||
that the speaker intended.
|
||||
|
||||
SHA-1 still possesses the other properties such as fast object lookup
|
||||
and safe error checking, but other hash functions are equally suitable
|
||||
that are believed to be cryptographically secure.
|
||||
|
||||
Goals
|
||||
-----
|
||||
1. The transition to SHA-256 can be done one local repository at a time.
|
||||
a. Requiring no action by any other party.
|
||||
b. A SHA-256 repository can communicate with SHA-1 Git servers
|
||||
(push/fetch).
|
||||
c. Users can use SHA-1 and SHA-256 identifiers for objects
|
||||
interchangeably (see "Object names on the command line", below).
|
||||
d. New signed objects make use of a stronger hash function than
|
||||
SHA-1 for their security guarantees.
|
||||
2. Allow a complete transition away from SHA-1.
|
||||
a. Local metadata for SHA-1 compatibility can be removed from a
|
||||
repository if compatibility with SHA-1 is no longer needed.
|
||||
3. Maintainability throughout the process.
|
||||
a. The object format is kept simple and consistent.
|
||||
b. Creation of a generalized repository conversion tool.
|
||||
|
||||
Non-Goals
|
||||
---------
|
||||
1. Add SHA-256 support to Git protocol. This is valuable and the
|
||||
logical next step but it is out of scope for this initial design.
|
||||
2. Transparently improving the security of existing SHA-1 signed
|
||||
objects.
|
||||
3. Intermixing objects using multiple hash functions in a single
|
||||
repository.
|
||||
4. Taking the opportunity to fix other bugs in Git's formats and
|
||||
protocols.
|
||||
5. Shallow clones and fetches into a SHA-256 repository. (This will
|
||||
change when we add SHA-256 support to Git protocol.)
|
||||
6. Skip fetching some submodules of a project into a SHA-256
|
||||
repository. (This also depends on SHA-256 support in Git
|
||||
protocol.)
|
||||
|
||||
Overview
|
||||
--------
|
||||
We introduce a new repository format extension. Repositories with this
|
||||
extension enabled use SHA-256 instead of SHA-1 to name their objects.
|
||||
This affects both object names and object content --- both the names
|
||||
of objects and all references to other objects within an object are
|
||||
switched to the new hash function.
|
||||
|
||||
SHA-256 repositories cannot be read by older versions of Git.
|
||||
|
||||
Alongside the packfile, a SHA-256 repository stores a bidirectional
|
||||
mapping between SHA-256 and SHA-1 object names. The mapping is generated
|
||||
locally and can be verified using "git fsck". Object lookups use this
|
||||
mapping to allow naming objects using either their SHA-1 and SHA-256 names
|
||||
interchangeably.
|
||||
|
||||
"git cat-file" and "git hash-object" gain options to display an object
|
||||
in its sha1 form and write an object given its sha1 form. This
|
||||
requires all objects referenced by that object to be present in the
|
||||
object database so that they can be named using the appropriate name
|
||||
(using the bidirectional hash mapping).
|
||||
|
||||
Fetches from a SHA-1 based server convert the fetched objects into
|
||||
SHA-256 form and record the mapping in the bidirectional mapping table
|
||||
(see below for details). Pushes to a SHA-1 based server convert the
|
||||
objects being pushed into sha1 form so the server does not have to be
|
||||
aware of the hash function the client is using.
|
||||
|
||||
Detailed Design
|
||||
---------------
|
||||
Repository format extension
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
A SHA-256 repository uses repository format version `1` (see
|
||||
Documentation/technical/repository-version.txt) with extensions
|
||||
`objectFormat` and `compatObjectFormat`:
|
||||
|
||||
[core]
|
||||
repositoryFormatVersion = 1
|
||||
[extensions]
|
||||
objectFormat = sha256
|
||||
compatObjectFormat = sha1
|
||||
|
||||
The combination of setting `core.repositoryFormatVersion=1` and
|
||||
populating `extensions.*` ensures that all versions of Git later than
|
||||
`v0.99.9l` will die instead of trying to operate on the SHA-256
|
||||
repository, instead producing an error message.
|
||||
|
||||
# Between v0.99.9l and v2.7.0
|
||||
$ git status
|
||||
fatal: Expected git repo version <= 0, found 1
|
||||
# After v2.7.0
|
||||
$ git status
|
||||
fatal: unknown repository extensions found:
|
||||
objectformat
|
||||
compatobjectformat
|
||||
|
||||
See the "Transition plan" section below for more details on these
|
||||
repository extensions.
|
||||
|
||||
Object names
|
||||
~~~~~~~~~~~~
|
||||
Objects can be named by their 40 hexadecimal digit sha1-name or 64
|
||||
hexadecimal digit sha256-name, plus names derived from those (see
|
||||
gitrevisions(7)).
|
||||
|
||||
The sha1-name of an object is the SHA-1 of the concatenation of its
|
||||
type, length, a nul byte, and the object's sha1-content. This is the
|
||||
traditional <sha1> used in Git to name objects.
|
||||
|
||||
The sha256-name of an object is the SHA-256 of the concatenation of its
|
||||
type, length, a nul byte, and the object's sha256-content.
|
||||
|
||||
Object format
|
||||
~~~~~~~~~~~~~
|
||||
The content as a byte sequence of a tag, commit, or tree object named
|
||||
by sha1 and sha256 differ because an object named by sha256-name refers to
|
||||
other objects by their sha256-names and an object named by sha1-name
|
||||
refers to other objects by their sha1-names.
|
||||
|
||||
The sha256-content of an object is the same as its sha1-content, except
|
||||
that objects referenced by the object are named using their sha256-names
|
||||
instead of sha1-names. Because a blob object does not refer to any
|
||||
other object, its sha1-content and sha256-content are the same.
|
||||
|
||||
The format allows round-trip conversion between sha256-content and
|
||||
sha1-content.
|
||||
|
||||
Object storage
|
||||
~~~~~~~~~~~~~~
|
||||
Loose objects use zlib compression and packed objects use the packed
|
||||
format described in Documentation/technical/pack-format.txt, just like
|
||||
today. The content that is compressed and stored uses sha256-content
|
||||
instead of sha1-content.
|
||||
|
||||
Pack index
|
||||
~~~~~~~~~~
|
||||
Pack index (.idx) files use a new v3 format that supports multiple
|
||||
hash functions. They have the following format (all integers are in
|
||||
network byte order):
|
||||
|
||||
- A header appears at the beginning and consists of the following:
|
||||
- The 4-byte pack index signature: '\377t0c'
|
||||
- 4-byte version number: 3
|
||||
- 4-byte length of the header section, including the signature and
|
||||
version number
|
||||
- 4-byte number of objects contained in the pack
|
||||
- 4-byte number of object formats in this pack index: 2
|
||||
- For each object format:
|
||||
- 4-byte format identifier (e.g., 'sha1' for SHA-1)
|
||||
- 4-byte length in bytes of shortened object names. This is the
|
||||
shortest possible length needed to make names in the shortened
|
||||
object name table unambiguous.
|
||||
- 4-byte integer, recording where tables relating to this format
|
||||
are stored in this index file, as an offset from the beginning.
|
||||
- 4-byte offset to the trailer from the beginning of this file.
|
||||
- Zero or more additional key/value pairs (4-byte key, 4-byte
|
||||
value). Only one key is supported: 'PSRC'. See the "Loose objects
|
||||
and unreachable objects" section for supported values and how this
|
||||
is used. All other keys are reserved. Readers must ignore
|
||||
unrecognized keys.
|
||||
- Zero or more NUL bytes. This can optionally be used to improve the
|
||||
alignment of the full object name table below.
|
||||
- Tables for the first object format:
|
||||
- A sorted table of shortened object names. These are prefixes of
|
||||
the names of all objects in this pack file, packed together
|
||||
without offset values to reduce the cache footprint of the binary
|
||||
search for a specific object name.
|
||||
|
||||
- A table of full object names in pack order. This allows resolving
|
||||
a reference to "the nth object in the pack file" (from a
|
||||
reachability bitmap or from the next table of another object
|
||||
format) to its object name.
|
||||
|
||||
- A table of 4-byte values mapping object name order to pack order.
|
||||
For an object in the table of sorted shortened object names, the
|
||||
value at the corresponding index in this table is the index in the
|
||||
previous table for that same object.
|
||||
|
||||
This can be used to look up the object in reachability bitmaps or
|
||||
to look up its name in another object format.
|
||||
|
||||
- A table of 4-byte CRC32 values of the packed object data, in the
|
||||
order that the objects appear in the pack file. This is to allow
|
||||
compressed data to be copied directly from pack to pack during
|
||||
repacking without undetected data corruption.
|
||||
|
||||
- A table of 4-byte offset values. For an object in the table of
|
||||
sorted shortened object names, the value at the corresponding
|
||||
index in this table indicates where that object can be found in
|
||||
the pack file. These are usually 31-bit pack file offsets, but
|
||||
large offsets are encoded as an index into the next table with the
|
||||
most significant bit set.
|
||||
|
||||
- A table of 8-byte offset entries (empty for pack files less than
|
||||
2 GiB). Pack files are organized with heavily used objects toward
|
||||
the front, so most object references should not need to refer to
|
||||
this table.
|
||||
- Zero or more NUL bytes.
|
||||
- Tables for the second object format, with the same layout as above,
|
||||
up to and not including the table of CRC32 values.
|
||||
- Zero or more NUL bytes.
|
||||
- The trailer consists of the following:
|
||||
- A copy of the 20-byte SHA-256 checksum at the end of the
|
||||
corresponding packfile.
|
||||
|
||||
- 20-byte SHA-256 checksum of all of the above.
|
||||
|
||||
Loose object index
|
||||
~~~~~~~~~~~~~~~~~~
|
||||
A new file $GIT_OBJECT_DIR/loose-object-idx contains information about
|
||||
all loose objects. Its format is
|
||||
|
||||
# loose-object-idx
|
||||
(sha256-name SP sha1-name LF)*
|
||||
|
||||
where the object names are in hexadecimal format. The file is not
|
||||
sorted.
|
||||
|
||||
The loose object index is protected against concurrent writes by a
|
||||
lock file $GIT_OBJECT_DIR/loose-object-idx.lock. To add a new loose
|
||||
object:
|
||||
|
||||
1. Write the loose object to a temporary file, like today.
|
||||
2. Open loose-object-idx.lock with O_CREAT | O_EXCL to acquire the lock.
|
||||
3. Rename the loose object into place.
|
||||
4. Open loose-object-idx with O_APPEND and write the new object
|
||||
5. Unlink loose-object-idx.lock to release the lock.
|
||||
|
||||
To remove entries (e.g. in "git pack-refs" or "git-prune"):
|
||||
|
||||
1. Open loose-object-idx.lock with O_CREAT | O_EXCL to acquire the
|
||||
lock.
|
||||
2. Write the new content to loose-object-idx.lock.
|
||||
3. Unlink any loose objects being removed.
|
||||
4. Rename to replace loose-object-idx, releasing the lock.
|
||||
|
||||
Translation table
|
||||
~~~~~~~~~~~~~~~~~
|
||||
The index files support a bidirectional mapping between sha1-names
|
||||
and sha256-names. The lookup proceeds similarly to ordinary object
|
||||
lookups. For example, to convert a sha1-name to a sha256-name:
|
||||
|
||||
1. Look for the object in idx files. If a match is present in the
|
||||
idx's sorted list of truncated sha1-names, then:
|
||||
a. Read the corresponding entry in the sha1-name order to pack
|
||||
name order mapping.
|
||||
b. Read the corresponding entry in the full sha1-name table to
|
||||
verify we found the right object. If it is, then
|
||||
c. Read the corresponding entry in the full sha256-name table.
|
||||
That is the object's sha256-name.
|
||||
2. Check for a loose object. Read lines from loose-object-idx until
|
||||
we find a match.
|
||||
|
||||
Step (1) takes the same amount of time as an ordinary object lookup:
|
||||
O(number of packs * log(objects per pack)). Step (2) takes O(number of
|
||||
loose objects) time. To maintain good performance it will be necessary
|
||||
to keep the number of loose objects low. See the "Loose objects and
|
||||
unreachable objects" section below for more details.
|
||||
|
||||
Since all operations that make new objects (e.g., "git commit") add
|
||||
the new objects to the corresponding index, this mapping is possible
|
||||
for all objects in the object store.
|
||||
|
||||
Reading an object's sha1-content
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
The sha1-content of an object can be read by converting all sha256-names
|
||||
its sha256-content references to sha1-names using the translation table.
|
||||
|
||||
Fetch
|
||||
~~~~~
|
||||
Fetching from a SHA-1 based server requires translating between SHA-1
|
||||
and SHA-256 based representations on the fly.
|
||||
|
||||
SHA-1s named in the ref advertisement that are present on the client
|
||||
can be translated to SHA-256 and looked up as local objects using the
|
||||
translation table.
|
||||
|
||||
Negotiation proceeds as today. Any "have"s generated locally are
|
||||
converted to SHA-1 before being sent to the server, and SHA-1s
|
||||
mentioned by the server are converted to SHA-256 when looking them up
|
||||
locally.
|
||||
|
||||
After negotiation, the server sends a packfile containing the
|
||||
requested objects. We convert the packfile to SHA-256 format using
|
||||
the following steps:
|
||||
|
||||
1. index-pack: inflate each object in the packfile and compute its
|
||||
SHA-1. Objects can contain deltas in OBJ_REF_DELTA format against
|
||||
objects the client has locally. These objects can be looked up
|
||||
using the translation table and their sha1-content read as
|
||||
described above to resolve the deltas.
|
||||
2. topological sort: starting at the "want"s from the negotiation
|
||||
phase, walk through objects in the pack and emit a list of them,
|
||||
excluding blobs, in reverse topologically sorted order, with each
|
||||
object coming later in the list than all objects it references.
|
||||
(This list only contains objects reachable from the "wants". If the
|
||||
pack from the server contained additional extraneous objects, then
|
||||
they will be discarded.)
|
||||
3. convert to sha256: open a new (sha256) packfile. Read the topologically
|
||||
sorted list just generated. For each object, inflate its
|
||||
sha1-content, convert to sha256-content, and write it to the sha256
|
||||
pack. Record the new sha1<->sha256 mapping entry for use in the idx.
|
||||
4. sort: reorder entries in the new pack to match the order of objects
|
||||
in the pack the server generated and include blobs. Write a sha256 idx
|
||||
file
|
||||
5. clean up: remove the SHA-1 based pack file, index, and
|
||||
topologically sorted list obtained from the server in steps 1
|
||||
and 2.
|
||||
|
||||
Step 3 requires every object referenced by the new object to be in the
|
||||
translation table. This is why the topological sort step is necessary.
|
||||
|
||||
As an optimization, step 1 could write a file describing what non-blob
|
||||
objects each object it has inflated from the packfile references. This
|
||||
makes the topological sort in step 2 possible without inflating the
|
||||
objects in the packfile for a second time. The objects need to be
|
||||
inflated again in step 3, for a total of two inflations.
|
||||
|
||||
Step 4 is probably necessary for good read-time performance. "git
|
||||
pack-objects" on the server optimizes the pack file for good data
|
||||
locality (see Documentation/technical/pack-heuristics.txt).
|
||||
|
||||
Details of this process are likely to change. It will take some
|
||||
experimenting to get this to perform well.
|
||||
|
||||
Push
|
||||
~~~~
|
||||
Push is simpler than fetch because the objects referenced by the
|
||||
pushed objects are already in the translation table. The sha1-content
|
||||
of each object being pushed can be read as described in the "Reading
|
||||
an object's sha1-content" section to generate the pack written by git
|
||||
send-pack.
|
||||
|
||||
Signed Commits
|
||||
~~~~~~~~~~~~~~
|
||||
We add a new field "gpgsig-sha256" to the commit object format to allow
|
||||
signing commits without relying on SHA-1. It is similar to the
|
||||
existing "gpgsig" field. Its signed payload is the sha256-content of the
|
||||
commit object with any "gpgsig" and "gpgsig-sha256" fields removed.
|
||||
|
||||
This means commits can be signed
|
||||
1. using SHA-1 only, as in existing signed commit objects
|
||||
2. using both SHA-1 and SHA-256, by using both gpgsig-sha256 and gpgsig
|
||||
fields.
|
||||
3. using only SHA-256, by only using the gpgsig-sha256 field.
|
||||
|
||||
Old versions of "git verify-commit" can verify the gpgsig signature in
|
||||
cases (1) and (2) without modifications and view case (3) as an
|
||||
ordinary unsigned commit.
|
||||
|
||||
Signed Tags
|
||||
~~~~~~~~~~~
|
||||
We add a new field "gpgsig-sha256" to the tag object format to allow
|
||||
signing tags without relying on SHA-1. Its signed payload is the
|
||||
sha256-content of the tag with its gpgsig-sha256 field and "-----BEGIN PGP
|
||||
SIGNATURE-----" delimited in-body signature removed.
|
||||
|
||||
This means tags can be signed
|
||||
1. using SHA-1 only, as in existing signed tag objects
|
||||
2. using both SHA-1 and SHA-256, by using gpgsig-sha256 and an in-body
|
||||
signature.
|
||||
3. using only SHA-256, by only using the gpgsig-sha256 field.
|
||||
|
||||
Mergetag embedding
|
||||
~~~~~~~~~~~~~~~~~~
|
||||
The mergetag field in the sha1-content of a commit contains the
|
||||
sha1-content of a tag that was merged by that commit.
|
||||
|
||||
The mergetag field in the sha256-content of the same commit contains the
|
||||
sha256-content of the same tag.
|
||||
|
||||
Submodules
|
||||
~~~~~~~~~~
|
||||
To convert recorded submodule pointers, you need to have the converted
|
||||
submodule repository in place. The translation table of the submodule
|
||||
can be used to look up the new hash.
|
||||
|
||||
Loose objects and unreachable objects
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
Fast lookups in the loose-object-idx require that the number of loose
|
||||
objects not grow too high.
|
||||
|
||||
"git gc --auto" currently waits for there to be 6700 loose objects
|
||||
present before consolidating them into a packfile. We will need to
|
||||
measure to find a more appropriate threshold for it to use.
|
||||
|
||||
"git gc --auto" currently waits for there to be 50 packs present
|
||||
before combining packfiles. Packing loose objects more aggressively
|
||||
may cause the number of pack files to grow too quickly. This can be
|
||||
mitigated by using a strategy similar to Martin Fick's exponential
|
||||
rolling garbage collection script:
|
||||
https://gerrit-review.googlesource.com/c/gerrit/+/35215
|
||||
|
||||
"git gc" currently expels any unreachable objects it encounters in
|
||||
pack files to loose objects in an attempt to prevent a race when
|
||||
pruning them (in case another process is simultaneously writing a new
|
||||
object that refers to the about-to-be-deleted object). This leads to
|
||||
an explosion in the number of loose objects present and disk space
|
||||
usage due to the objects in delta form being replaced with independent
|
||||
loose objects. Worse, the race is still present for loose objects.
|
||||
|
||||
Instead, "git gc" will need to move unreachable objects to a new
|
||||
packfile marked as UNREACHABLE_GARBAGE (using the PSRC field; see
|
||||
below). To avoid the race when writing new objects referring to an
|
||||
about-to-be-deleted object, code paths that write new objects will
|
||||
need to copy any objects from UNREACHABLE_GARBAGE packs that they
|
||||
refer to new, non-UNREACHABLE_GARBAGE packs (or loose objects).
|
||||
UNREACHABLE_GARBAGE are then safe to delete if their creation time (as
|
||||
indicated by the file's mtime) is long enough ago.
|
||||
|
||||
To avoid a proliferation of UNREACHABLE_GARBAGE packs, they can be
|
||||
combined under certain circumstances. If "gc.garbageTtl" is set to
|
||||
greater than one day, then packs created within a single calendar day,
|
||||
UTC, can be coalesced together. The resulting packfile would have an
|
||||
mtime before midnight on that day, so this makes the effective maximum
|
||||
ttl the garbageTtl + 1 day. If "gc.garbageTtl" is less than one day,
|
||||
then we divide the calendar day into intervals one-third of that ttl
|
||||
in duration. Packs created within the same interval can be coalesced
|
||||
together. The resulting packfile would have an mtime before the end of
|
||||
the interval, so this makes the effective maximum ttl equal to the
|
||||
garbageTtl * 4/3.
|
||||
|
||||
This rule comes from Thirumala Reddy Mutchukota's JGit change
|
||||
https://git.eclipse.org/r/90465.
|
||||
|
||||
The UNREACHABLE_GARBAGE setting goes in the PSRC field of the pack
|
||||
index. More generally, that field indicates where a pack came from:
|
||||
|
||||
- 1 (PACK_SOURCE_RECEIVE) for a pack received over the network
|
||||
- 2 (PACK_SOURCE_AUTO) for a pack created by a lightweight
|
||||
"gc --auto" operation
|
||||
- 3 (PACK_SOURCE_GC) for a pack created by a full gc
|
||||
- 4 (PACK_SOURCE_UNREACHABLE_GARBAGE) for potential garbage
|
||||
discovered by gc
|
||||
- 5 (PACK_SOURCE_INSERT) for locally created objects that were
|
||||
written directly to a pack file, e.g. from "git add ."
|
||||
|
||||
This information can be useful for debugging and for "gc --auto" to
|
||||
make appropriate choices about which packs to coalesce.
|
||||
|
||||
Caveats
|
||||
-------
|
||||
Invalid objects
|
||||
~~~~~~~~~~~~~~~
|
||||
The conversion from sha1-content to sha256-content retains any
|
||||
brokenness in the original object (e.g., tree entry modes encoded with
|
||||
leading 0, tree objects whose paths are not sorted correctly, and
|
||||
commit objects without an author or committer). This is a deliberate
|
||||
feature of the design to allow the conversion to round-trip.
|
||||
|
||||
More profoundly broken objects (e.g., a commit with a truncated "tree"
|
||||
header line) cannot be converted but were not usable by current Git
|
||||
anyway.
|
||||
|
||||
Shallow clone and submodules
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
Because it requires all referenced objects to be available in the
|
||||
locally generated translation table, this design does not support
|
||||
shallow clone or unfetched submodules. Protocol improvements might
|
||||
allow lifting this restriction.
|
||||
|
||||
Alternates
|
||||
~~~~~~~~~~
|
||||
For the same reason, a sha256 repository cannot borrow objects from a
|
||||
sha1 repository using objects/info/alternates or
|
||||
$GIT_ALTERNATE_OBJECT_REPOSITORIES.
|
||||
|
||||
git notes
|
||||
~~~~~~~~~
|
||||
The "git notes" tool annotates objects using their sha1-name as key.
|
||||
This design does not describe a way to migrate notes trees to use
|
||||
sha256-names. That migration is expected to happen separately (for
|
||||
example using a file at the root of the notes tree to describe which
|
||||
hash it uses).
|
||||
|
||||
Server-side cost
|
||||
~~~~~~~~~~~~~~~~
|
||||
Until Git protocol gains SHA-256 support, using SHA-256 based storage
|
||||
on public-facing Git servers is strongly discouraged. Once Git
|
||||
protocol gains SHA-256 support, SHA-256 based servers are likely not
|
||||
to support SHA-1 compatibility, to avoid what may be a very expensive
|
||||
hash reencode during clone and to encourage peers to modernize.
|
||||
|
||||
The design described here allows fetches by SHA-1 clients of a
|
||||
personal SHA-256 repository because it's not much more difficult than
|
||||
allowing pushes from that repository. This support needs to be guarded
|
||||
by a configuration option --- servers like git.kernel.org that serve a
|
||||
large number of clients would not be expected to bear that cost.
|
||||
|
||||
Meaning of signatures
|
||||
~~~~~~~~~~~~~~~~~~~~~
|
||||
The signed payload for signed commits and tags does not explicitly
|
||||
name the hash used to identify objects. If some day Git adopts a new
|
||||
hash function with the same length as the current SHA-1 (40
|
||||
hexadecimal digit) or SHA-256 (64 hexadecimal digit) objects then the
|
||||
intent behind the PGP signed payload in an object signature is
|
||||
unclear:
|
||||
|
||||
object e7e07d5a4fcc2a203d9873968ad3e6bd4d7419d7
|
||||
type commit
|
||||
tag v2.12.0
|
||||
tagger Junio C Hamano <gitster@pobox.com> 1487962205 -0800
|
||||
|
||||
Git 2.12
|
||||
|
||||
Does this mean Git v2.12.0 is the commit with sha1-name
|
||||
e7e07d5a4fcc2a203d9873968ad3e6bd4d7419d7 or the commit with
|
||||
new-40-digit-hash-name e7e07d5a4fcc2a203d9873968ad3e6bd4d7419d7?
|
||||
|
||||
Fortunately SHA-256 and SHA-1 have different lengths. If Git starts
|
||||
using another hash with the same length to name objects, then it will
|
||||
need to change the format of signed payloads using that hash to
|
||||
address this issue.
|
||||
|
||||
Object names on the command line
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
To support the transition (see Transition plan below), this design
|
||||
supports four different modes of operation:
|
||||
|
||||
1. ("dark launch") Treat object names input by the user as SHA-1 and
|
||||
convert any object names written to output to SHA-1, but store
|
||||
objects using SHA-256. This allows users to test the code with no
|
||||
visible behavior change except for performance. This allows
|
||||
allows running even tests that assume the SHA-1 hash function, to
|
||||
sanity-check the behavior of the new mode.
|
||||
|
||||
2. ("early transition") Allow both SHA-1 and SHA-256 object names in
|
||||
input. Any object names written to output use SHA-1. This allows
|
||||
users to continue to make use of SHA-1 to communicate with peers
|
||||
(e.g. by email) that have not migrated yet and prepares for mode 3.
|
||||
|
||||
3. ("late transition") Allow both SHA-1 and SHA-256 object names in
|
||||
input. Any object names written to output use SHA-256. In this
|
||||
mode, users are using a more secure object naming method by
|
||||
default. The disruption is minimal as long as most of their peers
|
||||
are in mode 2 or mode 3.
|
||||
|
||||
4. ("post-transition") Treat object names input by the user as
|
||||
SHA-256 and write output using SHA-256. This is safer than mode 3
|
||||
because there is less risk that input is incorrectly interpreted
|
||||
using the wrong hash function.
|
||||
|
||||
The mode is specified in configuration.
|
||||
|
||||
The user can also explicitly specify which format to use for a
|
||||
particular revision specifier and for output, overriding the mode. For
|
||||
example:
|
||||
|
||||
git --output-format=sha1 log abac87a^{sha1}..f787cac^{sha256}
|
||||
|
||||
Choice of Hash
|
||||
--------------
|
||||
In early 2005, around the time that Git was written, Xiaoyun Wang,
|
||||
Yiqun Lisa Yin, and Hongbo Yu announced an attack finding SHA-1
|
||||
collisions in 2^69 operations. In August they published details.
|
||||
Luckily, no practical demonstrations of a collision in full SHA-1 were
|
||||
published until 10 years later, in 2017.
|
||||
|
||||
Git v2.13.0 and later subsequently moved to a hardened SHA-1
|
||||
implementation by default that mitigates the SHAttered attack, but
|
||||
SHA-1 is still believed to be weak.
|
||||
|
||||
The hash to replace this hardened SHA-1 should be stronger than SHA-1
|
||||
was: we would like it to be trustworthy and useful in practice for at
|
||||
least 10 years.
|
||||
|
||||
Some other relevant properties:
|
||||
|
||||
1. A 256-bit hash (long enough to match common security practice; not
|
||||
excessively long to hurt performance and disk usage).
|
||||
|
||||
2. High quality implementations should be widely available (e.g., in
|
||||
OpenSSL and Apple CommonCrypto).
|
||||
|
||||
3. The hash function's properties should match Git's needs (e.g. Git
|
||||
requires collision and 2nd preimage resistance and does not require
|
||||
length extension resistance).
|
||||
|
||||
4. As a tiebreaker, the hash should be fast to compute (fortunately
|
||||
many contenders are faster than SHA-1).
|
||||
|
||||
We choose SHA-256.
|
||||
|
||||
Transition plan
|
||||
---------------
|
||||
Some initial steps can be implemented independently of one another:
|
||||
- adding a hash function API (vtable)
|
||||
- teaching fsck to tolerate the gpgsig-sha256 field
|
||||
- excluding gpgsig-* from the fields copied by "git commit --amend"
|
||||
- annotating tests that depend on SHA-1 values with a SHA1 test
|
||||
prerequisite
|
||||
- using "struct object_id", GIT_MAX_RAWSZ, and GIT_MAX_HEXSZ
|
||||
consistently instead of "unsigned char *" and the hardcoded
|
||||
constants 20 and 40.
|
||||
- introducing index v3
|
||||
- adding support for the PSRC field and safer object pruning
|
||||
|
||||
|
||||
The first user-visible change is the introduction of the objectFormat
|
||||
extension (without compatObjectFormat). This requires:
|
||||
- implementing the loose-object-idx
|
||||
- teaching fsck about this mode of operation
|
||||
- using the hash function API (vtable) when computing object names
|
||||
- signing objects and verifying signatures
|
||||
- rejecting attempts to fetch from or push to an incompatible
|
||||
repository
|
||||
|
||||
Next comes introduction of compatObjectFormat:
|
||||
- translating object names between object formats
|
||||
- translating object content between object formats
|
||||
- generating and verifying signatures in the compat format
|
||||
- adding appropriate index entries when adding a new object to the
|
||||
object store
|
||||
- --output-format option
|
||||
- ^{sha1} and ^{sha256} revision notation
|
||||
- configuration to specify default input and output format (see
|
||||
"Object names on the command line" above)
|
||||
|
||||
The next step is supporting fetches and pushes to SHA-1 repositories:
|
||||
- allow pushes to a repository using the compat format
|
||||
- generate a topologically sorted list of the SHA-1 names of fetched
|
||||
objects
|
||||
- convert the fetched packfile to sha256 format and generate an idx
|
||||
file
|
||||
- re-sort to match the order of objects in the fetched packfile
|
||||
|
||||
The infrastructure supporting fetch also allows converting an existing
|
||||
repository. In converted repositories and new clones, end users can
|
||||
gain support for the new hash function without any visible change in
|
||||
behavior (see "dark launch" in the "Object names on the command line"
|
||||
section). In particular this allows users to verify SHA-256 signatures
|
||||
on objects in the repository, and it should ensure the transition code
|
||||
is stable in production in preparation for using it more widely.
|
||||
|
||||
Over time projects would encourage their users to adopt the "early
|
||||
transition" and then "late transition" modes to take advantage of the
|
||||
new, more futureproof SHA-256 object names.
|
||||
|
||||
When objectFormat and compatObjectFormat are both set, commands
|
||||
generating signatures would generate both SHA-1 and SHA-256 signatures
|
||||
by default to support both new and old users.
|
||||
|
||||
In projects using SHA-256 heavily, users could be encouraged to adopt
|
||||
the "post-transition" mode to avoid accidentally making implicit use
|
||||
of SHA-1 object names.
|
||||
|
||||
Once a critical mass of users have upgraded to a version of Git that
|
||||
can verify SHA-256 signatures and have converted their existing
|
||||
repositories to support verifying them, we can add support for a
|
||||
setting to generate only SHA-256 signatures. This is expected to be at
|
||||
least a year later.
|
||||
|
||||
That is also a good moment to advertise the ability to convert
|
||||
repositories to use SHA-256 only, stripping out all SHA-1 related
|
||||
metadata. This improves performance by eliminating translation
|
||||
overhead and security by avoiding the possibility of accidentally
|
||||
relying on the safety of SHA-1.
|
||||
|
||||
Updating Git's protocols to allow a server to specify which hash
|
||||
functions it supports is also an important part of this transition. It
|
||||
is not discussed in detail in this document but this transition plan
|
||||
assumes it happens. :)
|
||||
|
||||
Alternatives considered
|
||||
-----------------------
|
||||
Upgrading everyone working on a particular project on a flag day
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
Projects like the Linux kernel are large and complex enough that
|
||||
flipping the switch for all projects based on the repository at once
|
||||
is infeasible.
|
||||
|
||||
Not only would all developers and server operators supporting
|
||||
developers have to switch on the same flag day, but supporting tooling
|
||||
(continuous integration, code review, bug trackers, etc) would have to
|
||||
be adapted as well. This also makes it difficult to get early feedback
|
||||
from some project participants testing before it is time for mass
|
||||
adoption.
|
||||
|
||||
Using hash functions in parallel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
(e.g. https://public-inbox.org/git/22708.8913.864049.452252@chiark.greenend.org.uk/ )
|
||||
Objects newly created would be addressed by the new hash, but inside
|
||||
such an object (e.g. commit) it is still possible to address objects
|
||||
using the old hash function.
|
||||
* You cannot trust its history (needed for bisectability) in the
|
||||
future without further work
|
||||
* Maintenance burden as the number of supported hash functions grows
|
||||
(they will never go away, so they accumulate). In this proposal, by
|
||||
comparison, converted objects lose all references to SHA-1.
|
||||
|
||||
Signed objects with multiple hashes
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
Instead of introducing the gpgsig-sha256 field in commit and tag objects
|
||||
for sha256-content based signatures, an earlier version of this design
|
||||
added "hash sha256 <sha256-name>" fields to strengthen the existing
|
||||
sha1-content based signatures.
|
||||
|
||||
In other words, a single signature was used to attest to the object
|
||||
content using both hash functions. This had some advantages:
|
||||
* Using one signature instead of two speeds up the signing process.
|
||||
* Having one signed payload with both hashes allows the signer to
|
||||
attest to the sha1-name and sha256-name referring to the same object.
|
||||
* All users consume the same signature. Broken signatures are likely
|
||||
to be detected quickly using current versions of git.
|
||||
|
||||
However, it also came with disadvantages:
|
||||
* Verifying a signed object requires access to the sha1-names of all
|
||||
objects it references, even after the transition is complete and
|
||||
translation table is no longer needed for anything else. To support
|
||||
this, the design added fields such as "hash sha1 tree <sha1-name>"
|
||||
and "hash sha1 parent <sha1-name>" to the sha256-content of a signed
|
||||
commit, complicating the conversion process.
|
||||
* Allowing signed objects without a sha1 (for after the transition is
|
||||
complete) complicated the design further, requiring a "nohash sha1"
|
||||
field to suppress including "hash sha1" fields in the sha256-content
|
||||
and signed payload.
|
||||
|
||||
Lazily populated translation table
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
Some of the work of building the translation table could be deferred to
|
||||
push time, but that would significantly complicate and slow down pushes.
|
||||
Calculating the sha1-name at object creation time at the same time it is
|
||||
being streamed to disk and having its sha256-name calculated should be
|
||||
an acceptable cost.
|
||||
|
||||
Document History
|
||||
----------------
|
||||
|
||||
2017-03-03
|
||||
bmwill@google.com, jonathantanmy@google.com, jrnieder@gmail.com,
|
||||
sbeller@google.com
|
||||
|
||||
Initial version sent to
|
||||
http://public-inbox.org/git/20170304011251.GA26789@aiede.mtv.corp.google.com
|
||||
|
||||
2017-03-03 jrnieder@gmail.com
|
||||
Incorporated suggestions from jonathantanmy and sbeller:
|
||||
* describe purpose of signed objects with each hash type
|
||||
* redefine signed object verification using object content under the
|
||||
first hash function
|
||||
|
||||
2017-03-06 jrnieder@gmail.com
|
||||
* Use SHA3-256 instead of SHA2 (thanks, Linus and brian m. carlson).[1][2]
|
||||
* Make sha3-based signatures a separate field, avoiding the need for
|
||||
"hash" and "nohash" fields (thanks to peff[3]).
|
||||
* Add a sorting phase to fetch (thanks to Junio for noticing the need
|
||||
for this).
|
||||
* Omit blobs from the topological sort during fetch (thanks to peff).
|
||||
* Discuss alternates, git notes, and git servers in the caveats
|
||||
section (thanks to Junio Hamano, brian m. carlson[4], and Shawn
|
||||
Pearce).
|
||||
* Clarify language throughout (thanks to various commenters,
|
||||
especially Junio).
|
||||
|
||||
2017-09-27 jrnieder@gmail.com, sbeller@google.com
|
||||
* use placeholder NewHash instead of SHA3-256
|
||||
* describe criteria for picking a hash function.
|
||||
* include a transition plan (thanks especially to Brandon Williams
|
||||
for fleshing these ideas out)
|
||||
* define the translation table (thanks, Shawn Pearce[5], Jonathan
|
||||
Tan, and Masaya Suzuki)
|
||||
* avoid loose object overhead by packing more aggressively in
|
||||
"git gc --auto"
|
||||
|
||||
Later history:
|
||||
|
||||
See the history of this file in git.git for the history of subsequent
|
||||
edits. This document history is no longer being maintained as it
|
||||
would now be superfluous to the commit log
|
||||
|
||||
[1] http://public-inbox.org/git/CA+55aFzJtejiCjV0e43+9oR3QuJK2PiFiLQemytoLpyJWe6P9w@mail.gmail.com/
|
||||
[2] http://public-inbox.org/git/CA+55aFz+gkAsDZ24zmePQuEs1XPS9BP_s8O7Q4wQ7LV7X5-oDA@mail.gmail.com/
|
||||
[3] http://public-inbox.org/git/20170306084353.nrns455dvkdsfgo5@sigill.intra.peff.net/
|
||||
[4] http://public-inbox.org/git/20170304224936.rqqtkdvfjgyezsht@genre.crustytoothpaste.net
|
||||
[5] https://public-inbox.org/git/CAJo=hJtoX9=AyLHHpUJS7fueV9ciZ_MNpnEPHUz8Whui6g9F0A@mail.gmail.com/
|
||||
518
Documentation/technical/http-protocol.txt
Normal file
518
Documentation/technical/http-protocol.txt
Normal file
|
|
@ -0,0 +1,518 @@
|
|||
HTTP transfer protocols
|
||||
=======================
|
||||
|
||||
Git supports two HTTP based transfer protocols. A "dumb" protocol
|
||||
which requires only a standard HTTP server on the server end of the
|
||||
connection, and a "smart" protocol which requires a Git aware CGI
|
||||
(or server module). This document describes both protocols.
|
||||
|
||||
As a design feature smart clients can automatically upgrade "dumb"
|
||||
protocol URLs to smart URLs. This permits all users to have the
|
||||
same published URL, and the peers automatically select the most
|
||||
efficient transport available to them.
|
||||
|
||||
|
||||
URL Format
|
||||
----------
|
||||
|
||||
URLs for Git repositories accessed by HTTP use the standard HTTP
|
||||
URL syntax documented by RFC 1738, so they are of the form:
|
||||
|
||||
http://<host>:<port>/<path>?<searchpart>
|
||||
|
||||
Within this documentation the placeholder `$GIT_URL` will stand for
|
||||
the http:// repository URL entered by the end-user.
|
||||
|
||||
Servers SHOULD handle all requests to locations matching `$GIT_URL`, as
|
||||
both the "smart" and "dumb" HTTP protocols used by Git operate
|
||||
by appending additional path components onto the end of the user
|
||||
supplied `$GIT_URL` string.
|
||||
|
||||
An example of a dumb client requesting for a loose object:
|
||||
|
||||
$GIT_URL: http://example.com:8080/git/repo.git
|
||||
URL request: http://example.com:8080/git/repo.git/objects/d0/49f6c27a2244e12041955e262a404c7faba355
|
||||
|
||||
An example of a smart request to a catch-all gateway:
|
||||
|
||||
$GIT_URL: http://example.com/daemon.cgi?svc=git&q=
|
||||
URL request: http://example.com/daemon.cgi?svc=git&q=/info/refs&service=git-receive-pack
|
||||
|
||||
An example of a request to a submodule:
|
||||
|
||||
$GIT_URL: http://example.com/git/repo.git/path/submodule.git
|
||||
URL request: http://example.com/git/repo.git/path/submodule.git/info/refs
|
||||
|
||||
Clients MUST strip a trailing `/`, if present, from the user supplied
|
||||
`$GIT_URL` string to prevent empty path tokens (`//`) from appearing
|
||||
in any URL sent to a server. Compatible clients MUST expand
|
||||
`$GIT_URL/info/refs` as `foo/info/refs` and not `foo//info/refs`.
|
||||
|
||||
|
||||
Authentication
|
||||
--------------
|
||||
|
||||
Standard HTTP authentication is used if authentication is required
|
||||
to access a repository, and MAY be configured and enforced by the
|
||||
HTTP server software.
|
||||
|
||||
Because Git repositories are accessed by standard path components
|
||||
server administrators MAY use directory based permissions within
|
||||
their HTTP server to control repository access.
|
||||
|
||||
Clients SHOULD support Basic authentication as described by RFC 2617.
|
||||
Servers SHOULD support Basic authentication by relying upon the
|
||||
HTTP server placed in front of the Git server software.
|
||||
|
||||
Servers SHOULD NOT require HTTP cookies for the purposes of
|
||||
authentication or access control.
|
||||
|
||||
Clients and servers MAY support other common forms of HTTP based
|
||||
authentication, such as Digest authentication.
|
||||
|
||||
|
||||
SSL
|
||||
---
|
||||
|
||||
Clients and servers SHOULD support SSL, particularly to protect
|
||||
passwords when relying on Basic HTTP authentication.
|
||||
|
||||
|
||||
Session State
|
||||
-------------
|
||||
|
||||
The Git over HTTP protocol (much like HTTP itself) is stateless
|
||||
from the perspective of the HTTP server side. All state MUST be
|
||||
retained and managed by the client process. This permits simple
|
||||
round-robin load-balancing on the server side, without needing to
|
||||
worry about state management.
|
||||
|
||||
Clients MUST NOT require state management on the server side in
|
||||
order to function correctly.
|
||||
|
||||
Servers MUST NOT require HTTP cookies in order to function correctly.
|
||||
Clients MAY store and forward HTTP cookies during request processing
|
||||
as described by RFC 2616 (HTTP/1.1). Servers SHOULD ignore any
|
||||
cookies sent by a client.
|
||||
|
||||
|
||||
General Request Processing
|
||||
--------------------------
|
||||
|
||||
Except where noted, all standard HTTP behavior SHOULD be assumed
|
||||
by both client and server. This includes (but is not necessarily
|
||||
limited to):
|
||||
|
||||
If there is no repository at `$GIT_URL`, or the resource pointed to by a
|
||||
location matching `$GIT_URL` does not exist, the server MUST NOT respond
|
||||
with `200 OK` response. A server SHOULD respond with
|
||||
`404 Not Found`, `410 Gone`, or any other suitable HTTP status code
|
||||
which does not imply the resource exists as requested.
|
||||
|
||||
If there is a repository at `$GIT_URL`, but access is not currently
|
||||
permitted, the server MUST respond with the `403 Forbidden` HTTP
|
||||
status code.
|
||||
|
||||
Servers SHOULD support both HTTP 1.0 and HTTP 1.1.
|
||||
Servers SHOULD support chunked encoding for both request and response
|
||||
bodies.
|
||||
|
||||
Clients SHOULD support both HTTP 1.0 and HTTP 1.1.
|
||||
Clients SHOULD support chunked encoding for both request and response
|
||||
bodies.
|
||||
|
||||
Servers MAY return ETag and/or Last-Modified headers.
|
||||
|
||||
Clients MAY revalidate cached entities by including If-Modified-Since
|
||||
and/or If-None-Match request headers.
|
||||
|
||||
Servers MAY return `304 Not Modified` if the relevant headers appear
|
||||
in the request and the entity has not changed. Clients MUST treat
|
||||
`304 Not Modified` identical to `200 OK` by reusing the cached entity.
|
||||
|
||||
Clients MAY reuse a cached entity without revalidation if the
|
||||
Cache-Control and/or Expires header permits caching. Clients and
|
||||
servers MUST follow RFC 2616 for cache controls.
|
||||
|
||||
|
||||
Discovering References
|
||||
----------------------
|
||||
|
||||
All HTTP clients MUST begin either a fetch or a push exchange by
|
||||
discovering the references available on the remote repository.
|
||||
|
||||
Dumb Clients
|
||||
~~~~~~~~~~~~
|
||||
|
||||
HTTP clients that only support the "dumb" protocol MUST discover
|
||||
references by making a request for the special info/refs file of
|
||||
the repository.
|
||||
|
||||
Dumb HTTP clients MUST make a `GET` request to `$GIT_URL/info/refs`,
|
||||
without any search/query parameters.
|
||||
|
||||
C: GET $GIT_URL/info/refs HTTP/1.0
|
||||
|
||||
S: 200 OK
|
||||
S:
|
||||
S: 95dcfa3633004da0049d3d0fa03f80589cbcaf31 refs/heads/maint
|
||||
S: d049f6c27a2244e12041955e262a404c7faba355 refs/heads/master
|
||||
S: 2cb58b79488a98d2721cea644875a8dd0026b115 refs/tags/v1.0
|
||||
S: a3c2e2402b99163d1d59756e5f207ae21cccba4c refs/tags/v1.0^{}
|
||||
|
||||
The Content-Type of the returned info/refs entity SHOULD be
|
||||
`text/plain; charset=utf-8`, but MAY be any content type.
|
||||
Clients MUST NOT attempt to validate the returned Content-Type.
|
||||
Dumb servers MUST NOT return a return type starting with
|
||||
`application/x-git-`.
|
||||
|
||||
Cache-Control headers MAY be returned to disable caching of the
|
||||
returned entity.
|
||||
|
||||
When examining the response clients SHOULD only examine the HTTP
|
||||
status code. Valid responses are `200 OK`, or `304 Not Modified`.
|
||||
|
||||
The returned content is a UNIX formatted text file describing
|
||||
each ref and its known value. The file SHOULD be sorted by name
|
||||
according to the C locale ordering. The file SHOULD NOT include
|
||||
the default ref named `HEAD`.
|
||||
|
||||
info_refs = *( ref_record )
|
||||
ref_record = any_ref / peeled_ref
|
||||
|
||||
any_ref = obj-id HTAB refname LF
|
||||
peeled_ref = obj-id HTAB refname LF
|
||||
obj-id HTAB refname "^{}" LF
|
||||
|
||||
Smart Clients
|
||||
~~~~~~~~~~~~~
|
||||
|
||||
HTTP clients that support the "smart" protocol (or both the
|
||||
"smart" and "dumb" protocols) MUST discover references by making
|
||||
a parameterized request for the info/refs file of the repository.
|
||||
|
||||
The request MUST contain exactly one query parameter,
|
||||
`service=$servicename`, where `$servicename` MUST be the service
|
||||
name the client wishes to contact to complete the operation.
|
||||
The request MUST NOT contain additional query parameters.
|
||||
|
||||
C: GET $GIT_URL/info/refs?service=git-upload-pack HTTP/1.0
|
||||
|
||||
dumb server reply:
|
||||
|
||||
S: 200 OK
|
||||
S:
|
||||
S: 95dcfa3633004da0049d3d0fa03f80589cbcaf31 refs/heads/maint
|
||||
S: d049f6c27a2244e12041955e262a404c7faba355 refs/heads/master
|
||||
S: 2cb58b79488a98d2721cea644875a8dd0026b115 refs/tags/v1.0
|
||||
S: a3c2e2402b99163d1d59756e5f207ae21cccba4c refs/tags/v1.0^{}
|
||||
|
||||
smart server reply:
|
||||
|
||||
S: 200 OK
|
||||
S: Content-Type: application/x-git-upload-pack-advertisement
|
||||
S: Cache-Control: no-cache
|
||||
S:
|
||||
S: 001e# service=git-upload-pack\n
|
||||
S: 0000
|
||||
S: 004895dcfa3633004da0049d3d0fa03f80589cbcaf31 refs/heads/maint\0multi_ack\n
|
||||
S: 0042d049f6c27a2244e12041955e262a404c7faba355 refs/heads/master\n
|
||||
S: 003c2cb58b79488a98d2721cea644875a8dd0026b115 refs/tags/v1.0\n
|
||||
S: 003fa3c2e2402b99163d1d59756e5f207ae21cccba4c refs/tags/v1.0^{}\n
|
||||
S: 0000
|
||||
|
||||
The client may send Extra Parameters (see
|
||||
Documentation/technical/pack-protocol.txt) as a colon-separated string
|
||||
in the Git-Protocol HTTP header.
|
||||
|
||||
Dumb Server Response
|
||||
^^^^^^^^^^^^^^^^^^^^
|
||||
Dumb servers MUST respond with the dumb server reply format.
|
||||
|
||||
See the prior section under dumb clients for a more detailed
|
||||
description of the dumb server response.
|
||||
|
||||
Smart Server Response
|
||||
^^^^^^^^^^^^^^^^^^^^^
|
||||
If the server does not recognize the requested service name, or the
|
||||
requested service name has been disabled by the server administrator,
|
||||
the server MUST respond with the `403 Forbidden` HTTP status code.
|
||||
|
||||
Otherwise, smart servers MUST respond with the smart server reply
|
||||
format for the requested service name.
|
||||
|
||||
Cache-Control headers SHOULD be used to disable caching of the
|
||||
returned entity.
|
||||
|
||||
The Content-Type MUST be `application/x-$servicename-advertisement`.
|
||||
Clients SHOULD fall back to the dumb protocol if another content
|
||||
type is returned. When falling back to the dumb protocol clients
|
||||
SHOULD NOT make an additional request to `$GIT_URL/info/refs`, but
|
||||
instead SHOULD use the response already in hand. Clients MUST NOT
|
||||
continue if they do not support the dumb protocol.
|
||||
|
||||
Clients MUST validate the status code is either `200 OK` or
|
||||
`304 Not Modified`.
|
||||
|
||||
Clients MUST validate the first five bytes of the response entity
|
||||
matches the regex `^[0-9a-f]{4}#`. If this test fails, clients
|
||||
MUST NOT continue.
|
||||
|
||||
Clients MUST parse the entire response as a sequence of pkt-line
|
||||
records.
|
||||
|
||||
Clients MUST verify the first pkt-line is `# service=$servicename`.
|
||||
Servers MUST set $servicename to be the request parameter value.
|
||||
Servers SHOULD include an LF at the end of this line.
|
||||
Clients MUST ignore an LF at the end of the line.
|
||||
|
||||
Servers MUST terminate the response with the magic `0000` end
|
||||
pkt-line marker.
|
||||
|
||||
The returned response is a pkt-line stream describing each ref and
|
||||
its known value. The stream SHOULD be sorted by name according to
|
||||
the C locale ordering. The stream SHOULD include the default ref
|
||||
named `HEAD` as the first ref. The stream MUST include capability
|
||||
declarations behind a NUL on the first ref.
|
||||
|
||||
The returned response contains "version 1" if "version=1" was sent as an
|
||||
Extra Parameter.
|
||||
|
||||
smart_reply = PKT-LINE("# service=$servicename" LF)
|
||||
"0000"
|
||||
*1("version 1")
|
||||
ref_list
|
||||
"0000"
|
||||
ref_list = empty_list / non_empty_list
|
||||
|
||||
empty_list = PKT-LINE(zero-id SP "capabilities^{}" NUL cap-list LF)
|
||||
|
||||
non_empty_list = PKT-LINE(obj-id SP name NUL cap_list LF)
|
||||
*ref_record
|
||||
|
||||
cap-list = capability *(SP capability)
|
||||
capability = 1*(LC_ALPHA / DIGIT / "-" / "_")
|
||||
LC_ALPHA = %x61-7A
|
||||
|
||||
ref_record = any_ref / peeled_ref
|
||||
any_ref = PKT-LINE(obj-id SP name LF)
|
||||
peeled_ref = PKT-LINE(obj-id SP name LF)
|
||||
PKT-LINE(obj-id SP name "^{}" LF
|
||||
|
||||
|
||||
Smart Service git-upload-pack
|
||||
------------------------------
|
||||
This service reads from the repository pointed to by `$GIT_URL`.
|
||||
|
||||
Clients MUST first perform ref discovery with
|
||||
`$GIT_URL/info/refs?service=git-upload-pack`.
|
||||
|
||||
C: POST $GIT_URL/git-upload-pack HTTP/1.0
|
||||
C: Content-Type: application/x-git-upload-pack-request
|
||||
C:
|
||||
C: 0032want 0a53e9ddeaddad63ad106860237bbf53411d11a7\n
|
||||
C: 0032have 441b40d833fdfa93eb2908e52742248faf0ee993\n
|
||||
C: 0000
|
||||
|
||||
S: 200 OK
|
||||
S: Content-Type: application/x-git-upload-pack-result
|
||||
S: Cache-Control: no-cache
|
||||
S:
|
||||
S: ....ACK %s, continue
|
||||
S: ....NAK
|
||||
|
||||
Clients MUST NOT reuse or revalidate a cached response.
|
||||
Servers MUST include sufficient Cache-Control headers
|
||||
to prevent caching of the response.
|
||||
|
||||
Servers SHOULD support all capabilities defined here.
|
||||
|
||||
Clients MUST send at least one "want" command in the request body.
|
||||
Clients MUST NOT reference an id in a "want" command which did not
|
||||
appear in the response obtained through ref discovery unless the
|
||||
server advertises capability `allow-tip-sha1-in-want` or
|
||||
`allow-reachable-sha1-in-want`.
|
||||
|
||||
compute_request = want_list
|
||||
have_list
|
||||
request_end
|
||||
request_end = "0000" / "done"
|
||||
|
||||
want_list = PKT-LINE(want SP cap_list LF)
|
||||
*(want_pkt)
|
||||
want_pkt = PKT-LINE(want LF)
|
||||
want = "want" SP id
|
||||
cap_list = capability *(SP capability)
|
||||
|
||||
have_list = *PKT-LINE("have" SP id LF)
|
||||
|
||||
TODO: Document this further.
|
||||
|
||||
The Negotiation Algorithm
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
The computation to select the minimal pack proceeds as follows
|
||||
(C = client, S = server):
|
||||
|
||||
'init step:'
|
||||
|
||||
C: Use ref discovery to obtain the advertised refs.
|
||||
|
||||
C: Place any object seen into set `advertised`.
|
||||
|
||||
C: Build an empty set, `common`, to hold the objects that are later
|
||||
determined to be on both ends.
|
||||
|
||||
C: Build a set, `want`, of the objects from `advertised` the client
|
||||
wants to fetch, based on what it saw during ref discovery.
|
||||
|
||||
C: Start a queue, `c_pending`, ordered by commit time (popping newest
|
||||
first). Add all client refs. When a commit is popped from
|
||||
the queue its parents SHOULD be automatically inserted back.
|
||||
Commits MUST only enter the queue once.
|
||||
|
||||
'one compute step:'
|
||||
|
||||
C: Send one `$GIT_URL/git-upload-pack` request:
|
||||
|
||||
C: 0032want <want #1>...............................
|
||||
C: 0032want <want #2>...............................
|
||||
....
|
||||
C: 0032have <common #1>.............................
|
||||
C: 0032have <common #2>.............................
|
||||
....
|
||||
C: 0032have <have #1>...............................
|
||||
C: 0032have <have #2>...............................
|
||||
....
|
||||
C: 0000
|
||||
|
||||
The stream is organized into "commands", with each command
|
||||
appearing by itself in a pkt-line. Within a command line,
|
||||
the text leading up to the first space is the command name,
|
||||
and the remainder of the line to the first LF is the value.
|
||||
Command lines are terminated with an LF as the last byte of
|
||||
the pkt-line value.
|
||||
|
||||
Commands MUST appear in the following order, if they appear
|
||||
at all in the request stream:
|
||||
|
||||
* "want"
|
||||
* "have"
|
||||
|
||||
The stream is terminated by a pkt-line flush (`0000`).
|
||||
|
||||
A single "want" or "have" command MUST have one hex formatted
|
||||
SHA-1 as its value. Multiple SHA-1s MUST be sent by sending
|
||||
multiple commands.
|
||||
|
||||
The `have` list is created by popping the first 32 commits
|
||||
from `c_pending`. Less can be supplied if `c_pending` empties.
|
||||
|
||||
If the client has sent 256 "have" commits and has not yet
|
||||
received one of those back from `s_common`, or the client has
|
||||
emptied `c_pending` it SHOULD include a "done" command to let
|
||||
the server know it won't proceed:
|
||||
|
||||
C: 0009done
|
||||
|
||||
S: Parse the git-upload-pack request:
|
||||
|
||||
Verify all objects in `want` are directly reachable from refs.
|
||||
|
||||
The server MAY walk backwards through history or through
|
||||
the reflog to permit slightly stale requests.
|
||||
|
||||
If no "want" objects are received, send an error:
|
||||
TODO: Define error if no "want" lines are requested.
|
||||
|
||||
If any "want" object is not reachable, send an error:
|
||||
TODO: Define error if an invalid "want" is requested.
|
||||
|
||||
Create an empty list, `s_common`.
|
||||
|
||||
If "have" was sent:
|
||||
|
||||
Loop through the objects in the order supplied by the client.
|
||||
|
||||
For each object, if the server has the object reachable from
|
||||
a ref, add it to `s_common`. If a commit is added to `s_common`,
|
||||
do not add any ancestors, even if they also appear in `have`.
|
||||
|
||||
S: Send the git-upload-pack response:
|
||||
|
||||
If the server has found a closed set of objects to pack or the
|
||||
request ends with "done", it replies with the pack.
|
||||
TODO: Document the pack based response
|
||||
|
||||
S: PACK...
|
||||
|
||||
The returned stream is the side-band-64k protocol supported
|
||||
by the git-upload-pack service, and the pack is embedded into
|
||||
stream 1. Progress messages from the server side MAY appear
|
||||
in stream 2.
|
||||
|
||||
Here a "closed set of objects" is defined to have at least
|
||||
one path from every "want" to at least one "common" object.
|
||||
|
||||
If the server needs more information, it replies with a
|
||||
status continue response:
|
||||
TODO: Document the non-pack response
|
||||
|
||||
C: Parse the upload-pack response:
|
||||
TODO: Document parsing response
|
||||
|
||||
'Do another compute step.'
|
||||
|
||||
|
||||
Smart Service git-receive-pack
|
||||
------------------------------
|
||||
This service reads from the repository pointed to by `$GIT_URL`.
|
||||
|
||||
Clients MUST first perform ref discovery with
|
||||
`$GIT_URL/info/refs?service=git-receive-pack`.
|
||||
|
||||
C: POST $GIT_URL/git-receive-pack HTTP/1.0
|
||||
C: Content-Type: application/x-git-receive-pack-request
|
||||
C:
|
||||
C: ....0a53e9ddeaddad63ad106860237bbf53411d11a7 441b40d833fdfa93eb2908e52742248faf0ee993 refs/heads/maint\0 report-status
|
||||
C: 0000
|
||||
C: PACK....
|
||||
|
||||
S: 200 OK
|
||||
S: Content-Type: application/x-git-receive-pack-result
|
||||
S: Cache-Control: no-cache
|
||||
S:
|
||||
S: ....
|
||||
|
||||
Clients MUST NOT reuse or revalidate a cached response.
|
||||
Servers MUST include sufficient Cache-Control headers
|
||||
to prevent caching of the response.
|
||||
|
||||
Servers SHOULD support all capabilities defined here.
|
||||
|
||||
Clients MUST send at least one command in the request body.
|
||||
Within the command portion of the request body clients SHOULD send
|
||||
the id obtained through ref discovery as old_id.
|
||||
|
||||
update_request = command_list
|
||||
"PACK" <binary data>
|
||||
|
||||
command_list = PKT-LINE(command NUL cap_list LF)
|
||||
*(command_pkt)
|
||||
command_pkt = PKT-LINE(command LF)
|
||||
cap_list = *(SP capability) SP
|
||||
|
||||
command = create / delete / update
|
||||
create = zero-id SP new_id SP name
|
||||
delete = old_id SP zero-id SP name
|
||||
update = old_id SP new_id SP name
|
||||
|
||||
TODO: Document this further.
|
||||
|
||||
|
||||
References
|
||||
----------
|
||||
|
||||
http://www.ietf.org/rfc/rfc1738.txt[RFC 1738: Uniform Resource Locators (URL)]
|
||||
http://www.ietf.org/rfc/rfc2616.txt[RFC 2616: Hypertext Transfer Protocol -- HTTP/1.1]
|
||||
link:technical/pack-protocol.html
|
||||
link:technical/protocol-capabilities.html
|
||||
357
Documentation/technical/index-format.txt
Normal file
357
Documentation/technical/index-format.txt
Normal file
|
|
@ -0,0 +1,357 @@
|
|||
Git index format
|
||||
================
|
||||
|
||||
== The Git index file has the following format
|
||||
|
||||
All binary numbers are in network byte order. Version 2 is described
|
||||
here unless stated otherwise.
|
||||
|
||||
- A 12-byte header consisting of
|
||||
|
||||
4-byte signature:
|
||||
The signature is { 'D', 'I', 'R', 'C' } (stands for "dircache")
|
||||
|
||||
4-byte version number:
|
||||
The current supported versions are 2, 3 and 4.
|
||||
|
||||
32-bit number of index entries.
|
||||
|
||||
- A number of sorted index entries (see below).
|
||||
|
||||
- Extensions
|
||||
|
||||
Extensions are identified by signature. Optional extensions can
|
||||
be ignored if Git does not understand them.
|
||||
|
||||
Git currently supports cached tree and resolve undo extensions.
|
||||
|
||||
4-byte extension signature. If the first byte is 'A'..'Z' the
|
||||
extension is optional and can be ignored.
|
||||
|
||||
32-bit size of the extension
|
||||
|
||||
Extension data
|
||||
|
||||
- 160-bit SHA-1 over the content of the index file before this
|
||||
checksum.
|
||||
|
||||
== Index entry
|
||||
|
||||
Index entries are sorted in ascending order on the name field,
|
||||
interpreted as a string of unsigned bytes (i.e. memcmp() order, no
|
||||
localization, no special casing of directory separator '/'). Entries
|
||||
with the same name are sorted by their stage field.
|
||||
|
||||
32-bit ctime seconds, the last time a file's metadata changed
|
||||
this is stat(2) data
|
||||
|
||||
32-bit ctime nanosecond fractions
|
||||
this is stat(2) data
|
||||
|
||||
32-bit mtime seconds, the last time a file's data changed
|
||||
this is stat(2) data
|
||||
|
||||
32-bit mtime nanosecond fractions
|
||||
this is stat(2) data
|
||||
|
||||
32-bit dev
|
||||
this is stat(2) data
|
||||
|
||||
32-bit ino
|
||||
this is stat(2) data
|
||||
|
||||
32-bit mode, split into (high to low bits)
|
||||
|
||||
4-bit object type
|
||||
valid values in binary are 1000 (regular file), 1010 (symbolic link)
|
||||
and 1110 (gitlink)
|
||||
|
||||
3-bit unused
|
||||
|
||||
9-bit unix permission. Only 0755 and 0644 are valid for regular files.
|
||||
Symbolic links and gitlinks have value 0 in this field.
|
||||
|
||||
32-bit uid
|
||||
this is stat(2) data
|
||||
|
||||
32-bit gid
|
||||
this is stat(2) data
|
||||
|
||||
32-bit file size
|
||||
This is the on-disk size from stat(2), truncated to 32-bit.
|
||||
|
||||
160-bit SHA-1 for the represented object
|
||||
|
||||
A 16-bit 'flags' field split into (high to low bits)
|
||||
|
||||
1-bit assume-valid flag
|
||||
|
||||
1-bit extended flag (must be zero in version 2)
|
||||
|
||||
2-bit stage (during merge)
|
||||
|
||||
12-bit name length if the length is less than 0xFFF; otherwise 0xFFF
|
||||
is stored in this field.
|
||||
|
||||
(Version 3 or later) A 16-bit field, only applicable if the
|
||||
"extended flag" above is 1, split into (high to low bits).
|
||||
|
||||
1-bit reserved for future
|
||||
|
||||
1-bit skip-worktree flag (used by sparse checkout)
|
||||
|
||||
1-bit intent-to-add flag (used by "git add -N")
|
||||
|
||||
13-bit unused, must be zero
|
||||
|
||||
Entry path name (variable length) relative to top level directory
|
||||
(without leading slash). '/' is used as path separator. The special
|
||||
path components ".", ".." and ".git" (without quotes) are disallowed.
|
||||
Trailing slash is also disallowed.
|
||||
|
||||
The exact encoding is undefined, but the '.' and '/' characters
|
||||
are encoded in 7-bit ASCII and the encoding cannot contain a NUL
|
||||
byte (iow, this is a UNIX pathname).
|
||||
|
||||
(Version 4) In version 4, the entry path name is prefix-compressed
|
||||
relative to the path name for the previous entry (the very first
|
||||
entry is encoded as if the path name for the previous entry is an
|
||||
empty string). At the beginning of an entry, an integer N in the
|
||||
variable width encoding (the same encoding as the offset is encoded
|
||||
for OFS_DELTA pack entries; see pack-format.txt) is stored, followed
|
||||
by a NUL-terminated string S. Removing N bytes from the end of the
|
||||
path name for the previous entry, and replacing it with the string S
|
||||
yields the path name for this entry.
|
||||
|
||||
1-8 nul bytes as necessary to pad the entry to a multiple of eight bytes
|
||||
while keeping the name NUL-terminated.
|
||||
|
||||
(Version 4) In version 4, the padding after the pathname does not
|
||||
exist.
|
||||
|
||||
Interpretation of index entries in split index mode is completely
|
||||
different. See below for details.
|
||||
|
||||
== Extensions
|
||||
|
||||
=== Cached tree
|
||||
|
||||
Cached tree extension contains pre-computed hashes for trees that can
|
||||
be derived from the index. It helps speed up tree object generation
|
||||
from index for a new commit.
|
||||
|
||||
When a path is updated in index, the path must be invalidated and
|
||||
removed from tree cache.
|
||||
|
||||
The signature for this extension is { 'T', 'R', 'E', 'E' }.
|
||||
|
||||
A series of entries fill the entire extension; each of which
|
||||
consists of:
|
||||
|
||||
- NUL-terminated path component (relative to its parent directory);
|
||||
|
||||
- ASCII decimal number of entries in the index that is covered by the
|
||||
tree this entry represents (entry_count);
|
||||
|
||||
- A space (ASCII 32);
|
||||
|
||||
- ASCII decimal number that represents the number of subtrees this
|
||||
tree has;
|
||||
|
||||
- A newline (ASCII 10); and
|
||||
|
||||
- 160-bit object name for the object that would result from writing
|
||||
this span of index as a tree.
|
||||
|
||||
An entry can be in an invalidated state and is represented by having
|
||||
a negative number in the entry_count field. In this case, there is no
|
||||
object name and the next entry starts immediately after the newline.
|
||||
When writing an invalid entry, -1 should always be used as entry_count.
|
||||
|
||||
The entries are written out in the top-down, depth-first order. The
|
||||
first entry represents the root level of the repository, followed by the
|
||||
first subtree--let's call this A--of the root level (with its name
|
||||
relative to the root level), followed by the first subtree of A (with
|
||||
its name relative to A), ...
|
||||
|
||||
=== Resolve undo
|
||||
|
||||
A conflict is represented in the index as a set of higher stage entries.
|
||||
When a conflict is resolved (e.g. with "git add path"), these higher
|
||||
stage entries will be removed and a stage-0 entry with proper resolution
|
||||
is added.
|
||||
|
||||
When these higher stage entries are removed, they are saved in the
|
||||
resolve undo extension, so that conflicts can be recreated (e.g. with
|
||||
"git checkout -m"), in case users want to redo a conflict resolution
|
||||
from scratch.
|
||||
|
||||
The signature for this extension is { 'R', 'E', 'U', 'C' }.
|
||||
|
||||
A series of entries fill the entire extension; each of which
|
||||
consists of:
|
||||
|
||||
- NUL-terminated pathname the entry describes (relative to the root of
|
||||
the repository, i.e. full pathname);
|
||||
|
||||
- Three NUL-terminated ASCII octal numbers, entry mode of entries in
|
||||
stage 1 to 3 (a missing stage is represented by "0" in this field);
|
||||
and
|
||||
|
||||
- At most three 160-bit object names of the entry in stages from 1 to 3
|
||||
(nothing is written for a missing stage).
|
||||
|
||||
=== Split index
|
||||
|
||||
In split index mode, the majority of index entries could be stored
|
||||
in a separate file. This extension records the changes to be made on
|
||||
top of that to produce the final index.
|
||||
|
||||
The signature for this extension is { 'l', 'i', 'n', 'k' }.
|
||||
|
||||
The extension consists of:
|
||||
|
||||
- 160-bit SHA-1 of the shared index file. The shared index file path
|
||||
is $GIT_DIR/sharedindex.<SHA-1>. If all 160 bits are zero, the
|
||||
index does not require a shared index file.
|
||||
|
||||
- An ewah-encoded delete bitmap, each bit represents an entry in the
|
||||
shared index. If a bit is set, its corresponding entry in the
|
||||
shared index will be removed from the final index. Note, because
|
||||
a delete operation changes index entry positions, but we do need
|
||||
original positions in replace phase, it's best to just mark
|
||||
entries for removal, then do a mass deletion after replacement.
|
||||
|
||||
- An ewah-encoded replace bitmap, each bit represents an entry in
|
||||
the shared index. If a bit is set, its corresponding entry in the
|
||||
shared index will be replaced with an entry in this index
|
||||
file. All replaced entries are stored in sorted order in this
|
||||
index. The first "1" bit in the replace bitmap corresponds to the
|
||||
first index entry, the second "1" bit to the second entry and so
|
||||
on. Replaced entries may have empty path names to save space.
|
||||
|
||||
The remaining index entries after replaced ones will be added to the
|
||||
final index. These added entries are also sorted by entry name then
|
||||
stage.
|
||||
|
||||
== Untracked cache
|
||||
|
||||
Untracked cache saves the untracked file list and necessary data to
|
||||
verify the cache. The signature for this extension is { 'U', 'N',
|
||||
'T', 'R' }.
|
||||
|
||||
The extension starts with
|
||||
|
||||
- A sequence of NUL-terminated strings, preceded by the size of the
|
||||
sequence in variable width encoding. Each string describes the
|
||||
environment where the cache can be used.
|
||||
|
||||
- Stat data of $GIT_DIR/info/exclude. See "Index entry" section from
|
||||
ctime field until "file size".
|
||||
|
||||
- Stat data of core.excludesfile
|
||||
|
||||
- 32-bit dir_flags (see struct dir_struct)
|
||||
|
||||
- 160-bit SHA-1 of $GIT_DIR/info/exclude. Null SHA-1 means the file
|
||||
does not exist.
|
||||
|
||||
- 160-bit SHA-1 of core.excludesfile. Null SHA-1 means the file does
|
||||
not exist.
|
||||
|
||||
- NUL-terminated string of per-dir exclude file name. This usually
|
||||
is ".gitignore".
|
||||
|
||||
- The number of following directory blocks, variable width
|
||||
encoding. If this number is zero, the extension ends here with a
|
||||
following NUL.
|
||||
|
||||
- A number of directory blocks in depth-first-search order, each
|
||||
consists of
|
||||
|
||||
- The number of untracked entries, variable width encoding.
|
||||
|
||||
- The number of sub-directory blocks, variable width encoding.
|
||||
|
||||
- The directory name terminated by NUL.
|
||||
|
||||
- A number of untracked file/dir names terminated by NUL.
|
||||
|
||||
The remaining data of each directory block is grouped by type:
|
||||
|
||||
- An ewah bitmap, the n-th bit marks whether the n-th directory has
|
||||
valid untracked cache entries.
|
||||
|
||||
- An ewah bitmap, the n-th bit records "check-only" bit of
|
||||
read_directory_recursive() for the n-th directory.
|
||||
|
||||
- An ewah bitmap, the n-th bit indicates whether SHA-1 and stat data
|
||||
is valid for the n-th directory and exists in the next data.
|
||||
|
||||
- An array of stat data. The n-th data corresponds with the n-th
|
||||
"one" bit in the previous ewah bitmap.
|
||||
|
||||
- An array of SHA-1. The n-th SHA-1 corresponds with the n-th "one" bit
|
||||
in the previous ewah bitmap.
|
||||
|
||||
- One NUL.
|
||||
|
||||
== File System Monitor cache
|
||||
|
||||
The file system monitor cache tracks files for which the core.fsmonitor
|
||||
hook has told us about changes. The signature for this extension is
|
||||
{ 'F', 'S', 'M', 'N' }.
|
||||
|
||||
The extension starts with
|
||||
|
||||
- 32-bit version number: the current supported version is 1.
|
||||
|
||||
- 64-bit time: the extension data reflects all changes through the given
|
||||
time which is stored as the nanoseconds elapsed since midnight,
|
||||
January 1, 1970.
|
||||
|
||||
- 32-bit bitmap size: the size of the CE_FSMONITOR_VALID bitmap.
|
||||
|
||||
- An ewah bitmap, the n-th bit indicates whether the n-th index entry
|
||||
is not CE_FSMONITOR_VALID.
|
||||
|
||||
== End of Index Entry
|
||||
|
||||
The End of Index Entry (EOIE) is used to locate the end of the variable
|
||||
length index entries and the begining of the extensions. Code can take
|
||||
advantage of this to quickly locate the index extensions without having
|
||||
to parse through all of the index entries.
|
||||
|
||||
Because it must be able to be loaded before the variable length cache
|
||||
entries and other index extensions, this extension must be written last.
|
||||
The signature for this extension is { 'E', 'O', 'I', 'E' }.
|
||||
|
||||
The extension consists of:
|
||||
|
||||
- 32-bit offset to the end of the index entries
|
||||
|
||||
- 160-bit SHA-1 over the extension types and their sizes (but not
|
||||
their contents). E.g. if we have "TREE" extension that is N-bytes
|
||||
long, "REUC" extension that is M-bytes long, followed by "EOIE",
|
||||
then the hash would be:
|
||||
|
||||
SHA-1("TREE" + <binary representation of N> +
|
||||
"REUC" + <binary representation of M>)
|
||||
|
||||
== Index Entry Offset Table
|
||||
|
||||
The Index Entry Offset Table (IEOT) is used to help address the CPU
|
||||
cost of loading the index by enabling multi-threading the process of
|
||||
converting cache entries from the on-disk format to the in-memory format.
|
||||
The signature for this extension is { 'I', 'E', 'O', 'T' }.
|
||||
|
||||
The extension consists of:
|
||||
|
||||
- 32-bit version (currently 1)
|
||||
|
||||
- A number of index offset entries each consisting of:
|
||||
|
||||
- 32-bit offset from the begining of the file to the first cache entry
|
||||
in this block of entries.
|
||||
|
||||
- 32-bit count of cache entries in this block
|
||||
50
Documentation/technical/long-running-process-protocol.txt
Normal file
50
Documentation/technical/long-running-process-protocol.txt
Normal file
|
|
@ -0,0 +1,50 @@
|
|||
Long-running process protocol
|
||||
=============================
|
||||
|
||||
This protocol is used when Git needs to communicate with an external
|
||||
process throughout the entire life of a single Git command. All
|
||||
communication is in pkt-line format (see technical/protocol-common.txt)
|
||||
over standard input and standard output.
|
||||
|
||||
Handshake
|
||||
---------
|
||||
|
||||
Git starts by sending a welcome message (for example,
|
||||
"git-filter-client"), a list of supported protocol version numbers, and
|
||||
a flush packet. Git expects to read the welcome message with "server"
|
||||
instead of "client" (for example, "git-filter-server"), exactly one
|
||||
protocol version number from the previously sent list, and a flush
|
||||
packet. All further communication will be based on the selected version.
|
||||
The remaining protocol description below documents "version=2". Please
|
||||
note that "version=42" in the example below does not exist and is only
|
||||
there to illustrate how the protocol would look like with more than one
|
||||
version.
|
||||
|
||||
After the version negotiation Git sends a list of all capabilities that
|
||||
it supports and a flush packet. Git expects to read a list of desired
|
||||
capabilities, which must be a subset of the supported capabilities list,
|
||||
and a flush packet as response:
|
||||
------------------------
|
||||
packet: git> git-filter-client
|
||||
packet: git> version=2
|
||||
packet: git> version=42
|
||||
packet: git> 0000
|
||||
packet: git< git-filter-server
|
||||
packet: git< version=2
|
||||
packet: git< 0000
|
||||
packet: git> capability=clean
|
||||
packet: git> capability=smudge
|
||||
packet: git> capability=not-yet-invented
|
||||
packet: git> 0000
|
||||
packet: git< capability=clean
|
||||
packet: git< capability=smudge
|
||||
packet: git< 0000
|
||||
------------------------
|
||||
|
||||
Shutdown
|
||||
--------
|
||||
|
||||
Git will close
|
||||
the command pipe on exit. The filter is expected to detect EOF
|
||||
and exit gracefully on its own. Git will wait until the filter
|
||||
process has stopped.
|
||||
109
Documentation/technical/multi-pack-index.txt
Normal file
109
Documentation/technical/multi-pack-index.txt
Normal file
|
|
@ -0,0 +1,109 @@
|
|||
Multi-Pack-Index (MIDX) Design Notes
|
||||
====================================
|
||||
|
||||
The Git object directory contains a 'pack' directory containing
|
||||
packfiles (with suffix ".pack") and pack-indexes (with suffix
|
||||
".idx"). The pack-indexes provide a way to lookup objects and
|
||||
navigate to their offset within the pack, but these must come
|
||||
in pairs with the packfiles. This pairing depends on the file
|
||||
names, as the pack-index differs only in suffix with its pack-
|
||||
file. While the pack-indexes provide fast lookup per packfile,
|
||||
this performance degrades as the number of packfiles increases,
|
||||
because abbreviations need to inspect every packfile and we are
|
||||
more likely to have a miss on our most-recently-used packfile.
|
||||
For some large repositories, repacking into a single packfile
|
||||
is not feasible due to storage space or excessive repack times.
|
||||
|
||||
The multi-pack-index (MIDX for short) stores a list of objects
|
||||
and their offsets into multiple packfiles. It contains:
|
||||
|
||||
- A list of packfile names.
|
||||
- A sorted list of object IDs.
|
||||
- A list of metadata for the ith object ID including:
|
||||
- A value j referring to the jth packfile.
|
||||
- An offset within the jth packfile for the object.
|
||||
- If large offsets are required, we use another list of large
|
||||
offsets similar to version 2 pack-indexes.
|
||||
|
||||
Thus, we can provide O(log N) lookup time for any number
|
||||
of packfiles.
|
||||
|
||||
Design Details
|
||||
--------------
|
||||
|
||||
- The MIDX is stored in a file named 'multi-pack-index' in the
|
||||
.git/objects/pack directory. This could be stored in the pack
|
||||
directory of an alternate. It refers only to packfiles in that
|
||||
same directory.
|
||||
|
||||
- The pack.multiIndex config setting must be on to consume MIDX files.
|
||||
|
||||
- The file format includes parameters for the object ID hash
|
||||
function, so a future change of hash algorithm does not require
|
||||
a change in format.
|
||||
|
||||
- The MIDX keeps only one record per object ID. If an object appears
|
||||
in multiple packfiles, then the MIDX selects the copy in the most-
|
||||
recently modified packfile.
|
||||
|
||||
- If there exist packfiles in the pack directory not registered in
|
||||
the MIDX, then those packfiles are loaded into the `packed_git`
|
||||
list and `packed_git_mru` cache.
|
||||
|
||||
- The pack-indexes (.idx files) remain in the pack directory so we
|
||||
can delete the MIDX file, set core.midx to false, or downgrade
|
||||
without any loss of information.
|
||||
|
||||
- The MIDX file format uses a chunk-based approach (similar to the
|
||||
commit-graph file) that allows optional data to be added.
|
||||
|
||||
Future Work
|
||||
-----------
|
||||
|
||||
- Add a 'verify' subcommand to the 'git midx' builtin to verify the
|
||||
contents of the multi-pack-index file match the offsets listed in
|
||||
the corresponding pack-indexes.
|
||||
|
||||
- The multi-pack-index allows many packfiles, especially in a context
|
||||
where repacking is expensive (such as a very large repo), or
|
||||
unexpected maintenance time is unacceptable (such as a high-demand
|
||||
build machine). However, the multi-pack-index needs to be rewritten
|
||||
in full every time. We can extend the format to be incremental, so
|
||||
writes are fast. By storing a small "tip" multi-pack-index that
|
||||
points to large "base" MIDX files, we can keep writes fast while
|
||||
still reducing the number of binary searches required for object
|
||||
lookups.
|
||||
|
||||
- The reachability bitmap is currently paired directly with a single
|
||||
packfile, using the pack-order as the object order to hopefully
|
||||
compress the bitmaps well using run-length encoding. This could be
|
||||
extended to pair a reachability bitmap with a multi-pack-index. If
|
||||
the multi-pack-index is extended to store a "stable object order"
|
||||
(a function Order(hash) = integer that is constant for a given hash,
|
||||
even as the multi-pack-index is updated) then a reachability bitmap
|
||||
could point to a multi-pack-index and be updated independently.
|
||||
|
||||
- Packfiles can be marked as "special" using empty files that share
|
||||
the initial name but replace ".pack" with ".keep" or ".promisor".
|
||||
We can add an optional chunk of data to the multi-pack-index that
|
||||
records flags of information about the packfiles. This allows new
|
||||
states, such as 'repacked' or 'redeltified', that can help with
|
||||
pack maintenance in a multi-pack environment. It may also be
|
||||
helpful to organize packfiles by object type (commit, tree, blob,
|
||||
etc.) and use this metadata to help that maintenance.
|
||||
|
||||
- The partial clone feature records special "promisor" packs that
|
||||
may point to objects that are not stored locally, but available
|
||||
on request to a server. The multi-pack-index does not currently
|
||||
track these promisor packs.
|
||||
|
||||
Related Links
|
||||
-------------
|
||||
[0] https://bugs.chromium.org/p/git/issues/detail?id=6
|
||||
Chromium work item for: Multi-Pack Index (MIDX)
|
||||
|
||||
[1] https://public-inbox.org/git/20180107181459.222909-1-dstolee@microsoft.com/
|
||||
An earlier RFC for the multi-pack-index feature
|
||||
|
||||
[2] https://public-inbox.org/git/alpine.DEB.2.20.1803091557510.23109@alexmv-linux/
|
||||
Git Merge 2018 Contributor's summit notes (includes discussion of MIDX)
|
||||
331
Documentation/technical/pack-format.txt
Normal file
331
Documentation/technical/pack-format.txt
Normal file
|
|
@ -0,0 +1,331 @@
|
|||
Git pack format
|
||||
===============
|
||||
|
||||
== pack-*.pack files have the following format:
|
||||
|
||||
- A header appears at the beginning and consists of the following:
|
||||
|
||||
4-byte signature:
|
||||
The signature is: {'P', 'A', 'C', 'K'}
|
||||
|
||||
4-byte version number (network byte order):
|
||||
Git currently accepts version number 2 or 3 but
|
||||
generates version 2 only.
|
||||
|
||||
4-byte number of objects contained in the pack (network byte order)
|
||||
|
||||
Observation: we cannot have more than 4G versions ;-) and
|
||||
more than 4G objects in a pack.
|
||||
|
||||
- The header is followed by number of object entries, each of
|
||||
which looks like this:
|
||||
|
||||
(undeltified representation)
|
||||
n-byte type and length (3-bit type, (n-1)*7+4-bit length)
|
||||
compressed data
|
||||
|
||||
(deltified representation)
|
||||
n-byte type and length (3-bit type, (n-1)*7+4-bit length)
|
||||
20-byte base object name if OBJ_REF_DELTA or a negative relative
|
||||
offset from the delta object's position in the pack if this
|
||||
is an OBJ_OFS_DELTA object
|
||||
compressed delta data
|
||||
|
||||
Observation: length of each object is encoded in a variable
|
||||
length format and is not constrained to 32-bit or anything.
|
||||
|
||||
- The trailer records 20-byte SHA-1 checksum of all of the above.
|
||||
|
||||
=== Object types
|
||||
|
||||
Valid object types are:
|
||||
|
||||
- OBJ_COMMIT (1)
|
||||
- OBJ_TREE (2)
|
||||
- OBJ_BLOB (3)
|
||||
- OBJ_TAG (4)
|
||||
- OBJ_OFS_DELTA (6)
|
||||
- OBJ_REF_DELTA (7)
|
||||
|
||||
Type 5 is reserved for future expansion. Type 0 is invalid.
|
||||
|
||||
=== Deltified representation
|
||||
|
||||
Conceptually there are only four object types: commit, tree, tag and
|
||||
blob. However to save space, an object could be stored as a "delta" of
|
||||
another "base" object. These representations are assigned new types
|
||||
ofs-delta and ref-delta, which is only valid in a pack file.
|
||||
|
||||
Both ofs-delta and ref-delta store the "delta" to be applied to
|
||||
another object (called 'base object') to reconstruct the object. The
|
||||
difference between them is, ref-delta directly encodes 20-byte base
|
||||
object name. If the base object is in the same pack, ofs-delta encodes
|
||||
the offset of the base object in the pack instead.
|
||||
|
||||
The base object could also be deltified if it's in the same pack.
|
||||
Ref-delta can also refer to an object outside the pack (i.e. the
|
||||
so-called "thin pack"). When stored on disk however, the pack should
|
||||
be self contained to avoid cyclic dependency.
|
||||
|
||||
The delta data is a sequence of instructions to reconstruct an object
|
||||
from the base object. If the base object is deltified, it must be
|
||||
converted to canonical form first. Each instruction appends more and
|
||||
more data to the target object until it's complete. There are two
|
||||
supported instructions so far: one for copy a byte range from the
|
||||
source object and one for inserting new data embedded in the
|
||||
instruction itself.
|
||||
|
||||
Each instruction has variable length. Instruction type is determined
|
||||
by the seventh bit of the first octet. The following diagrams follow
|
||||
the convention in RFC 1951 (Deflate compressed data format).
|
||||
|
||||
==== Instruction to copy from base object
|
||||
|
||||
+----------+---------+---------+---------+---------+-------+-------+-------+
|
||||
| 1xxxxxxx | offset1 | offset2 | offset3 | offset4 | size1 | size2 | size3 |
|
||||
+----------+---------+---------+---------+---------+-------+-------+-------+
|
||||
|
||||
This is the instruction format to copy a byte range from the source
|
||||
object. It encodes the offset to copy from and the number of bytes to
|
||||
copy. Offset and size are in little-endian order.
|
||||
|
||||
All offset and size bytes are optional. This is to reduce the
|
||||
instruction size when encoding small offsets or sizes. The first seven
|
||||
bits in the first octet determines which of the next seven octets is
|
||||
present. If bit zero is set, offset1 is present. If bit one is set
|
||||
offset2 is present and so on.
|
||||
|
||||
Note that a more compact instruction does not change offset and size
|
||||
encoding. For example, if only offset2 is omitted like below, offset3
|
||||
still contains bits 16-23. It does not become offset2 and contains
|
||||
bits 8-15 even if it's right next to offset1.
|
||||
|
||||
+----------+---------+---------+
|
||||
| 10000101 | offset1 | offset3 |
|
||||
+----------+---------+---------+
|
||||
|
||||
In its most compact form, this instruction only takes up one byte
|
||||
(0x80) with both offset and size omitted, which will have default
|
||||
values zero. There is another exception: size zero is automatically
|
||||
converted to 0x10000.
|
||||
|
||||
==== Instruction to add new data
|
||||
|
||||
+----------+============+
|
||||
| 0xxxxxxx | data |
|
||||
+----------+============+
|
||||
|
||||
This is the instruction to construct target object without the base
|
||||
object. The following data is appended to the target object. The first
|
||||
seven bits of the first octet determines the size of data in
|
||||
bytes. The size must be non-zero.
|
||||
|
||||
==== Reserved instruction
|
||||
|
||||
+----------+============
|
||||
| 00000000 |
|
||||
+----------+============
|
||||
|
||||
This is the instruction reserved for future expansion.
|
||||
|
||||
== Original (version 1) pack-*.idx files have the following format:
|
||||
|
||||
- The header consists of 256 4-byte network byte order
|
||||
integers. N-th entry of this table records the number of
|
||||
objects in the corresponding pack, the first byte of whose
|
||||
object name is less than or equal to N. This is called the
|
||||
'first-level fan-out' table.
|
||||
|
||||
- The header is followed by sorted 24-byte entries, one entry
|
||||
per object in the pack. Each entry is:
|
||||
|
||||
4-byte network byte order integer, recording where the
|
||||
object is stored in the packfile as the offset from the
|
||||
beginning.
|
||||
|
||||
20-byte object name.
|
||||
|
||||
- The file is concluded with a trailer:
|
||||
|
||||
A copy of the 20-byte SHA-1 checksum at the end of
|
||||
corresponding packfile.
|
||||
|
||||
20-byte SHA-1-checksum of all of the above.
|
||||
|
||||
Pack Idx file:
|
||||
|
||||
-- +--------------------------------+
|
||||
fanout | fanout[0] = 2 (for example) |-.
|
||||
table +--------------------------------+ |
|
||||
| fanout[1] | |
|
||||
+--------------------------------+ |
|
||||
| fanout[2] | |
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
|
||||
| fanout[255] = total objects |---.
|
||||
-- +--------------------------------+ | |
|
||||
main | offset | | |
|
||||
index | object name 00XXXXXXXXXXXXXXXX | | |
|
||||
table +--------------------------------+ | |
|
||||
| offset | | |
|
||||
| object name 00XXXXXXXXXXXXXXXX | | |
|
||||
+--------------------------------+<+ |
|
||||
.-| offset | |
|
||||
| | object name 01XXXXXXXXXXXXXXXX | |
|
||||
| +--------------------------------+ |
|
||||
| | offset | |
|
||||
| | object name 01XXXXXXXXXXXXXXXX | |
|
||||
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
|
||||
| | offset | |
|
||||
| | object name FFXXXXXXXXXXXXXXXX | |
|
||||
--| +--------------------------------+<--+
|
||||
trailer | | packfile checksum |
|
||||
| +--------------------------------+
|
||||
| | idxfile checksum |
|
||||
| +--------------------------------+
|
||||
.-------.
|
||||
|
|
||||
Pack file entry: <+
|
||||
|
||||
packed object header:
|
||||
1-byte size extension bit (MSB)
|
||||
type (next 3 bit)
|
||||
size0 (lower 4-bit)
|
||||
n-byte sizeN (as long as MSB is set, each 7-bit)
|
||||
size0..sizeN form 4+7+7+..+7 bit integer, size0
|
||||
is the least significant part, and sizeN is the
|
||||
most significant part.
|
||||
packed object data:
|
||||
If it is not DELTA, then deflated bytes (the size above
|
||||
is the size before compression).
|
||||
If it is REF_DELTA, then
|
||||
20-byte base object name SHA-1 (the size above is the
|
||||
size of the delta data that follows).
|
||||
delta data, deflated.
|
||||
If it is OFS_DELTA, then
|
||||
n-byte offset (see below) interpreted as a negative
|
||||
offset from the type-byte of the header of the
|
||||
ofs-delta entry (the size above is the size of
|
||||
the delta data that follows).
|
||||
delta data, deflated.
|
||||
|
||||
offset encoding:
|
||||
n bytes with MSB set in all but the last one.
|
||||
The offset is then the number constructed by
|
||||
concatenating the lower 7 bit of each byte, and
|
||||
for n >= 2 adding 2^7 + 2^14 + ... + 2^(7*(n-1))
|
||||
to the result.
|
||||
|
||||
|
||||
|
||||
== Version 2 pack-*.idx files support packs larger than 4 GiB, and
|
||||
have some other reorganizations. They have the format:
|
||||
|
||||
- A 4-byte magic number '\377tOc' which is an unreasonable
|
||||
fanout[0] value.
|
||||
|
||||
- A 4-byte version number (= 2)
|
||||
|
||||
- A 256-entry fan-out table just like v1.
|
||||
|
||||
- A table of sorted 20-byte SHA-1 object names. These are
|
||||
packed together without offset values to reduce the cache
|
||||
footprint of the binary search for a specific object name.
|
||||
|
||||
- A table of 4-byte CRC32 values of the packed object data.
|
||||
This is new in v2 so compressed data can be copied directly
|
||||
from pack to pack during repacking without undetected
|
||||
data corruption.
|
||||
|
||||
- A table of 4-byte offset values (in network byte order).
|
||||
These are usually 31-bit pack file offsets, but large
|
||||
offsets are encoded as an index into the next table with
|
||||
the msbit set.
|
||||
|
||||
- A table of 8-byte offset entries (empty for pack files less
|
||||
than 2 GiB). Pack files are organized with heavily used
|
||||
objects toward the front, so most object references should
|
||||
not need to refer to this table.
|
||||
|
||||
- The same trailer as a v1 pack file:
|
||||
|
||||
A copy of the 20-byte SHA-1 checksum at the end of
|
||||
corresponding packfile.
|
||||
|
||||
20-byte SHA-1-checksum of all of the above.
|
||||
|
||||
== multi-pack-index (MIDX) files have the following format:
|
||||
|
||||
The multi-pack-index files refer to multiple pack-files and loose objects.
|
||||
|
||||
In order to allow extensions that add extra data to the MIDX, we organize
|
||||
the body into "chunks" and provide a lookup table at the beginning of the
|
||||
body. The header includes certain length values, such as the number of packs,
|
||||
the number of base MIDX files, hash lengths and types.
|
||||
|
||||
All 4-byte numbers are in network order.
|
||||
|
||||
HEADER:
|
||||
|
||||
4-byte signature:
|
||||
The signature is: {'M', 'I', 'D', 'X'}
|
||||
|
||||
1-byte version number:
|
||||
Git only writes or recognizes version 1.
|
||||
|
||||
1-byte Object Id Version
|
||||
Git only writes or recognizes version 1 (SHA1).
|
||||
|
||||
1-byte number of "chunks"
|
||||
|
||||
1-byte number of base multi-pack-index files:
|
||||
This value is currently always zero.
|
||||
|
||||
4-byte number of pack files
|
||||
|
||||
CHUNK LOOKUP:
|
||||
|
||||
(C + 1) * 12 bytes providing the chunk offsets:
|
||||
First 4 bytes describe chunk id. Value 0 is a terminating label.
|
||||
Other 8 bytes provide offset in current file for chunk to start.
|
||||
(Chunks are provided in file-order, so you can infer the length
|
||||
using the next chunk position if necessary.)
|
||||
|
||||
The remaining data in the body is described one chunk at a time, and
|
||||
these chunks may be given in any order. Chunks are required unless
|
||||
otherwise specified.
|
||||
|
||||
CHUNK DATA:
|
||||
|
||||
Packfile Names (ID: {'P', 'N', 'A', 'M'})
|
||||
Stores the packfile names as concatenated, null-terminated strings.
|
||||
Packfiles must be listed in lexicographic order for fast lookups by
|
||||
name. This is the only chunk not guaranteed to be a multiple of four
|
||||
bytes in length, so should be the last chunk for alignment reasons.
|
||||
|
||||
OID Fanout (ID: {'O', 'I', 'D', 'F'})
|
||||
The ith entry, F[i], stores the number of OIDs with first
|
||||
byte at most i. Thus F[255] stores the total
|
||||
number of objects.
|
||||
|
||||
OID Lookup (ID: {'O', 'I', 'D', 'L'})
|
||||
The OIDs for all objects in the MIDX are stored in lexicographic
|
||||
order in this chunk.
|
||||
|
||||
Object Offsets (ID: {'O', 'O', 'F', 'F'})
|
||||
Stores two 4-byte values for every object.
|
||||
1: The pack-int-id for the pack storing this object.
|
||||
2: The offset within the pack.
|
||||
If all offsets are less than 2^31, then the large offset chunk
|
||||
will not exist and offsets are stored as in IDX v1.
|
||||
If there is at least one offset value larger than 2^32-1, then
|
||||
the large offset chunk must exist. If the large offset chunk
|
||||
exists and the 31st bit is on, then removing that bit reveals
|
||||
the row in the large offsets containing the 8-byte offset of
|
||||
this object.
|
||||
|
||||
[Optional] Object Large Offsets (ID: {'L', 'O', 'F', 'F'})
|
||||
8-byte offsets into large packfiles.
|
||||
|
||||
TRAILER:
|
||||
|
||||
20-byte SHA1-checksum of the above contents.
|
||||
460
Documentation/technical/pack-heuristics.txt
Normal file
460
Documentation/technical/pack-heuristics.txt
Normal file
|
|
@ -0,0 +1,460 @@
|
|||
Concerning Git's Packing Heuristics
|
||||
===================================
|
||||
|
||||
Oh, here's a really stupid question:
|
||||
|
||||
Where do I go
|
||||
to learn the details
|
||||
of Git's packing heuristics?
|
||||
|
||||
Be careful what you ask!
|
||||
|
||||
Followers of the Git, please open the Git IRC Log and turn to
|
||||
February 10, 2006.
|
||||
|
||||
It's a rare occasion, and we are joined by the King Git Himself,
|
||||
Linus Torvalds (linus). Nathaniel Smith, (njs`), has the floor
|
||||
and seeks enlightenment. Others are present, but silent.
|
||||
|
||||
Let's listen in!
|
||||
|
||||
<njs`> Oh, here's a really stupid question -- where do I go to
|
||||
learn the details of Git's packing heuristics? google avails
|
||||
me not, reading the source didn't help a lot, and wading
|
||||
through the whole mailing list seems less efficient than any
|
||||
of that.
|
||||
|
||||
It is a bold start! A plea for help combined with a simultaneous
|
||||
tri-part attack on some of the tried and true mainstays in the quest
|
||||
for enlightenment. Brash accusations of google being useless. Hubris!
|
||||
Maligning the source. Heresy! Disdain for the mailing list archives.
|
||||
Woe.
|
||||
|
||||
<pasky> yes, the packing-related delta stuff is somewhat
|
||||
mysterious even for me ;)
|
||||
|
||||
Ah! Modesty after all.
|
||||
|
||||
<linus> njs, I don't think the docs exist. That's something where
|
||||
I don't think anybody else than me even really got involved.
|
||||
Most of the rest of Git others have been busy with (especially
|
||||
Junio), but packing nobody touched after I did it.
|
||||
|
||||
It's cryptic, yet vague. Linus in style for sure. Wise men
|
||||
interpret this as an apology. A few argue it is merely a
|
||||
statement of fact.
|
||||
|
||||
<njs`> I guess the next step is "read the source again", but I
|
||||
have to build up a certain level of gumption first :-)
|
||||
|
||||
Indeed! On both points.
|
||||
|
||||
<linus> The packing heuristic is actually really really simple.
|
||||
|
||||
Bait...
|
||||
|
||||
<linus> But strange.
|
||||
|
||||
And switch. That ought to do it!
|
||||
|
||||
<linus> Remember: Git really doesn't follow files. So what it does is
|
||||
- generate a list of all objects
|
||||
- sort the list according to magic heuristics
|
||||
- walk the list, using a sliding window, seeing if an object
|
||||
can be diffed against another object in the window
|
||||
- write out the list in recency order
|
||||
|
||||
The traditional understatement:
|
||||
|
||||
<njs`> I suspect that what I'm missing is the precise definition of
|
||||
the word "magic"
|
||||
|
||||
The traditional insight:
|
||||
|
||||
<pasky> yes
|
||||
|
||||
And Babel-like confusion flowed.
|
||||
|
||||
<njs`> oh, hmm, and I'm not sure what this sliding window means either
|
||||
|
||||
<pasky> iirc, it appeared to me to be just the sha1 of the object
|
||||
when reading the code casually ...
|
||||
|
||||
... which simply doesn't sound as a very good heuristics, though ;)
|
||||
|
||||
<njs`> .....and recency order. okay, I think it's clear I didn't
|
||||
even realize how much I wasn't realizing :-)
|
||||
|
||||
Ah, grasshopper! And thus the enlightenment begins anew.
|
||||
|
||||
<linus> The "magic" is actually in theory totally arbitrary.
|
||||
ANY order will give you a working pack, but no, it's not
|
||||
ordered by SHA-1.
|
||||
|
||||
Before talking about the ordering for the sliding delta
|
||||
window, let's talk about the recency order. That's more
|
||||
important in one way.
|
||||
|
||||
<njs`> Right, but if all you want is a working way to pack things
|
||||
together, you could just use cat and save yourself some
|
||||
trouble...
|
||||
|
||||
Waaait for it....
|
||||
|
||||
<linus> The recency ordering (which is basically: put objects
|
||||
_physically_ into the pack in the order that they are
|
||||
"reachable" from the head) is important.
|
||||
|
||||
<njs`> okay
|
||||
|
||||
<linus> It's important because that's the thing that gives packs
|
||||
good locality. It keeps the objects close to the head (whether
|
||||
they are old or new, but they are _reachable_ from the head)
|
||||
at the head of the pack. So packs actually have absolutely
|
||||
_wonderful_ IO patterns.
|
||||
|
||||
Read that again, because it is important.
|
||||
|
||||
<linus> But recency ordering is totally useless for deciding how
|
||||
to actually generate the deltas, so the delta ordering is
|
||||
something else.
|
||||
|
||||
The delta ordering is (wait for it):
|
||||
- first sort by the "basename" of the object, as defined by
|
||||
the name the object was _first_ reached through when
|
||||
generating the object list
|
||||
- within the same basename, sort by size of the object
|
||||
- but always sort different types separately (commits first).
|
||||
|
||||
That's not exactly it, but it's very close.
|
||||
|
||||
<njs`> The "_first_ reached" thing is not too important, just you
|
||||
need some way to break ties since the same objects may be
|
||||
reachable many ways, yes?
|
||||
|
||||
And as if to clarify:
|
||||
|
||||
<linus> The point is that it's all really just any random
|
||||
heuristic, and the ordering is totally unimportant for
|
||||
correctness, but it helps a lot if the heuristic gives
|
||||
"clumping" for things that are likely to delta well against
|
||||
each other.
|
||||
|
||||
It is an important point, so secretly, I did my own research and have
|
||||
included my results below. To be fair, it has changed some over time.
|
||||
And through the magic of Revisionistic History, I draw upon this entry
|
||||
from The Git IRC Logs on my father's birthday, March 1:
|
||||
|
||||
<gitster> The quote from the above linus should be rewritten a
|
||||
bit (wait for it):
|
||||
- first sort by type. Different objects never delta with
|
||||
each other.
|
||||
- then sort by filename/dirname. hash of the basename
|
||||
occupies the top BITS_PER_INT-DIR_BITS bits, and bottom
|
||||
DIR_BITS are for the hash of leading path elements.
|
||||
- then if we are doing "thin" pack, the objects we are _not_
|
||||
going to pack but we know about are sorted earlier than
|
||||
other objects.
|
||||
- and finally sort by size, larger to smaller.
|
||||
|
||||
In one swell-foop, clarification and obscurification! Nonetheless,
|
||||
authoritative. Cryptic, yet concise. It even solicits notions of
|
||||
quotes from The Source Code. Clearly, more study is needed.
|
||||
|
||||
<gitster> That's the sort order. What this means is:
|
||||
- we do not delta different object types.
|
||||
- we prefer to delta the objects with the same full path, but
|
||||
allow files with the same name from different directories.
|
||||
- we always prefer to delta against objects we are not going
|
||||
to send, if there are some.
|
||||
- we prefer to delta against larger objects, so that we have
|
||||
lots of removals.
|
||||
|
||||
The penultimate rule is for "thin" packs. It is used when
|
||||
the other side is known to have such objects.
|
||||
|
||||
There it is again. "Thin" packs. I'm thinking to myself, "What
|
||||
is a 'thin' pack?" So I ask:
|
||||
|
||||
<jdl> What is a "thin" pack?
|
||||
|
||||
<gitster> Use of --objects-edge to rev-list as the upstream of
|
||||
pack-objects. The pack transfer protocol negotiates that.
|
||||
|
||||
Woo hoo! Cleared that _right_ up!
|
||||
|
||||
<gitster> There are two directions - push and fetch.
|
||||
|
||||
There! Did you see it? It is not '"push" and "pull"'! How often the
|
||||
confusion has started here. So casually mentioned, too!
|
||||
|
||||
<gitster> For push, git-send-pack invokes git-receive-pack on the
|
||||
other end. The receive-pack says "I have up to these commits".
|
||||
send-pack looks at them, and computes what are missing from
|
||||
the other end. So "thin" could be the default there.
|
||||
|
||||
In the other direction, fetch, git-fetch-pack and
|
||||
git-clone-pack invokes git-upload-pack on the other end
|
||||
(via ssh or by talking to the daemon).
|
||||
|
||||
There are two cases: fetch-pack with -k and clone-pack is one,
|
||||
fetch-pack without -k is the other. clone-pack and fetch-pack
|
||||
with -k will keep the downloaded packfile without expanded, so
|
||||
we do not use thin pack transfer. Otherwise, the generated
|
||||
pack will have delta without base object in the same pack.
|
||||
|
||||
But fetch-pack without -k will explode the received pack into
|
||||
individual objects, so we automatically ask upload-pack to
|
||||
give us a thin pack if upload-pack supports it.
|
||||
|
||||
OK then.
|
||||
|
||||
Uh.
|
||||
|
||||
Let's return to the previous conversation still in progress.
|
||||
|
||||
<njs`> and "basename" means something like "the tail of end of
|
||||
path of file objects and dir objects, as per basename(3), and
|
||||
we just declare all commit and tag objects to have the same
|
||||
basename" or something?
|
||||
|
||||
Luckily, that too is a point that gitster clarified for us!
|
||||
|
||||
If I might add, the trick is to make files that _might_ be similar be
|
||||
located close to each other in the hash buckets based on their file
|
||||
names. It used to be that "foo/Makefile", "bar/baz/quux/Makefile" and
|
||||
"Makefile" all landed in the same bucket due to their common basename,
|
||||
"Makefile". However, now they land in "close" buckets.
|
||||
|
||||
The algorithm allows not just for the _same_ bucket, but for _close_
|
||||
buckets to be considered delta candidates. The rationale is
|
||||
essentially that files, like Makefiles, often have very similar
|
||||
content no matter what directory they live in.
|
||||
|
||||
<linus> I played around with different delta algorithms, and with
|
||||
making the "delta window" bigger, but having too big of a
|
||||
sliding window makes it very expensive to generate the pack:
|
||||
you need to compare every object with a _ton_ of other objects.
|
||||
|
||||
There are a number of other trivial heuristics too, which
|
||||
basically boil down to "don't bother even trying to delta this
|
||||
pair" if we can tell before-hand that the delta isn't worth it
|
||||
(due to size differences, where we can take a previous delta
|
||||
result into account to decide that "ok, no point in trying
|
||||
that one, it will be worse").
|
||||
|
||||
End result: packing is actually very size efficient. It's
|
||||
somewhat CPU-wasteful, but on the other hand, since you're
|
||||
really only supposed to do it maybe once a month (and you can
|
||||
do it during the night), nobody really seems to care.
|
||||
|
||||
Nice Engineering Touch, there. Find when it doesn't matter, and
|
||||
proclaim it a non-issue. Good style too!
|
||||
|
||||
<njs`> So, just to repeat to see if I'm following, we start by
|
||||
getting a list of the objects we want to pack, we sort it by
|
||||
this heuristic (basically lexicographically on the tuple
|
||||
(type, basename, size)).
|
||||
|
||||
Then we walk through this list, and calculate a delta of
|
||||
each object against the last n (tunable parameter) objects,
|
||||
and pick the smallest of these deltas.
|
||||
|
||||
Vastly simplified, but the essence is there!
|
||||
|
||||
<linus> Correct.
|
||||
|
||||
<njs`> And then once we have picked a delta or fulltext to
|
||||
represent each object, we re-sort by recency, and write them
|
||||
out in that order.
|
||||
|
||||
<linus> Yup. Some other small details:
|
||||
|
||||
And of course there is the "Other Shoe" Factor too.
|
||||
|
||||
<linus> - We limit the delta depth to another magic value (right
|
||||
now both the window and delta depth magic values are just "10")
|
||||
|
||||
<njs`> Hrm, my intuition is that you'd end up with really _bad_ IO
|
||||
patterns, because the things you want are near by, but to
|
||||
actually reconstruct them you may have to jump all over in
|
||||
random ways.
|
||||
|
||||
<linus> - When we write out a delta, and we haven't yet written
|
||||
out the object it is a delta against, we write out the base
|
||||
object first. And no, when we reconstruct them, we actually
|
||||
get nice IO patterns, because:
|
||||
- larger objects tend to be "more recent" (Linus' law: files grow)
|
||||
- we actively try to generate deltas from a larger object to a
|
||||
smaller one
|
||||
- this means that the top-of-tree very seldom has deltas
|
||||
(i.e. deltas in _practice_ are "backwards deltas")
|
||||
|
||||
Again, we should reread that whole paragraph. Not just because
|
||||
Linus has slipped Linus's Law in there on us, but because it is
|
||||
important. Let's make sure we clarify some of the points here:
|
||||
|
||||
<njs`> So the point is just that in practice, delta order and
|
||||
recency order match each other quite well.
|
||||
|
||||
<linus> Yes. There's another nice side to this (and yes, it was
|
||||
designed that way ;):
|
||||
- the reason we generate deltas against the larger object is
|
||||
actually a big space saver too!
|
||||
|
||||
<njs`> Hmm, but your last comment (if "we haven't yet written out
|
||||
the object it is a delta against, we write out the base object
|
||||
first"), seems like it would make these facts mostly
|
||||
irrelevant because even if in practice you would not have to
|
||||
wander around much, in fact you just brute-force say that in
|
||||
the cases where you might have to wander, don't do that :-)
|
||||
|
||||
<linus> Yes and no. Notice the rule: we only write out the base
|
||||
object first if the delta against it was more recent. That
|
||||
means that you can actually have deltas that refer to a base
|
||||
object that is _not_ close to the delta object, but that only
|
||||
happens when the delta is needed to generate an _old_ object.
|
||||
|
||||
<linus> See?
|
||||
|
||||
Yeah, no. I missed that on the first two or three readings myself.
|
||||
|
||||
<linus> This keeps the front of the pack dense. The front of the
|
||||
pack never contains data that isn't relevant to a "recent"
|
||||
object. The size optimization comes from our use of xdelta
|
||||
(but is true for many other delta algorithms): removing data
|
||||
is cheaper (in size) than adding data.
|
||||
|
||||
When you remove data, you only need to say "copy bytes n--m".
|
||||
In contrast, in a delta that _adds_ data, you have to say "add
|
||||
these bytes: 'actual data goes here'"
|
||||
|
||||
*** njs` has quit: Read error: 104 (Connection reset by peer)
|
||||
|
||||
<linus> Uhhuh. I hope I didn't blow njs` mind.
|
||||
|
||||
*** njs` has joined channel #git
|
||||
|
||||
<pasky> :)
|
||||
|
||||
The silent observers are amused. Of course.
|
||||
|
||||
And as if njs` was expected to be omniscient:
|
||||
|
||||
<linus> njs - did you miss anything?
|
||||
|
||||
OK, I'll spell it out. That's Geek Humor. If njs` was not actually
|
||||
connected for a little bit there, how would he know if missed anything
|
||||
while he was disconnected? He's a benevolent dictator with a sense of
|
||||
humor! Well noted!
|
||||
|
||||
<njs`> Stupid router. Or gremlins, or whatever.
|
||||
|
||||
It's a cheap shot at Cisco. Take 'em when you can.
|
||||
|
||||
<njs`> Yes and no. Notice the rule: we only write out the base
|
||||
object first if the delta against it was more recent.
|
||||
|
||||
I'm getting lost in all these orders, let me re-read :-)
|
||||
So the write-out order is from most recent to least recent?
|
||||
(Conceivably it could be the opposite way too, I'm not sure if
|
||||
we've said) though my connection back at home is logging, so I
|
||||
can just read what you said there :-)
|
||||
|
||||
And for those of you paying attention, the Omniscient Trick has just
|
||||
been detailed!
|
||||
|
||||
<linus> Yes, we always write out most recent first
|
||||
|
||||
<njs`> And, yeah, I got the part about deeper-in-history stuff
|
||||
having worse IO characteristics, one sort of doesn't care.
|
||||
|
||||
<linus> With the caveat that if the "most recent" needs an older
|
||||
object to delta against (hey, shrinking sometimes does
|
||||
happen), we write out the old object with the delta.
|
||||
|
||||
<njs`> (if only it happened more...)
|
||||
|
||||
<linus> Anyway, the pack-file could easily be denser still, but
|
||||
because it's used both for streaming (the Git protocol) and
|
||||
for on-disk, it has a few pessimizations.
|
||||
|
||||
Actually, it is a made-up word. But it is a made-up word being
|
||||
used as setup for a later optimization, which is a real word:
|
||||
|
||||
<linus> In particular, while the pack-file is then compressed,
|
||||
it's compressed just one object at a time, so the actual
|
||||
compression factor is less than it could be in theory. But it
|
||||
means that it's all nice random-access with a simple index to
|
||||
do "object name->location in packfile" translation.
|
||||
|
||||
<njs`> I'm assuming the real win for delta-ing large->small is
|
||||
more homogeneous statistics for gzip to run over?
|
||||
|
||||
(You have to put the bytes in one place or another, but
|
||||
putting them in a larger blob wins on compression)
|
||||
|
||||
Actually, what is the compression strategy -- each delta
|
||||
individually gzipped, the whole file gzipped, somewhere in
|
||||
between, no compression at all, ....?
|
||||
|
||||
Right.
|
||||
|
||||
Reality IRC sets in. For example:
|
||||
|
||||
<pasky> I'll read the rest in the morning, I really have to go
|
||||
sleep or there's no hope whatsoever for me at the today's
|
||||
exam... g'nite all.
|
||||
|
||||
Heh.
|
||||
|
||||
<linus> pasky: g'nite
|
||||
|
||||
<njs`> pasky: 'luck
|
||||
|
||||
<linus> Right: large->small matters exactly because of compression
|
||||
behaviour. If it was non-compressed, it probably wouldn't make
|
||||
any difference.
|
||||
|
||||
<njs`> yeah
|
||||
|
||||
<linus> Anyway: I'm not even trying to claim that the pack-files
|
||||
are perfect, but they do tend to have a nice balance of
|
||||
density vs ease-of use.
|
||||
|
||||
Gasp! OK, saved. That's a fair Engineering trade off. Close call!
|
||||
In fact, Linus reflects on some Basic Engineering Fundamentals,
|
||||
design options, etc.
|
||||
|
||||
<linus> More importantly, they allow Git to still _conceptually_
|
||||
never deal with deltas at all, and be a "whole object" store.
|
||||
|
||||
Which has some problems (we discussed bad huge-file
|
||||
behaviour on the Git lists the other day), but it does mean
|
||||
that the basic Git concepts are really really simple and
|
||||
straightforward.
|
||||
|
||||
It's all been quite stable.
|
||||
|
||||
Which I think is very much a result of having very simple
|
||||
basic ideas, so that there's never any confusion about what's
|
||||
going on.
|
||||
|
||||
Bugs happen, but they are "simple" bugs. And bugs that
|
||||
actually get some object store detail wrong are almost always
|
||||
so obvious that they never go anywhere.
|
||||
|
||||
<njs`> Yeah.
|
||||
|
||||
Nuff said.
|
||||
|
||||
<linus> Anyway. I'm off for bed. It's not 6AM here, but I've got
|
||||
three kids, and have to get up early in the morning to send
|
||||
them off. I need my beauty sleep.
|
||||
|
||||
<njs`> :-)
|
||||
|
||||
<njs`> appreciate the infodump, I really was failing to find the
|
||||
details on Git packs :-)
|
||||
|
||||
And now you know the rest of the story.
|
||||
674
Documentation/technical/pack-protocol.txt
Normal file
674
Documentation/technical/pack-protocol.txt
Normal file
|
|
@ -0,0 +1,674 @@
|
|||
Packfile transfer protocols
|
||||
===========================
|
||||
|
||||
Git supports transferring data in packfiles over the ssh://, git://, http:// and
|
||||
file:// transports. There exist two sets of protocols, one for pushing
|
||||
data from a client to a server and another for fetching data from a
|
||||
server to a client. The three transports (ssh, git, file) use the same
|
||||
protocol to transfer data. http is documented in http-protocol.txt.
|
||||
|
||||
The processes invoked in the canonical Git implementation are 'upload-pack'
|
||||
on the server side and 'fetch-pack' on the client side for fetching data;
|
||||
then 'receive-pack' on the server and 'send-pack' on the client for pushing
|
||||
data. The protocol functions to have a server tell a client what is
|
||||
currently on the server, then for the two to negotiate the smallest amount
|
||||
of data to send in order to fully update one or the other.
|
||||
|
||||
pkt-line Format
|
||||
---------------
|
||||
|
||||
The descriptions below build on the pkt-line format described in
|
||||
protocol-common.txt. When the grammar indicate `PKT-LINE(...)`, unless
|
||||
otherwise noted the usual pkt-line LF rules apply: the sender SHOULD
|
||||
include a LF, but the receiver MUST NOT complain if it is not present.
|
||||
|
||||
An error packet is a special pkt-line that contains an error string.
|
||||
|
||||
----
|
||||
error-line = PKT-LINE("ERR" SP explanation-text)
|
||||
----
|
||||
|
||||
Throughout the protocol, where `PKT-LINE(...)` is expected, an error packet MAY
|
||||
be sent. Once this packet is sent by a client or a server, the data transfer
|
||||
process defined in this protocol is terminated.
|
||||
|
||||
Transports
|
||||
----------
|
||||
There are three transports over which the packfile protocol is
|
||||
initiated. The Git transport is a simple, unauthenticated server that
|
||||
takes the command (almost always 'upload-pack', though Git
|
||||
servers can be configured to be globally writable, in which 'receive-
|
||||
pack' initiation is also allowed) with which the client wishes to
|
||||
communicate and executes it and connects it to the requesting
|
||||
process.
|
||||
|
||||
In the SSH transport, the client just runs the 'upload-pack'
|
||||
or 'receive-pack' process on the server over the SSH protocol and then
|
||||
communicates with that invoked process over the SSH connection.
|
||||
|
||||
The file:// transport runs the 'upload-pack' or 'receive-pack'
|
||||
process locally and communicates with it over a pipe.
|
||||
|
||||
Extra Parameters
|
||||
----------------
|
||||
|
||||
The protocol provides a mechanism in which clients can send additional
|
||||
information in its first message to the server. These are called "Extra
|
||||
Parameters", and are supported by the Git, SSH, and HTTP protocols.
|
||||
|
||||
Each Extra Parameter takes the form of `<key>=<value>` or `<key>`.
|
||||
|
||||
Servers that receive any such Extra Parameters MUST ignore all
|
||||
unrecognized keys. Currently, the only Extra Parameter recognized is
|
||||
"version" with a value of '1' or '2'. See protocol-v2.txt for more
|
||||
information on protocol version 2.
|
||||
|
||||
Git Transport
|
||||
-------------
|
||||
|
||||
The Git transport starts off by sending the command and repository
|
||||
on the wire using the pkt-line format, followed by a NUL byte and a
|
||||
hostname parameter, terminated by a NUL byte.
|
||||
|
||||
0033git-upload-pack /project.git\0host=myserver.com\0
|
||||
|
||||
The transport may send Extra Parameters by adding an additional NUL
|
||||
byte, and then adding one or more NUL-terminated strings:
|
||||
|
||||
003egit-upload-pack /project.git\0host=myserver.com\0\0version=1\0
|
||||
|
||||
--
|
||||
git-proto-request = request-command SP pathname NUL
|
||||
[ host-parameter NUL ] [ NUL extra-parameters ]
|
||||
request-command = "git-upload-pack" / "git-receive-pack" /
|
||||
"git-upload-archive" ; case sensitive
|
||||
pathname = *( %x01-ff ) ; exclude NUL
|
||||
host-parameter = "host=" hostname [ ":" port ]
|
||||
extra-parameters = 1*extra-parameter
|
||||
extra-parameter = 1*( %x01-ff ) NUL
|
||||
--
|
||||
|
||||
host-parameter is used for the
|
||||
git-daemon name based virtual hosting. See --interpolated-path
|
||||
option to git daemon, with the %H/%CH format characters.
|
||||
|
||||
Basically what the Git client is doing to connect to an 'upload-pack'
|
||||
process on the server side over the Git protocol is this:
|
||||
|
||||
$ echo -e -n \
|
||||
"0039git-upload-pack /schacon/gitbook.git\0host=example.com\0" |
|
||||
nc -v example.com 9418
|
||||
|
||||
|
||||
SSH Transport
|
||||
-------------
|
||||
|
||||
Initiating the upload-pack or receive-pack processes over SSH is
|
||||
executing the binary on the server via SSH remote execution.
|
||||
It is basically equivalent to running this:
|
||||
|
||||
$ ssh git.example.com "git-upload-pack '/project.git'"
|
||||
|
||||
For a server to support Git pushing and pulling for a given user over
|
||||
SSH, that user needs to be able to execute one or both of those
|
||||
commands via the SSH shell that they are provided on login. On some
|
||||
systems, that shell access is limited to only being able to run those
|
||||
two commands, or even just one of them.
|
||||
|
||||
In an ssh:// format URI, it's absolute in the URI, so the '/' after
|
||||
the host name (or port number) is sent as an argument, which is then
|
||||
read by the remote git-upload-pack exactly as is, so it's effectively
|
||||
an absolute path in the remote filesystem.
|
||||
|
||||
git clone ssh://user@example.com/project.git
|
||||
|
|
||||
v
|
||||
ssh user@example.com "git-upload-pack '/project.git'"
|
||||
|
||||
In a "user@host:path" format URI, its relative to the user's home
|
||||
directory, because the Git client will run:
|
||||
|
||||
git clone user@example.com:project.git
|
||||
|
|
||||
v
|
||||
ssh user@example.com "git-upload-pack 'project.git'"
|
||||
|
||||
The exception is if a '~' is used, in which case
|
||||
we execute it without the leading '/'.
|
||||
|
||||
ssh://user@example.com/~alice/project.git,
|
||||
|
|
||||
v
|
||||
ssh user@example.com "git-upload-pack '~alice/project.git'"
|
||||
|
||||
Depending on the value of the `protocol.version` configuration variable,
|
||||
Git may attempt to send Extra Parameters as a colon-separated string in
|
||||
the GIT_PROTOCOL environment variable. This is done only if
|
||||
the `ssh.variant` configuration variable indicates that the ssh command
|
||||
supports passing environment variables as an argument.
|
||||
|
||||
A few things to remember here:
|
||||
|
||||
- The "command name" is spelled with dash (e.g. git-upload-pack), but
|
||||
this can be overridden by the client;
|
||||
|
||||
- The repository path is always quoted with single quotes.
|
||||
|
||||
Fetching Data From a Server
|
||||
---------------------------
|
||||
|
||||
When one Git repository wants to get data that a second repository
|
||||
has, the first can 'fetch' from the second. This operation determines
|
||||
what data the server has that the client does not then streams that
|
||||
data down to the client in packfile format.
|
||||
|
||||
|
||||
Reference Discovery
|
||||
-------------------
|
||||
|
||||
When the client initially connects the server will immediately respond
|
||||
with a version number (if "version=1" is sent as an Extra Parameter),
|
||||
and a listing of each reference it has (all branches and tags) along
|
||||
with the object name that each reference currently points to.
|
||||
|
||||
$ echo -e -n "0044git-upload-pack /schacon/gitbook.git\0host=example.com\0\0version=1\0" |
|
||||
nc -v example.com 9418
|
||||
000aversion 1
|
||||
00887217a7c7e582c46cec22a130adf4b9d7d950fba0 HEAD\0multi_ack thin-pack
|
||||
side-band side-band-64k ofs-delta shallow no-progress include-tag
|
||||
00441d3fcd5ced445d1abc402225c0b8a1299641f497 refs/heads/integration
|
||||
003f7217a7c7e582c46cec22a130adf4b9d7d950fba0 refs/heads/master
|
||||
003cb88d2441cac0977faf98efc80305012112238d9d refs/tags/v0.9
|
||||
003c525128480b96c89e6418b1e40909bf6c5b2d580f refs/tags/v1.0
|
||||
003fe92df48743b7bc7d26bcaabfddde0a1e20cae47c refs/tags/v1.0^{}
|
||||
0000
|
||||
|
||||
The returned response is a pkt-line stream describing each ref and
|
||||
its current value. The stream MUST be sorted by name according to
|
||||
the C locale ordering.
|
||||
|
||||
If HEAD is a valid ref, HEAD MUST appear as the first advertised
|
||||
ref. If HEAD is not a valid ref, HEAD MUST NOT appear in the
|
||||
advertisement list at all, but other refs may still appear.
|
||||
|
||||
The stream MUST include capability declarations behind a NUL on the
|
||||
first ref. The peeled value of a ref (that is "ref^{}") MUST be
|
||||
immediately after the ref itself, if presented. A conforming server
|
||||
MUST peel the ref if it's an annotated tag.
|
||||
|
||||
----
|
||||
advertised-refs = *1("version 1")
|
||||
(no-refs / list-of-refs)
|
||||
*shallow
|
||||
flush-pkt
|
||||
|
||||
no-refs = PKT-LINE(zero-id SP "capabilities^{}"
|
||||
NUL capability-list)
|
||||
|
||||
list-of-refs = first-ref *other-ref
|
||||
first-ref = PKT-LINE(obj-id SP refname
|
||||
NUL capability-list)
|
||||
|
||||
other-ref = PKT-LINE(other-tip / other-peeled)
|
||||
other-tip = obj-id SP refname
|
||||
other-peeled = obj-id SP refname "^{}"
|
||||
|
||||
shallow = PKT-LINE("shallow" SP obj-id)
|
||||
|
||||
capability-list = capability *(SP capability)
|
||||
capability = 1*(LC_ALPHA / DIGIT / "-" / "_")
|
||||
LC_ALPHA = %x61-7A
|
||||
----
|
||||
|
||||
Server and client MUST use lowercase for obj-id, both MUST treat obj-id
|
||||
as case-insensitive.
|
||||
|
||||
See protocol-capabilities.txt for a list of allowed server capabilities
|
||||
and descriptions.
|
||||
|
||||
Packfile Negotiation
|
||||
--------------------
|
||||
After reference and capabilities discovery, the client can decide to
|
||||
terminate the connection by sending a flush-pkt, telling the server it can
|
||||
now gracefully terminate, and disconnect, when it does not need any pack
|
||||
data. This can happen with the ls-remote command, and also can happen when
|
||||
the client already is up to date.
|
||||
|
||||
Otherwise, it enters the negotiation phase, where the client and
|
||||
server determine what the minimal packfile necessary for transport is,
|
||||
by telling the server what objects it wants, its shallow objects
|
||||
(if any), and the maximum commit depth it wants (if any). The client
|
||||
will also send a list of the capabilities it wants to be in effect,
|
||||
out of what the server said it could do with the first 'want' line.
|
||||
|
||||
----
|
||||
upload-request = want-list
|
||||
*shallow-line
|
||||
*1depth-request
|
||||
[filter-request]
|
||||
flush-pkt
|
||||
|
||||
want-list = first-want
|
||||
*additional-want
|
||||
|
||||
shallow-line = PKT-LINE("shallow" SP obj-id)
|
||||
|
||||
depth-request = PKT-LINE("deepen" SP depth) /
|
||||
PKT-LINE("deepen-since" SP timestamp) /
|
||||
PKT-LINE("deepen-not" SP ref)
|
||||
|
||||
first-want = PKT-LINE("want" SP obj-id SP capability-list)
|
||||
additional-want = PKT-LINE("want" SP obj-id)
|
||||
|
||||
depth = 1*DIGIT
|
||||
|
||||
filter-request = PKT-LINE("filter" SP filter-spec)
|
||||
----
|
||||
|
||||
Clients MUST send all the obj-ids it wants from the reference
|
||||
discovery phase as 'want' lines. Clients MUST send at least one
|
||||
'want' command in the request body. Clients MUST NOT mention an
|
||||
obj-id in a 'want' command which did not appear in the response
|
||||
obtained through ref discovery.
|
||||
|
||||
The client MUST write all obj-ids which it only has shallow copies
|
||||
of (meaning that it does not have the parents of a commit) as
|
||||
'shallow' lines so that the server is aware of the limitations of
|
||||
the client's history.
|
||||
|
||||
The client now sends the maximum commit history depth it wants for
|
||||
this transaction, which is the number of commits it wants from the
|
||||
tip of the history, if any, as a 'deepen' line. A depth of 0 is the
|
||||
same as not making a depth request. The client does not want to receive
|
||||
any commits beyond this depth, nor does it want objects needed only to
|
||||
complete those commits. Commits whose parents are not received as a
|
||||
result are defined as shallow and marked as such in the server. This
|
||||
information is sent back to the client in the next step.
|
||||
|
||||
The client can optionally request that pack-objects omit various
|
||||
objects from the packfile using one of several filtering techniques.
|
||||
These are intended for use with partial clone and partial fetch
|
||||
operations. An object that does not meet a filter-spec value is
|
||||
omitted unless explicitly requested in a 'want' line. See `rev-list`
|
||||
for possible filter-spec values.
|
||||
|
||||
Once all the 'want's and 'shallow's (and optional 'deepen') are
|
||||
transferred, clients MUST send a flush-pkt, to tell the server side
|
||||
that it is done sending the list.
|
||||
|
||||
Otherwise, if the client sent a positive depth request, the server
|
||||
will determine which commits will and will not be shallow and
|
||||
send this information to the client. If the client did not request
|
||||
a positive depth, this step is skipped.
|
||||
|
||||
----
|
||||
shallow-update = *shallow-line
|
||||
*unshallow-line
|
||||
flush-pkt
|
||||
|
||||
shallow-line = PKT-LINE("shallow" SP obj-id)
|
||||
|
||||
unshallow-line = PKT-LINE("unshallow" SP obj-id)
|
||||
----
|
||||
|
||||
If the client has requested a positive depth, the server will compute
|
||||
the set of commits which are no deeper than the desired depth. The set
|
||||
of commits start at the client's wants.
|
||||
|
||||
The server writes 'shallow' lines for each
|
||||
commit whose parents will not be sent as a result. The server writes
|
||||
an 'unshallow' line for each commit which the client has indicated is
|
||||
shallow, but is no longer shallow at the currently requested depth
|
||||
(that is, its parents will now be sent). The server MUST NOT mark
|
||||
as unshallow anything which the client has not indicated was shallow.
|
||||
|
||||
Now the client will send a list of the obj-ids it has using 'have'
|
||||
lines, so the server can make a packfile that only contains the objects
|
||||
that the client needs. In multi_ack mode, the canonical implementation
|
||||
will send up to 32 of these at a time, then will send a flush-pkt. The
|
||||
canonical implementation will skip ahead and send the next 32 immediately,
|
||||
so that there is always a block of 32 "in-flight on the wire" at a time.
|
||||
|
||||
----
|
||||
upload-haves = have-list
|
||||
compute-end
|
||||
|
||||
have-list = *have-line
|
||||
have-line = PKT-LINE("have" SP obj-id)
|
||||
compute-end = flush-pkt / PKT-LINE("done")
|
||||
----
|
||||
|
||||
If the server reads 'have' lines, it then will respond by ACKing any
|
||||
of the obj-ids the client said it had that the server also has. The
|
||||
server will ACK obj-ids differently depending on which ack mode is
|
||||
chosen by the client.
|
||||
|
||||
In multi_ack mode:
|
||||
|
||||
* the server will respond with 'ACK obj-id continue' for any common
|
||||
commits.
|
||||
|
||||
* once the server has found an acceptable common base commit and is
|
||||
ready to make a packfile, it will blindly ACK all 'have' obj-ids
|
||||
back to the client.
|
||||
|
||||
* the server will then send a 'NAK' and then wait for another response
|
||||
from the client - either a 'done' or another list of 'have' lines.
|
||||
|
||||
In multi_ack_detailed mode:
|
||||
|
||||
* the server will differentiate the ACKs where it is signaling
|
||||
that it is ready to send data with 'ACK obj-id ready' lines, and
|
||||
signals the identified common commits with 'ACK obj-id common' lines.
|
||||
|
||||
Without either multi_ack or multi_ack_detailed:
|
||||
|
||||
* upload-pack sends "ACK obj-id" on the first common object it finds.
|
||||
After that it says nothing until the client gives it a "done".
|
||||
|
||||
* upload-pack sends "NAK" on a flush-pkt if no common object
|
||||
has been found yet. If one has been found, and thus an ACK
|
||||
was already sent, it's silent on the flush-pkt.
|
||||
|
||||
After the client has gotten enough ACK responses that it can determine
|
||||
that the server has enough information to send an efficient packfile
|
||||
(in the canonical implementation, this is determined when it has received
|
||||
enough ACKs that it can color everything left in the --date-order queue
|
||||
as common with the server, or the --date-order queue is empty), or the
|
||||
client determines that it wants to give up (in the canonical implementation,
|
||||
this is determined when the client sends 256 'have' lines without getting
|
||||
any of them ACKed by the server - meaning there is nothing in common and
|
||||
the server should just send all of its objects), then the client will send
|
||||
a 'done' command. The 'done' command signals to the server that the client
|
||||
is ready to receive its packfile data.
|
||||
|
||||
However, the 256 limit *only* turns on in the canonical client
|
||||
implementation if we have received at least one "ACK %s continue"
|
||||
during a prior round. This helps to ensure that at least one common
|
||||
ancestor is found before we give up entirely.
|
||||
|
||||
Once the 'done' line is read from the client, the server will either
|
||||
send a final 'ACK obj-id' or it will send a 'NAK'. 'obj-id' is the object
|
||||
name of the last commit determined to be common. The server only sends
|
||||
ACK after 'done' if there is at least one common base and multi_ack or
|
||||
multi_ack_detailed is enabled. The server always sends NAK after 'done'
|
||||
if there is no common base found.
|
||||
|
||||
Instead of 'ACK' or 'NAK', the server may send an error message (for
|
||||
example, if it does not recognize an object in a 'want' line received
|
||||
from the client).
|
||||
|
||||
Then the server will start sending its packfile data.
|
||||
|
||||
----
|
||||
server-response = *ack_multi ack / nak
|
||||
ack_multi = PKT-LINE("ACK" SP obj-id ack_status)
|
||||
ack_status = "continue" / "common" / "ready"
|
||||
ack = PKT-LINE("ACK" SP obj-id)
|
||||
nak = PKT-LINE("NAK")
|
||||
----
|
||||
|
||||
A simple clone may look like this (with no 'have' lines):
|
||||
|
||||
----
|
||||
C: 0054want 74730d410fcb6603ace96f1dc55ea6196122532d multi_ack \
|
||||
side-band-64k ofs-delta\n
|
||||
C: 0032want 7d1665144a3a975c05f1f43902ddaf084e784dbe\n
|
||||
C: 0032want 5a3f6be755bbb7deae50065988cbfa1ffa9ab68a\n
|
||||
C: 0032want 7e47fe2bd8d01d481f44d7af0531bd93d3b21c01\n
|
||||
C: 0032want 74730d410fcb6603ace96f1dc55ea6196122532d\n
|
||||
C: 0000
|
||||
C: 0009done\n
|
||||
|
||||
S: 0008NAK\n
|
||||
S: [PACKFILE]
|
||||
----
|
||||
|
||||
An incremental update (fetch) response might look like this:
|
||||
|
||||
----
|
||||
C: 0054want 74730d410fcb6603ace96f1dc55ea6196122532d multi_ack \
|
||||
side-band-64k ofs-delta\n
|
||||
C: 0032want 7d1665144a3a975c05f1f43902ddaf084e784dbe\n
|
||||
C: 0032want 5a3f6be755bbb7deae50065988cbfa1ffa9ab68a\n
|
||||
C: 0000
|
||||
C: 0032have 7e47fe2bd8d01d481f44d7af0531bd93d3b21c01\n
|
||||
C: [30 more have lines]
|
||||
C: 0032have 74730d410fcb6603ace96f1dc55ea6196122532d\n
|
||||
C: 0000
|
||||
|
||||
S: 003aACK 7e47fe2bd8d01d481f44d7af0531bd93d3b21c01 continue\n
|
||||
S: 003aACK 74730d410fcb6603ace96f1dc55ea6196122532d continue\n
|
||||
S: 0008NAK\n
|
||||
|
||||
C: 0009done\n
|
||||
|
||||
S: 0031ACK 74730d410fcb6603ace96f1dc55ea6196122532d\n
|
||||
S: [PACKFILE]
|
||||
----
|
||||
|
||||
|
||||
Packfile Data
|
||||
-------------
|
||||
|
||||
Now that the client and server have finished negotiation about what
|
||||
the minimal amount of data that needs to be sent to the client is, the server
|
||||
will construct and send the required data in packfile format.
|
||||
|
||||
See pack-format.txt for what the packfile itself actually looks like.
|
||||
|
||||
If 'side-band' or 'side-band-64k' capabilities have been specified by
|
||||
the client, the server will send the packfile data multiplexed.
|
||||
|
||||
Each packet starting with the packet-line length of the amount of data
|
||||
that follows, followed by a single byte specifying the sideband the
|
||||
following data is coming in on.
|
||||
|
||||
In 'side-band' mode, it will send up to 999 data bytes plus 1 control
|
||||
code, for a total of up to 1000 bytes in a pkt-line. In 'side-band-64k'
|
||||
mode it will send up to 65519 data bytes plus 1 control code, for a
|
||||
total of up to 65520 bytes in a pkt-line.
|
||||
|
||||
The sideband byte will be a '1', '2' or a '3'. Sideband '1' will contain
|
||||
packfile data, sideband '2' will be used for progress information that the
|
||||
client will generally print to stderr and sideband '3' is used for error
|
||||
information.
|
||||
|
||||
If no 'side-band' capability was specified, the server will stream the
|
||||
entire packfile without multiplexing.
|
||||
|
||||
|
||||
Pushing Data To a Server
|
||||
------------------------
|
||||
|
||||
Pushing data to a server will invoke the 'receive-pack' process on the
|
||||
server, which will allow the client to tell it which references it should
|
||||
update and then send all the data the server will need for those new
|
||||
references to be complete. Once all the data is received and validated,
|
||||
the server will then update its references to what the client specified.
|
||||
|
||||
Authentication
|
||||
--------------
|
||||
|
||||
The protocol itself contains no authentication mechanisms. That is to be
|
||||
handled by the transport, such as SSH, before the 'receive-pack' process is
|
||||
invoked. If 'receive-pack' is configured over the Git transport, those
|
||||
repositories will be writable by anyone who can access that port (9418) as
|
||||
that transport is unauthenticated.
|
||||
|
||||
Reference Discovery
|
||||
-------------------
|
||||
|
||||
The reference discovery phase is done nearly the same way as it is in the
|
||||
fetching protocol. Each reference obj-id and name on the server is sent
|
||||
in packet-line format to the client, followed by a flush-pkt. The only
|
||||
real difference is that the capability listing is different - the only
|
||||
possible values are 'report-status', 'delete-refs', 'ofs-delta' and
|
||||
'push-options'.
|
||||
|
||||
Reference Update Request and Packfile Transfer
|
||||
----------------------------------------------
|
||||
|
||||
Once the client knows what references the server is at, it can send a
|
||||
list of reference update requests. For each reference on the server
|
||||
that it wants to update, it sends a line listing the obj-id currently on
|
||||
the server, the obj-id the client would like to update it to and the name
|
||||
of the reference.
|
||||
|
||||
This list is followed by a flush-pkt.
|
||||
|
||||
----
|
||||
update-requests = *shallow ( command-list | push-cert )
|
||||
|
||||
shallow = PKT-LINE("shallow" SP obj-id)
|
||||
|
||||
command-list = PKT-LINE(command NUL capability-list)
|
||||
*PKT-LINE(command)
|
||||
flush-pkt
|
||||
|
||||
command = create / delete / update
|
||||
create = zero-id SP new-id SP name
|
||||
delete = old-id SP zero-id SP name
|
||||
update = old-id SP new-id SP name
|
||||
|
||||
old-id = obj-id
|
||||
new-id = obj-id
|
||||
|
||||
push-cert = PKT-LINE("push-cert" NUL capability-list LF)
|
||||
PKT-LINE("certificate version 0.1" LF)
|
||||
PKT-LINE("pusher" SP ident LF)
|
||||
PKT-LINE("pushee" SP url LF)
|
||||
PKT-LINE("nonce" SP nonce LF)
|
||||
*PKT-LINE("push-option" SP push-option LF)
|
||||
PKT-LINE(LF)
|
||||
*PKT-LINE(command LF)
|
||||
*PKT-LINE(gpg-signature-lines LF)
|
||||
PKT-LINE("push-cert-end" LF)
|
||||
|
||||
push-option = 1*( VCHAR | SP )
|
||||
----
|
||||
|
||||
If the server has advertised the 'push-options' capability and the client has
|
||||
specified 'push-options' as part of the capability list above, the client then
|
||||
sends its push options followed by a flush-pkt.
|
||||
|
||||
----
|
||||
push-options = *PKT-LINE(push-option) flush-pkt
|
||||
----
|
||||
|
||||
For backwards compatibility with older Git servers, if the client sends a push
|
||||
cert and push options, it MUST send its push options both embedded within the
|
||||
push cert and after the push cert. (Note that the push options within the cert
|
||||
are prefixed, but the push options after the cert are not.) Both these lists
|
||||
MUST be the same, modulo the prefix.
|
||||
|
||||
After that the packfile that
|
||||
should contain all the objects that the server will need to complete the new
|
||||
references will be sent.
|
||||
|
||||
----
|
||||
packfile = "PACK" 28*(OCTET)
|
||||
----
|
||||
|
||||
If the receiving end does not support delete-refs, the sending end MUST
|
||||
NOT ask for delete command.
|
||||
|
||||
If the receiving end does not support push-cert, the sending end
|
||||
MUST NOT send a push-cert command. When a push-cert command is
|
||||
sent, command-list MUST NOT be sent; the commands recorded in the
|
||||
push certificate is used instead.
|
||||
|
||||
The packfile MUST NOT be sent if the only command used is 'delete'.
|
||||
|
||||
A packfile MUST be sent if either create or update command is used,
|
||||
even if the server already has all the necessary objects. In this
|
||||
case the client MUST send an empty packfile. The only time this
|
||||
is likely to happen is if the client is creating
|
||||
a new branch or a tag that points to an existing obj-id.
|
||||
|
||||
The server will receive the packfile, unpack it, then validate each
|
||||
reference that is being updated that it hasn't changed while the request
|
||||
was being processed (the obj-id is still the same as the old-id), and
|
||||
it will run any update hooks to make sure that the update is acceptable.
|
||||
If all of that is fine, the server will then update the references.
|
||||
|
||||
Push Certificate
|
||||
----------------
|
||||
|
||||
A push certificate begins with a set of header lines. After the
|
||||
header and an empty line, the protocol commands follow, one per
|
||||
line. Note that the trailing LF in push-cert PKT-LINEs is _not_
|
||||
optional; it must be present.
|
||||
|
||||
Currently, the following header fields are defined:
|
||||
|
||||
`pusher` ident::
|
||||
Identify the GPG key in "Human Readable Name <email@address>"
|
||||
format.
|
||||
|
||||
`pushee` url::
|
||||
The repository URL (anonymized, if the URL contains
|
||||
authentication material) the user who ran `git push`
|
||||
intended to push into.
|
||||
|
||||
`nonce` nonce::
|
||||
The 'nonce' string the receiving repository asked the
|
||||
pushing user to include in the certificate, to prevent
|
||||
replay attacks.
|
||||
|
||||
The GPG signature lines are a detached signature for the contents
|
||||
recorded in the push certificate before the signature block begins.
|
||||
The detached signature is used to certify that the commands were
|
||||
given by the pusher, who must be the signer.
|
||||
|
||||
Report Status
|
||||
-------------
|
||||
|
||||
After receiving the pack data from the sender, the receiver sends a
|
||||
report if 'report-status' capability is in effect.
|
||||
It is a short listing of what happened in that update. It will first
|
||||
list the status of the packfile unpacking as either 'unpack ok' or
|
||||
'unpack [error]'. Then it will list the status for each of the references
|
||||
that it tried to update. Each line is either 'ok [refname]' if the
|
||||
update was successful, or 'ng [refname] [error]' if the update was not.
|
||||
|
||||
----
|
||||
report-status = unpack-status
|
||||
1*(command-status)
|
||||
flush-pkt
|
||||
|
||||
unpack-status = PKT-LINE("unpack" SP unpack-result)
|
||||
unpack-result = "ok" / error-msg
|
||||
|
||||
command-status = command-ok / command-fail
|
||||
command-ok = PKT-LINE("ok" SP refname)
|
||||
command-fail = PKT-LINE("ng" SP refname SP error-msg)
|
||||
|
||||
error-msg = 1*(OCTECT) ; where not "ok"
|
||||
----
|
||||
|
||||
Updates can be unsuccessful for a number of reasons. The reference can have
|
||||
changed since the reference discovery phase was originally sent, meaning
|
||||
someone pushed in the meantime. The reference being pushed could be a
|
||||
non-fast-forward reference and the update hooks or configuration could be
|
||||
set to not allow that, etc. Also, some references can be updated while others
|
||||
can be rejected.
|
||||
|
||||
An example client/server communication might look like this:
|
||||
|
||||
----
|
||||
S: 006274730d410fcb6603ace96f1dc55ea6196122532d refs/heads/local\0report-status delete-refs ofs-delta\n
|
||||
S: 003e7d1665144a3a975c05f1f43902ddaf084e784dbe refs/heads/debug\n
|
||||
S: 003f74730d410fcb6603ace96f1dc55ea6196122532d refs/heads/master\n
|
||||
S: 003d74730d410fcb6603ace96f1dc55ea6196122532d refs/heads/team\n
|
||||
S: 0000
|
||||
|
||||
C: 00677d1665144a3a975c05f1f43902ddaf084e784dbe 74730d410fcb6603ace96f1dc55ea6196122532d refs/heads/debug\n
|
||||
C: 006874730d410fcb6603ace96f1dc55ea6196122532d 5a3f6be755bbb7deae50065988cbfa1ffa9ab68a refs/heads/master\n
|
||||
C: 0000
|
||||
C: [PACKDATA]
|
||||
|
||||
S: 000eunpack ok\n
|
||||
S: 0018ok refs/heads/debug\n
|
||||
S: 002ang refs/heads/master non-fast-forward\n
|
||||
----
|
||||
324
Documentation/technical/partial-clone.txt
Normal file
324
Documentation/technical/partial-clone.txt
Normal file
|
|
@ -0,0 +1,324 @@
|
|||
Partial Clone Design Notes
|
||||
==========================
|
||||
|
||||
The "Partial Clone" feature is a performance optimization for Git that
|
||||
allows Git to function without having a complete copy of the repository.
|
||||
The goal of this work is to allow Git better handle extremely large
|
||||
repositories.
|
||||
|
||||
During clone and fetch operations, Git downloads the complete contents
|
||||
and history of the repository. This includes all commits, trees, and
|
||||
blobs for the complete life of the repository. For extremely large
|
||||
repositories, clones can take hours (or days) and consume 100+GiB of disk
|
||||
space.
|
||||
|
||||
Often in these repositories there are many blobs and trees that the user
|
||||
does not need such as:
|
||||
|
||||
1. files outside of the user's work area in the tree. For example, in
|
||||
a repository with 500K directories and 3.5M files in every commit,
|
||||
we can avoid downloading many objects if the user only needs a
|
||||
narrow "cone" of the source tree.
|
||||
|
||||
2. large binary assets. For example, in a repository where large build
|
||||
artifacts are checked into the tree, we can avoid downloading all
|
||||
previous versions of these non-mergeable binary assets and only
|
||||
download versions that are actually referenced.
|
||||
|
||||
Partial clone allows us to avoid downloading such unneeded objects *in
|
||||
advance* during clone and fetch operations and thereby reduce download
|
||||
times and disk usage. Missing objects can later be "demand fetched"
|
||||
if/when needed.
|
||||
|
||||
Use of partial clone requires that the user be online and the origin
|
||||
remote be available for on-demand fetching of missing objects. This may
|
||||
or may not be problematic for the user. For example, if the user can
|
||||
stay within the pre-selected subset of the source tree, they may not
|
||||
encounter any missing objects. Alternatively, the user could try to
|
||||
pre-fetch various objects if they know that they are going offline.
|
||||
|
||||
|
||||
Non-Goals
|
||||
---------
|
||||
|
||||
Partial clone is a mechanism to limit the number of blobs and trees downloaded
|
||||
*within* a given range of commits -- and is therefore independent of and not
|
||||
intended to conflict with existing DAG-level mechanisms to limit the set of
|
||||
requested commits (i.e. shallow clone, single branch, or fetch '<refspec>').
|
||||
|
||||
|
||||
Design Overview
|
||||
---------------
|
||||
|
||||
Partial clone logically consists of the following parts:
|
||||
|
||||
- A mechanism for the client to describe unneeded or unwanted objects to
|
||||
the server.
|
||||
|
||||
- A mechanism for the server to omit such unwanted objects from packfiles
|
||||
sent to the client.
|
||||
|
||||
- A mechanism for the client to gracefully handle missing objects (that
|
||||
were previously omitted by the server).
|
||||
|
||||
- A mechanism for the client to backfill missing objects as needed.
|
||||
|
||||
|
||||
Design Details
|
||||
--------------
|
||||
|
||||
- A new pack-protocol capability "filter" is added to the fetch-pack and
|
||||
upload-pack negotiation.
|
||||
+
|
||||
This uses the existing capability discovery mechanism.
|
||||
See "filter" in Documentation/technical/pack-protocol.txt.
|
||||
|
||||
- Clients pass a "filter-spec" to clone and fetch which is passed to the
|
||||
server to request filtering during packfile construction.
|
||||
+
|
||||
There are various filters available to accommodate different situations.
|
||||
See "--filter=<filter-spec>" in Documentation/rev-list-options.txt.
|
||||
|
||||
- On the server pack-objects applies the requested filter-spec as it
|
||||
creates "filtered" packfiles for the client.
|
||||
+
|
||||
These filtered packfiles are *incomplete* in the traditional sense because
|
||||
they may contain objects that reference objects not contained in the
|
||||
packfile and that the client doesn't already have. For example, the
|
||||
filtered packfile may contain trees or tags that reference missing blobs
|
||||
or commits that reference missing trees.
|
||||
|
||||
- On the client these incomplete packfiles are marked as "promisor packfiles"
|
||||
and treated differently by various commands.
|
||||
|
||||
- On the client a repository extension is added to the local config to
|
||||
prevent older versions of git from failing mid-operation because of
|
||||
missing objects that they cannot handle.
|
||||
See "extensions.partialClone" in Documentation/technical/repository-version.txt"
|
||||
|
||||
|
||||
Handling Missing Objects
|
||||
------------------------
|
||||
|
||||
- An object may be missing due to a partial clone or fetch, or missing due
|
||||
to repository corruption. To differentiate these cases, the local
|
||||
repository specially indicates such filtered packfiles obtained from the
|
||||
promisor remote as "promisor packfiles".
|
||||
+
|
||||
These promisor packfiles consist of a "<name>.promisor" file with
|
||||
arbitrary contents (like the "<name>.keep" files), in addition to
|
||||
their "<name>.pack" and "<name>.idx" files.
|
||||
|
||||
- The local repository considers a "promisor object" to be an object that
|
||||
it knows (to the best of its ability) that the promisor remote has promised
|
||||
that it has, either because the local repository has that object in one of
|
||||
its promisor packfiles, or because another promisor object refers to it.
|
||||
+
|
||||
When Git encounters a missing object, Git can see if it is a promisor object
|
||||
and handle it appropriately. If not, Git can report a corruption.
|
||||
+
|
||||
This means that there is no need for the client to explicitly maintain an
|
||||
expensive-to-modify list of missing objects.[a]
|
||||
|
||||
- Since almost all Git code currently expects any referenced object to be
|
||||
present locally and because we do not want to force every command to do
|
||||
a dry-run first, a fallback mechanism is added to allow Git to attempt
|
||||
to dynamically fetch missing objects from the promisor remote.
|
||||
+
|
||||
When the normal object lookup fails to find an object, Git invokes
|
||||
fetch-object to try to get the object from the server and then retry
|
||||
the object lookup. This allows objects to be "faulted in" without
|
||||
complicated prediction algorithms.
|
||||
+
|
||||
For efficiency reasons, no check as to whether the missing object is
|
||||
actually a promisor object is performed.
|
||||
+
|
||||
Dynamic object fetching tends to be slow as objects are fetched one at
|
||||
a time.
|
||||
|
||||
- `checkout` (and any other command using `unpack-trees`) has been taught
|
||||
to bulk pre-fetch all required missing blobs in a single batch.
|
||||
|
||||
- `rev-list` has been taught to print missing objects.
|
||||
+
|
||||
This can be used by other commands to bulk prefetch objects.
|
||||
For example, a "git log -p A..B" may internally want to first do
|
||||
something like "git rev-list --objects --quiet --missing=print A..B"
|
||||
and prefetch those objects in bulk.
|
||||
|
||||
- `fsck` has been updated to be fully aware of promisor objects.
|
||||
|
||||
- `repack` in GC has been updated to not touch promisor packfiles at all,
|
||||
and to only repack other objects.
|
||||
|
||||
- The global variable "fetch_if_missing" is used to control whether an
|
||||
object lookup will attempt to dynamically fetch a missing object or
|
||||
report an error.
|
||||
+
|
||||
We are not happy with this global variable and would like to remove it,
|
||||
but that requires significant refactoring of the object code to pass an
|
||||
additional flag. We hope that concurrent efforts to add an ODB API can
|
||||
encompass this.
|
||||
|
||||
|
||||
Fetching Missing Objects
|
||||
------------------------
|
||||
|
||||
- Fetching of objects is done using the existing transport mechanism using
|
||||
transport_fetch_refs(), setting a new transport option
|
||||
TRANS_OPT_NO_DEPENDENTS to indicate that only the objects themselves are
|
||||
desired, not any object that they refer to.
|
||||
+
|
||||
Because some transports invoke fetch_pack() in the same process, fetch_pack()
|
||||
has been updated to not use any object flags when the corresponding argument
|
||||
(no_dependents) is set.
|
||||
|
||||
- The local repository sends a request with the hashes of all requested
|
||||
objects as "want" lines, and does not perform any packfile negotiation.
|
||||
It then receives a packfile.
|
||||
|
||||
- Because we are reusing the existing fetch-pack mechanism, fetching
|
||||
currently fetches all objects referred to by the requested objects, even
|
||||
though they are not necessary.
|
||||
|
||||
|
||||
Current Limitations
|
||||
-------------------
|
||||
|
||||
- The remote used for a partial clone (or the first partial fetch
|
||||
following a regular clone) is marked as the "promisor remote".
|
||||
+
|
||||
We are currently limited to a single promisor remote and only that
|
||||
remote may be used for subsequent partial fetches.
|
||||
+
|
||||
We accept this limitation because we believe initial users of this
|
||||
feature will be using it on repositories with a strong single central
|
||||
server.
|
||||
|
||||
- Dynamic object fetching will only ask the promisor remote for missing
|
||||
objects. We assume that the promisor remote has a complete view of the
|
||||
repository and can satisfy all such requests.
|
||||
|
||||
- Repack essentially treats promisor and non-promisor packfiles as 2
|
||||
distinct partitions and does not mix them. Repack currently only works
|
||||
on non-promisor packfiles and loose objects.
|
||||
|
||||
- Dynamic object fetching invokes fetch-pack once *for each item*
|
||||
because most algorithms stumble upon a missing object and need to have
|
||||
it resolved before continuing their work. This may incur significant
|
||||
overhead -- and multiple authentication requests -- if many objects are
|
||||
needed.
|
||||
|
||||
- Dynamic object fetching currently uses the existing pack protocol V0
|
||||
which means that each object is requested via fetch-pack. The server
|
||||
will send a full set of info/refs when the connection is established.
|
||||
If there are large number of refs, this may incur significant overhead.
|
||||
|
||||
|
||||
Future Work
|
||||
-----------
|
||||
|
||||
- Allow more than one promisor remote and define a strategy for fetching
|
||||
missing objects from specific promisor remotes or of iterating over the
|
||||
set of promisor remotes until a missing object is found.
|
||||
+
|
||||
A user might want to have multiple geographically-close cache servers
|
||||
for fetching missing blobs while continuing to do filtered `git-fetch`
|
||||
commands from the central server, for example.
|
||||
+
|
||||
Or the user might want to work in a triangular work flow with multiple
|
||||
promisor remotes that each have an incomplete view of the repository.
|
||||
|
||||
- Allow repack to work on promisor packfiles (while keeping them distinct
|
||||
from non-promisor packfiles).
|
||||
|
||||
- Allow non-pathname-based filters to make use of packfile bitmaps (when
|
||||
present). This was just an omission during the initial implementation.
|
||||
|
||||
- Investigate use of a long-running process to dynamically fetch a series
|
||||
of objects, such as proposed in [5,6] to reduce process startup and
|
||||
overhead costs.
|
||||
+
|
||||
It would be nice if pack protocol V2 could allow that long-running
|
||||
process to make a series of requests over a single long-running
|
||||
connection.
|
||||
|
||||
- Investigate pack protocol V2 to avoid the info/refs broadcast on
|
||||
each connection with the server to dynamically fetch missing objects.
|
||||
|
||||
- Investigate the need to handle loose promisor objects.
|
||||
+
|
||||
Objects in promisor packfiles are allowed to reference missing objects
|
||||
that can be dynamically fetched from the server. An assumption was
|
||||
made that loose objects are only created locally and therefore should
|
||||
not reference a missing object. We may need to revisit that assumption
|
||||
if, for example, we dynamically fetch a missing tree and store it as a
|
||||
loose object rather than a single object packfile.
|
||||
+
|
||||
This does not necessarily mean we need to mark loose objects as promisor;
|
||||
it may be sufficient to relax the object lookup or is-promisor functions.
|
||||
|
||||
|
||||
Non-Tasks
|
||||
---------
|
||||
|
||||
- Every time the subject of "demand loading blobs" comes up it seems
|
||||
that someone suggests that the server be allowed to "guess" and send
|
||||
additional objects that may be related to the requested objects.
|
||||
+
|
||||
No work has gone into actually doing that; we're just documenting that
|
||||
it is a common suggestion. We're not sure how it would work and have
|
||||
no plans to work on it.
|
||||
+
|
||||
It is valid for the server to send more objects than requested (even
|
||||
for a dynamic object fetch), but we are not building on that.
|
||||
|
||||
|
||||
Footnotes
|
||||
---------
|
||||
|
||||
[a] expensive-to-modify list of missing objects: Earlier in the design of
|
||||
partial clone we discussed the need for a single list of missing objects.
|
||||
This would essentially be a sorted linear list of OIDs that the were
|
||||
omitted by the server during a clone or subsequent fetches.
|
||||
|
||||
This file would need to be loaded into memory on every object lookup.
|
||||
It would need to be read, updated, and re-written (like the .git/index)
|
||||
on every explicit "git fetch" command *and* on any dynamic object fetch.
|
||||
|
||||
The cost to read, update, and write this file could add significant
|
||||
overhead to every command if there are many missing objects. For example,
|
||||
if there are 100M missing blobs, this file would be at least 2GiB on disk.
|
||||
|
||||
With the "promisor" concept, we *infer* a missing object based upon the
|
||||
type of packfile that references it.
|
||||
|
||||
|
||||
Related Links
|
||||
-------------
|
||||
[0] https://crbug.com/git/2
|
||||
Bug#2: Partial Clone
|
||||
|
||||
[1] https://public-inbox.org/git/20170113155253.1644-1-benpeart@microsoft.com/ +
|
||||
Subject: [RFC] Add support for downloading blobs on demand +
|
||||
Date: Fri, 13 Jan 2017 10:52:53 -0500
|
||||
|
||||
[2] https://public-inbox.org/git/cover.1506714999.git.jonathantanmy@google.com/ +
|
||||
Subject: [PATCH 00/18] Partial clone (from clone to lazy fetch in 18 patches) +
|
||||
Date: Fri, 29 Sep 2017 13:11:36 -0700
|
||||
|
||||
[3] https://public-inbox.org/git/20170426221346.25337-1-jonathantanmy@google.com/ +
|
||||
Subject: Proposal for missing blob support in Git repos +
|
||||
Date: Wed, 26 Apr 2017 15:13:46 -0700
|
||||
|
||||
[4] https://public-inbox.org/git/1488999039-37631-1-git-send-email-git@jeffhostetler.com/ +
|
||||
Subject: [PATCH 00/10] RFC Partial Clone and Fetch +
|
||||
Date: Wed, 8 Mar 2017 18:50:29 +0000
|
||||
|
||||
[5] https://public-inbox.org/git/20170505152802.6724-1-benpeart@microsoft.com/ +
|
||||
Subject: [PATCH v7 00/10] refactor the filter process code into a reusable module +
|
||||
Date: Fri, 5 May 2017 11:27:52 -0400
|
||||
|
||||
[6] https://public-inbox.org/git/20170714132651.170708-1-benpeart@microsoft.com/ +
|
||||
Subject: [RFC/PATCH v2 0/1] Add support for downloading blobs on demand +
|
||||
Date: Fri, 14 Jul 2017 09:26:50 -0400
|
||||
337
Documentation/technical/protocol-capabilities.txt
Normal file
337
Documentation/technical/protocol-capabilities.txt
Normal file
|
|
@ -0,0 +1,337 @@
|
|||
Git Protocol Capabilities
|
||||
=========================
|
||||
|
||||
NOTE: this document describes capabilities for versions 0 and 1 of the pack
|
||||
protocol. For version 2, please refer to the link:protocol-v2.html[protocol-v2]
|
||||
doc.
|
||||
|
||||
Servers SHOULD support all capabilities defined in this document.
|
||||
|
||||
On the very first line of the initial server response of either
|
||||
receive-pack and upload-pack the first reference is followed by
|
||||
a NUL byte and then a list of space delimited server capabilities.
|
||||
These allow the server to declare what it can and cannot support
|
||||
to the client.
|
||||
|
||||
Client will then send a space separated list of capabilities it wants
|
||||
to be in effect. The client MUST NOT ask for capabilities the server
|
||||
did not say it supports.
|
||||
|
||||
Server MUST diagnose and abort if capabilities it does not understand
|
||||
was sent. Server MUST NOT ignore capabilities that client requested
|
||||
and server advertised. As a consequence of these rules, server MUST
|
||||
NOT advertise capabilities it does not understand.
|
||||
|
||||
The 'atomic', 'report-status', 'delete-refs', 'quiet', and 'push-cert'
|
||||
capabilities are sent and recognized by the receive-pack (push to server)
|
||||
process.
|
||||
|
||||
The 'ofs-delta' and 'side-band-64k' capabilities are sent and recognized
|
||||
by both upload-pack and receive-pack protocols. The 'agent' capability
|
||||
may optionally be sent in both protocols.
|
||||
|
||||
All other capabilities are only recognized by the upload-pack (fetch
|
||||
from server) process.
|
||||
|
||||
multi_ack
|
||||
---------
|
||||
|
||||
The 'multi_ack' capability allows the server to return "ACK obj-id
|
||||
continue" as soon as it finds a commit that it can use as a common
|
||||
base, between the client's wants and the client's have set.
|
||||
|
||||
By sending this early, the server can potentially head off the client
|
||||
from walking any further down that particular branch of the client's
|
||||
repository history. The client may still need to walk down other
|
||||
branches, sending have lines for those, until the server has a
|
||||
complete cut across the DAG, or the client has said "done".
|
||||
|
||||
Without multi_ack, a client sends have lines in --date-order until
|
||||
the server has found a common base. That means the client will send
|
||||
have lines that are already known by the server to be common, because
|
||||
they overlap in time with another branch that the server hasn't found
|
||||
a common base on yet.
|
||||
|
||||
For example suppose the client has commits in caps that the server
|
||||
doesn't and the server has commits in lower case that the client
|
||||
doesn't, as in the following diagram:
|
||||
|
||||
+---- u ---------------------- x
|
||||
/ +----- y
|
||||
/ /
|
||||
a -- b -- c -- d -- E -- F
|
||||
\
|
||||
+--- Q -- R -- S
|
||||
|
||||
If the client wants x,y and starts out by saying have F,S, the server
|
||||
doesn't know what F,S is. Eventually the client says "have d" and
|
||||
the server sends "ACK d continue" to let the client know to stop
|
||||
walking down that line (so don't send c-b-a), but it's not done yet,
|
||||
it needs a base for x. The client keeps going with S-R-Q, until a
|
||||
gets reached, at which point the server has a clear base and it all
|
||||
ends.
|
||||
|
||||
Without multi_ack the client would have sent that c-b-a chain anyway,
|
||||
interleaved with S-R-Q.
|
||||
|
||||
multi_ack_detailed
|
||||
------------------
|
||||
This is an extension of multi_ack that permits client to better
|
||||
understand the server's in-memory state. See pack-protocol.txt,
|
||||
section "Packfile Negotiation" for more information.
|
||||
|
||||
no-done
|
||||
-------
|
||||
This capability should only be used with the smart HTTP protocol. If
|
||||
multi_ack_detailed and no-done are both present, then the sender is
|
||||
free to immediately send a pack following its first "ACK obj-id ready"
|
||||
message.
|
||||
|
||||
Without no-done in the smart HTTP protocol, the server session would
|
||||
end and the client has to make another trip to send "done" before
|
||||
the server can send the pack. no-done removes the last round and
|
||||
thus slightly reduces latency.
|
||||
|
||||
thin-pack
|
||||
---------
|
||||
|
||||
A thin pack is one with deltas which reference base objects not
|
||||
contained within the pack (but are known to exist at the receiving
|
||||
end). This can reduce the network traffic significantly, but it
|
||||
requires the receiving end to know how to "thicken" these packs by
|
||||
adding the missing bases to the pack.
|
||||
|
||||
The upload-pack server advertises 'thin-pack' when it can generate
|
||||
and send a thin pack. A client requests the 'thin-pack' capability
|
||||
when it understands how to "thicken" it, notifying the server that
|
||||
it can receive such a pack. A client MUST NOT request the
|
||||
'thin-pack' capability if it cannot turn a thin pack into a
|
||||
self-contained pack.
|
||||
|
||||
Receive-pack, on the other hand, is assumed by default to be able to
|
||||
handle thin packs, but can ask the client not to use the feature by
|
||||
advertising the 'no-thin' capability. A client MUST NOT send a thin
|
||||
pack if the server advertises the 'no-thin' capability.
|
||||
|
||||
The reasons for this asymmetry are historical. The receive-pack
|
||||
program did not exist until after the invention of thin packs, so
|
||||
historically the reference implementation of receive-pack always
|
||||
understood thin packs. Adding 'no-thin' later allowed receive-pack
|
||||
to disable the feature in a backwards-compatible manner.
|
||||
|
||||
|
||||
side-band, side-band-64k
|
||||
------------------------
|
||||
|
||||
This capability means that server can send, and client understand multiplexed
|
||||
progress reports and error info interleaved with the packfile itself.
|
||||
|
||||
These two options are mutually exclusive. A modern client always
|
||||
favors 'side-band-64k'.
|
||||
|
||||
Either mode indicates that the packfile data will be streamed broken
|
||||
up into packets of up to either 1000 bytes in the case of 'side_band',
|
||||
or 65520 bytes in the case of 'side_band_64k'. Each packet is made up
|
||||
of a leading 4-byte pkt-line length of how much data is in the packet,
|
||||
followed by a 1-byte stream code, followed by the actual data.
|
||||
|
||||
The stream code can be one of:
|
||||
|
||||
1 - pack data
|
||||
2 - progress messages
|
||||
3 - fatal error message just before stream aborts
|
||||
|
||||
The "side-band-64k" capability came about as a way for newer clients
|
||||
that can handle much larger packets to request packets that are
|
||||
actually crammed nearly full, while maintaining backward compatibility
|
||||
for the older clients.
|
||||
|
||||
Further, with side-band and its up to 1000-byte messages, it's actually
|
||||
999 bytes of payload and 1 byte for the stream code. With side-band-64k,
|
||||
same deal, you have up to 65519 bytes of data and 1 byte for the stream
|
||||
code.
|
||||
|
||||
The client MUST send only maximum of one of "side-band" and "side-
|
||||
band-64k". Server MUST diagnose it as an error if client requests
|
||||
both.
|
||||
|
||||
ofs-delta
|
||||
---------
|
||||
|
||||
Server can send, and client understand PACKv2 with delta referring to
|
||||
its base by position in pack rather than by an obj-id. That is, they can
|
||||
send/read OBJ_OFS_DELTA (aka type 6) in a packfile.
|
||||
|
||||
agent
|
||||
-----
|
||||
|
||||
The server may optionally send a capability of the form `agent=X` to
|
||||
notify the client that the server is running version `X`. The client may
|
||||
optionally return its own agent string by responding with an `agent=Y`
|
||||
capability (but it MUST NOT do so if the server did not mention the
|
||||
agent capability). The `X` and `Y` strings may contain any printable
|
||||
ASCII characters except space (i.e., the byte range 32 < x < 127), and
|
||||
are typically of the form "package/version" (e.g., "git/1.8.3.1"). The
|
||||
agent strings are purely informative for statistics and debugging
|
||||
purposes, and MUST NOT be used to programmatically assume the presence
|
||||
or absence of particular features.
|
||||
|
||||
symref
|
||||
------
|
||||
|
||||
This parameterized capability is used to inform the receiver which symbolic ref
|
||||
points to which ref; for example, "symref=HEAD:refs/heads/master" tells the
|
||||
receiver that HEAD points to master. This capability can be repeated to
|
||||
represent multiple symrefs.
|
||||
|
||||
Servers SHOULD include this capability for the HEAD symref if it is one of the
|
||||
refs being sent.
|
||||
|
||||
Clients MAY use the parameters from this capability to select the proper initial
|
||||
branch when cloning a repository.
|
||||
|
||||
shallow
|
||||
-------
|
||||
|
||||
This capability adds "deepen", "shallow" and "unshallow" commands to
|
||||
the fetch-pack/upload-pack protocol so clients can request shallow
|
||||
clones.
|
||||
|
||||
deepen-since
|
||||
------------
|
||||
|
||||
This capability adds "deepen-since" command to fetch-pack/upload-pack
|
||||
protocol so the client can request shallow clones that are cut at a
|
||||
specific time, instead of depth. Internally it's equivalent of doing
|
||||
"rev-list --max-age=<timestamp>" on the server side. "deepen-since"
|
||||
cannot be used with "deepen".
|
||||
|
||||
deepen-not
|
||||
----------
|
||||
|
||||
This capability adds "deepen-not" command to fetch-pack/upload-pack
|
||||
protocol so the client can request shallow clones that are cut at a
|
||||
specific revision, instead of depth. Internally it's equivalent of
|
||||
doing "rev-list --not <rev>" on the server side. "deepen-not"
|
||||
cannot be used with "deepen", but can be used with "deepen-since".
|
||||
|
||||
deepen-relative
|
||||
---------------
|
||||
|
||||
If this capability is requested by the client, the semantics of
|
||||
"deepen" command is changed. The "depth" argument is the depth from
|
||||
the current shallow boundary, instead of the depth from remote refs.
|
||||
|
||||
no-progress
|
||||
-----------
|
||||
|
||||
The client was started with "git clone -q" or something, and doesn't
|
||||
want that side band 2. Basically the client just says "I do not
|
||||
wish to receive stream 2 on sideband, so do not send it to me, and if
|
||||
you did, I will drop it on the floor anyway". However, the sideband
|
||||
channel 3 is still used for error responses.
|
||||
|
||||
include-tag
|
||||
-----------
|
||||
|
||||
The 'include-tag' capability is about sending annotated tags if we are
|
||||
sending objects they point to. If we pack an object to the client, and
|
||||
a tag object points exactly at that object, we pack the tag object too.
|
||||
In general this allows a client to get all new annotated tags when it
|
||||
fetches a branch, in a single network connection.
|
||||
|
||||
Clients MAY always send include-tag, hardcoding it into a request when
|
||||
the server advertises this capability. The decision for a client to
|
||||
request include-tag only has to do with the client's desires for tag
|
||||
data, whether or not a server had advertised objects in the
|
||||
refs/tags/* namespace.
|
||||
|
||||
Servers MUST pack the tags if their referrant is packed and the client
|
||||
has requested include-tags.
|
||||
|
||||
Clients MUST be prepared for the case where a server has ignored
|
||||
include-tag and has not actually sent tags in the pack. In such
|
||||
cases the client SHOULD issue a subsequent fetch to acquire the tags
|
||||
that include-tag would have otherwise given the client.
|
||||
|
||||
The server SHOULD send include-tag, if it supports it, regardless
|
||||
of whether or not there are tags available.
|
||||
|
||||
report-status
|
||||
-------------
|
||||
|
||||
The receive-pack process can receive a 'report-status' capability,
|
||||
which tells it that the client wants a report of what happened after
|
||||
a packfile upload and reference update. If the pushing client requests
|
||||
this capability, after unpacking and updating references the server
|
||||
will respond with whether the packfile unpacked successfully and if
|
||||
each reference was updated successfully. If any of those were not
|
||||
successful, it will send back an error message. See pack-protocol.txt
|
||||
for example messages.
|
||||
|
||||
delete-refs
|
||||
-----------
|
||||
|
||||
If the server sends back the 'delete-refs' capability, it means that
|
||||
it is capable of accepting a zero-id value as the target
|
||||
value of a reference update. It is not sent back by the client, it
|
||||
simply informs the client that it can be sent zero-id values
|
||||
to delete references.
|
||||
|
||||
quiet
|
||||
-----
|
||||
|
||||
If the receive-pack server advertises the 'quiet' capability, it is
|
||||
capable of silencing human-readable progress output which otherwise may
|
||||
be shown when processing the received pack. A send-pack client should
|
||||
respond with the 'quiet' capability to suppress server-side progress
|
||||
reporting if the local progress reporting is also being suppressed
|
||||
(e.g., via `push -q`, or if stderr does not go to a tty).
|
||||
|
||||
atomic
|
||||
------
|
||||
|
||||
If the server sends the 'atomic' capability it is capable of accepting
|
||||
atomic pushes. If the pushing client requests this capability, the server
|
||||
will update the refs in one atomic transaction. Either all refs are
|
||||
updated or none.
|
||||
|
||||
push-options
|
||||
------------
|
||||
|
||||
If the server sends the 'push-options' capability it is able to accept
|
||||
push options after the update commands have been sent, but before the
|
||||
packfile is streamed. If the pushing client requests this capability,
|
||||
the server will pass the options to the pre- and post- receive hooks
|
||||
that process this push request.
|
||||
|
||||
allow-tip-sha1-in-want
|
||||
----------------------
|
||||
|
||||
If the upload-pack server advertises this capability, fetch-pack may
|
||||
send "want" lines with SHA-1s that exist at the server but are not
|
||||
advertised by upload-pack.
|
||||
|
||||
allow-reachable-sha1-in-want
|
||||
----------------------------
|
||||
|
||||
If the upload-pack server advertises this capability, fetch-pack may
|
||||
send "want" lines with SHA-1s that exist at the server but are not
|
||||
advertised by upload-pack.
|
||||
|
||||
push-cert=<nonce>
|
||||
-----------------
|
||||
|
||||
The receive-pack server that advertises this capability is willing
|
||||
to accept a signed push certificate, and asks the <nonce> to be
|
||||
included in the push certificate. A send-pack client MUST NOT
|
||||
send a push-cert packet unless the receive-pack server advertises
|
||||
this capability.
|
||||
|
||||
filter
|
||||
------
|
||||
|
||||
If the upload-pack server advertises the 'filter' capability,
|
||||
fetch-pack may send "filter" commands to request a partial clone
|
||||
or partial fetch and request that the server omit various objects
|
||||
from the packfile.
|
||||
99
Documentation/technical/protocol-common.txt
Normal file
99
Documentation/technical/protocol-common.txt
Normal file
|
|
@ -0,0 +1,99 @@
|
|||
Documentation Common to Pack and Http Protocols
|
||||
===============================================
|
||||
|
||||
ABNF Notation
|
||||
-------------
|
||||
|
||||
ABNF notation as described by RFC 5234 is used within the protocol documents,
|
||||
except the following replacement core rules are used:
|
||||
----
|
||||
HEXDIG = DIGIT / "a" / "b" / "c" / "d" / "e" / "f"
|
||||
----
|
||||
|
||||
We also define the following common rules:
|
||||
----
|
||||
NUL = %x00
|
||||
zero-id = 40*"0"
|
||||
obj-id = 40*(HEXDIGIT)
|
||||
|
||||
refname = "HEAD"
|
||||
refname /= "refs/" <see discussion below>
|
||||
----
|
||||
|
||||
A refname is a hierarchical octet string beginning with "refs/" and
|
||||
not violating the 'git-check-ref-format' command's validation rules.
|
||||
More specifically, they:
|
||||
|
||||
. They can include slash `/` for hierarchical (directory)
|
||||
grouping, but no slash-separated component can begin with a
|
||||
dot `.`.
|
||||
|
||||
. They must contain at least one `/`. This enforces the presence of a
|
||||
category like `heads/`, `tags/` etc. but the actual names are not
|
||||
restricted.
|
||||
|
||||
. They cannot have two consecutive dots `..` anywhere.
|
||||
|
||||
. They cannot have ASCII control characters (i.e. bytes whose
|
||||
values are lower than \040, or \177 `DEL`), space, tilde `~`,
|
||||
caret `^`, colon `:`, question-mark `?`, asterisk `*`,
|
||||
or open bracket `[` anywhere.
|
||||
|
||||
. They cannot end with a slash `/` or a dot `.`.
|
||||
|
||||
. They cannot end with the sequence `.lock`.
|
||||
|
||||
. They cannot contain a sequence `@{`.
|
||||
|
||||
. They cannot contain a `\\`.
|
||||
|
||||
|
||||
pkt-line Format
|
||||
---------------
|
||||
|
||||
Much (but not all) of the payload is described around pkt-lines.
|
||||
|
||||
A pkt-line is a variable length binary string. The first four bytes
|
||||
of the line, the pkt-len, indicates the total length of the line,
|
||||
in hexadecimal. The pkt-len includes the 4 bytes used to contain
|
||||
the length's hexadecimal representation.
|
||||
|
||||
A pkt-line MAY contain binary data, so implementors MUST ensure
|
||||
pkt-line parsing/formatting routines are 8-bit clean.
|
||||
|
||||
A non-binary line SHOULD BE terminated by an LF, which if present
|
||||
MUST be included in the total length. Receivers MUST treat pkt-lines
|
||||
with non-binary data the same whether or not they contain the trailing
|
||||
LF (stripping the LF if present, and not complaining when it is
|
||||
missing).
|
||||
|
||||
The maximum length of a pkt-line's data component is 65516 bytes.
|
||||
Implementations MUST NOT send pkt-line whose length exceeds 65520
|
||||
(65516 bytes of payload + 4 bytes of length data).
|
||||
|
||||
Implementations SHOULD NOT send an empty pkt-line ("0004").
|
||||
|
||||
A pkt-line with a length field of 0 ("0000"), called a flush-pkt,
|
||||
is a special case and MUST be handled differently than an empty
|
||||
pkt-line ("0004").
|
||||
|
||||
----
|
||||
pkt-line = data-pkt / flush-pkt
|
||||
|
||||
data-pkt = pkt-len pkt-payload
|
||||
pkt-len = 4*(HEXDIG)
|
||||
pkt-payload = (pkt-len - 4)*(OCTET)
|
||||
|
||||
flush-pkt = "0000"
|
||||
----
|
||||
|
||||
Examples (as C-style strings):
|
||||
|
||||
----
|
||||
pkt-line actual value
|
||||
---------------------------------
|
||||
"0006a\n" "a\n"
|
||||
"0005a" "a"
|
||||
"000bfoobar\n" "foobar\n"
|
||||
"0004" ""
|
||||
----
|
||||
455
Documentation/technical/protocol-v2.txt
Normal file
455
Documentation/technical/protocol-v2.txt
Normal file
|
|
@ -0,0 +1,455 @@
|
|||
Git Wire Protocol, Version 2
|
||||
============================
|
||||
|
||||
This document presents a specification for a version 2 of Git's wire
|
||||
protocol. Protocol v2 will improve upon v1 in the following ways:
|
||||
|
||||
* Instead of multiple service names, multiple commands will be
|
||||
supported by a single service
|
||||
* Easily extendable as capabilities are moved into their own section
|
||||
of the protocol, no longer being hidden behind a NUL byte and
|
||||
limited by the size of a pkt-line
|
||||
* Separate out other information hidden behind NUL bytes (e.g. agent
|
||||
string as a capability and symrefs can be requested using 'ls-refs')
|
||||
* Reference advertisement will be omitted unless explicitly requested
|
||||
* ls-refs command to explicitly request some refs
|
||||
* Designed with http and stateless-rpc in mind. With clear flush
|
||||
semantics the http remote helper can simply act as a proxy
|
||||
|
||||
In protocol v2 communication is command oriented. When first contacting a
|
||||
server a list of capabilities will advertised. Some of these capabilities
|
||||
will be commands which a client can request be executed. Once a command
|
||||
has completed, a client can reuse the connection and request that other
|
||||
commands be executed.
|
||||
|
||||
Packet-Line Framing
|
||||
-------------------
|
||||
|
||||
All communication is done using packet-line framing, just as in v1. See
|
||||
`Documentation/technical/pack-protocol.txt` and
|
||||
`Documentation/technical/protocol-common.txt` for more information.
|
||||
|
||||
In protocol v2 these special packets will have the following semantics:
|
||||
|
||||
* '0000' Flush Packet (flush-pkt) - indicates the end of a message
|
||||
* '0001' Delimiter Packet (delim-pkt) - separates sections of a message
|
||||
|
||||
Initial Client Request
|
||||
----------------------
|
||||
|
||||
In general a client can request to speak protocol v2 by sending
|
||||
`version=2` through the respective side-channel for the transport being
|
||||
used which inevitably sets `GIT_PROTOCOL`. More information can be
|
||||
found in `pack-protocol.txt` and `http-protocol.txt`. In all cases the
|
||||
response from the server is the capability advertisement.
|
||||
|
||||
Git Transport
|
||||
~~~~~~~~~~~~~
|
||||
|
||||
When using the git:// transport, you can request to use protocol v2 by
|
||||
sending "version=2" as an extra parameter:
|
||||
|
||||
003egit-upload-pack /project.git\0host=myserver.com\0\0version=2\0
|
||||
|
||||
SSH and File Transport
|
||||
~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
When using either the ssh:// or file:// transport, the GIT_PROTOCOL
|
||||
environment variable must be set explicitly to include "version=2".
|
||||
|
||||
HTTP Transport
|
||||
~~~~~~~~~~~~~~
|
||||
|
||||
When using the http:// or https:// transport a client makes a "smart"
|
||||
info/refs request as described in `http-protocol.txt` and requests that
|
||||
v2 be used by supplying "version=2" in the `Git-Protocol` header.
|
||||
|
||||
C: GET $GIT_URL/info/refs?service=git-upload-pack HTTP/1.0
|
||||
C: Git-Protocol: version=2
|
||||
|
||||
A v2 server would reply:
|
||||
|
||||
S: 200 OK
|
||||
S: <Some headers>
|
||||
S: ...
|
||||
S:
|
||||
S: 000eversion 2\n
|
||||
S: <capability-advertisement>
|
||||
|
||||
Subsequent requests are then made directly to the service
|
||||
`$GIT_URL/git-upload-pack`. (This works the same for git-receive-pack).
|
||||
|
||||
Capability Advertisement
|
||||
------------------------
|
||||
|
||||
A server which decides to communicate (based on a request from a client)
|
||||
using protocol version 2, notifies the client by sending a version string
|
||||
in its initial response followed by an advertisement of its capabilities.
|
||||
Each capability is a key with an optional value. Clients must ignore all
|
||||
unknown keys. Semantics of unknown values are left to the definition of
|
||||
each key. Some capabilities will describe commands which can be requested
|
||||
to be executed by the client.
|
||||
|
||||
capability-advertisement = protocol-version
|
||||
capability-list
|
||||
flush-pkt
|
||||
|
||||
protocol-version = PKT-LINE("version 2" LF)
|
||||
capability-list = *capability
|
||||
capability = PKT-LINE(key[=value] LF)
|
||||
|
||||
key = 1*(ALPHA | DIGIT | "-_")
|
||||
value = 1*(ALPHA | DIGIT | " -_.,?\/{}[]()<>!@#$%^&*+=:;")
|
||||
|
||||
Command Request
|
||||
---------------
|
||||
|
||||
After receiving the capability advertisement, a client can then issue a
|
||||
request to select the command it wants with any particular capabilities
|
||||
or arguments. There is then an optional section where the client can
|
||||
provide any command specific parameters or queries. Only a single
|
||||
command can be requested at a time.
|
||||
|
||||
request = empty-request | command-request
|
||||
empty-request = flush-pkt
|
||||
command-request = command
|
||||
capability-list
|
||||
[command-args]
|
||||
flush-pkt
|
||||
command = PKT-LINE("command=" key LF)
|
||||
command-args = delim-pkt
|
||||
*command-specific-arg
|
||||
|
||||
command-specific-args are packet line framed arguments defined by
|
||||
each individual command.
|
||||
|
||||
The server will then check to ensure that the client's request is
|
||||
comprised of a valid command as well as valid capabilities which were
|
||||
advertised. If the request is valid the server will then execute the
|
||||
command. A server MUST wait till it has received the client's entire
|
||||
request before issuing a response. The format of the response is
|
||||
determined by the command being executed, but in all cases a flush-pkt
|
||||
indicates the end of the response.
|
||||
|
||||
When a command has finished, and the client has received the entire
|
||||
response from the server, a client can either request that another
|
||||
command be executed or can terminate the connection. A client may
|
||||
optionally send an empty request consisting of just a flush-pkt to
|
||||
indicate that no more requests will be made.
|
||||
|
||||
Capabilities
|
||||
------------
|
||||
|
||||
There are two different types of capabilities: normal capabilities,
|
||||
which can be used to convey information or alter the behavior of a
|
||||
request, and commands, which are the core actions that a client wants to
|
||||
perform (fetch, push, etc).
|
||||
|
||||
Protocol version 2 is stateless by default. This means that all commands
|
||||
must only last a single round and be stateless from the perspective of the
|
||||
server side, unless the client has requested a capability indicating that
|
||||
state should be maintained by the server. Clients MUST NOT require state
|
||||
management on the server side in order to function correctly. This
|
||||
permits simple round-robin load-balancing on the server side, without
|
||||
needing to worry about state management.
|
||||
|
||||
agent
|
||||
~~~~~
|
||||
|
||||
The server can advertise the `agent` capability with a value `X` (in the
|
||||
form `agent=X`) to notify the client that the server is running version
|
||||
`X`. The client may optionally send its own agent string by including
|
||||
the `agent` capability with a value `Y` (in the form `agent=Y`) in its
|
||||
request to the server (but it MUST NOT do so if the server did not
|
||||
advertise the agent capability). The `X` and `Y` strings may contain any
|
||||
printable ASCII characters except space (i.e., the byte range 32 < x <
|
||||
127), and are typically of the form "package/version" (e.g.,
|
||||
"git/1.8.3.1"). The agent strings are purely informative for statistics
|
||||
and debugging purposes, and MUST NOT be used to programmatically assume
|
||||
the presence or absence of particular features.
|
||||
|
||||
ls-refs
|
||||
~~~~~~~
|
||||
|
||||
`ls-refs` is the command used to request a reference advertisement in v2.
|
||||
Unlike the current reference advertisement, ls-refs takes in arguments
|
||||
which can be used to limit the refs sent from the server.
|
||||
|
||||
Additional features not supported in the base command will be advertised
|
||||
as the value of the command in the capability advertisement in the form
|
||||
of a space separated list of features: "<command>=<feature 1> <feature 2>"
|
||||
|
||||
ls-refs takes in the following arguments:
|
||||
|
||||
symrefs
|
||||
In addition to the object pointed by it, show the underlying ref
|
||||
pointed by it when showing a symbolic ref.
|
||||
peel
|
||||
Show peeled tags.
|
||||
ref-prefix <prefix>
|
||||
When specified, only references having a prefix matching one of
|
||||
the provided prefixes are displayed.
|
||||
|
||||
The output of ls-refs is as follows:
|
||||
|
||||
output = *ref
|
||||
flush-pkt
|
||||
ref = PKT-LINE(obj-id SP refname *(SP ref-attribute) LF)
|
||||
ref-attribute = (symref | peeled)
|
||||
symref = "symref-target:" symref-target
|
||||
peeled = "peeled:" obj-id
|
||||
|
||||
fetch
|
||||
~~~~~
|
||||
|
||||
`fetch` is the command used to fetch a packfile in v2. It can be looked
|
||||
at as a modified version of the v1 fetch where the ref-advertisement is
|
||||
stripped out (since the `ls-refs` command fills that role) and the
|
||||
message format is tweaked to eliminate redundancies and permit easy
|
||||
addition of future extensions.
|
||||
|
||||
Additional features not supported in the base command will be advertised
|
||||
as the value of the command in the capability advertisement in the form
|
||||
of a space separated list of features: "<command>=<feature 1> <feature 2>"
|
||||
|
||||
A `fetch` request can take the following arguments:
|
||||
|
||||
want <oid>
|
||||
Indicates to the server an object which the client wants to
|
||||
retrieve. Wants can be anything and are not limited to
|
||||
advertised objects.
|
||||
|
||||
have <oid>
|
||||
Indicates to the server an object which the client has locally.
|
||||
This allows the server to make a packfile which only contains
|
||||
the objects that the client needs. Multiple 'have' lines can be
|
||||
supplied.
|
||||
|
||||
done
|
||||
Indicates to the server that negotiation should terminate (or
|
||||
not even begin if performing a clone) and that the server should
|
||||
use the information supplied in the request to construct the
|
||||
packfile.
|
||||
|
||||
thin-pack
|
||||
Request that a thin pack be sent, which is a pack with deltas
|
||||
which reference base objects not contained within the pack (but
|
||||
are known to exist at the receiving end). This can reduce the
|
||||
network traffic significantly, but it requires the receiving end
|
||||
to know how to "thicken" these packs by adding the missing bases
|
||||
to the pack.
|
||||
|
||||
no-progress
|
||||
Request that progress information that would normally be sent on
|
||||
side-band channel 2, during the packfile transfer, should not be
|
||||
sent. However, the side-band channel 3 is still used for error
|
||||
responses.
|
||||
|
||||
include-tag
|
||||
Request that annotated tags should be sent if the objects they
|
||||
point to are being sent.
|
||||
|
||||
ofs-delta
|
||||
Indicate that the client understands PACKv2 with delta referring
|
||||
to its base by position in pack rather than by an oid. That is,
|
||||
they can read OBJ_OFS_DELTA (ake type 6) in a packfile.
|
||||
|
||||
If the 'shallow' feature is advertised the following arguments can be
|
||||
included in the clients request as well as the potential addition of the
|
||||
'shallow-info' section in the server's response as explained below.
|
||||
|
||||
shallow <oid>
|
||||
A client must notify the server of all commits for which it only
|
||||
has shallow copies (meaning that it doesn't have the parents of
|
||||
a commit) by supplying a 'shallow <oid>' line for each such
|
||||
object so that the server is aware of the limitations of the
|
||||
client's history. This is so that the server is aware that the
|
||||
client may not have all objects reachable from such commits.
|
||||
|
||||
deepen <depth>
|
||||
Requests that the fetch/clone should be shallow having a commit
|
||||
depth of <depth> relative to the remote side.
|
||||
|
||||
deepen-relative
|
||||
Requests that the semantics of the "deepen" command be changed
|
||||
to indicate that the depth requested is relative to the client's
|
||||
current shallow boundary, instead of relative to the requested
|
||||
commits.
|
||||
|
||||
deepen-since <timestamp>
|
||||
Requests that the shallow clone/fetch should be cut at a
|
||||
specific time, instead of depth. Internally it's equivalent to
|
||||
doing "git rev-list --max-age=<timestamp>". Cannot be used with
|
||||
"deepen".
|
||||
|
||||
deepen-not <rev>
|
||||
Requests that the shallow clone/fetch should be cut at a
|
||||
specific revision specified by '<rev>', instead of a depth.
|
||||
Internally it's equivalent of doing "git rev-list --not <rev>".
|
||||
Cannot be used with "deepen", but can be used with
|
||||
"deepen-since".
|
||||
|
||||
If the 'filter' feature is advertised, the following argument can be
|
||||
included in the client's request:
|
||||
|
||||
filter <filter-spec>
|
||||
Request that various objects from the packfile be omitted
|
||||
using one of several filtering techniques. These are intended
|
||||
for use with partial clone and partial fetch operations. See
|
||||
`rev-list` for possible "filter-spec" values. When communicating
|
||||
with other processes, senders SHOULD translate scaled integers
|
||||
(e.g. "1k") into a fully-expanded form (e.g. "1024") to aid
|
||||
interoperability with older receivers that may not understand
|
||||
newly-invented scaling suffixes. However, receivers SHOULD
|
||||
accept the following suffixes: 'k', 'm', and 'g' for 1024,
|
||||
1048576, and 1073741824, respectively.
|
||||
|
||||
If the 'ref-in-want' feature is advertised, the following argument can
|
||||
be included in the client's request as well as the potential addition of
|
||||
the 'wanted-refs' section in the server's response as explained below.
|
||||
|
||||
want-ref <ref>
|
||||
Indicates to the server that the client wants to retrieve a
|
||||
particular ref, where <ref> is the full name of a ref on the
|
||||
server.
|
||||
|
||||
If the 'sideband-all' feature is advertised, the following argument can be
|
||||
included in the client's request:
|
||||
|
||||
sideband-all
|
||||
Instruct the server to send the whole response multiplexed, not just
|
||||
the packfile section. All non-flush and non-delim PKT-LINE in the
|
||||
response (not only in the packfile section) will then start with a byte
|
||||
indicating its sideband (1, 2, or 3), and the server may send "0005\2"
|
||||
(a PKT-LINE of sideband 2 with no payload) as a keepalive packet.
|
||||
|
||||
The response of `fetch` is broken into a number of sections separated by
|
||||
delimiter packets (0001), with each section beginning with its section
|
||||
header.
|
||||
|
||||
output = *section
|
||||
section = (acknowledgments | shallow-info | wanted-refs | packfile)
|
||||
(flush-pkt | delim-pkt)
|
||||
|
||||
acknowledgments = PKT-LINE("acknowledgments" LF)
|
||||
(nak | *ack)
|
||||
(ready)
|
||||
ready = PKT-LINE("ready" LF)
|
||||
nak = PKT-LINE("NAK" LF)
|
||||
ack = PKT-LINE("ACK" SP obj-id LF)
|
||||
|
||||
shallow-info = PKT-LINE("shallow-info" LF)
|
||||
*PKT-LINE((shallow | unshallow) LF)
|
||||
shallow = "shallow" SP obj-id
|
||||
unshallow = "unshallow" SP obj-id
|
||||
|
||||
wanted-refs = PKT-LINE("wanted-refs" LF)
|
||||
*PKT-LINE(wanted-ref LF)
|
||||
wanted-ref = obj-id SP refname
|
||||
|
||||
packfile = PKT-LINE("packfile" LF)
|
||||
*PKT-LINE(%x01-03 *%x00-ff)
|
||||
|
||||
acknowledgments section
|
||||
* If the client determines that it is finished with negotiations
|
||||
by sending a "done" line, the acknowledgments sections MUST be
|
||||
omitted from the server's response.
|
||||
|
||||
* Always begins with the section header "acknowledgments"
|
||||
|
||||
* The server will respond with "NAK" if none of the object ids sent
|
||||
as have lines were common.
|
||||
|
||||
* The server will respond with "ACK obj-id" for all of the
|
||||
object ids sent as have lines which are common.
|
||||
|
||||
* A response cannot have both "ACK" lines as well as a "NAK"
|
||||
line.
|
||||
|
||||
* The server will respond with a "ready" line indicating that
|
||||
the server has found an acceptable common base and is ready to
|
||||
make and send a packfile (which will be found in the packfile
|
||||
section of the same response)
|
||||
|
||||
* If the server has found a suitable cut point and has decided
|
||||
to send a "ready" line, then the server can decide to (as an
|
||||
optimization) omit any "ACK" lines it would have sent during
|
||||
its response. This is because the server will have already
|
||||
determined the objects it plans to send to the client and no
|
||||
further negotiation is needed.
|
||||
|
||||
shallow-info section
|
||||
* If the client has requested a shallow fetch/clone, a shallow
|
||||
client requests a fetch or the server is shallow then the
|
||||
server's response may include a shallow-info section. The
|
||||
shallow-info section will be included if (due to one of the
|
||||
above conditions) the server needs to inform the client of any
|
||||
shallow boundaries or adjustments to the clients already
|
||||
existing shallow boundaries.
|
||||
|
||||
* Always begins with the section header "shallow-info"
|
||||
|
||||
* If a positive depth is requested, the server will compute the
|
||||
set of commits which are no deeper than the desired depth.
|
||||
|
||||
* The server sends a "shallow obj-id" line for each commit whose
|
||||
parents will not be sent in the following packfile.
|
||||
|
||||
* The server sends an "unshallow obj-id" line for each commit
|
||||
which the client has indicated is shallow, but is no longer
|
||||
shallow as a result of the fetch (due to its parents being
|
||||
sent in the following packfile).
|
||||
|
||||
* The server MUST NOT send any "unshallow" lines for anything
|
||||
which the client has not indicated was shallow as a part of
|
||||
its request.
|
||||
|
||||
* This section is only included if a packfile section is also
|
||||
included in the response.
|
||||
|
||||
wanted-refs section
|
||||
* This section is only included if the client has requested a
|
||||
ref using a 'want-ref' line and if a packfile section is also
|
||||
included in the response.
|
||||
|
||||
* Always begins with the section header "wanted-refs".
|
||||
|
||||
* The server will send a ref listing ("<oid> <refname>") for
|
||||
each reference requested using 'want-ref' lines.
|
||||
|
||||
* The server MUST NOT send any refs which were not requested
|
||||
using 'want-ref' lines.
|
||||
|
||||
packfile section
|
||||
* This section is only included if the client has sent 'want'
|
||||
lines in its request and either requested that no more
|
||||
negotiation be done by sending 'done' or if the server has
|
||||
decided it has found a sufficient cut point to produce a
|
||||
packfile.
|
||||
|
||||
* Always begins with the section header "packfile"
|
||||
|
||||
* The transmission of the packfile begins immediately after the
|
||||
section header
|
||||
|
||||
* The data transfer of the packfile is always multiplexed, using
|
||||
the same semantics of the 'side-band-64k' capability from
|
||||
protocol version 1. This means that each packet, during the
|
||||
packfile data stream, is made up of a leading 4-byte pkt-line
|
||||
length (typical of the pkt-line format), followed by a 1-byte
|
||||
stream code, followed by the actual data.
|
||||
|
||||
The stream code can be one of:
|
||||
1 - pack data
|
||||
2 - progress messages
|
||||
3 - fatal error message just before stream aborts
|
||||
|
||||
server-option
|
||||
~~~~~~~~~~~~~
|
||||
|
||||
If advertised, indicates that any number of server specific options can be
|
||||
included in a request. This is done by sending each option as a
|
||||
"server-option=<option>" capability line in the capability-list section of
|
||||
a request.
|
||||
|
||||
The provided options must not contain a NUL or LF character.
|
||||
201
Documentation/technical/racy-git.txt
Normal file
201
Documentation/technical/racy-git.txt
Normal file
|
|
@ -0,0 +1,201 @@
|
|||
Use of index and Racy Git problem
|
||||
=================================
|
||||
|
||||
Background
|
||||
----------
|
||||
|
||||
The index is one of the most important data structures in Git.
|
||||
It represents a virtual working tree state by recording list of
|
||||
paths and their object names and serves as a staging area to
|
||||
write out the next tree object to be committed. The state is
|
||||
"virtual" in the sense that it does not necessarily have to, and
|
||||
often does not, match the files in the working tree.
|
||||
|
||||
There are cases Git needs to examine the differences between the
|
||||
virtual working tree state in the index and the files in the
|
||||
working tree. The most obvious case is when the user asks `git
|
||||
diff` (or its low level implementation, `git diff-files`) or
|
||||
`git-ls-files --modified`. In addition, Git internally checks
|
||||
if the files in the working tree are different from what are
|
||||
recorded in the index to avoid stomping on local changes in them
|
||||
during patch application, switching branches, and merging.
|
||||
|
||||
In order to speed up this comparison between the files in the
|
||||
working tree and the index entries, the index entries record the
|
||||
information obtained from the filesystem via `lstat(2)` system
|
||||
call when they were last updated. When checking if they differ,
|
||||
Git first runs `lstat(2)` on the files and compares the result
|
||||
with this information (this is what was originally done by the
|
||||
`ce_match_stat()` function, but the current code does it in
|
||||
`ce_match_stat_basic()` function). If some of these "cached
|
||||
stat information" fields do not match, Git can tell that the
|
||||
files are modified without even looking at their contents.
|
||||
|
||||
Note: not all members in `struct stat` obtained via `lstat(2)`
|
||||
are used for this comparison. For example, `st_atime` obviously
|
||||
is not useful. Currently, Git compares the file type (regular
|
||||
files vs symbolic links) and executable bits (only for regular
|
||||
files) from `st_mode` member, `st_mtime` and `st_ctime`
|
||||
timestamps, `st_uid`, `st_gid`, `st_ino`, and `st_size` members.
|
||||
With a `USE_STDEV` compile-time option, `st_dev` is also
|
||||
compared, but this is not enabled by default because this member
|
||||
is not stable on network filesystems. With `USE_NSEC`
|
||||
compile-time option, `st_mtim.tv_nsec` and `st_ctim.tv_nsec`
|
||||
members are also compared. On Linux, this is not enabled by default
|
||||
because in-core timestamps can have finer granularity than
|
||||
on-disk timestamps, resulting in meaningless changes when an
|
||||
inode is evicted from the inode cache. See commit 8ce13b0
|
||||
of git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
|
||||
([PATCH] Sync in core time granularity with filesystems,
|
||||
2005-01-04). This patch is included in kernel 2.6.11 and newer, but
|
||||
only fixes the issue for file systems with exactly 1 ns or 1 s
|
||||
resolution. Other file systems are still broken in current Linux
|
||||
kernels (e.g. CEPH, CIFS, NTFS, UDF), see
|
||||
https://lkml.org/lkml/2015/6/9/714
|
||||
|
||||
Racy Git
|
||||
--------
|
||||
|
||||
There is one slight problem with the optimization based on the
|
||||
cached stat information. Consider this sequence:
|
||||
|
||||
: modify 'foo'
|
||||
$ git update-index 'foo'
|
||||
: modify 'foo' again, in-place, without changing its size
|
||||
|
||||
The first `update-index` computes the object name of the
|
||||
contents of file `foo` and updates the index entry for `foo`
|
||||
along with the `struct stat` information. If the modification
|
||||
that follows it happens very fast so that the file's `st_mtime`
|
||||
timestamp does not change, after this sequence, the cached stat
|
||||
information the index entry records still exactly match what you
|
||||
would see in the filesystem, even though the file `foo` is now
|
||||
different.
|
||||
This way, Git can incorrectly think files in the working tree
|
||||
are unmodified even though they actually are. This is called
|
||||
the "racy Git" problem (discovered by Pasky), and the entries
|
||||
that appear clean when they may not be because of this problem
|
||||
are called "racily clean".
|
||||
|
||||
To avoid this problem, Git does two things:
|
||||
|
||||
. When the cached stat information says the file has not been
|
||||
modified, and the `st_mtime` is the same as (or newer than)
|
||||
the timestamp of the index file itself (which is the time `git
|
||||
update-index foo` finished running in the above example), it
|
||||
also compares the contents with the object registered in the
|
||||
index entry to make sure they match.
|
||||
|
||||
. When the index file is updated that contains racily clean
|
||||
entries, cached `st_size` information is truncated to zero
|
||||
before writing a new version of the index file.
|
||||
|
||||
Because the index file itself is written after collecting all
|
||||
the stat information from updated paths, `st_mtime` timestamp of
|
||||
it is usually the same as or newer than any of the paths the
|
||||
index contains. And no matter how quick the modification that
|
||||
follows `git update-index foo` finishes, the resulting
|
||||
`st_mtime` timestamp on `foo` cannot get a value earlier
|
||||
than the index file. Therefore, index entries that can be
|
||||
racily clean are limited to the ones that have the same
|
||||
timestamp as the index file itself.
|
||||
|
||||
The callers that want to check if an index entry matches the
|
||||
corresponding file in the working tree continue to call
|
||||
`ce_match_stat()`, but with this change, `ce_match_stat()` uses
|
||||
`ce_modified_check_fs()` to see if racily clean ones are
|
||||
actually clean after comparing the cached stat information using
|
||||
`ce_match_stat_basic()`.
|
||||
|
||||
The problem the latter solves is this sequence:
|
||||
|
||||
$ git update-index 'foo'
|
||||
: modify 'foo' in-place without changing its size
|
||||
: wait for enough time
|
||||
$ git update-index 'bar'
|
||||
|
||||
Without the latter, the timestamp of the index file gets a newer
|
||||
value, and falsely clean entry `foo` would not be caught by the
|
||||
timestamp comparison check done with the former logic anymore.
|
||||
The latter makes sure that the cached stat information for `foo`
|
||||
would never match with the file in the working tree, so later
|
||||
checks by `ce_match_stat_basic()` would report that the index entry
|
||||
does not match the file and Git does not have to fall back on more
|
||||
expensive `ce_modified_check_fs()`.
|
||||
|
||||
|
||||
Runtime penalty
|
||||
---------------
|
||||
|
||||
The runtime penalty of falling back to `ce_modified_check_fs()`
|
||||
from `ce_match_stat()` can be very expensive when there are many
|
||||
racily clean entries. An obvious way to artificially create
|
||||
this situation is to give the same timestamp to all the files in
|
||||
the working tree in a large project, run `git update-index` on
|
||||
them, and give the same timestamp to the index file:
|
||||
|
||||
$ date >.datestamp
|
||||
$ git ls-files | xargs touch -r .datestamp
|
||||
$ git ls-files | git update-index --stdin
|
||||
$ touch -r .datestamp .git/index
|
||||
|
||||
This will make all index entries racily clean. The linux project, for
|
||||
example, there are over 20,000 files in the working tree. On my
|
||||
Athlon 64 X2 3800+, after the above:
|
||||
|
||||
$ /usr/bin/time git diff-files
|
||||
1.68user 0.54system 0:02.22elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
|
||||
0inputs+0outputs (0major+67111minor)pagefaults 0swaps
|
||||
$ git update-index MAINTAINERS
|
||||
$ /usr/bin/time git diff-files
|
||||
0.02user 0.12system 0:00.14elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
|
||||
0inputs+0outputs (0major+935minor)pagefaults 0swaps
|
||||
|
||||
Running `git update-index` in the middle checked the racily
|
||||
clean entries, and left the cached `st_mtime` for all the paths
|
||||
intact because they were actually clean (so this step took about
|
||||
the same amount of time as the first `git diff-files`). After
|
||||
that, they are not racily clean anymore but are truly clean, so
|
||||
the second invocation of `git diff-files` fully took advantage
|
||||
of the cached stat information.
|
||||
|
||||
|
||||
Avoiding runtime penalty
|
||||
------------------------
|
||||
|
||||
In order to avoid the above runtime penalty, post 1.4.2 Git used
|
||||
to have a code that made sure the index file
|
||||
got timestamp newer than the youngest files in the index when
|
||||
there are many young files with the same timestamp as the
|
||||
resulting index file would otherwise would have by waiting
|
||||
before finishing writing the index file out.
|
||||
|
||||
I suspected that in practice the situation where many paths in the
|
||||
index are all racily clean was quite rare. The only code paths
|
||||
that can record recent timestamp for large number of paths are:
|
||||
|
||||
. Initial `git add .` of a large project.
|
||||
|
||||
. `git checkout` of a large project from an empty index into an
|
||||
unpopulated working tree.
|
||||
|
||||
Note: switching branches with `git checkout` keeps the cached
|
||||
stat information of existing working tree files that are the
|
||||
same between the current branch and the new branch, which are
|
||||
all older than the resulting index file, and they will not
|
||||
become racily clean. Only the files that are actually checked
|
||||
out can become racily clean.
|
||||
|
||||
In a large project where raciness avoidance cost really matters,
|
||||
however, the initial computation of all object names in the
|
||||
index takes more than one second, and the index file is written
|
||||
out after all that happens. Therefore the timestamp of the
|
||||
index file will be more than one seconds later than the
|
||||
youngest file in the working tree. This means that in these
|
||||
cases there actually will not be any racily clean entry in
|
||||
the resulting index.
|
||||
|
||||
Based on this discussion, the current code does not use the
|
||||
"workaround" to avoid the runtime penalty that does not exist in
|
||||
practice anymore. This was done with commit 0fc82cff on Aug 15,
|
||||
2006.
|
||||
102
Documentation/technical/repository-version.txt
Normal file
102
Documentation/technical/repository-version.txt
Normal file
|
|
@ -0,0 +1,102 @@
|
|||
== Git Repository Format Versions
|
||||
|
||||
Every git repository is marked with a numeric version in the
|
||||
`core.repositoryformatversion` key of its `config` file. This version
|
||||
specifies the rules for operating on the on-disk repository data. An
|
||||
implementation of git which does not understand a particular version
|
||||
advertised by an on-disk repository MUST NOT operate on that repository;
|
||||
doing so risks not only producing wrong results, but actually losing
|
||||
data.
|
||||
|
||||
Because of this rule, version bumps should be kept to an absolute
|
||||
minimum. Instead, we generally prefer these strategies:
|
||||
|
||||
- bumping format version numbers of individual data files (e.g.,
|
||||
index, packfiles, etc). This restricts the incompatibilities only to
|
||||
those files.
|
||||
|
||||
- introducing new data that gracefully degrades when used by older
|
||||
clients (e.g., pack bitmap files are ignored by older clients, which
|
||||
simply do not take advantage of the optimization they provide).
|
||||
|
||||
A whole-repository format version bump should only be part of a change
|
||||
that cannot be independently versioned. For instance, if one were to
|
||||
change the reachability rules for objects, or the rules for locking
|
||||
refs, that would require a bump of the repository format version.
|
||||
|
||||
Note that this applies only to accessing the repository's disk contents
|
||||
directly. An older client which understands only format `0` may still
|
||||
connect via `git://` to a repository using format `1`, as long as the
|
||||
server process understands format `1`.
|
||||
|
||||
The preferred strategy for rolling out a version bump (whether whole
|
||||
repository or for a single file) is to teach git to read the new format,
|
||||
and allow writing the new format with a config switch or command line
|
||||
option (for experimentation or for those who do not care about backwards
|
||||
compatibility with older gits). Then after a long period to allow the
|
||||
reading capability to become common, we may switch to writing the new
|
||||
format by default.
|
||||
|
||||
The currently defined format versions are:
|
||||
|
||||
=== Version `0`
|
||||
|
||||
This is the format defined by the initial version of git, including but
|
||||
not limited to the format of the repository directory, the repository
|
||||
configuration file, and the object and ref storage. Specifying the
|
||||
complete behavior of git is beyond the scope of this document.
|
||||
|
||||
=== Version `1`
|
||||
|
||||
This format is identical to version `0`, with the following exceptions:
|
||||
|
||||
1. When reading the `core.repositoryformatversion` variable, a git
|
||||
implementation which supports version 1 MUST also read any
|
||||
configuration keys found in the `extensions` section of the
|
||||
configuration file.
|
||||
|
||||
2. If a version-1 repository specifies any `extensions.*` keys that
|
||||
the running git has not implemented, the operation MUST NOT
|
||||
proceed. Similarly, if the value of any known key is not understood
|
||||
by the implementation, the operation MUST NOT proceed.
|
||||
|
||||
Note that if no extensions are specified in the config file, then
|
||||
`core.repositoryformatversion` SHOULD be set to `0` (setting it to `1`
|
||||
provides no benefit, and makes the repository incompatible with older
|
||||
implementations of git).
|
||||
|
||||
This document will serve as the master list for extensions. Any
|
||||
implementation wishing to define a new extension should make a note of
|
||||
it here, in order to claim the name.
|
||||
|
||||
The defined extensions are:
|
||||
|
||||
==== `noop`
|
||||
|
||||
This extension does not change git's behavior at all. It is useful only
|
||||
for testing format-1 compatibility.
|
||||
|
||||
==== `preciousObjects`
|
||||
|
||||
When the config key `extensions.preciousObjects` is set to `true`,
|
||||
objects in the repository MUST NOT be deleted (e.g., by `git-prune` or
|
||||
`git repack -d`).
|
||||
|
||||
==== `partialclone`
|
||||
|
||||
When the config key `extensions.partialclone` is set, it indicates
|
||||
that the repo was created with a partial clone (or later performed
|
||||
a partial fetch) and that the remote may have omitted sending
|
||||
certain unwanted objects. Such a remote is called a "promisor remote"
|
||||
and it promises that all such omitted objects can be fetched from it
|
||||
in the future.
|
||||
|
||||
The value of this key is the name of the promisor remote.
|
||||
|
||||
==== `worktreeConfig`
|
||||
|
||||
If set, by default "git config" reads from both "config" and
|
||||
"config.worktree" file from GIT_DIR in that order. In
|
||||
multiple working directory mode, "config" file is shared while
|
||||
"config.worktree" is per-working directory (i.e., it's in
|
||||
GIT_COMMON_DIR/worktrees/<id>/config.worktree)
|
||||
186
Documentation/technical/rerere.txt
Normal file
186
Documentation/technical/rerere.txt
Normal file
|
|
@ -0,0 +1,186 @@
|
|||
Rerere
|
||||
======
|
||||
|
||||
This document describes the rerere logic.
|
||||
|
||||
Conflict normalization
|
||||
----------------------
|
||||
|
||||
To ensure recorded conflict resolutions can be looked up in the rerere
|
||||
database, even when branches are merged in a different order,
|
||||
different branches are merged that result in the same conflict, or
|
||||
when different conflict style settings are used, rerere normalizes the
|
||||
conflicts before writing them to the rerere database.
|
||||
|
||||
Different conflict styles and branch names are normalized by stripping
|
||||
the labels from the conflict markers, and removing the common ancestor
|
||||
version from the `diff3` conflict style. Branches that are merged
|
||||
in different order are normalized by sorting the conflict hunks. More
|
||||
on each of those steps in the following sections.
|
||||
|
||||
Once these two normalization operations are applied, a conflict ID is
|
||||
calculated based on the normalized conflict, which is later used by
|
||||
rerere to look up the conflict in the rerere database.
|
||||
|
||||
Removing the common ancestor version
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Say we have three branches AB, AC and AC2. The common ancestor of
|
||||
these branches has a file with a line containing the string "A" (for
|
||||
brevity this is called "line A" in the rest of the document). In
|
||||
branch AB this line is changed to "B", in AC, this line is changed to
|
||||
"C", and branch AC2 is forked off of AC, after the line was changed to
|
||||
"C".
|
||||
|
||||
Forking a branch ABAC off of branch AB and then merging AC into it, we
|
||||
get a conflict like the following:
|
||||
|
||||
<<<<<<< HEAD
|
||||
B
|
||||
=======
|
||||
C
|
||||
>>>>>>> AC
|
||||
|
||||
Doing the analogous with AC2 (forking a branch ABAC2 off of branch AB
|
||||
and then merging branch AC2 into it), using the diff3 conflict style,
|
||||
we get a conflict like the following:
|
||||
|
||||
<<<<<<< HEAD
|
||||
B
|
||||
||||||| merged common ancestors
|
||||
A
|
||||
=======
|
||||
C
|
||||
>>>>>>> AC2
|
||||
|
||||
By resolving this conflict, to leave line D, the user declares:
|
||||
|
||||
After examining what branches AB and AC did, I believe that making
|
||||
line A into line D is the best thing to do that is compatible with
|
||||
what AB and AC wanted to do.
|
||||
|
||||
As branch AC2 refers to the same commit as AC, the above implies that
|
||||
this is also compatible what AB and AC2 wanted to do.
|
||||
|
||||
By extension, this means that rerere should recognize that the above
|
||||
conflicts are the same. To do this, the labels on the conflict
|
||||
markers are stripped, and the common ancestor version is removed. The above
|
||||
examples would both result in the following normalized conflict:
|
||||
|
||||
<<<<<<<
|
||||
B
|
||||
=======
|
||||
C
|
||||
>>>>>>>
|
||||
|
||||
Sorting hunks
|
||||
~~~~~~~~~~~~~
|
||||
|
||||
As before, lets imagine that a common ancestor had a file with line A
|
||||
its early part, and line X in its late part. And then four branches
|
||||
are forked that do these things:
|
||||
|
||||
- AB: changes A to B
|
||||
- AC: changes A to C
|
||||
- XY: changes X to Y
|
||||
- XZ: changes X to Z
|
||||
|
||||
Now, forking a branch ABAC off of branch AB and then merging AC into
|
||||
it, and forking a branch ACAB off of branch AC and then merging AB
|
||||
into it, would yield the conflict in a different order. The former
|
||||
would say "A became B or C, what now?" while the latter would say "A
|
||||
became C or B, what now?"
|
||||
|
||||
As a reminder, the act of merging AC into ABAC and resolving the
|
||||
conflict to leave line D means that the user declares:
|
||||
|
||||
After examining what branches AB and AC did, I believe that
|
||||
making line A into line D is the best thing to do that is
|
||||
compatible with what AB and AC wanted to do.
|
||||
|
||||
So the conflict we would see when merging AB into ACAB should be
|
||||
resolved the same way---it is the resolution that is in line with that
|
||||
declaration.
|
||||
|
||||
Imagine that similarly previously a branch XYXZ was forked from XY,
|
||||
and XZ was merged into it, and resolved "X became Y or Z" into "X
|
||||
became W".
|
||||
|
||||
Now, if a branch ABXY was forked from AB and then merged XY, then ABXY
|
||||
would have line B in its early part and line Y in its later part.
|
||||
Such a merge would be quite clean. We can construct 4 combinations
|
||||
using these four branches ((AB, AC) x (XY, XZ)).
|
||||
|
||||
Merging ABXY and ACXZ would make "an early A became B or C, a late X
|
||||
became Y or Z" conflict, while merging ACXY and ABXZ would make "an
|
||||
early A became C or B, a late X became Y or Z". We can see there are
|
||||
4 combinations of ("B or C", "C or B") x ("X or Y", "Y or X").
|
||||
|
||||
By sorting, the conflict is given its canonical name, namely, "an
|
||||
early part became B or C, a late part becames X or Y", and whenever
|
||||
any of these four patterns appear, and we can get to the same conflict
|
||||
and resolution that we saw earlier.
|
||||
|
||||
Without the sorting, we'd have to somehow find a previous resolution
|
||||
from combinatorial explosion.
|
||||
|
||||
Conflict ID calculation
|
||||
~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Once the conflict normalization is done, the conflict ID is calculated
|
||||
as the sha1 hash of the conflict hunks appended to each other,
|
||||
separated by <NUL> characters. The conflict markers are stripped out
|
||||
before the sha1 is calculated. So in the example above, where we
|
||||
merge branch AC which changes line A to line C, into branch AB, which
|
||||
changes line A to line C, the conflict ID would be
|
||||
SHA1('B<NUL>C<NUL>').
|
||||
|
||||
If there are multiple conflicts in one file, the sha1 is calculated
|
||||
the same way with all hunks appended to each other, in the order in
|
||||
which they appear in the file, separated by a <NUL> character.
|
||||
|
||||
Nested conflicts
|
||||
~~~~~~~~~~~~~~~~
|
||||
|
||||
Nested conflicts are handled very similarly to "simple" conflicts.
|
||||
Similar to simple conflicts, the conflict is first normalized by
|
||||
stripping the labels from conflict markers, stripping the common ancestor
|
||||
version, and the sorting the conflict hunks, both for the outer and the
|
||||
inner conflict. This is done recursively, so any number of nested
|
||||
conflicts can be handled.
|
||||
|
||||
Note that this only works for conflict markers that "cleanly nest". If
|
||||
there are any unmatched conflict markers, rerere will fail to handle
|
||||
the conflict and record a conflict resolution.
|
||||
|
||||
The only difference is in how the conflict ID is calculated. For the
|
||||
inner conflict, the conflict markers themselves are not stripped out
|
||||
before calculating the sha1.
|
||||
|
||||
Say we have the following conflict for example:
|
||||
|
||||
<<<<<<< HEAD
|
||||
1
|
||||
=======
|
||||
<<<<<<< HEAD
|
||||
3
|
||||
=======
|
||||
2
|
||||
>>>>>>> branch-2
|
||||
>>>>>>> branch-3~
|
||||
|
||||
After stripping out the labels of the conflict markers, and sorting
|
||||
the hunks, the conflict would look as follows:
|
||||
|
||||
<<<<<<<
|
||||
1
|
||||
=======
|
||||
<<<<<<<
|
||||
2
|
||||
=======
|
||||
3
|
||||
>>>>>>>
|
||||
>>>>>>>
|
||||
|
||||
and finally the conflict ID would be calculated as:
|
||||
`sha1('1<NUL><<<<<<<\n3\n=======\n2\n>>>>>>><NUL>')`
|
||||
63
Documentation/technical/send-pack-pipeline.txt
Normal file
63
Documentation/technical/send-pack-pipeline.txt
Normal file
|
|
@ -0,0 +1,63 @@
|
|||
Git-send-pack internals
|
||||
=======================
|
||||
|
||||
Overall operation
|
||||
-----------------
|
||||
|
||||
. Connects to the remote side and invokes git-receive-pack.
|
||||
|
||||
. Learns what refs the remote has and what commit they point at.
|
||||
Matches them to the refspecs we are pushing.
|
||||
|
||||
. Checks if there are non-fast-forwards. Unlike fetch-pack,
|
||||
the repository send-pack runs in is supposed to be a superset
|
||||
of the recipient in fast-forward cases, so there is no need
|
||||
for want/have exchanges, and fast-forward check can be done
|
||||
locally. Tell the result to the other end.
|
||||
|
||||
. Calls pack_objects() which generates a packfile and sends it
|
||||
over to the other end.
|
||||
|
||||
. If the remote side is new enough (v1.1.0 or later), wait for
|
||||
the unpack and hook status from the other end.
|
||||
|
||||
. Exit with appropriate error codes.
|
||||
|
||||
|
||||
Pack_objects pipeline
|
||||
---------------------
|
||||
|
||||
This function gets one file descriptor (`fd`) which is either a
|
||||
socket (over the network) or a pipe (local). What's written to
|
||||
this fd goes to git-receive-pack to be unpacked.
|
||||
|
||||
send-pack ---> fd ---> receive-pack
|
||||
|
||||
The function pack_objects creates a pipe and then forks. The
|
||||
forked child execs pack-objects with --revs to receive revision
|
||||
parameters from its standard input. This process will write the
|
||||
packfile to the other end.
|
||||
|
||||
send-pack
|
||||
|
|
||||
pack_objects() ---> fd ---> receive-pack
|
||||
| ^ (pipe)
|
||||
v |
|
||||
(child)
|
||||
|
||||
The child dup2's to arrange its standard output to go back to
|
||||
the other end, and read its standard input to come from the
|
||||
pipe. After that it exec's pack-objects. On the other hand,
|
||||
the parent process, before starting to feed the child pipeline,
|
||||
closes the reading side of the pipe and fd to receive-pack.
|
||||
|
||||
send-pack
|
||||
|
|
||||
pack_objects(parent)
|
||||
|
|
||||
v [0]
|
||||
pack-objects [0] ---> receive-pack
|
||||
|
||||
|
||||
[jc: the pipeline was much more complex and needed documentation before
|
||||
I understood an earlier bug, but now it is trivial and straightforward.]
|
||||
60
Documentation/technical/shallow.txt
Normal file
60
Documentation/technical/shallow.txt
Normal file
|
|
@ -0,0 +1,60 @@
|
|||
Shallow commits
|
||||
===============
|
||||
|
||||
.Definition
|
||||
*********************************************************
|
||||
Shallow commits do have parents, but not in the shallow
|
||||
repo, and therefore grafts are introduced pretending that
|
||||
these commits have no parents.
|
||||
*********************************************************
|
||||
|
||||
$GIT_DIR/shallow lists commit object names and tells Git to
|
||||
pretend as if they are root commits (e.g. "git log" traversal
|
||||
stops after showing them; "git fsck" does not complain saying
|
||||
the commits listed on their "parent" lines do not exist).
|
||||
|
||||
Each line contains exactly one SHA-1. When read, a commit_graft
|
||||
will be constructed, which has nr_parent < 0 to make it easier
|
||||
to discern from user provided grafts.
|
||||
|
||||
Note that the shallow feature could not be changed easily to
|
||||
use replace refs: a commit containing a `mergetag` is not allowed
|
||||
to be replaced, not even by a root commit. Such a commit can be
|
||||
made shallow, though. Also, having a `shallow` file explicitly
|
||||
listing all the commits made shallow makes it a *lot* easier to
|
||||
do shallow-specific things such as to deepen the history.
|
||||
|
||||
Since fsck-objects relies on the library to read the objects,
|
||||
it honours shallow commits automatically.
|
||||
|
||||
There are some unfinished ends of the whole shallow business:
|
||||
|
||||
- maybe we have to force non-thin packs when fetching into a
|
||||
shallow repo (ATM they are forced non-thin).
|
||||
|
||||
- A special handling of a shallow upstream is needed. At some
|
||||
stage, upload-pack has to check if it sends a shallow commit,
|
||||
and it should send that information early (or fail, if the
|
||||
client does not support shallow repositories). There is no
|
||||
support at all for this in this patch series.
|
||||
|
||||
- Instead of locking $GIT_DIR/shallow at the start, just
|
||||
the timestamp of it is noted, and when it comes to writing it,
|
||||
a check is performed if the mtime is still the same, dying if
|
||||
it is not.
|
||||
|
||||
- It is unclear how "push into/from a shallow repo" should behave.
|
||||
|
||||
- If you deepen a history, you'd want to get the tags of the
|
||||
newly stored (but older!) commits. This does not work right now.
|
||||
|
||||
To make a shallow clone, you can call "git-clone --depth 20 repo".
|
||||
The result contains only commit chains with a length of at most 20.
|
||||
It also writes an appropriate $GIT_DIR/shallow.
|
||||
|
||||
You can deepen a shallow repository with "git-fetch --depth 20
|
||||
repo branch", which will fetch branch from repo, but stop at depth
|
||||
20, updating $GIT_DIR/shallow.
|
||||
|
||||
The special depth 2147483647 (or 0x7fffffff, the largest positive
|
||||
number a signed 32-bit integer can contain) means infinite depth.
|
||||
186
Documentation/technical/signature-format.txt
Normal file
186
Documentation/technical/signature-format.txt
Normal file
|
|
@ -0,0 +1,186 @@
|
|||
Git signature format
|
||||
====================
|
||||
|
||||
== Overview
|
||||
|
||||
Git uses cryptographic signatures in various places, currently objects (tags,
|
||||
commits, mergetags) and transactions (pushes). In every case, the command which
|
||||
is about to create an object or transaction determines a payload from that,
|
||||
calls gpg to obtain a detached signature for the payload (`gpg -bsa`) and
|
||||
embeds the signature into the object or transaction.
|
||||
|
||||
Signatures always begin with `-----BEGIN PGP SIGNATURE-----`
|
||||
and end with `-----END PGP SIGNATURE-----`, unless gpg is told to
|
||||
produce RFC1991 signatures which use `MESSAGE` instead of `SIGNATURE`.
|
||||
|
||||
The signed payload and the way the signature is embedded depends
|
||||
on the type of the object resp. transaction.
|
||||
|
||||
== Tag signatures
|
||||
|
||||
- created by: `git tag -s`
|
||||
- payload: annotated tag object
|
||||
- embedding: append the signature to the unsigned tag object
|
||||
- example: tag `signedtag` with subject `signed tag`
|
||||
|
||||
----
|
||||
object 04b871796dc0420f8e7561a895b52484b701d51a
|
||||
type commit
|
||||
tag signedtag
|
||||
tagger C O Mitter <committer@example.com> 1465981006 +0000
|
||||
|
||||
signed tag
|
||||
|
||||
signed tag message body
|
||||
-----BEGIN PGP SIGNATURE-----
|
||||
Version: GnuPG v1
|
||||
|
||||
iQEcBAABAgAGBQJXYRhOAAoJEGEJLoW3InGJklkIAIcnhL7RwEb/+QeX9enkXhxn
|
||||
rxfdqrvWd1K80sl2TOt8Bg/NYwrUBw/RWJ+sg/hhHp4WtvE1HDGHlkEz3y11Lkuh
|
||||
8tSxS3qKTxXUGozyPGuE90sJfExhZlW4knIQ1wt/yWqM+33E9pN4hzPqLwyrdods
|
||||
q8FWEqPPUbSJXoMbRPw04S5jrLtZSsUWbRYjmJCHzlhSfFWW4eFd37uquIaLUBS0
|
||||
rkC3Jrx7420jkIpgFcTI2s60uhSQLzgcCwdA2ukSYIRnjg/zDkj8+3h/GaROJ72x
|
||||
lZyI6HWixKJkWw8lE9aAOD9TmTW9sFJwcVAzmAuFX2kUreDUKMZduGcoRYGpD7E=
|
||||
=jpXa
|
||||
-----END PGP SIGNATURE-----
|
||||
----
|
||||
|
||||
- verify with: `git verify-tag [-v]` or `git tag -v`
|
||||
|
||||
----
|
||||
gpg: Signature made Wed Jun 15 10:56:46 2016 CEST using RSA key ID B7227189
|
||||
gpg: Good signature from "Eris Discordia <discord@example.net>"
|
||||
gpg: WARNING: This key is not certified with a trusted signature!
|
||||
gpg: There is no indication that the signature belongs to the owner.
|
||||
Primary key fingerprint: D4BE 2231 1AD3 131E 5EDA 29A4 6109 2E85 B722 7189
|
||||
object 04b871796dc0420f8e7561a895b52484b701d51a
|
||||
type commit
|
||||
tag signedtag
|
||||
tagger C O Mitter <committer@example.com> 1465981006 +0000
|
||||
|
||||
signed tag
|
||||
|
||||
signed tag message body
|
||||
----
|
||||
|
||||
== Commit signatures
|
||||
|
||||
- created by: `git commit -S`
|
||||
- payload: commit object
|
||||
- embedding: header entry `gpgsig`
|
||||
(content is preceded by a space)
|
||||
- example: commit with subject `signed commit`
|
||||
|
||||
----
|
||||
tree eebfed94e75e7760540d1485c740902590a00332
|
||||
parent 04b871796dc0420f8e7561a895b52484b701d51a
|
||||
author A U Thor <author@example.com> 1465981137 +0000
|
||||
committer C O Mitter <committer@example.com> 1465981137 +0000
|
||||
gpgsig -----BEGIN PGP SIGNATURE-----
|
||||
Version: GnuPG v1
|
||||
|
||||
iQEcBAABAgAGBQJXYRjRAAoJEGEJLoW3InGJ3IwIAIY4SA6GxY3BjL60YyvsJPh/
|
||||
HRCJwH+w7wt3Yc/9/bW2F+gF72kdHOOs2jfv+OZhq0q4OAN6fvVSczISY/82LpS7
|
||||
DVdMQj2/YcHDT4xrDNBnXnviDO9G7am/9OE77kEbXrp7QPxvhjkicHNwy2rEflAA
|
||||
zn075rtEERDHr8nRYiDh8eVrefSO7D+bdQ7gv+7GsYMsd2auJWi1dHOSfTr9HIF4
|
||||
HJhWXT9d2f8W+diRYXGh4X0wYiGg6na/soXc+vdtDYBzIxanRqjg8jCAeo1eOTk1
|
||||
EdTwhcTZlI0x5pvJ3H0+4hA2jtldVtmPM4OTB0cTrEWBad7XV6YgiyuII73Ve3I=
|
||||
=jKHM
|
||||
-----END PGP SIGNATURE-----
|
||||
|
||||
signed commit
|
||||
|
||||
signed commit message body
|
||||
----
|
||||
|
||||
- verify with: `git verify-commit [-v]` (or `git show --show-signature`)
|
||||
|
||||
----
|
||||
gpg: Signature made Wed Jun 15 10:58:57 2016 CEST using RSA key ID B7227189
|
||||
gpg: Good signature from "Eris Discordia <discord@example.net>"
|
||||
gpg: WARNING: This key is not certified with a trusted signature!
|
||||
gpg: There is no indication that the signature belongs to the owner.
|
||||
Primary key fingerprint: D4BE 2231 1AD3 131E 5EDA 29A4 6109 2E85 B722 7189
|
||||
tree eebfed94e75e7760540d1485c740902590a00332
|
||||
parent 04b871796dc0420f8e7561a895b52484b701d51a
|
||||
author A U Thor <author@example.com> 1465981137 +0000
|
||||
committer C O Mitter <committer@example.com> 1465981137 +0000
|
||||
|
||||
signed commit
|
||||
|
||||
signed commit message body
|
||||
----
|
||||
|
||||
== Mergetag signatures
|
||||
|
||||
- created by: `git merge` on signed tag
|
||||
- payload/embedding: the whole signed tag object is embedded into
|
||||
the (merge) commit object as header entry `mergetag`
|
||||
- example: merge of the signed tag `signedtag` as above
|
||||
|
||||
----
|
||||
tree c7b1cff039a93f3600a1d18b82d26688668c7dea
|
||||
parent c33429be94b5f2d3ee9b0adad223f877f174b05d
|
||||
parent 04b871796dc0420f8e7561a895b52484b701d51a
|
||||
author A U Thor <author@example.com> 1465982009 +0000
|
||||
committer C O Mitter <committer@example.com> 1465982009 +0000
|
||||
mergetag object 04b871796dc0420f8e7561a895b52484b701d51a
|
||||
type commit
|
||||
tag signedtag
|
||||
tagger C O Mitter <committer@example.com> 1465981006 +0000
|
||||
|
||||
signed tag
|
||||
|
||||
signed tag message body
|
||||
-----BEGIN PGP SIGNATURE-----
|
||||
Version: GnuPG v1
|
||||
|
||||
iQEcBAABAgAGBQJXYRhOAAoJEGEJLoW3InGJklkIAIcnhL7RwEb/+QeX9enkXhxn
|
||||
rxfdqrvWd1K80sl2TOt8Bg/NYwrUBw/RWJ+sg/hhHp4WtvE1HDGHlkEz3y11Lkuh
|
||||
8tSxS3qKTxXUGozyPGuE90sJfExhZlW4knIQ1wt/yWqM+33E9pN4hzPqLwyrdods
|
||||
q8FWEqPPUbSJXoMbRPw04S5jrLtZSsUWbRYjmJCHzlhSfFWW4eFd37uquIaLUBS0
|
||||
rkC3Jrx7420jkIpgFcTI2s60uhSQLzgcCwdA2ukSYIRnjg/zDkj8+3h/GaROJ72x
|
||||
lZyI6HWixKJkWw8lE9aAOD9TmTW9sFJwcVAzmAuFX2kUreDUKMZduGcoRYGpD7E=
|
||||
=jpXa
|
||||
-----END PGP SIGNATURE-----
|
||||
|
||||
Merge tag 'signedtag' into downstream
|
||||
|
||||
signed tag
|
||||
|
||||
signed tag message body
|
||||
|
||||
# gpg: Signature made Wed Jun 15 08:56:46 2016 UTC using RSA key ID B7227189
|
||||
# gpg: Good signature from "Eris Discordia <discord@example.net>"
|
||||
# gpg: WARNING: This key is not certified with a trusted signature!
|
||||
# gpg: There is no indication that the signature belongs to the owner.
|
||||
# Primary key fingerprint: D4BE 2231 1AD3 131E 5EDA 29A4 6109 2E85 B722 7189
|
||||
----
|
||||
|
||||
- verify with: verification is embedded in merge commit message by default,
|
||||
alternatively with `git show --show-signature`:
|
||||
|
||||
----
|
||||
commit 9863f0c76ff78712b6800e199a46aa56afbcbd49
|
||||
merged tag 'signedtag'
|
||||
gpg: Signature made Wed Jun 15 10:56:46 2016 CEST using RSA key ID B7227189
|
||||
gpg: Good signature from "Eris Discordia <discord@example.net>"
|
||||
gpg: WARNING: This key is not certified with a trusted signature!
|
||||
gpg: There is no indication that the signature belongs to the owner.
|
||||
Primary key fingerprint: D4BE 2231 1AD3 131E 5EDA 29A4 6109 2E85 B722 7189
|
||||
Merge: c33429b 04b8717
|
||||
Author: A U Thor <author@example.com>
|
||||
Date: Wed Jun 15 09:13:29 2016 +0000
|
||||
|
||||
Merge tag 'signedtag' into downstream
|
||||
|
||||
signed tag
|
||||
|
||||
signed tag message body
|
||||
|
||||
# gpg: Signature made Wed Jun 15 08:56:46 2016 UTC using RSA key ID B7227189
|
||||
# gpg: Good signature from "Eris Discordia <discord@example.net>"
|
||||
# gpg: WARNING: This key is not certified with a trusted signature!
|
||||
# gpg: There is no indication that the signature belongs to the owner.
|
||||
# Primary key fingerprint: D4BE 2231 1AD3 131E 5EDA 29A4 6109 2E85 B722 7189
|
||||
----
|
||||
121
Documentation/technical/trivial-merge.txt
Normal file
121
Documentation/technical/trivial-merge.txt
Normal file
|
|
@ -0,0 +1,121 @@
|
|||
Trivial merge rules
|
||||
===================
|
||||
|
||||
This document describes the outcomes of the trivial merge logic in read-tree.
|
||||
|
||||
One-way merge
|
||||
-------------
|
||||
|
||||
This replaces the index with a different tree, keeping the stat info
|
||||
for entries that don't change, and allowing -u to make the minimum
|
||||
required changes to the working tree to have it match.
|
||||
|
||||
Entries marked '+' have stat information. Spaces marked '*' don't
|
||||
affect the result.
|
||||
|
||||
index tree result
|
||||
-----------------------
|
||||
* (empty) (empty)
|
||||
(empty) tree tree
|
||||
index+ tree tree
|
||||
index+ index index+
|
||||
|
||||
Two-way merge
|
||||
-------------
|
||||
|
||||
It is permitted for the index to lack an entry; this does not prevent
|
||||
any case from applying.
|
||||
|
||||
If the index exists, it is an error for it not to match either the old
|
||||
or the result.
|
||||
|
||||
If multiple cases apply, the one used is listed first.
|
||||
|
||||
A result which changes the index is an error if the index is not empty
|
||||
and not up to date.
|
||||
|
||||
Entries marked '+' have stat information. Spaces marked '*' don't
|
||||
affect the result.
|
||||
|
||||
case index old new result
|
||||
-------------------------------------
|
||||
0/2 (empty) * (empty) (empty)
|
||||
1/3 (empty) * new new
|
||||
4/5 index+ (empty) (empty) index+
|
||||
6/7 index+ (empty) index index+
|
||||
10 index+ index (empty) (empty)
|
||||
14/15 index+ old old index+
|
||||
18/19 index+ old index index+
|
||||
20 index+ index new new
|
||||
|
||||
Three-way merge
|
||||
---------------
|
||||
|
||||
It is permitted for the index to lack an entry; this does not prevent
|
||||
any case from applying.
|
||||
|
||||
If the index exists, it is an error for it not to match either the
|
||||
head or (if the merge is trivial) the result.
|
||||
|
||||
If multiple cases apply, the one used is listed first.
|
||||
|
||||
A result of "no merge" means that index is left in stage 0, ancest in
|
||||
stage 1, head in stage 2, and remote in stage 3 (if any of these are
|
||||
empty, no entry is left for that stage). Otherwise, the given entry is
|
||||
left in stage 0, and there are no other entries.
|
||||
|
||||
A result of "no merge" is an error if the index is not empty and not
|
||||
up to date.
|
||||
|
||||
*empty* means that the tree must not have a directory-file conflict
|
||||
with the entry.
|
||||
|
||||
For multiple ancestors, a '+' means that this case applies even if
|
||||
only one ancestor or remote fits; a '^' means all of the ancestors
|
||||
must be the same.
|
||||
|
||||
case ancest head remote result
|
||||
----------------------------------------
|
||||
1 (empty)+ (empty) (empty) (empty)
|
||||
2ALT (empty)+ *empty* remote remote
|
||||
2 (empty)^ (empty) remote no merge
|
||||
3ALT (empty)+ head *empty* head
|
||||
3 (empty)^ head (empty) no merge
|
||||
4 (empty)^ head remote no merge
|
||||
5ALT * head head head
|
||||
6 ancest+ (empty) (empty) no merge
|
||||
8 ancest^ (empty) ancest no merge
|
||||
7 ancest+ (empty) remote no merge
|
||||
10 ancest^ ancest (empty) no merge
|
||||
9 ancest+ head (empty) no merge
|
||||
16 anc1/anc2 anc1 anc2 no merge
|
||||
13 ancest+ head ancest head
|
||||
14 ancest+ ancest remote remote
|
||||
11 ancest+ head remote no merge
|
||||
|
||||
Only #2ALT and #3ALT use *empty*, because these are the only cases
|
||||
where there can be conflicts that didn't exist before. Note that we
|
||||
allow directory-file conflicts between things in different stages
|
||||
after the trivial merge.
|
||||
|
||||
A possible alternative for #6 is (empty), which would make it like
|
||||
#1. This is not used, due to the likelihood that it arises due to
|
||||
moving the file to multiple different locations or moving and deleting
|
||||
it in different branches.
|
||||
|
||||
Case #1 is included for completeness, and also in case we decide to
|
||||
put on '+' markings; any path that is never mentioned at all isn't
|
||||
handled.
|
||||
|
||||
Note that #16 is when both #13 and #14 apply; in this case, we refuse
|
||||
the trivial merge, because we can't tell from this data which is
|
||||
right. This is a case of a reverted patch (in some direction, maybe
|
||||
multiple times), and the right answer depends on looking at crossings
|
||||
of history or common ancestors of the ancestors.
|
||||
|
||||
Note that, between #6, #7, #9, and #11, all cases not otherwise
|
||||
covered are handled in this table.
|
||||
|
||||
For #8 and #10, there is alternative behavior, not currently
|
||||
implemented, where the result is (empty). As currently implemented,
|
||||
the automatic merge will generally give this effect.
|
||||
Loading…
Add table
Add a link
Reference in a new issue