Recipes¶
Getting added and removed lines from Subversion log¶
Subversion log (unlike git) does not provide the number of lines added and removed for each commit so codemetrics resort to a 2 passes process retrieving the log first:
log = cm.get_svn_log()
At this point, added and removed column will be NaN. To populate then run the second pass. It is slow
because it relies on repeatedly calling svn diff -c
for each revision:
log.loc[:, ['added', 'removed']] = log.groupby('revision').apply(cm.get_diff_stats)
Note chunks=True
returns diff stats with a row for each diff chunks.
See also
Function codemetrics.svn.get_diff_stats
Leverage dask to speed up retrieval of added and removed line with Subversion¶
Retrieving added and removed line with Subversion can be slow because codemetrics makes repeated calls to
svn diff --git -c XXX
to count the number of pluses and minuses. To speed up the process somewhat, one can try to
leverage dask like so:
import dask.dataframe as dd
import dask.diagnostics as ddiags
import multiprocessing as mp
n_cpus = mp.cpu_count()
log = cm.get_svn_log().reset_index()
meta = get_diff_stats(log[-1:]).iloc[0:0]
partitioned_log = dd.from_pandas(log, npartitions=n_cpus)
wf = partioned_log.groupby('revision').apply(get_diff_stats, chunks=False, meta=meta)
with ddiags.ProgressBar(): # optional
addrem_df = wf.conpute() # returns a pandas.DataFrame
Note that there is a significant overhead to start the parallel process.