External Integration

General

Nino-hist is a systematic tool based on boost-histogram. The HEP analysis is a complicated process and relies on many other (Sckit-HEP) tools. Thus, it’s intuitive for Nino-hist to support the integration of these external packages and modules. In this part, we will demonstrate how Nino-hist works and its interaction with other (Sckit-HEP) tools.

A brief workflow of HEP analysis is as below:

image0

De/serialization

If we want to analysis data, what do we need to do first? To load the data files, of course! How do we do it? It denpends on the file, of course!

pickle Integration

We could use pickle to deal with it like this:

[1]:
import hist
import pickle
from pathlib import Path

h1 = hist.Hist(hist.axis.Regular(2, -1, 1, name='h_1'))
h2 = h1.copy()
h2.name = 'h_2'

h1.fill(-0.5)
h2.fill(0.5)

# Arithmetic operators
h3 = h1 + h2
h2.name = 'h_3'
h4 = h3 * 2
h2.name = 'h_4'

print(f"{h4[0]}, {h4[1]}")

h4_saved = Path("pickle_file.pkl")

# Now save the histogram
with h4_saved.open("wb") as f:
    pickle.dump(h4, f, protocol=-1)

# And load
with h4_saved.open("rb") as f:
    h5 = pickle.load(f)

assert h4 == h5
print("Succeeded in pickling a histogram!")

# Delete the file to keep things tidy
# h4_saved.unlink()
2.0, 2.0
Succeeded in pickling a histogram!

aghast Integration

Though pickle is capable for serialization and deserialization, we sometimes need to use ROOT files for loading and saving data in HEP analysis. Initially, we expect to use aghast for Nino-hist to interactive with ROOT files. While it’s not pure Pythonic and not installable on PyPI, currently. Thus, here we first demonstrate the availability of aghast and then show how to save and load our histograms by aghast’s underlying uproot. aghast is a histogramming library that does not fill histograms and does not plot them. Its role is behind the scenes, to provide better communication between histogramming libraries.

As we need to make Nino-hist available to communicate with aghast, that is, to change aghast’s histograms to aghast’s histograms.

[2]:
import aghast
import numpy as np
import matplotlib.pyplot as plt

h = hist.Hist(hist.axis.Regular(50, -3, 3, name='x'))
h.fill(np.random.normal(size=1_000_000))
ghastly_hist = aghast.from_numpy(h.to_numpy())
ghastly_hist.dump()
Histogram(
  axis=[
    Axis(binning=RegularBinning(num=50, interval=RealInterval(low=-3.0, high=2.9999999999999996)))
  ],
  counts=
    WeightedCounts(
      sumw=
        InterpretedInlineFloat64Buffer(
          buffer=
              [  633.   841.  1273.  1817.  2324.  3077.  4077.  5304.  6775.  8417.
               10582. 12991. 15554. 18490. 21941. 25120. 28839. 31609. 35316. 38458.
               41635. 43466. 45603. 46843. 47655. 47754. 47388. 45594. 43733. 41031.
               38711. 35442. 32205. 28463. 24793. 21814. 18531. 15621. 12792. 10405.
                8297.  6747.  5341.  4026.  3103.  2294.  1776.  1233.   893.   627.])))

See, it works! We are looking forward to aghast’s update, and we can use the ghastly_hist for better use.

I am sure it will be a shortcut for file serialization and deserialization.

uproot Integration

Now that aghast need the more detailed development to support serialization and deserialization for our Nino-hist, we have to use uproot to demonstrate how can we save and load histograms in Nino-hist by ROOT files.

uproot is the most classical ROOT I/O in Python. It is a reader and a writer of the ROOT file format using only Python and Numpy. Unlike the standard C++ ROOT implementation, uproot is only an I/O library, primarily intended to stream data into machine learning libraries in Python. Unlike PyROOT and root_numpy, uproot does not depend on C++ ROOT. Instead, it uses Numpy to cast blocks of data from the ROOT file as Numpy arrays.

The first thing we need to do is to create Nino-hist’s histograms as normal.

[3]:
import hist
import uproot
import numpy as np
import matplotlib.pyplot as plt

file = uproot.recreate("demo_root_file.root")
h = hist.Hist(hist.axis.Regular(20, -3, 3, name='x'))
h.fill(np.random.normal(size=1_000))

w, data = h.to_numpy()

Here, we use a little tricks. First, we convert the Nino-hist’s histogram to a Numpy histogram. And then write the histogram as well as a title to the demo_root_file.root. We load the demo to another file, printing the title, showing the histogram.

[4]:
file['title'] = 'A histogram'
file['h'] = np.histogram(data[:-1], bins=data, weights=w)
file2 = uproot.open("demo_root_file.root")
print(file2['title'])
file2['h'].show()
A histogram
                 0                                                         134.4
                 +-------------------------------------------------------------+
[-inf, -3)   0   |                                                             |
[-3, -2.7)   3   |*                                                            |
[-2.7, -2.4) 6   |***                                                          |
[-2.4, -2.1) 7   |***                                                          |
[-2.1, -1.8) 19  |*********                                                    |
[-1.8, -1.5) 31  |**************                                               |
[-1.5, -1.2) 51  |***********************                                      |
[-1.2, -0.9) 57  |**************************                                   |
[-0.9, -0.6) 97  |********************************************                 |
[-0.6, -0.3) 104 |***********************************************              |
[-0.3, 0)    117 |*****************************************************        |
[0, 0.3)     128 |**********************************************************   |
[0.3, 0.6)   108 |*************************************************            |
[0.6, 0.9)   101 |**********************************************               |
[0.9, 1.2)   59  |***************************                                  |
[1.2, 1.5)   48  |**********************                                       |
[1.5, 1.8)   26  |************                                                 |
[1.8, 2.1)   20  |*********                                                    |
[2.1, 2.4)   8   |****                                                         |
[2.4, 2.7)   4   |**                                                           |
[2.7, 3)     2   |*                                                            |
[3, inf]     0   |                                                             |
                 +-------------------------------------------------------------+

Note that uproot is a pretty large package which supports many functionalities. Here, we just demonstate how to save and load ROOT files. You can go to the origin to see more information.

Histogram

Nino-hist

Nino-hist acts as the histogramming part, i.e., “Data” to “Histogram”, in HEP analysis workflow. It is based on boost-histogram, meaning that you can do whatever you like in boost-histogram with histograms of Nino-hist. Let’s see some examples.

First, we will re-use the 2D-density histogram example of boost-histogram.

[5]:
import hist
import numpy as np
import matplotlib.pyplot as plt

# Make a 2D histogram
h = hist.Hist(hist.axis.Regular(50, -3, 3, name='x'), hist.axis.Regular(50, -3, 3, name='y'))

# Fill with Gaussian random values
h.fill(np.random.normal(size=1_000_000), np.random.normal(size=1_000_000))

# Compute the areas of each bin
areas = np.prod(h.axes.widths, axis=0)

# Compute the density
density = h.view() / h.sum() / areas

# Get the edges
X, Y = h.axes.edges

# Make the plot
fig, ax = plt.subplots()
mesh = ax.pcolormesh(X.T, Y.T, density.T)
fig.colorbar(mesh)
plt.show()
../_images/user-guide_integration_17_0.png

Nice! This is good, because this means that you can migrate your boost-histogram codes to Nino-hist directly.

Another example:

[6]:
import hist
import boost_histogram as bh

# Make 1-d histogram with 5 logarithmic bins from 1e0 to 1e5
h = hist.Hist(
    hist.axis.Regular(5, 1e0, 1e5, name="x", transform=bh.axis.transform.log),
    storage=bh.storage.Weight(),
)

# Fill histogram with numbers
x = (2e0, 2e1, 2e2, 2e3, 2e4)

# Doing this several times so the variance is more interesting
h.fill(x, weight=1)
h.fill(x, weight=1)
h.fill(x, weight=1)
h.fill(x, weight=1)

# Iterate over bins and access bin counter
for idx, (lower, upper) in enumerate(h.axes[0]):
    val = h[idx]
    print(val)
    print(f"bin {idx} in [{lower:g}, {upper:g}): {val.value} +/- {val.variance**.5}")
WeightedSum(value=4, variance=4)
bin 0 in [1, 10): 4.0 +/- 2.0
WeightedSum(value=4, variance=4)
bin 1 in [10, 100): 4.0 +/- 2.0
WeightedSum(value=4, variance=4)
bin 2 in [100, 1000): 4.0 +/- 2.0
WeightedSum(value=4, variance=4)
bin 3 in [1000, 10000): 4.0 +/- 2.0
WeightedSum(value=4, variance=4)
bin 4 in [10000, 100000): 4.0 +/- 2.0

We haven’t define Storage and transform for Nino-hist yet, so you cannot directly use them in our package, but Nino-hist’s compatibility with boost-histogram is good, so you can use whatever you need by citing boost-histogram’s corresponding modules.

To sum up, Nino-hist acts as the core part in HEP analysis workflow, allowing us to histogram data. This is valuable for “Data” to “Histogram” in HEP analyse workflow!

Analysis

Here we are going to show how some (Sckit-HEP) analysis tools are integrated to our package.

iminuit Integration

iminuit is most commonly used for likelihood fits of models to data, and to get model parameter error estimates from likelihood profile analysis. It’s a numerical minimizer and error calculator. You provide it an analytical function, which accepts one or several parameters, and an initial guess of the parameter values. It will then find a local minimum of this function starting from the initial guess. In that regard, iminuit is similar to other minimizers, like those in scipy.optimize.

Let’s use an example to see how can iminuit fits the data in Nino-hist.

[7]:
import hist
import numpy as np
import matplotlib.pyplot as plt

k = 3.
b = 2.
data_x = np.arange(0, 10, .2)
offset = np.random.normal(size=50)
data_y = k * data_x + b + offset

plt.bar(data_x, data_y, width=0.1)

h = hist.Hist(
    hist.axis.Regular(20, 0, 10, name='iminuit', underflow=False, overflow=False)
)

h.fill(data_x, weight=data_y)
[7]:
Hist(Regular(20, 0, 10, underflow=False, overflow=False, metadata={'name': 'iminuit', 'title': None}), storage=Double()) # Sum: 844.3786162013363
../_images/user-guide_integration_24_1.png

By this way, we create and fill a histogram with the linear distribution, i.e., the weight of each bin is the multiple of its index. Of course you can also use a histogram loading by serialization. Back to the point, it’s a linear distribution, and we want to use a fitter to get its slope k and intercept b.

First, we need to access the data (the weight of histogram) and define the error func to be minimized.

[8]:
y, x = h.to_numpy()

def least_squares_np(par):  # par is a numpy array here
    mu = np.polyval(par, data_x)  # for par = (a, b) this is a line
    yvar = 0.01
    return np.sum((data_y - mu) ** 2 / yvar)

Then, we will initialized the parameters in an array (two elements in this linear case), the step size, and the error definition (errordef = 0.5 for negative log-likelihood functions, errordef = 1 for least-squares functions). You are able to see the minuit initial states use get_param_states().

[9]:
from iminuit import Minuit

m = Minuit.from_array_func(least_squares_np, (5, 5), error=(0.1, 0.1), errordef=1, name=('k', 'b'))
m.get_param_states()
[9]:
Name Value Hesse Error Minos Error- Minos Error+ Limit- Limit+ Fixed
0 k 5.00 0.10
1 b 5.00 0.10

And it’s time for fitting! It’s pretty handy in iminuit, just m.migrad() is ok. Of course you can use more specific methods like fixed, limits, fast fitting methods, etc. This is just the simplest way to fit.

[10]:
m.migrad()
# m.get_param_states()
[10]:
FCN = 5340 Ncalls = 34 (34 total)
EDM = 1.47E-19 (Goal: 1E-05) up = 1.0
Valid Min. Valid Param. Above EDM Reached call limit
True True False False
Hesse failed Has cov. Accurate Pos. def. Forced
False True True True False
Name Value Hesse Error Minos Error- Minos Error+ Limit- Limit+ Fixed
0 k 3.000 0.005
1 b 2.189 0.028

Nice! It’s almost true! Next, we want to visualized our fitting (the correlationship of two params and the params’ scopes).

We can analysis the correlationship of the params to eliminate some of them.

[11]:
m.matrix(correlation=True)
[11]:
k b
k 1.00 -0.86
b -0.86 1.00

We can also use contour to represent it.

[12]:
m.draw_mncontour('k','b', nsigma=5);  # draw five contours from sigma=1 to 5
# m.draw_contour('k','b');
../_images/user-guide_integration_36_0.png

We can also view the scope of those params.

[13]:
m.draw_profile('k');
# m.draw_profile('b');
../_images/user-guide_integration_38_0.png

To sum up, Nino-hist allow user to fit data by interacting with iminuit. Sepcifically, you need to convert your histogram to numpy format, and then iminuit will deal with that. With numpy arrays as the media, it’s flexible. You can do whatever you want in iminuit with the histograms of Nino-hist. This is valuable for “Histogram” to “Analysis” in HEP analyse workflow!

Visualization

You can use many tool for Nino-hist’s visualization, because it can be converted to numpy arrays and there are many visualization tools for numpy arrays I firmly believe. But there, our primary focus is its interation with mplhep.

mplhep Integration

Although we can use Matplotlib to visualize our histograms, it should be a better way to do the same thing in a, umm… (Sckit-HEP) manner. mplhep is a set of helpers for matplotlib to more easily produce plots typically needed in HEP as well as style them in way that’s compatible with current collaboration requirements (ROOT).

Let’s use an example to see how mplhep visualizes the histograms in Nino-hist.

First, create 1-D and 2-D histograms.

[14]:
import hist
import numpy as np
import matplotlib.pyplot as plt

data_x = np.random.normal(loc=0., scale=1., size=1_000_000)
data_y = np.random.normal(loc=0., scale=1., size=1_000_000)

h_1d = hist.Hist(
    hist.axis.Regular(500, -3, 3, name='x_1d', underflow=False, overflow=False)
)

h_1d.fill(data_x)

h_2d = hist.Hist(
    hist.axis.Regular(500, -3, 3, name='x_2d', underflow=False, overflow=False),
    hist.axis.Regular(500, -3, 3, name='y_2d', underflow=False, overflow=False)
)

h_2d.fill(data_x, data_y)
[14]:
Hist(
  Regular(500, -3, 3, underflow=False, overflow=False, metadata={'name': 'x_2d', 'title': None}),
  Regular(500, -3, 3, underflow=False, overflow=False, metadata={'name': 'y_2d', 'title': None}),
  storage=Double()) # Sum: 994660.0

Then, we can plot our histogram. (mplhep is in its infancy, with a few very single features, but I’m sure there will be better interfaces later.)

[15]:
import mplhep

# plt.style.use(mplhep.style.LHCb)
plt.subplot(121)
w, xbins = h_1d.to_numpy()
mplhep.histplot(w.T, xbins)

plt.subplot(122)
w, xbins, ybins = h_2d.to_numpy()
mplhep.hist2dplot(w.T, xbins, ybins)
[15]:
<matplotlib.axes._subplots.AxesSubplot at 0x12d2a5790>
../_images/user-guide_integration_44_1.png

To sum up, Nino-hist allow user to visualize data by interacting with mplhep. Sepcifically, you need to convert your histogram to numpy format, and then mplhep will deal with that. With numpy arrays as the media, it’s flexible. You can do whatever you want in mplhep with the histograms of Nino-hist. This is valuable for “Analysis” to “Visualization” in HEP analyse workflow!