Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Variance extractor #2543

Merged
merged 20 commits into from
Jul 31, 2024
Merged

Variance extractor #2543

merged 20 commits into from
Jul 31, 2024

Conversation

ctoennis
Copy link

This pull request is to add a new ImageExtractor subclass to generate images from the variance of waveforms that will be needed for the pointing calibration using stars. This request is related to issue #2542 .

@ctoennis ctoennis requested a review from fwerner April 24, 2024 15:14
@ctoennis ctoennis self-assigned this Apr 24, 2024
@kosack
Copy link
Contributor

kosack commented Apr 24, 2024

If I understand correctly, what you are basically doing here is computing an event-wise pedestal variance. In the data model (see the original top-level data model document), there are a few places you can put pedestals:

  • dl1.event.telescope.pedestal: pedestals measured for each event (usually in a time window as @maxnoe mentioned, since you can for example just use time slices outside the trace integration window, even for Cherenkov events). This is also computed by some cameras directly in hardware as dl0 information.
  • dl1.monitoring.telescope.pedestal: is the pedestal computed from an ensemble of pedestal events, or from Cherenkov events with the shower cut out (this is also in the DL0.monitoring.telescope.pedestal, computed by the cameras) computed every few seconds.
  • dl1.service.telescope.pedestal: pedestals measured from some standard run that are applied for e.g. RTA

So I think what you are computing is the first one, and you should store it as pedestal information: It is in fact a second measurement of the pedestal and its variance, and could be compared to the other methods (even if the use case you are describing here is for doing star tracking, star tracking is computed from the pedestal variances, which can be over many time scales)

@fwerner
Copy link
Contributor

fwerner commented Apr 25, 2024

In addition to what has been said I'd suggest to clarify the purpose of this extractor: it is calculating the waveform sample variance, which should be reflected in the name/docstring to communicate that it has a complicated relationship to charge.

@ctoennis
Copy link
Author

The thing we want to calculate is the variance of the waveforms specifically of interleaved pedestal events and not the variance of the pedestal in units of charge.

@kosack : What we get is a sort of image based on the variance of the waveform in each pixel. What we get is not a pedestal value, so putting it in a pedestal container is not the best option i think.

I think it is best to write a container appropriate for these variance images similar to the DL1CameraContainer

@ctoennis
Copy link
Author

I just added a custom continer for the variance images. I also compared the variance calculation using just a plain variance and a variance of integrated chunks of the waveforms using the FixedWindowSum extractor in the plot below.
comparemethod

The x axis shows the number of images used in calculating an average variance value that is shown on the y axis.

Though the methods are generally statistically equivalent the average variance becomes stable with fewer images when using a simple variance over the variance from integrated chunks.

@maxnoe
Copy link
Member

maxnoe commented Apr 26, 2024

Though the methods are generally statistically equivalent the average variance becomes stable with fewer images when using a simple variance over the variance from integrated chunks.

That's not surprising, since you have just more samples. But you change of what you compute the variance

@ctoennis
Copy link
Author

Hi all, can we restart the discussion here? I made a small update to the extractor to contain timing information that will later be needed in the corresponding subclass of StatisticsExtractor that i am working on with Tjark.

There was a decision needed as to where the variance images will be attached to. Should this find a place somewhere in dl1.event.telescope.pedestal or something similar?

@maxnoe
Copy link
Member

maxnoe commented May 29, 2024

Hi @ctoennis the lint step in the CI is failing due to some typos and wrong formatting.

To check locallly, make sure to install the pre-commit hooks:
https://ctapipe.readthedocs.io/en/latest/developer-guide/getting-started.html#installing-ctapipe-in-development-mode

This comment has been minimized.

This comment has been minimized.

@ctoennis
Copy link
Author

ctoennis commented Jun 3, 2024

i fixed all the isses with the spelling, code checks etc.

src/ctapipe/containers.py Outdated Show resolved Hide resolved
src/ctapipe/containers.py Outdated Show resolved Hide resolved
src/ctapipe/containers.py Outdated Show resolved Hide resolved
src/ctapipe/containers.py Outdated Show resolved Hide resolved
@ctoennis
Copy link
Author

i can remove trigger time. But i am not sure how to proceed on the difference to the ImageExtractor API. I could make it that the variance image is stored in a DL1ImageContainer and that it complies with the API or make it a separate class that doesn't inherit from the ImageExtractor.

I think what i generate here is a type of image as i try to look for stars in the camera FOV with them. But if it makes more sense to you it can be separate from the ImageExtractors.

@maxnoe
Copy link
Member

maxnoe commented Jul 10, 2024

The question is if it is possible to adhere to the API that is expected from the image extractors: if yes, no problem, let it be an image extractor. But that means for example the call above should work and correctly write out the data to the HDF5 file.

If you need the different API, it should be its own thing and we need to think about how to integrate it into the process tool for example.

@kosack
Copy link
Contributor

kosack commented Jul 10, 2024

The question is if it is possible to adhere to the API that is expected from the image extractors: if yes, no problem, let it be an image extractor. But that means for example the call above should work and correctly write out the data to the HDF5 file.

I think what this class does is not the same as an ImageExtractor, or am I missing something? ImageExtractors take a "movie" (a set of waveforms per pixel for one event) for a single event and convert them to set of time-independent images that must include the peak time and integral over a window, but can also optionally inlcude any other parameters of the waveform like the pulse width or variabce. Generally, there is only one ImageExtractor that is run at a during processing. That doesn't seem to be what you have here.

The idea of the ImageExtractors was that low-level routines like those used to compute peak time or integral are implemeted as basic functions taking numpy inpus, and the ImageExtractor composes several of those methods into one set of algorithms. So what I would expect here is more like a function to compute the variance, and then that function is added to one or more of the existing ImageExtractors, perhaps as an option. So that then the chosen ImageExtractor generates the required peak time and integral, but also the variance

@maxnoe
Copy link
Member

maxnoe commented Jul 10, 2024

I think what this class does is not the same as an ImageExtractor, or am I missing something?

You could think of this as an ImageExtractor that uses the variance of each waveform to produce the image and doesn't produce a peak_time.

@mexanick
Copy link
Contributor

I believe, it is an ImageExtractor, the __call__() signature is of course wrong here. It should copy the one of the base extractor:

@abstractmethod
    def __call__(
        self, waveforms, tel_id, selected_gain_channel, broken_pixels
    ) -> DL1CameraContainer:

Only waveforms are then used, the other parameters are simply ignored.

The only issue, that has to be resolved, is the return type. DL1CameraContainer in its current shape contains a number of fields that are not needed in this case (actually, everything except image).

I'd propose to add an intermediate container:

class CameraImageContainer(Container):
    """
    Storage of output of generic camera images (any reduction of waveforms).
    """

    image = Field(
        None,
        "Numpy array of camera image, after waveform extraction."
        "Shape: (n_pixel) if n_channels is 1 or data is gain selected"
        "else: (n_channels, n_pixel)",
    )

class DL1CameraContainer(CameraImageContainer):
    """
    Storage of output of camera calibration e.g the final calibrated
    image in intensity units and the pulse time.
    """
    peak_time = Field(
        None,
        "Numpy array containing position of the peak of the pulse as determined by "
        "the extractor."
        "Shape: (n_pixel) if n_channels is 1 or data is gain selected"
        "else: (n_channels, n_pixel)",
    )
    # rest of the DL1CameraContainer fields
    
class VarianceContainer(CameraImageContainer):
    """
    Storage of output of camera variance image.

    Stores the variance of waveform in each pixel, composed as an image.
    """
    variance_method = Field(
        VarianceType.SIMPLE,
        "Method by which the variance was calculated"
        "This can either be a plain variance"
        "or a variance of integrated samples",
        type=VarianceType,
    )
    # I don't think you need anything else here. The trigger time you extract from elsewhere in the event tree.

and change the return type of base ImageExtractor to CameraImageContainer

@maxnoe
Copy link
Member

maxnoe commented Jul 12, 2024

If it's just that the VarianceExtractor doesn't provide a peak_time, I think that's fine, just leave it None (the default value).

Adding an additional field to the DL1CameraContainer that's an "image_type" sounds like a good idea.

I would avoid having extractors that have different return types.

maxnoe
maxnoe previously approved these changes Jul 19, 2024

This comment has been minimized.

@ctoennis
Copy link
Author

So, is this request then good to merge?

@maxnoe
Copy link
Member

maxnoe commented Jul 22, 2024

@ctoennis From my side yes, but we require two approvals and @kosack is on vacation right now. Since he had comments / requests for changes before, I'd like to wait for his approval here.

@maxnoe maxnoe requested a review from kosack July 22, 2024 08:44
@kosack
Copy link
Contributor

kosack commented Jul 29, 2024

I'll take a look tomorrow or today hopefully! Might take a bit of time since there are a lot of changes.

Copy link
Contributor

@kosack kosack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. but I would like to have a test that checks for correctness. It is a simple algorithm, but the test will also ensure that nobody breaks it in the future. E.g. a test that runs the extractor on some dummy data with a known variance.

@ctoennis
Copy link
Author

i added a test to check that an actual variance is being calculated.

Copy link

Passed

Analysis Details

0 Issues

  • Bug 0 Bugs
  • Vulnerability 0 Vulnerabilities
  • Code Smell 0 Code Smells

Coverage and Duplications

  • Coverage 100.00% Coverage (93.60% Estimated after merge)
  • Duplications 0.00% Duplicated Code (0.60% Estimated after merge)

Project ID: cta-observatory_ctapipe_AY52EYhuvuGcMFidNyUs

View in SonarQube

@kosack kosack merged commit cce22f3 into main Jul 31, 2024
12 checks passed
@kosack kosack deleted the VarianceExtractor branch July 31, 2024 14:44
@maxnoe maxnoe added this to the 0.22.0 milestone Sep 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants