Skip to content

Federated ProxQuasiNewton

Necessary mixin and utils for prox newton method.

fed_PQN

FedProxQuasiNewton

Bases: LocMakeFedPQNFisherGradientNLL, AggChooseStepComputeAscentDirection

Mixin class to implement a Prox Newton method for box constraints.

It implements the method presented here: https://www.cs.utexas.edu/~inderjit/public_papers/pqnj_sisc10.pdf More context can be found here https://optml.mit.edu/papers/sksChap.pdf

Methods:

Name Description
run_fed_PQN

The method to run the Prox Quasi Newton algorithm. It relies on the methods inherited from the LocMakeFedPQNFisherGradientNLL and AggChooseStepComputeAscentDirection classes.

Source code in fedpydeseq2/core/fed_algorithms/fed_PQN/fed_PQN.py
class FedProxQuasiNewton(
    LocMakeFedPQNFisherGradientNLL, AggChooseStepComputeAscentDirection
):
    """Mixin class to implement a Prox Newton method for box constraints.

    It implements the method presented here:
    https://www.cs.utexas.edu/~inderjit/public_papers/pqnj_sisc10.pdf
    More context can be found here
    https://optml.mit.edu/papers/sksChap.pdf

    Methods
    -------
    run_fed_PQN
        The method to run the Prox Quasi Newton algorithm.
        It relies on the methods inherited from the LocMakeFedPQNFisherGradientNLL and
        AggChooseStepComputeAscentDirection classes.

    """

    PQN_num_iters: int

    def run_fed_PQN(
        self,
        train_data_nodes,
        aggregation_node,
        local_states,
        PQN_shared_state,
        first_iteration_mode: Literal["irls_catch"] | None,
        round_idx,
        clean_models,
        refit_mode: bool = False,
    ):
        """Run the Prox Quasi Newton  algorithm.

        Parameters
        ----------
        train_data_nodes: list
            List of TrainDataNode.

        aggregation_node: AggregationNode
            The aggregation node.

        local_states: dict
            Local states. Required to propagate intermediate results.

        PQN_shared_state: dict
            The input shared state.
            The requirements for this shared state are defined in the
            LocMakeFedPQNFisherGradientNLL class and depend on the
            first_iteration_mode.

        first_iteration_mode: Optional[Literal["irls_catch"]]
            The first iteration mode.
            This defines the input requirements for the algorithm, and is passed
            to the make_local_fisher_gradient_nll method at the first iteration.

        round_idx: int
            The current round.

        clean_models: bool
            If True, the models are cleaned.

        refit_mode: bool
            Whether to run on `refit_adata`s instead of `local_adata`s.
            (default: False).

        Returns
        -------
        local_states: dict
            Local states. Required to propagate intermediate results.

        irls_final_shared_states: dict
            Shared states containing the final IRLS results.
            It contains nothing for now.

        round_idx: int
            The updated round index.

        """
        #### ---- Main training loop ---- #####

        for pqn_iter in range(self.PQN_num_iters + 1):
            # ---- Compute local IRLS summands and nlls ---- #

            (
                local_states,
                local_fisher_gradient_nlls_shared_states,
                round_idx,
            ) = local_step(
                local_method=self.make_local_fisher_gradient_nll,
                method_params={
                    "first_iteration_mode": first_iteration_mode
                    if pqn_iter == 0
                    else None,
                    "refit_mode": refit_mode,
                },
                train_data_nodes=train_data_nodes,
                output_local_states=local_states,
                round_idx=round_idx,
                input_local_states=local_states,
                input_shared_state=PQN_shared_state,
                aggregation_id=aggregation_node.organization_id,
                description="Compute local Prox Newton summands and nlls.",
                clean_models=clean_models,
            )

            # ---- Compute global IRLS update and nlls ---- #

            PQN_shared_state, round_idx = aggregation_step(
                aggregation_method=self.choose_step_and_compute_ascent_direction,
                train_data_nodes=train_data_nodes,
                aggregation_node=aggregation_node,
                input_shared_states=local_fisher_gradient_nlls_shared_states,
                round_idx=round_idx,
                description="Update the log fold changes and nlls in IRLS.",
                clean_models=clean_models,
            )

        #### ---- End of training ---- ####

        return local_states, PQN_shared_state, round_idx

run_fed_PQN(train_data_nodes, aggregation_node, local_states, PQN_shared_state, first_iteration_mode, round_idx, clean_models, refit_mode=False)

Run the Prox Quasi Newton algorithm.

Parameters:

Name Type Description Default
train_data_nodes

List of TrainDataNode.

required
aggregation_node

The aggregation node.

required
local_states

Local states. Required to propagate intermediate results.

required
PQN_shared_state

The input shared state. The requirements for this shared state are defined in the LocMakeFedPQNFisherGradientNLL class and depend on the first_iteration_mode.

required
first_iteration_mode Literal['irls_catch'] | None

The first iteration mode. This defines the input requirements for the algorithm, and is passed to the make_local_fisher_gradient_nll method at the first iteration.

required
round_idx

The current round.

required
clean_models

If True, the models are cleaned.

required
refit_mode bool

Whether to run on refit_adatas instead of local_adatas. (default: False).

False

Returns:

Name Type Description
local_states dict

Local states. Required to propagate intermediate results.

irls_final_shared_states dict

Shared states containing the final IRLS results. It contains nothing for now.

round_idx int

The updated round index.

Source code in fedpydeseq2/core/fed_algorithms/fed_PQN/fed_PQN.py
def run_fed_PQN(
    self,
    train_data_nodes,
    aggregation_node,
    local_states,
    PQN_shared_state,
    first_iteration_mode: Literal["irls_catch"] | None,
    round_idx,
    clean_models,
    refit_mode: bool = False,
):
    """Run the Prox Quasi Newton  algorithm.

    Parameters
    ----------
    train_data_nodes: list
        List of TrainDataNode.

    aggregation_node: AggregationNode
        The aggregation node.

    local_states: dict
        Local states. Required to propagate intermediate results.

    PQN_shared_state: dict
        The input shared state.
        The requirements for this shared state are defined in the
        LocMakeFedPQNFisherGradientNLL class and depend on the
        first_iteration_mode.

    first_iteration_mode: Optional[Literal["irls_catch"]]
        The first iteration mode.
        This defines the input requirements for the algorithm, and is passed
        to the make_local_fisher_gradient_nll method at the first iteration.

    round_idx: int
        The current round.

    clean_models: bool
        If True, the models are cleaned.

    refit_mode: bool
        Whether to run on `refit_adata`s instead of `local_adata`s.
        (default: False).

    Returns
    -------
    local_states: dict
        Local states. Required to propagate intermediate results.

    irls_final_shared_states: dict
        Shared states containing the final IRLS results.
        It contains nothing for now.

    round_idx: int
        The updated round index.

    """
    #### ---- Main training loop ---- #####

    for pqn_iter in range(self.PQN_num_iters + 1):
        # ---- Compute local IRLS summands and nlls ---- #

        (
            local_states,
            local_fisher_gradient_nlls_shared_states,
            round_idx,
        ) = local_step(
            local_method=self.make_local_fisher_gradient_nll,
            method_params={
                "first_iteration_mode": first_iteration_mode
                if pqn_iter == 0
                else None,
                "refit_mode": refit_mode,
            },
            train_data_nodes=train_data_nodes,
            output_local_states=local_states,
            round_idx=round_idx,
            input_local_states=local_states,
            input_shared_state=PQN_shared_state,
            aggregation_id=aggregation_node.organization_id,
            description="Compute local Prox Newton summands and nlls.",
            clean_models=clean_models,
        )

        # ---- Compute global IRLS update and nlls ---- #

        PQN_shared_state, round_idx = aggregation_step(
            aggregation_method=self.choose_step_and_compute_ascent_direction,
            train_data_nodes=train_data_nodes,
            aggregation_node=aggregation_node,
            input_shared_states=local_fisher_gradient_nlls_shared_states,
            round_idx=round_idx,
            description="Update the log fold changes and nlls in IRLS.",
            clean_models=clean_models,
        )

    #### ---- End of training ---- ####

    return local_states, PQN_shared_state, round_idx

substeps

AggChooseStepComputeAscentDirection

Mixin class to compute the right ascent direction.

An ascent direction is a direction that is positively correlated to the gradient. This direction will be used to compute the next iterate in the proximal quasi newton algorithm. As our aim will be to mimimize the negative log likelihood, we will move in the opposite direction, that is in the direction of minus the ascent direction.

Attributes:

Name Type Description
num_jobs int

The number of cpus to use.

joblib_verbosity int

The joblib verbosity.

joblib_backend str

The backend to use for the IRLS algorithm.

irls_batch_size int

The batch size to use for the IRLS algorithm.

max_beta float

The maximum value for the beta parameter.

beta_tol float

The tolerance for the beta parameter.

PQN_num_iters_ls int

The number of iterations to use for the line search.

PQN_c1 float

The c1 parameter for the line search.

PQN_ftol float

The ftol parameter for the line search.

PQN_num_iters int

The number of iterations to use for the proximal quasi newton algorithm.

Methods:

Name Description
choose_step_and_compute_ascent_direction

A remote method. Choose the best step size and compute the next ascent direction.

Source code in fedpydeseq2/core/fed_algorithms/fed_PQN/substeps.py
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
class AggChooseStepComputeAscentDirection:
    """Mixin class to compute the right ascent direction.

    An ascent direction is a direction that is positively correlated to the gradient.
    This direction will be used to compute the next iterate in the proximal quasi newton
    algorithm. As our aim will be to mimimize the negative log likelihood, we will
    move in the opposite direction, that is in the direction of minus the
    ascent direction.

    Attributes
    ----------
    num_jobs : int
        The number of cpus to use.
    joblib_verbosity : int
        The joblib verbosity.
    joblib_backend : str
        The backend to use for the IRLS algorithm.
    irls_batch_size : int
        The batch size to use for the IRLS algorithm.
    max_beta : float
        The maximum value for the beta parameter.
    beta_tol : float
        The tolerance for the beta parameter.
    PQN_num_iters_ls : int
        The number of iterations to use for the line search.
    PQN_c1 : float
        The c1 parameter for the line search.
    PQN_ftol : float
        The ftol parameter for the line search.
    PQN_num_iters : int
        The number of iterations to use for the proximal quasi newton algorithm.

    Methods
    -------
    choose_step_and_compute_ascent_direction
        A remote method.
        Choose the best step size and compute the next ascent direction.
    """

    num_jobs: int
    joblib_verbosity: int
    joblib_backend: str
    irls_batch_size: int
    max_beta: float
    beta_tol: float
    PQN_num_iters_ls: int
    PQN_c1: float
    PQN_ftol: float
    PQN_num_iters: int

    @remote
    @log_remote
    def choose_step_and_compute_ascent_direction(
        self, shared_states: list[dict]
    ) -> dict[str, Any]:
        """Choose best step size and compute next ascent direction.

        By "ascent direction", we mean the direction that is positively correlated
        with the gradient.

        The role of this function is twofold.

        1) It chooses the best step size for each gene, and updates the beta values
        as well as the nll values. This allows to define the next iterate.
        Note that at the first iterate, it simply computes the nll, gradient and fisher
        information at the current beta values, to define the next ascent direction.

        2) For this new iterate (or the current one if we are at the first round),
        it computes the gradient scaling matrix, which is used to scale the gradient
        in the proximal newton algorithm. From this gradient scaling matrix, and the
        gradient, it computes the ascent direction (and the newton decrement).


        Parameters
        ----------
        shared_states: list[dict]
            A list of dictionaries containing the following
            keys:
            - beta: ndarray
                The log fold changes, of shape (n_non_zero_genes, n_params).
            - local_nll: ndarray
                The local nll, of shape (n_genes,), where
                n_genes is the current number of genes that are active (True
                in the PQN_mask).
            - local_fisher: ndarray
                The local fisher matrix, of shape (n_genes, n_params, n_params).
            - local_gradient: ndarray
                The local gradient, of shape (n_genes, n_params).
            - PQN_diverged_mask: ndarray
                A boolean mask indicating if the gene has diverged in the prox newton
                algorithm, of shape (n_non_zero_genes,).
            - PQN_mask: ndarray
                A boolean mask indicating if the gene should be used for the
                proximal newton step, of shape (n_non_zero_genes,).
            - global_reg_nll: ndarray
                The global regularized nll, of shape (n_non_zero_genes,).
            - newton_decrement_on_mask: Optional[ndarray]
                The newton decrement, of shape (n_ngenes,).
                This is None at the first round of the prox newton algorithm.
            - round_number_PQN: int
                The current round number of the prox newton algorithm.
            - ascent_direction_on_mask: Optional[ndarray]
                The ascent direction, of shape (n_genes, n_params), where
                n_genes is the current number of genes that are active (True
                in the PQN_mask).
            - irls_diverged_mask: ndarray
                A boolean mask indicating if the gene has diverged in the IRLS
                algorithm.

        Returns
        -------
        dict[str, Any]
            A dictionary containing all the necessary info to run the method.
            If we are not at the last iteration, it contains the following fields:
            - beta: ndarray
                The log fold changes, of shape (n_non_zero_genes, n_params).
            - PQN_mask: ndarray
                A boolean mask indicating if the gene should be used for the
                proximal newton step.
                It is of shape (n_non_zero_genes,)
            - PQN_diverged_mask: ndarray
                A boolean mask indicating if the gene has diverged in the prox newton
                algorithm. It is of shape (n_non_zero_genes,)
            - ascent_direction_on_mask: np.ndarray
                The ascent direction, of shape (n_genes, n_params), where
                n_genes is the current number of genes that are active (True
                in the PQN_mask).
            - newton_decrement_on_mask: np.ndarray
                The newton decrement, of shape (n_ngenes,).
            - round_number_PQN: int
                The current round number of the prox newton algorithm.
            - global_reg_nll: ndarray
                The global regularized nll, of shape (n_non_zero_genes,).
            - irls_diverged_mask: ndarray
                A boolean mask indicating if the gene has diverged in the IRLS
                algorithm.
            If we are at the last iteration, it contains the following fields:
            - beta: ndarray
                The log fold changes, of shape (n_non_zero_genes, n_params).
            - PQN_diverged_mask: ndarray
                A boolean mask indicating if the gene has diverged in the prox newton
                algorithm. It is of shape (n_non_zero_genes,)
            - irls_diverged_mask: ndarray
                A boolean mask indicating if the gene has diverged in the IRLS
                algorithm.

        """
        # Wwe use the following naming convention: when we say "on mask", we mean
        # that we restrict the quantity to the genes that are active in the proximal
        # newton
        # algorithm. We therefore need to ensure that these quantities are readjusted
        # when we change the proximal quasi newton mask.

        # Load main params from the first state
        beta = shared_states[0]["beta"]
        PQN_diverged_mask = shared_states[0]["PQN_diverged_mask"]
        PQN_mask = shared_states[0]["PQN_mask"]
        reg_nll = shared_states[0]["global_reg_nll"]
        ascent_direction_on_mask = shared_states[0]["ascent_direction_on_mask"]
        newton_decrement_on_mask = shared_states[0]["newton_decrement_on_mask"]
        round_number_PQN = shared_states[0]["round_number_PQN"]

        reg_parameter = 1e-6

        # ---- Step 0: Aggregate the nll, gradient and fisher info ---- #

        new_fisher_options_on_mask = sum(
            [state["local_fisher"] for state in shared_states]
        )

        new_gradient_options_on_mask = sum(
            [state["local_gradient"] for state in shared_states]
        )
        new_reg_nll_options_on_mask = sum(
            [state["local_nll"] for state in shared_states]
        )

        # ---- Step 1: Add the regularization term ---- #

        # ---- Step 1a: Compute the new beta options ---- #

        # In order to regularize, we have to compute the beta values at which
        # the nll, gradient and fisher informations were evaluated in the local steps.

        beta_on_mask = beta[PQN_mask]

        if round_number_PQN == 0:
            # In this case, there is no line search, and only
            # beta is considered in the local steps.
            new_beta_options_on_mask = beta_on_mask[None, :]

        else:
            # In this case, there is a line search, and we have to
            # compute the new beta options
            assert ascent_direction_on_mask is not None
            step_sizes = 0.5 ** np.arange(self.PQN_num_iters_ls)
            new_beta_options_on_mask = np.clip(
                beta_on_mask[None, :, :]
                - step_sizes[:, None, None] * ascent_direction_on_mask[None, :, :],
                -self.max_beta,
                self.max_beta,
            )

        # ---- Step 1b: Add the regularization ---- #

        # Add a regularization term to fisher info

        if new_fisher_options_on_mask is not None:
            # Add the regularization term to construct the Fisher info with prior
            # from the Fisher info without prior
            cross_term = (
                new_gradient_options_on_mask[:, :, :, None]
                @ new_beta_options_on_mask[:, :, None, :]
            )
            beta_term = (
                new_beta_options_on_mask[:, :, :, None]
                @ new_beta_options_on_mask[:, :, None, :]
            )
            new_fisher_options_on_mask += (
                reg_parameter * (cross_term + cross_term.transpose(0, 1, 3, 2))
                + reg_parameter**2 * beta_term
            )

            # Furthermore, add a ridge term to the Fisher info for numerical stability
            # This factor decreases log linearly between and initial and final reg
            # The decreasing factor is to ensure that the first steps correspond to
            # gradient descent steps, as we are too far from the optimum
            # to use the Fisher info.
            # Note that other schemes seem to work as well: 1 for 20 iterations then
            # 1e-6
            # 1 for 20 iterations then 1e-2 (to confirm), or 1 for 20 iterations and
            # then
            # 1/n_samples.
            initial_reg_fisher = 1
            final_reg_fisher = 1e-6
            reg_fisher = initial_reg_fisher * (
                final_reg_fisher / initial_reg_fisher
            ) ** (round_number_PQN / self.PQN_num_iters)

            new_fisher_options_on_mask = (
                new_fisher_options_on_mask
                + np.diag(np.repeat(reg_fisher, new_fisher_options_on_mask.shape[-1]))[
                    None, None, :, :
                ]
            )

        # Add regularization term to gradient
        new_gradient_options_on_mask += reg_parameter * new_beta_options_on_mask

        # Add regularization term to the nll
        new_reg_nll_options_on_mask += (
            0.5 * reg_parameter * np.sum(new_beta_options_on_mask**2, axis=2)
        )

        # ---- Step 2: Compute best step size, and new values for this step size ---- #

        # This is only done if we are not at the first round of the prox newton
        # algorithm, as the first rounds serves only to evaluate the nll, gradient
        # and fisher info at the current beta values, and compute the first
        # ascent direction.

        if round_number_PQN > 0:
            # ---- Step 2a: See which step sizes pass the selection criteria ---- #

            assert reg_nll is not None
            reg_nll_on_mask = reg_nll[PQN_mask]

            obj_diff_options_on_mask = (
                reg_nll_on_mask[None, :] - new_reg_nll_options_on_mask
            )  # of shape n_steps, n_PQN_genes

            step_sizes = 0.5 ** np.arange(self.PQN_num_iters_ls)

            # Condition 1: Armijo condition
            # This condition is also called the first Wolfe condition.
            # Reference https://web.stanford.edu/~boyd/cvxbook/bv_cvxbook.pdf
            admissible_step_size_options_mask = (
                obj_diff_options_on_mask
                >= self.PQN_c1 * step_sizes[:, None] * newton_decrement_on_mask[None, :]
            )

            # ---- Step 2b: Identify genes that have diverged, and remove them ---- #

            # For each gene, we check if there is at least one step size that satisfies
            # the selection criteria. If there is none, we consider that the gene has
            # diverged: we remove all such genes from the PQN_mask
            # and add them to the PQN_diverged_mask.

            diverged_gene_mask_in_current_PQN_mask = np.all(
                ~admissible_step_size_options_mask, axis=0
            )

            # Remove these diverged genes for which we cannot find
            # a correct step size

            PQN_diverged_mask[PQN_mask] = diverged_gene_mask_in_current_PQN_mask
            PQN_mask[PQN_mask] = ~diverged_gene_mask_in_current_PQN_mask

            # Restrict all the quantities defined on the prox newton
            # mask to the new prox newton mask

            obj_diff_options_on_mask = obj_diff_options_on_mask[
                :, ~diverged_gene_mask_in_current_PQN_mask
            ]
            reg_nll_on_mask = reg_nll_on_mask[~diverged_gene_mask_in_current_PQN_mask]
            beta_on_mask = beta_on_mask[~diverged_gene_mask_in_current_PQN_mask]
            admissible_step_size_options_mask = admissible_step_size_options_mask[
                :, ~diverged_gene_mask_in_current_PQN_mask
            ]
            new_reg_nll_options_on_mask = new_reg_nll_options_on_mask[
                :, ~diverged_gene_mask_in_current_PQN_mask
            ]
            new_gradient_options_on_mask = new_gradient_options_on_mask[
                :, ~diverged_gene_mask_in_current_PQN_mask, :
            ]
            new_beta_options_on_mask = new_beta_options_on_mask[
                :, ~diverged_gene_mask_in_current_PQN_mask, :
            ]

            new_fisher_options_on_mask = new_fisher_options_on_mask[
                :, ~diverged_gene_mask_in_current_PQN_mask, :, :
            ]

            # ---- Step 2c: Find the best step size for each gene ---- #

            # Here, we find the best step size for each gene that satisfies the
            # selection criteria (i.e. the largest).
            # We do this by finding the first index for which
            # the admissible step size mask is True.
            # We then create the new beta, gradient, fisher info and reg nll by
            # taking the option corresponding to the best step size

            new_step_size_index = np.argmax(admissible_step_size_options_mask, axis=0)
            arange_PQN = np.arange(len(new_step_size_index))

            new_beta_on_mask = new_beta_options_on_mask[new_step_size_index, arange_PQN]
            new_gradient_on_mask = new_gradient_options_on_mask[
                new_step_size_index, arange_PQN
            ]
            new_fisher_on_mask = new_fisher_options_on_mask[
                new_step_size_index, arange_PQN
            ]

            obj_diff_on_mask = obj_diff_options_on_mask[new_step_size_index, arange_PQN]

            new_reg_nll_on_mask = new_reg_nll_options_on_mask[
                new_step_size_index, arange_PQN
            ]

            # ---- Step 2d: Update the beta values and the reg_nll values ---- #

            beta[PQN_mask] = new_beta_on_mask
            reg_nll[PQN_mask] = new_reg_nll_on_mask

            # ---- Step 2e: Check for convergence of the method ---- #

            convergence_mask = (
                np.abs(obj_diff_on_mask)
                / (
                    np.maximum(
                        np.maximum(
                            np.abs(new_reg_nll_on_mask),
                            np.abs(reg_nll_on_mask),
                        ),
                        1,
                    )
                )
                < self.PQN_ftol
            )

            # ---- Step 2f: Remove converged genes from the mask ---- #
            PQN_mask[PQN_mask] = ~convergence_mask

            # If we reach the max number of iterations, we stop
            if round_number_PQN == self.PQN_num_iters:
                # In this case, we are finished.
                return {
                    "beta": beta,
                    "PQN_diverged_mask": PQN_diverged_mask | PQN_mask,
                    "irls_diverged_mask": shared_states[0]["irls_diverged_mask"],
                }

            # We restrict all quantities to the new mask

            new_gradient_on_mask = new_gradient_on_mask[~convergence_mask]
            new_beta_on_mask = new_beta_on_mask[~convergence_mask]
            new_fisher_on_mask = new_fisher_on_mask[~convergence_mask]

            # Note, this is the old beta
            beta_on_mask = beta_on_mask[~convergence_mask]

        else:
            # In this case, we are at the first round of the prox newton algorithm
            # In this case, we simply instantiate the new values to the first
            # values that were computed in the local steps, to be able to compute
            # the first ascent direction.
            beta_on_mask = None
            new_gradient_on_mask = new_gradient_options_on_mask[0]
            new_beta_on_mask = new_beta_options_on_mask[0]
            new_fisher_on_mask = new_fisher_options_on_mask[0]

            # Set the nll
            reg_nll[PQN_mask] = new_reg_nll_options_on_mask[0]

        # ---- Step 3: Compute the gradient scaling matrix ---- #

        gradient_scaling_matrix_on_mask = compute_gradient_scaling_matrix_fisher(
            fisher=new_fisher_on_mask,
            backend=self.joblib_backend,
            num_jobs=self.num_jobs,
            joblib_verbosity=self.joblib_verbosity,
            batch_size=self.irls_batch_size,
        )

        # ---- Step 4: Compute the ascent direction and the newton decrement ---- #

        (
            ascent_direction_on_mask,
            newton_decrement_on_mask,
        ) = compute_ascent_direction_decrement(
            gradient_scaling_matrix=gradient_scaling_matrix_on_mask,
            gradient=new_gradient_on_mask,
            beta=new_beta_on_mask,
            max_beta=self.max_beta,
        )

        round_number_PQN += 1

        return {
            "beta": beta,
            "PQN_mask": PQN_mask,
            "PQN_diverged_mask": PQN_diverged_mask,
            "ascent_direction_on_mask": ascent_direction_on_mask,
            "newton_decrement_on_mask": newton_decrement_on_mask,
            "round_number_PQN": round_number_PQN,
            "global_reg_nll": reg_nll,
            "irls_diverged_mask": shared_states[0]["irls_diverged_mask"],
        }

choose_step_and_compute_ascent_direction(shared_states)

Choose best step size and compute next ascent direction.

By "ascent direction", we mean the direction that is positively correlated with the gradient.

The role of this function is twofold.

1) It chooses the best step size for each gene, and updates the beta values as well as the nll values. This allows to define the next iterate. Note that at the first iterate, it simply computes the nll, gradient and fisher information at the current beta values, to define the next ascent direction.

2) For this new iterate (or the current one if we are at the first round), it computes the gradient scaling matrix, which is used to scale the gradient in the proximal newton algorithm. From this gradient scaling matrix, and the gradient, it computes the ascent direction (and the newton decrement).

Parameters:

Name Type Description Default
shared_states list[dict]

A list of dictionaries containing the following keys: - beta: ndarray The log fold changes, of shape (n_non_zero_genes, n_params). - local_nll: ndarray The local nll, of shape (n_genes,), where n_genes is the current number of genes that are active (True in the PQN_mask). - local_fisher: ndarray The local fisher matrix, of shape (n_genes, n_params, n_params). - local_gradient: ndarray The local gradient, of shape (n_genes, n_params). - PQN_diverged_mask: ndarray A boolean mask indicating if the gene has diverged in the prox newton algorithm, of shape (n_non_zero_genes,). - PQN_mask: ndarray A boolean mask indicating if the gene should be used for the proximal newton step, of shape (n_non_zero_genes,). - global_reg_nll: ndarray The global regularized nll, of shape (n_non_zero_genes,). - newton_decrement_on_mask: Optional[ndarray] The newton decrement, of shape (n_ngenes,). This is None at the first round of the prox newton algorithm. - round_number_PQN: int The current round number of the prox newton algorithm. - ascent_direction_on_mask: Optional[ndarray] The ascent direction, of shape (n_genes, n_params), where n_genes is the current number of genes that are active (True in the PQN_mask). - irls_diverged_mask: ndarray A boolean mask indicating if the gene has diverged in the IRLS algorithm.

required

Returns:

Type Description
dict[str, Any]

A dictionary containing all the necessary info to run the method. If we are not at the last iteration, it contains the following fields: - beta: ndarray The log fold changes, of shape (n_non_zero_genes, n_params). - PQN_mask: ndarray A boolean mask indicating if the gene should be used for the proximal newton step. It is of shape (n_non_zero_genes,) - PQN_diverged_mask: ndarray A boolean mask indicating if the gene has diverged in the prox newton algorithm. It is of shape (n_non_zero_genes,) - ascent_direction_on_mask: np.ndarray The ascent direction, of shape (n_genes, n_params), where n_genes is the current number of genes that are active (True in the PQN_mask). - newton_decrement_on_mask: np.ndarray The newton decrement, of shape (n_ngenes,). - round_number_PQN: int The current round number of the prox newton algorithm. - global_reg_nll: ndarray The global regularized nll, of shape (n_non_zero_genes,). - irls_diverged_mask: ndarray A boolean mask indicating if the gene has diverged in the IRLS algorithm. If we are at the last iteration, it contains the following fields: - beta: ndarray The log fold changes, of shape (n_non_zero_genes, n_params). - PQN_diverged_mask: ndarray A boolean mask indicating if the gene has diverged in the prox newton algorithm. It is of shape (n_non_zero_genes,) - irls_diverged_mask: ndarray A boolean mask indicating if the gene has diverged in the IRLS algorithm.

Source code in fedpydeseq2/core/fed_algorithms/fed_PQN/substeps.py
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
@remote
@log_remote
def choose_step_and_compute_ascent_direction(
    self, shared_states: list[dict]
) -> dict[str, Any]:
    """Choose best step size and compute next ascent direction.

    By "ascent direction", we mean the direction that is positively correlated
    with the gradient.

    The role of this function is twofold.

    1) It chooses the best step size for each gene, and updates the beta values
    as well as the nll values. This allows to define the next iterate.
    Note that at the first iterate, it simply computes the nll, gradient and fisher
    information at the current beta values, to define the next ascent direction.

    2) For this new iterate (or the current one if we are at the first round),
    it computes the gradient scaling matrix, which is used to scale the gradient
    in the proximal newton algorithm. From this gradient scaling matrix, and the
    gradient, it computes the ascent direction (and the newton decrement).


    Parameters
    ----------
    shared_states: list[dict]
        A list of dictionaries containing the following
        keys:
        - beta: ndarray
            The log fold changes, of shape (n_non_zero_genes, n_params).
        - local_nll: ndarray
            The local nll, of shape (n_genes,), where
            n_genes is the current number of genes that are active (True
            in the PQN_mask).
        - local_fisher: ndarray
            The local fisher matrix, of shape (n_genes, n_params, n_params).
        - local_gradient: ndarray
            The local gradient, of shape (n_genes, n_params).
        - PQN_diverged_mask: ndarray
            A boolean mask indicating if the gene has diverged in the prox newton
            algorithm, of shape (n_non_zero_genes,).
        - PQN_mask: ndarray
            A boolean mask indicating if the gene should be used for the
            proximal newton step, of shape (n_non_zero_genes,).
        - global_reg_nll: ndarray
            The global regularized nll, of shape (n_non_zero_genes,).
        - newton_decrement_on_mask: Optional[ndarray]
            The newton decrement, of shape (n_ngenes,).
            This is None at the first round of the prox newton algorithm.
        - round_number_PQN: int
            The current round number of the prox newton algorithm.
        - ascent_direction_on_mask: Optional[ndarray]
            The ascent direction, of shape (n_genes, n_params), where
            n_genes is the current number of genes that are active (True
            in the PQN_mask).
        - irls_diverged_mask: ndarray
            A boolean mask indicating if the gene has diverged in the IRLS
            algorithm.

    Returns
    -------
    dict[str, Any]
        A dictionary containing all the necessary info to run the method.
        If we are not at the last iteration, it contains the following fields:
        - beta: ndarray
            The log fold changes, of shape (n_non_zero_genes, n_params).
        - PQN_mask: ndarray
            A boolean mask indicating if the gene should be used for the
            proximal newton step.
            It is of shape (n_non_zero_genes,)
        - PQN_diverged_mask: ndarray
            A boolean mask indicating if the gene has diverged in the prox newton
            algorithm. It is of shape (n_non_zero_genes,)
        - ascent_direction_on_mask: np.ndarray
            The ascent direction, of shape (n_genes, n_params), where
            n_genes is the current number of genes that are active (True
            in the PQN_mask).
        - newton_decrement_on_mask: np.ndarray
            The newton decrement, of shape (n_ngenes,).
        - round_number_PQN: int
            The current round number of the prox newton algorithm.
        - global_reg_nll: ndarray
            The global regularized nll, of shape (n_non_zero_genes,).
        - irls_diverged_mask: ndarray
            A boolean mask indicating if the gene has diverged in the IRLS
            algorithm.
        If we are at the last iteration, it contains the following fields:
        - beta: ndarray
            The log fold changes, of shape (n_non_zero_genes, n_params).
        - PQN_diverged_mask: ndarray
            A boolean mask indicating if the gene has diverged in the prox newton
            algorithm. It is of shape (n_non_zero_genes,)
        - irls_diverged_mask: ndarray
            A boolean mask indicating if the gene has diverged in the IRLS
            algorithm.

    """
    # Wwe use the following naming convention: when we say "on mask", we mean
    # that we restrict the quantity to the genes that are active in the proximal
    # newton
    # algorithm. We therefore need to ensure that these quantities are readjusted
    # when we change the proximal quasi newton mask.

    # Load main params from the first state
    beta = shared_states[0]["beta"]
    PQN_diverged_mask = shared_states[0]["PQN_diverged_mask"]
    PQN_mask = shared_states[0]["PQN_mask"]
    reg_nll = shared_states[0]["global_reg_nll"]
    ascent_direction_on_mask = shared_states[0]["ascent_direction_on_mask"]
    newton_decrement_on_mask = shared_states[0]["newton_decrement_on_mask"]
    round_number_PQN = shared_states[0]["round_number_PQN"]

    reg_parameter = 1e-6

    # ---- Step 0: Aggregate the nll, gradient and fisher info ---- #

    new_fisher_options_on_mask = sum(
        [state["local_fisher"] for state in shared_states]
    )

    new_gradient_options_on_mask = sum(
        [state["local_gradient"] for state in shared_states]
    )
    new_reg_nll_options_on_mask = sum(
        [state["local_nll"] for state in shared_states]
    )

    # ---- Step 1: Add the regularization term ---- #

    # ---- Step 1a: Compute the new beta options ---- #

    # In order to regularize, we have to compute the beta values at which
    # the nll, gradient and fisher informations were evaluated in the local steps.

    beta_on_mask = beta[PQN_mask]

    if round_number_PQN == 0:
        # In this case, there is no line search, and only
        # beta is considered in the local steps.
        new_beta_options_on_mask = beta_on_mask[None, :]

    else:
        # In this case, there is a line search, and we have to
        # compute the new beta options
        assert ascent_direction_on_mask is not None
        step_sizes = 0.5 ** np.arange(self.PQN_num_iters_ls)
        new_beta_options_on_mask = np.clip(
            beta_on_mask[None, :, :]
            - step_sizes[:, None, None] * ascent_direction_on_mask[None, :, :],
            -self.max_beta,
            self.max_beta,
        )

    # ---- Step 1b: Add the regularization ---- #

    # Add a regularization term to fisher info

    if new_fisher_options_on_mask is not None:
        # Add the regularization term to construct the Fisher info with prior
        # from the Fisher info without prior
        cross_term = (
            new_gradient_options_on_mask[:, :, :, None]
            @ new_beta_options_on_mask[:, :, None, :]
        )
        beta_term = (
            new_beta_options_on_mask[:, :, :, None]
            @ new_beta_options_on_mask[:, :, None, :]
        )
        new_fisher_options_on_mask += (
            reg_parameter * (cross_term + cross_term.transpose(0, 1, 3, 2))
            + reg_parameter**2 * beta_term
        )

        # Furthermore, add a ridge term to the Fisher info for numerical stability
        # This factor decreases log linearly between and initial and final reg
        # The decreasing factor is to ensure that the first steps correspond to
        # gradient descent steps, as we are too far from the optimum
        # to use the Fisher info.
        # Note that other schemes seem to work as well: 1 for 20 iterations then
        # 1e-6
        # 1 for 20 iterations then 1e-2 (to confirm), or 1 for 20 iterations and
        # then
        # 1/n_samples.
        initial_reg_fisher = 1
        final_reg_fisher = 1e-6
        reg_fisher = initial_reg_fisher * (
            final_reg_fisher / initial_reg_fisher
        ) ** (round_number_PQN / self.PQN_num_iters)

        new_fisher_options_on_mask = (
            new_fisher_options_on_mask
            + np.diag(np.repeat(reg_fisher, new_fisher_options_on_mask.shape[-1]))[
                None, None, :, :
            ]
        )

    # Add regularization term to gradient
    new_gradient_options_on_mask += reg_parameter * new_beta_options_on_mask

    # Add regularization term to the nll
    new_reg_nll_options_on_mask += (
        0.5 * reg_parameter * np.sum(new_beta_options_on_mask**2, axis=2)
    )

    # ---- Step 2: Compute best step size, and new values for this step size ---- #

    # This is only done if we are not at the first round of the prox newton
    # algorithm, as the first rounds serves only to evaluate the nll, gradient
    # and fisher info at the current beta values, and compute the first
    # ascent direction.

    if round_number_PQN > 0:
        # ---- Step 2a: See which step sizes pass the selection criteria ---- #

        assert reg_nll is not None
        reg_nll_on_mask = reg_nll[PQN_mask]

        obj_diff_options_on_mask = (
            reg_nll_on_mask[None, :] - new_reg_nll_options_on_mask
        )  # of shape n_steps, n_PQN_genes

        step_sizes = 0.5 ** np.arange(self.PQN_num_iters_ls)

        # Condition 1: Armijo condition
        # This condition is also called the first Wolfe condition.
        # Reference https://web.stanford.edu/~boyd/cvxbook/bv_cvxbook.pdf
        admissible_step_size_options_mask = (
            obj_diff_options_on_mask
            >= self.PQN_c1 * step_sizes[:, None] * newton_decrement_on_mask[None, :]
        )

        # ---- Step 2b: Identify genes that have diverged, and remove them ---- #

        # For each gene, we check if there is at least one step size that satisfies
        # the selection criteria. If there is none, we consider that the gene has
        # diverged: we remove all such genes from the PQN_mask
        # and add them to the PQN_diverged_mask.

        diverged_gene_mask_in_current_PQN_mask = np.all(
            ~admissible_step_size_options_mask, axis=0
        )

        # Remove these diverged genes for which we cannot find
        # a correct step size

        PQN_diverged_mask[PQN_mask] = diverged_gene_mask_in_current_PQN_mask
        PQN_mask[PQN_mask] = ~diverged_gene_mask_in_current_PQN_mask

        # Restrict all the quantities defined on the prox newton
        # mask to the new prox newton mask

        obj_diff_options_on_mask = obj_diff_options_on_mask[
            :, ~diverged_gene_mask_in_current_PQN_mask
        ]
        reg_nll_on_mask = reg_nll_on_mask[~diverged_gene_mask_in_current_PQN_mask]
        beta_on_mask = beta_on_mask[~diverged_gene_mask_in_current_PQN_mask]
        admissible_step_size_options_mask = admissible_step_size_options_mask[
            :, ~diverged_gene_mask_in_current_PQN_mask
        ]
        new_reg_nll_options_on_mask = new_reg_nll_options_on_mask[
            :, ~diverged_gene_mask_in_current_PQN_mask
        ]
        new_gradient_options_on_mask = new_gradient_options_on_mask[
            :, ~diverged_gene_mask_in_current_PQN_mask, :
        ]
        new_beta_options_on_mask = new_beta_options_on_mask[
            :, ~diverged_gene_mask_in_current_PQN_mask, :
        ]

        new_fisher_options_on_mask = new_fisher_options_on_mask[
            :, ~diverged_gene_mask_in_current_PQN_mask, :, :
        ]

        # ---- Step 2c: Find the best step size for each gene ---- #

        # Here, we find the best step size for each gene that satisfies the
        # selection criteria (i.e. the largest).
        # We do this by finding the first index for which
        # the admissible step size mask is True.
        # We then create the new beta, gradient, fisher info and reg nll by
        # taking the option corresponding to the best step size

        new_step_size_index = np.argmax(admissible_step_size_options_mask, axis=0)
        arange_PQN = np.arange(len(new_step_size_index))

        new_beta_on_mask = new_beta_options_on_mask[new_step_size_index, arange_PQN]
        new_gradient_on_mask = new_gradient_options_on_mask[
            new_step_size_index, arange_PQN
        ]
        new_fisher_on_mask = new_fisher_options_on_mask[
            new_step_size_index, arange_PQN
        ]

        obj_diff_on_mask = obj_diff_options_on_mask[new_step_size_index, arange_PQN]

        new_reg_nll_on_mask = new_reg_nll_options_on_mask[
            new_step_size_index, arange_PQN
        ]

        # ---- Step 2d: Update the beta values and the reg_nll values ---- #

        beta[PQN_mask] = new_beta_on_mask
        reg_nll[PQN_mask] = new_reg_nll_on_mask

        # ---- Step 2e: Check for convergence of the method ---- #

        convergence_mask = (
            np.abs(obj_diff_on_mask)
            / (
                np.maximum(
                    np.maximum(
                        np.abs(new_reg_nll_on_mask),
                        np.abs(reg_nll_on_mask),
                    ),
                    1,
                )
            )
            < self.PQN_ftol
        )

        # ---- Step 2f: Remove converged genes from the mask ---- #
        PQN_mask[PQN_mask] = ~convergence_mask

        # If we reach the max number of iterations, we stop
        if round_number_PQN == self.PQN_num_iters:
            # In this case, we are finished.
            return {
                "beta": beta,
                "PQN_diverged_mask": PQN_diverged_mask | PQN_mask,
                "irls_diverged_mask": shared_states[0]["irls_diverged_mask"],
            }

        # We restrict all quantities to the new mask

        new_gradient_on_mask = new_gradient_on_mask[~convergence_mask]
        new_beta_on_mask = new_beta_on_mask[~convergence_mask]
        new_fisher_on_mask = new_fisher_on_mask[~convergence_mask]

        # Note, this is the old beta
        beta_on_mask = beta_on_mask[~convergence_mask]

    else:
        # In this case, we are at the first round of the prox newton algorithm
        # In this case, we simply instantiate the new values to the first
        # values that were computed in the local steps, to be able to compute
        # the first ascent direction.
        beta_on_mask = None
        new_gradient_on_mask = new_gradient_options_on_mask[0]
        new_beta_on_mask = new_beta_options_on_mask[0]
        new_fisher_on_mask = new_fisher_options_on_mask[0]

        # Set the nll
        reg_nll[PQN_mask] = new_reg_nll_options_on_mask[0]

    # ---- Step 3: Compute the gradient scaling matrix ---- #

    gradient_scaling_matrix_on_mask = compute_gradient_scaling_matrix_fisher(
        fisher=new_fisher_on_mask,
        backend=self.joblib_backend,
        num_jobs=self.num_jobs,
        joblib_verbosity=self.joblib_verbosity,
        batch_size=self.irls_batch_size,
    )

    # ---- Step 4: Compute the ascent direction and the newton decrement ---- #

    (
        ascent_direction_on_mask,
        newton_decrement_on_mask,
    ) = compute_ascent_direction_decrement(
        gradient_scaling_matrix=gradient_scaling_matrix_on_mask,
        gradient=new_gradient_on_mask,
        beta=new_beta_on_mask,
        max_beta=self.max_beta,
    )

    round_number_PQN += 1

    return {
        "beta": beta,
        "PQN_mask": PQN_mask,
        "PQN_diverged_mask": PQN_diverged_mask,
        "ascent_direction_on_mask": ascent_direction_on_mask,
        "newton_decrement_on_mask": newton_decrement_on_mask,
        "round_number_PQN": round_number_PQN,
        "global_reg_nll": reg_nll,
        "irls_diverged_mask": shared_states[0]["irls_diverged_mask"],
    }

LocMakeFedPQNFisherGradientNLL

Mixin to compute local values, gradient and Fisher information of the NLL.

Attributes:

Name Type Description
local_adata AnnData

The local AnnData.

num_jobs int

The number of cpus to use.

joblib_verbosity int

The joblib verbosity.

joblib_backend str

The backend to use for the IRLS algorithm.

irls_batch_size int

The batch size to use for the IRLS algorithm.

max_beta float

The maximum value for the beta parameter.

PQN_num_iters_ls int

The number of iterations to use for the line search.

PQN_min_mu float

The min_mu parameter for the Proximal Quasi Newton algorithm.

Methods:

Name Description
make_local_fisher_gradient_nll

A remote_data method. Make the local nll, gradient and fisher matrix.

Source code in fedpydeseq2/core/fed_algorithms/fed_PQN/substeps.py
class LocMakeFedPQNFisherGradientNLL:
    """Mixin to compute local values, gradient and Fisher information of the NLL.

    Attributes
    ----------
    local_adata : AnnData
        The local AnnData.
    num_jobs : int
        The number of cpus to use.
    joblib_verbosity : int
        The joblib verbosity.
    joblib_backend : str
        The backend to use for the IRLS algorithm.
    irls_batch_size : int
        The batch size to use for the IRLS algorithm.
    max_beta : float
        The maximum value for the beta parameter.
    PQN_num_iters_ls : int
        The number of iterations to use for the line search.
    PQN_min_mu : float
        The min_mu parameter for the Proximal Quasi Newton algorithm.

    Methods
    -------
    make_local_fisher_gradient_nll
        A remote_data method.
        Make the local nll, gradient and fisher matrix.

    """

    local_adata: AnnData
    refit_adata: AnnData
    num_jobs: int
    joblib_verbosity: int
    joblib_backend: str
    irls_batch_size: int
    max_beta: float
    PQN_num_iters_ls: int
    PQN_min_mu: float

    @remote_data
    @log_remote_data
    @reconstruct_adatas
    def make_local_fisher_gradient_nll(
        self,
        data_from_opener: AnnData,
        shared_state: dict[str, Any],
        first_iteration_mode: Literal["irls_catch"] | None = None,
        refit_mode: bool = False,
    ):
        r"""Make the local nll, gradient and fisher information matrix.

        Given an ascent direction :math:`d` (an ascent direction being positively
        correlated to the gradient of the starting point) and a starting point
        :math:`beta`, this function
        computes the nll, gradient and Fisher information at the points
        :math:`beta + t * d`,
        for :math:`t` in step_sizes
        (step sizes are :math:`0.5^i` for :math:`i` in :math:`0,...,19`.


        Moreover, if the iteration is the first one, the step sizes are not used,
        and instead, the nll, gradient and fisher information are computed at the
        current beta values.

        Parameters
        ----------
        data_from_opener : AnnData
            Not used.

        shared_state : dict
            A dictionary containing the following
            keys:
            - PQN_mask: ndarray
                A boolean mask indicating if the gene should be used for the
                proximal newton step.
                It is of shape (n_non_zero_genes,)
                Used, but not modified.
            - round_number_PQN: int
                The current round number of the prox newton algorithm.
                Used but not modified.
            - ascent_direction_on_mask: Optional[ndarray]
                The ascent direction, of shape (n_genes, n_params), where
                n_genes is the current number of genes that are active (True
                in the PQN_mask).
                Used but not modified.
            - beta: ndarray
                The log fold changes, of shape (n_non_zero_genes, n_params).
                Used but not modified.
            - global_reg_nll: ndarray
                The global regularized nll, of shape (n_non_zero_genes,).
                Not used and not modified.
            - newton_decrement_on_mask: Optional[ndarray]
                The newton decrement, of shape (n_ngenes,).
                It is None at the first round of the prox newton algorithm.
                Not used and not modified.
            - irls_diverged_mask: ndarray
                A boolean mask indicating if the gene has diverged in the IRLS
                algorithm.
                Not used and not modified.
            - PQN_diverged_mask: ndarray
                A boolean mask indicating if the gene has diverged in the prox newton
                algorithm.
                Not used and not modified.

        first_iteration_mode : Optional[Literal["irls_catch"]]
            For the first iteration, this function behaves differently. If
            first_iteration_mode is None, then we are not at the first iteration.
            If first_iteration_mode is not None, the function will expect a
            different shared state than the one described above, and will construct
            the initial shared state from it.
            If first_iteration_mode is "irls_catch", then we assume that
            we are using the PQN algorithm as a method to catch IRLS when it fails
            The function will expect a
            shared state that contains the following fields:
            - beta: ndarray
                The log fold changes, of shape (n_non_zero_genes, n_params).
            - irls_diverged_mask: ndarray
                A boolean mask indicating if the gene has diverged in the IRLS
                algorithm.
            - irls_mask : ndarray
                The mask of genes that were still active for the IRLS algorithm.

        refit_mode : bool
            Whether to run on `refit_adata`s instead of `local_adata`s.
            (default: False).

        Returns
        -------
        dict
            The state to share to the server.
            It contains the following fields:
            - beta: ndarray
                The log fold changes, of shape (n_non_zero_genes, n_params).
            - local_nll: ndarray
                The local nll, of shape (n_step_sizes, n_genes,), where
                n_genes is the current number of genes that are active (True
                in the PQN_mask). n_step_sizes is the number of step sizes
                considered, which is `PQN_num_iters_ls` if we are not at the
                first round, and 1 otherwise.
                This is created during this step.
            - local_fisher: ndarray
                The local fisher matrix,
                of shape (n_step_sizes, n_genes, n_params, n_params).
                This is created during this step.
            - local_gradient: ndarray
                The local gradient, of shape (n_step_sizes, n_genes, n_params).
                This is created during this step.
            - PQN_diverged_mask: ndarray
                A boolean mask indicating if the gene has diverged in the prox newton
                algorithm.
            - PQN_mask: ndarray
                A boolean mask indicating if the gene should be used for the
                proximal newton step, of shape (n_non_zero_genes,).
            - global_reg_nll: ndarray
                The global regularized nll, of shape (n_non_zero_genes,).
            - newton_decrement_on_mask: Optional[ndarray]
                The newton decrement, of shape (n_ngenes,).
                This is None at the first round of the prox newton algorithm.
            - round_number_PQN: int
                The current round number of the prox newton algorithm.
            - irls_diverged_mask: ndarray
                A boolean mask indicating if the gene has diverged in the IRLS
                algorithm.
            - ascent_direction_on_mask: Optional[ndarray]
                The ascent direction, of shape (n_genes, n_params), where
                n_genes is the current number of genes that are active (True
                in the PQN_mask).

        Raises
        ------
        ValueError
            If first_iteration_mode is not None or "irls_catch".
        """
        if refit_mode:
            adata = self.refit_adata
        else:
            adata = self.local_adata

        # Distinguish between the first iteration and the rest
        if first_iteration_mode is not None and first_iteration_mode == "irls_catch":
            beta = shared_state["beta"]
            irls_diverged_mask = shared_state["irls_diverged_mask"]
            irls_mask = shared_state["irls_mask"]
            PQN_mask = irls_mask | irls_diverged_mask
            irls_diverged_mask = PQN_mask.copy()
            round_number_PQN = 0
            ascent_direction_on_mask = None
            newton_decrement_on_mask = None
            PQN_diverged_mask = np.zeros_like(irls_mask, dtype=bool)
            global_reg_nll = np.nan * np.ones_like(irls_mask, dtype=float)
        elif first_iteration_mode is None:
            # If we are not at the first iteration, we use the shared state
            PQN_mask = shared_state["PQN_mask"]
            round_number_PQN = shared_state["round_number_PQN"]
            ascent_direction_on_mask = shared_state["ascent_direction_on_mask"]
            beta = shared_state["beta"]
            PQN_diverged_mask = shared_state["PQN_diverged_mask"]
            newton_decrement_on_mask = shared_state["newton_decrement_on_mask"]
            global_reg_nll = shared_state["global_reg_nll"]
            irls_diverged_mask = shared_state["irls_diverged_mask"]
        else:
            raise ValueError("first_iteration_mode should be None or 'irls_catch'")

        if round_number_PQN == 0:
            # Sanity check that this is the first round of fed prox
            beta[PQN_mask] = adata.uns["_irls_beta_init"][PQN_mask]
            step_sizes: np.ndarray | None = None

        else:
            step_sizes = 0.5 ** np.arange(self.PQN_num_iters_ls)

        # Get the quantities stored in the adata
        disp_param_name = adata.uns["_irls_disp_param_name"]

        (
            PQN_gene_names,
            design_matrix,
            size_factors,
            counts,
            dispersions,
            beta_on_mask,
        ) = get_lfc_utils_from_gene_mask_adata(
            adata,
            PQN_mask,
            disp_param_name,
            beta=beta,
        )

        # ---- Compute local nll, gradient and Fisher information ---- #

        with parallel_backend(self.joblib_backend):
            res = Parallel(n_jobs=self.num_jobs, verbose=self.joblib_verbosity)(
                delayed(make_fisher_gradient_nll_step_sizes_batch)(
                    design_matrix=design_matrix,
                    size_factors=size_factors,
                    beta=beta_on_mask[i : i + self.irls_batch_size],
                    dispersions=dispersions[i : i + self.irls_batch_size],
                    counts=counts[:, i : i + self.irls_batch_size],
                    ascent_direction=ascent_direction_on_mask[
                        i : i + self.irls_batch_size
                    ]
                    if ascent_direction_on_mask is not None
                    else None,
                    step_sizes=step_sizes,
                    beta_min=-self.max_beta,
                    beta_max=self.max_beta,
                    min_mu=self.PQN_min_mu,
                )
                for i in range(0, len(beta_on_mask), self.irls_batch_size)
            )

        n_step_sizes = len(step_sizes) if step_sizes is not None else 1
        if len(res) == 0:
            H = np.zeros((n_step_sizes, 0, beta.shape[1], beta.shape[1]))
            gradient = np.zeros((n_step_sizes, 0, beta.shape[1]))
            local_nll = np.zeros((n_step_sizes, 0))
        else:
            H = np.concatenate([r[0] for r in res], axis=1)
            gradient = np.concatenate([r[1] for r in res], axis=1)
            local_nll = np.concatenate([r[2] for r in res], axis=1)

        # Create the shared state
        return {
            "beta": beta,
            "local_nll": local_nll,
            "local_fisher": H,
            "local_gradient": gradient,
            "PQN_diverged_mask": PQN_diverged_mask,
            "PQN_mask": PQN_mask,
            "global_reg_nll": global_reg_nll,
            "newton_decrement_on_mask": newton_decrement_on_mask,
            "round_number_PQN": round_number_PQN,
            "irls_diverged_mask": irls_diverged_mask,
            "ascent_direction_on_mask": ascent_direction_on_mask,
        }

make_local_fisher_gradient_nll(data_from_opener, shared_state, first_iteration_mode=None, refit_mode=False)

Make the local nll, gradient and fisher information matrix.

Given an ascent direction :math:d (an ascent direction being positively correlated to the gradient of the starting point) and a starting point :math:beta, this function computes the nll, gradient and Fisher information at the points :math:beta + t * d, for :math:t in step_sizes (step sizes are :math:0.5^i for :math:i in :math:0,...,19.

Moreover, if the iteration is the first one, the step sizes are not used, and instead, the nll, gradient and fisher information are computed at the current beta values.

Parameters:

Name Type Description Default
data_from_opener AnnData

Not used.

required
shared_state dict

A dictionary containing the following keys: - PQN_mask: ndarray A boolean mask indicating if the gene should be used for the proximal newton step. It is of shape (n_non_zero_genes,) Used, but not modified. - round_number_PQN: int The current round number of the prox newton algorithm. Used but not modified. - ascent_direction_on_mask: Optional[ndarray] The ascent direction, of shape (n_genes, n_params), where n_genes is the current number of genes that are active (True in the PQN_mask). Used but not modified. - beta: ndarray The log fold changes, of shape (n_non_zero_genes, n_params). Used but not modified. - global_reg_nll: ndarray The global regularized nll, of shape (n_non_zero_genes,). Not used and not modified. - newton_decrement_on_mask: Optional[ndarray] The newton decrement, of shape (n_ngenes,). It is None at the first round of the prox newton algorithm. Not used and not modified. - irls_diverged_mask: ndarray A boolean mask indicating if the gene has diverged in the IRLS algorithm. Not used and not modified. - PQN_diverged_mask: ndarray A boolean mask indicating if the gene has diverged in the prox newton algorithm. Not used and not modified.

required
first_iteration_mode Optional[Literal['irls_catch']]

For the first iteration, this function behaves differently. If first_iteration_mode is None, then we are not at the first iteration. If first_iteration_mode is not None, the function will expect a different shared state than the one described above, and will construct the initial shared state from it. If first_iteration_mode is "irls_catch", then we assume that we are using the PQN algorithm as a method to catch IRLS when it fails The function will expect a shared state that contains the following fields: - beta: ndarray The log fold changes, of shape (n_non_zero_genes, n_params). - irls_diverged_mask: ndarray A boolean mask indicating if the gene has diverged in the IRLS algorithm. - irls_mask : ndarray The mask of genes that were still active for the IRLS algorithm.

None
refit_mode bool

Whether to run on refit_adatas instead of local_adatas. (default: False).

False

Returns:

Type Description
dict

The state to share to the server. It contains the following fields: - beta: ndarray The log fold changes, of shape (n_non_zero_genes, n_params). - local_nll: ndarray The local nll, of shape (n_step_sizes, n_genes,), where n_genes is the current number of genes that are active (True in the PQN_mask). n_step_sizes is the number of step sizes considered, which is PQN_num_iters_ls if we are not at the first round, and 1 otherwise. This is created during this step. - local_fisher: ndarray The local fisher matrix, of shape (n_step_sizes, n_genes, n_params, n_params). This is created during this step. - local_gradient: ndarray The local gradient, of shape (n_step_sizes, n_genes, n_params). This is created during this step. - PQN_diverged_mask: ndarray A boolean mask indicating if the gene has diverged in the prox newton algorithm. - PQN_mask: ndarray A boolean mask indicating if the gene should be used for the proximal newton step, of shape (n_non_zero_genes,). - global_reg_nll: ndarray The global regularized nll, of shape (n_non_zero_genes,). - newton_decrement_on_mask: Optional[ndarray] The newton decrement, of shape (n_ngenes,). This is None at the first round of the prox newton algorithm. - round_number_PQN: int The current round number of the prox newton algorithm. - irls_diverged_mask: ndarray A boolean mask indicating if the gene has diverged in the IRLS algorithm. - ascent_direction_on_mask: Optional[ndarray] The ascent direction, of shape (n_genes, n_params), where n_genes is the current number of genes that are active (True in the PQN_mask).

Raises:

Type Description
ValueError

If first_iteration_mode is not None or "irls_catch".

Source code in fedpydeseq2/core/fed_algorithms/fed_PQN/substeps.py
@remote_data
@log_remote_data
@reconstruct_adatas
def make_local_fisher_gradient_nll(
    self,
    data_from_opener: AnnData,
    shared_state: dict[str, Any],
    first_iteration_mode: Literal["irls_catch"] | None = None,
    refit_mode: bool = False,
):
    r"""Make the local nll, gradient and fisher information matrix.

    Given an ascent direction :math:`d` (an ascent direction being positively
    correlated to the gradient of the starting point) and a starting point
    :math:`beta`, this function
    computes the nll, gradient and Fisher information at the points
    :math:`beta + t * d`,
    for :math:`t` in step_sizes
    (step sizes are :math:`0.5^i` for :math:`i` in :math:`0,...,19`.


    Moreover, if the iteration is the first one, the step sizes are not used,
    and instead, the nll, gradient and fisher information are computed at the
    current beta values.

    Parameters
    ----------
    data_from_opener : AnnData
        Not used.

    shared_state : dict
        A dictionary containing the following
        keys:
        - PQN_mask: ndarray
            A boolean mask indicating if the gene should be used for the
            proximal newton step.
            It is of shape (n_non_zero_genes,)
            Used, but not modified.
        - round_number_PQN: int
            The current round number of the prox newton algorithm.
            Used but not modified.
        - ascent_direction_on_mask: Optional[ndarray]
            The ascent direction, of shape (n_genes, n_params), where
            n_genes is the current number of genes that are active (True
            in the PQN_mask).
            Used but not modified.
        - beta: ndarray
            The log fold changes, of shape (n_non_zero_genes, n_params).
            Used but not modified.
        - global_reg_nll: ndarray
            The global regularized nll, of shape (n_non_zero_genes,).
            Not used and not modified.
        - newton_decrement_on_mask: Optional[ndarray]
            The newton decrement, of shape (n_ngenes,).
            It is None at the first round of the prox newton algorithm.
            Not used and not modified.
        - irls_diverged_mask: ndarray
            A boolean mask indicating if the gene has diverged in the IRLS
            algorithm.
            Not used and not modified.
        - PQN_diverged_mask: ndarray
            A boolean mask indicating if the gene has diverged in the prox newton
            algorithm.
            Not used and not modified.

    first_iteration_mode : Optional[Literal["irls_catch"]]
        For the first iteration, this function behaves differently. If
        first_iteration_mode is None, then we are not at the first iteration.
        If first_iteration_mode is not None, the function will expect a
        different shared state than the one described above, and will construct
        the initial shared state from it.
        If first_iteration_mode is "irls_catch", then we assume that
        we are using the PQN algorithm as a method to catch IRLS when it fails
        The function will expect a
        shared state that contains the following fields:
        - beta: ndarray
            The log fold changes, of shape (n_non_zero_genes, n_params).
        - irls_diverged_mask: ndarray
            A boolean mask indicating if the gene has diverged in the IRLS
            algorithm.
        - irls_mask : ndarray
            The mask of genes that were still active for the IRLS algorithm.

    refit_mode : bool
        Whether to run on `refit_adata`s instead of `local_adata`s.
        (default: False).

    Returns
    -------
    dict
        The state to share to the server.
        It contains the following fields:
        - beta: ndarray
            The log fold changes, of shape (n_non_zero_genes, n_params).
        - local_nll: ndarray
            The local nll, of shape (n_step_sizes, n_genes,), where
            n_genes is the current number of genes that are active (True
            in the PQN_mask). n_step_sizes is the number of step sizes
            considered, which is `PQN_num_iters_ls` if we are not at the
            first round, and 1 otherwise.
            This is created during this step.
        - local_fisher: ndarray
            The local fisher matrix,
            of shape (n_step_sizes, n_genes, n_params, n_params).
            This is created during this step.
        - local_gradient: ndarray
            The local gradient, of shape (n_step_sizes, n_genes, n_params).
            This is created during this step.
        - PQN_diverged_mask: ndarray
            A boolean mask indicating if the gene has diverged in the prox newton
            algorithm.
        - PQN_mask: ndarray
            A boolean mask indicating if the gene should be used for the
            proximal newton step, of shape (n_non_zero_genes,).
        - global_reg_nll: ndarray
            The global regularized nll, of shape (n_non_zero_genes,).
        - newton_decrement_on_mask: Optional[ndarray]
            The newton decrement, of shape (n_ngenes,).
            This is None at the first round of the prox newton algorithm.
        - round_number_PQN: int
            The current round number of the prox newton algorithm.
        - irls_diverged_mask: ndarray
            A boolean mask indicating if the gene has diverged in the IRLS
            algorithm.
        - ascent_direction_on_mask: Optional[ndarray]
            The ascent direction, of shape (n_genes, n_params), where
            n_genes is the current number of genes that are active (True
            in the PQN_mask).

    Raises
    ------
    ValueError
        If first_iteration_mode is not None or "irls_catch".
    """
    if refit_mode:
        adata = self.refit_adata
    else:
        adata = self.local_adata

    # Distinguish between the first iteration and the rest
    if first_iteration_mode is not None and first_iteration_mode == "irls_catch":
        beta = shared_state["beta"]
        irls_diverged_mask = shared_state["irls_diverged_mask"]
        irls_mask = shared_state["irls_mask"]
        PQN_mask = irls_mask | irls_diverged_mask
        irls_diverged_mask = PQN_mask.copy()
        round_number_PQN = 0
        ascent_direction_on_mask = None
        newton_decrement_on_mask = None
        PQN_diverged_mask = np.zeros_like(irls_mask, dtype=bool)
        global_reg_nll = np.nan * np.ones_like(irls_mask, dtype=float)
    elif first_iteration_mode is None:
        # If we are not at the first iteration, we use the shared state
        PQN_mask = shared_state["PQN_mask"]
        round_number_PQN = shared_state["round_number_PQN"]
        ascent_direction_on_mask = shared_state["ascent_direction_on_mask"]
        beta = shared_state["beta"]
        PQN_diverged_mask = shared_state["PQN_diverged_mask"]
        newton_decrement_on_mask = shared_state["newton_decrement_on_mask"]
        global_reg_nll = shared_state["global_reg_nll"]
        irls_diverged_mask = shared_state["irls_diverged_mask"]
    else:
        raise ValueError("first_iteration_mode should be None or 'irls_catch'")

    if round_number_PQN == 0:
        # Sanity check that this is the first round of fed prox
        beta[PQN_mask] = adata.uns["_irls_beta_init"][PQN_mask]
        step_sizes: np.ndarray | None = None

    else:
        step_sizes = 0.5 ** np.arange(self.PQN_num_iters_ls)

    # Get the quantities stored in the adata
    disp_param_name = adata.uns["_irls_disp_param_name"]

    (
        PQN_gene_names,
        design_matrix,
        size_factors,
        counts,
        dispersions,
        beta_on_mask,
    ) = get_lfc_utils_from_gene_mask_adata(
        adata,
        PQN_mask,
        disp_param_name,
        beta=beta,
    )

    # ---- Compute local nll, gradient and Fisher information ---- #

    with parallel_backend(self.joblib_backend):
        res = Parallel(n_jobs=self.num_jobs, verbose=self.joblib_verbosity)(
            delayed(make_fisher_gradient_nll_step_sizes_batch)(
                design_matrix=design_matrix,
                size_factors=size_factors,
                beta=beta_on_mask[i : i + self.irls_batch_size],
                dispersions=dispersions[i : i + self.irls_batch_size],
                counts=counts[:, i : i + self.irls_batch_size],
                ascent_direction=ascent_direction_on_mask[
                    i : i + self.irls_batch_size
                ]
                if ascent_direction_on_mask is not None
                else None,
                step_sizes=step_sizes,
                beta_min=-self.max_beta,
                beta_max=self.max_beta,
                min_mu=self.PQN_min_mu,
            )
            for i in range(0, len(beta_on_mask), self.irls_batch_size)
        )

    n_step_sizes = len(step_sizes) if step_sizes is not None else 1
    if len(res) == 0:
        H = np.zeros((n_step_sizes, 0, beta.shape[1], beta.shape[1]))
        gradient = np.zeros((n_step_sizes, 0, beta.shape[1]))
        local_nll = np.zeros((n_step_sizes, 0))
    else:
        H = np.concatenate([r[0] for r in res], axis=1)
        gradient = np.concatenate([r[1] for r in res], axis=1)
        local_nll = np.concatenate([r[2] for r in res], axis=1)

    # Create the shared state
    return {
        "beta": beta,
        "local_nll": local_nll,
        "local_fisher": H,
        "local_gradient": gradient,
        "PQN_diverged_mask": PQN_diverged_mask,
        "PQN_mask": PQN_mask,
        "global_reg_nll": global_reg_nll,
        "newton_decrement_on_mask": newton_decrement_on_mask,
        "round_number_PQN": round_number_PQN,
        "irls_diverged_mask": irls_diverged_mask,
        "ascent_direction_on_mask": ascent_direction_on_mask,
    }

utils

Utility functions for the proximal Newton optimization.

This optimization is used in the catching of the IRLS algorithm.

compute_ascent_direction_decrement(gradient_scaling_matrix, gradient, beta, max_beta)

Compute the ascent direction and decrement.

We do this from the gradient scaling matrix, the gradient, the beta and the max beta, which embodies the box constraints.

Please look at this paper for the precise references to the equations: https://www.cs.utexas.edu/~inderjit/public_papers/pqnj_sisc10.pdf

By ascent direction, we mean that the direction we compute is positively correlated with the gradient. As our aim is to minimize the function, we want to move in the opposite direction of the ascent direction, but it is simpler to compute the ascent direction to avoid sign errors.

Parameters:

Name Type Description Default
gradient_scaling_matrix ndarray

The gradient scaling matrix, of shape (n_genes, n_params, n_params).

required
gradient ndarray

The gradient per gene, of shape (n_genes, n_params).

required
beta ndarray

Beta on those genes, of shape (n_genes, n_params).

required
max_beta float

The max absolute value for beta.

required

Returns:

Name Type Description
ascent_direction ndarray

The new ascent direction, of shape (n_genes, n_params).

newton_decrement ndarray

The newton decrement associated to these ascent directions of shape (n_genes, )

Source code in fedpydeseq2/core/fed_algorithms/fed_PQN/utils.py
def compute_ascent_direction_decrement(
    gradient_scaling_matrix: np.ndarray,
    gradient: np.ndarray,
    beta: np.ndarray,
    max_beta: float,
):
    """Compute the ascent direction and decrement.

    We do this from the gradient scaling matrix, the gradient,
    the beta and the max beta, which embodies the box constraints.

    Please look at this paper for the precise references to the equations:
    https://www.cs.utexas.edu/~inderjit/public_papers/pqnj_sisc10.pdf

    By ascent direction, we mean that the direction we compute is positively
    correlated with the gradient. As our aim is to minimize the function,
    we want to move in the opposite direction of the ascent direction, but
    it is simpler to compute the ascent direction to avoid sign errors.

    Parameters
    ----------
    gradient_scaling_matrix : np.ndarray
        The gradient scaling matrix, of shape (n_genes, n_params, n_params).
    gradient : np.ndarray
        The gradient per gene, of shape (n_genes, n_params).
    beta : np.ndarray
        Beta on those genes, of shape (n_genes, n_params).
    max_beta : float
        The max absolute value for beta.

    Returns
    -------
    ascent_direction : np.ndarray
        The new ascent direction, of shape (n_genes, n_params).
    newton_decrement : np.ndarray
        The newton decrement associated to these ascent directions
        of shape (n_genes, )

    """
    # ---- Step 1: compute first index set ---- #
    # See https://www.cs.utexas.edu/~inderjit/public_papers/pqnj_sisc10.pdf
    # equation 2.2

    lower_binding = (beta < -max_beta + 1e-14) & (gradient > 0)
    upper_binding = (beta > max_beta - 1e-14) & (gradient < 0)
    first_index_mask = lower_binding | upper_binding  # of shape (n_genes, n_params)

    # Set to zero the gradient scaling matrix on the first index

    n_params = beta.shape[1]

    gradient_scaling_matrix[
        np.repeat(first_index_mask[:, :, None], repeats=n_params, axis=2)
    ] = 0
    gradient_scaling_matrix[
        np.repeat(first_index_mask[:, None, :], repeats=n_params, axis=1)
    ] = 0

    ascent_direction = (gradient_scaling_matrix @ gradient[:, :, None]).squeeze(
        axis=2
    )  # of shape (n_genes, n_params)

    # ---- Step 2: Compute the second index set ---- #
    # See https://www.cs.utexas.edu/~inderjit/public_papers/pqnj_sisc10.pdf
    # equation 2.3

    lower_binding = (beta < -max_beta + 1e-14) & (ascent_direction > 0)
    upper_binding = (beta > max_beta - 1e-14) & (ascent_direction < 0)
    second_index_mask = lower_binding | upper_binding

    # Set to zero the gradient scaling matrix on the second index

    gradient_scaling_matrix[
        np.repeat(second_index_mask[:, :, None], repeats=n_params, axis=2)
    ] = 0
    gradient_scaling_matrix[
        np.repeat(second_index_mask[:, None, :], repeats=n_params, axis=1)
    ] = 0

    # ---- Step 3: Compute the ascent direction and Newton decrement ---- #

    ascent_direction = gradient_scaling_matrix @ gradient[:, :, None]
    newton_decrement = (gradient[:, None, :] @ ascent_direction).squeeze(axis=(1, 2))

    ascent_direction = ascent_direction.squeeze(axis=2)

    return ascent_direction, newton_decrement

compute_gradient_scaling_matrix_fisher(fisher, backend, num_jobs, joblib_verbosity, batch_size)

Compute the gradient scaling matrix using the Fisher information.

In this case, we simply invert the provided Fisher matrix to get the gradient scaling matrix.

Parameters:

Name Type Description Default
fisher ndarray

The Fisher matrix, of shape (n_genes, n_params, n_params)

required
backend str

The backend to use for parallelization

required
num_jobs int

The number of cpus to use

required
joblib_verbosity int

The verbosity level of joblib

required
batch_size int

The batch size to use for the computation

required

Returns:

Type Description
ndarray

The gradient scaling matrix, of shape (n_genes, n_params, n_params)

Source code in fedpydeseq2/core/fed_algorithms/fed_PQN/utils.py
def compute_gradient_scaling_matrix_fisher(
    fisher: np.ndarray,
    backend: str,
    num_jobs: int,
    joblib_verbosity: int,
    batch_size: int,
):
    """Compute the gradient scaling matrix using the Fisher information.

    In this case, we simply invert the provided Fisher matrix to get the gradient
    scaling matrix.

    Parameters
    ----------
    fisher : ndarray
        The Fisher matrix, of shape (n_genes, n_params, n_params)
    backend : str
        The backend to use for parallelization
    num_jobs : int
        The number of cpus to use
    joblib_verbosity : int
        The verbosity level of joblib
    batch_size : int
        The batch size to use for the computation

    Returns
    -------
    ndarray
        The gradient scaling matrix, of shape (n_genes, n_params, n_params)
    """
    with parallel_backend(backend):
        res = Parallel(n_jobs=num_jobs, verbose=joblib_verbosity)(
            delayed(np.linalg.inv)(
                fisher[i : i + batch_size],
            )
            for i in range(0, len(fisher), batch_size)
        )
    if len(res) > 0:
        gradient_scaling_matrix = np.concatenate(res)
    else:
        gradient_scaling_matrix = np.zeros_like(fisher)

    return gradient_scaling_matrix

make_fisher_gradient_nll_step_sizes_batch(design_matrix, size_factors, beta, dispersions, counts, ascent_direction, step_sizes, beta_min, beta_max, min_mu=0.5)

Make local gradient, fisher matrix, and nll for multiple steps.

Parameters:

Name Type Description Default
design_matrix ndarray

The design matrix, of shape (n_obs, n_params).

required
size_factors ndarray

The size factors, of shape (n_obs).

required
beta ndarray

The log fold change matrix, of shape (batch_size, n_params).

required
dispersions ndarray

The dispersions, of shape (batch_size).

required
counts ndarray

The counts, of shape (n_obs,batch_size).

required
ascent_direction ndarray

The ascent direction, of shape (batch_size, n_params).

required
step_sizes ndarray | None

A list of step sizes to evaluate, of size (n_steps, ).

required
beta_min float | None

The minimum value tolerated for beta.

required
beta_max float | None

The maximum value tolerated for beta.

required
min_mu float

Lower bound on estimated means, to ensure numerical stability.

0.5

Returns:

Name Type Description
H Optional[ndarray]

The Fisher information matrix, of shape (n_steps, batch_size, n_params, n_params).

gradient ndarray

The gradient, of shape (n_steps, batch_size, n_params).

nll ndarray

The nll evaluations on all steps, of size (n_steps, batch_size).

Source code in fedpydeseq2/core/fed_algorithms/fed_PQN/utils.py
def make_fisher_gradient_nll_step_sizes_batch(
    design_matrix: np.ndarray,
    size_factors: np.ndarray,
    beta: np.ndarray,
    dispersions: np.ndarray,
    counts: np.ndarray,
    ascent_direction: np.ndarray | None,
    step_sizes: np.ndarray | None,
    beta_min: float | None,
    beta_max: float | None,
    min_mu: float = 0.5,
) -> tuple[np.ndarray, np.ndarray, np.ndarray]:
    """Make local gradient, fisher matrix, and nll for multiple steps.

    Parameters
    ----------
    design_matrix : ndarray
        The design matrix, of shape (n_obs, n_params).
    size_factors : ndarray
        The size factors, of shape (n_obs).
    beta : ndarray
        The log fold change matrix, of shape (batch_size, n_params).
    dispersions : ndarray
        The dispersions, of shape (batch_size).
    counts : ndarray
        The counts, of shape (n_obs,batch_size).
    ascent_direction : np.ndarray
        The ascent direction, of shape (batch_size, n_params).
    step_sizes: np.ndarray
        A list of step sizes to evaluate, of size (n_steps, ).
    beta_min: float
        The minimum value tolerated for beta.
    beta_max: float
        The maximum value tolerated for beta.
    min_mu : float
        Lower bound on estimated means, to ensure numerical stability.

    Returns
    -------
    H : Optional[ndarray]
        The Fisher information matrix, of shape
        (n_steps, batch_size, n_params, n_params).
    gradient : ndarray
        The gradient, of shape (n_steps, batch_size, n_params).
    nll : ndarray
        The nll evaluations on all steps, of size (n_steps, batch_size).
    """
    # If no ascent direction is provided, we do not need to compute the grid
    # of beta values, but only the current beta value, where we unsqueeze the
    # first dimension to make it compatible with the rest of the code
    # This is the case when we are at the first iteration of the optimization
    if ascent_direction is None and step_sizes is None:
        beta_grid = np.clip(
            beta[None, :, :],
            beta_min,
            beta_max,
        )  # of shape (n_steps, batch_size, n_params)

    # In this case, we compute the grid of beta values, by moving in the direction
    # of the ascent direction, by the step sizes
    else:
        assert isinstance(step_sizes, np.ndarray) and isinstance(
            ascent_direction, np.ndarray
        )
        beta_grid = np.clip(
            beta[None, :, :] - step_sizes[:, None, None] * ascent_direction[None, :, :],
            beta_min,
            beta_max,
        )  # of shape (n_steps, batch_size, n_params)

    mu_grid = size_factors[None, None, :] * np.exp(
        (design_matrix[None, None, :, :] @ beta_grid[:, :, :, None]).squeeze(axis=3)
    )  # of shape (n_steps, batch_size, n_obs)
    mu_grid = np.maximum(
        mu_grid,
        min_mu,
    )

    # --- Step 1: Compute the gradient ----#

    gradient_term_1 = -(design_matrix.T @ counts).T[
        None, :, :
    ]  # shape (1, batch_size, n_params)
    gradient_term_2 = (
        design_matrix.T[None, None, :, :]
        @ (
            (1 / dispersions[None, :, None] + counts.T[None, :, :])
            * mu_grid
            / (1 / dispersions[None, :, None] + mu_grid)  # n_steps, batch_size, n_obs
        )[:, :, :, None]
    ).squeeze(
        3
    )  # Shape n_steps, batch_size, n_params
    gradient = gradient_term_1 + gradient_term_2

    # ---- Step 2: Compute the Fisher matrix  ----#

    W = mu_grid / (1.0 + mu_grid * dispersions[None, :, None])
    expanded_design = design_matrix[
        None, None, :, :
    ]  # of shape (1, 1, n_obs, n_params)
    assert W is not None
    H = (expanded_design * W[:, :, :, None]).transpose(0, 1, 3, 2) @ expanded_design
    # H of size (n_steps, batch_size, n_params, n_params)

    # Get the mu_grid
    nll = mu_grid_nb_nll(counts, mu_grid, dispersions)

    return H, gradient, nll