Multi-Antenna Interference Management for Coded Caching

A multi-antenna broadcast channel scenario is considered where a base station delivers contents to cache-enabled user terminals. A joint design of coded caching (CC) and multigroup multicast beamforming is proposed to benefit from spatial multiplexing gain, improved interference management and the global CC gain, simultaneously. The developed general content delivery strategies utilize the multiantenna multicasting opportunities provided by the CC technique while optimally balancing the detrimental impact of both noise and inter-stream interference from coded messages transmitted in parallel. Flexible resource allocation schemes for CC are introduced where the multicast beamformer design and the receiver complexity are controlled by varying the size of the subset of users served during a given time interval, and the overlap among the multicast messages transmitted in parallel, indicated by parameters <inline-formula> <tex-math notation="LaTeX">$\alpha $ </tex-math></inline-formula> and <inline-formula> <tex-math notation="LaTeX">$\beta $ </tex-math></inline-formula>, respectively. Degrees of freedom (DoF) analysis is provided showing that the DoF only depends on <inline-formula> <tex-math notation="LaTeX">$\alpha $ </tex-math></inline-formula> while it is independent of <inline-formula> <tex-math notation="LaTeX">$\beta $ </tex-math></inline-formula>. The proposed schemes are shown to provide the same degrees-of-freedom at high signal-to-noise ratio (SNR) as the state-of-art methods and, in general, to perform significantly better, especially in the finite SNR regime, than several baseline schemes.


I. INTRODUCTION
Video delivery will be responsible for about 80 percent of the mobile traffic by 2021 according to the Cisco traffic forecast report [1], which draws attention to the content caching technology as a key element of next generation networks [2].Content caching involves prefetching most popular A. Tölli  contents at network edge during low-congested hours mitigating network overcrowding when the real requests of users will show up.This idea has been widely investigated in various wireless network scenarios such as using cache-enabled helpers [3], device-to-device collaboration [4], [5], small cell networks [2], multi-hop networks [6], and Cooperative Multi-Point (CoMP) [7].
While the above works clearly demonstrate the benefits of caching in wireless networks, the pioneering work of [8] considers an information theoretic framework for the caching problem, through which a novel coded caching (CC) scheme is proposed.In the coded caching scheme the idea is that, instead of simply replicating high-popularity contents near-or-at end-users (at the cache content placement phase), one should spread different contents at different caches.This way, at the content delivery phase, common coded messages could be broadcast to different users with different demands, that would benefit all of the users resulting in substantial gains in large networks.This global caching gain relies on the observation that almost in all communication scenarios, broadcasting is much simpler than unicasting.Also, as proved later, the performance of this CC scheme is optimal under the assumption of uncoded prefetching, i.e. when coding is allowed only at the delivery phase [9].Follow-up works extend the coded caching scheme proposed in [8] to other setups such as online coded caching [10], hierarchical coded caching [11], and multi-server scenarios [12].All these works suggest that the same kind of CC gain is achievable under various network models.
In order to examine the CC approach in wireless networks the specific characteristics of wireless medium (such as the broadcast nature, fading, and interference) must be investigated to be able to implement the original idea of [8] in mobile delivery scenarios.In order to achieve this goal, in this paper, we investigate the potentials of applying CC to a single-cell multiple-input single-output (MISO) broadcast channel (BC).In such a scenario a multi-antenna base station (BS) transmitter, which has access to the contents library, satisfies content requests of singleantenna users (mobile devices) via a shared wireless medium.The users are cache-enabled, and thus, before the delivery phase begins, they have cached relevant data from the library during off-peak hours.We focus on a joint design of the beamforming scheme used at the BS and the CC design of multicast messages such that the achievable delivery rate is maximized in finite SNR regime.The main goal of our paper is to employ the multiple antennas at the transmitter to manage the interaction between noise and interference between coded messages (i.e., inter-stream interference) and at the same time to benefit from the gains promised by the CC paradigm.

A. Related Work
In the context of benefiting from CC gains in wireless networks, the authors in [13] consider the effect of delayed channel state information at the transmitter (CSIT) and demonstrate a synergy between CSIT and caching.Moreover, the work [14] investigates wireless interference channels where both the transmitters and receivers are cache-enabled.They show that, considering oneshot transmission schemes, the caches at receive and transmit sides are of equal value in the sense of network DoF, which is also confirmed to be the case in cellular networks [15].In contrast, [16] treats the same setup with mixed-CSIT and unveils the importance of receiver side memory in such a scenario.Cache-enabled interference channels are also investigated by other works such as [17]- [20] which do not restrict the schemes to be one-shot, and thus benefit from practically more complex interference alignment (IA) schemes.Also, the authors in [21] investigate CC schemes in wireless device-to-device networks and adapt the original CC scheme to a serverless setup, while [22] shows the benefit of device mobility in such scenarios.Furthermore, the cache-enabled cloud radio access networks (C-RAN) are studied in [23].
All the aforementioned papers consider wireless networks in the high signal-to-noise-ratio (SNR) regime, expressing their performance in terms of degrees-of-freedom (DoF).As high SNR analysis is not always a good indicator for practical implementations performance, there is still a gap which should be filled in with finite SNR analysis of the CC idea.The papers [24] and [25] propose different CC schemes in a wireless MISO-BC model, and provide a finite SNR analysis, in different system operating regimes.While the main idea in [24] is to use rate-splitting along with CC, the authors in [25] propose a joint design of CC and zero-forcing (ZF) to benefit from the spatial multiplexing gain and the global gain of CC, at the same time.While the ideas in [25] originally came from adapting the multi-server CC scheme of [12] (which is almost optimal in terms of DoF as shown in [14]) to a Gaussian MISO-BC, the interesting observations in [25] reveal that careful code and beamformer design modifications have significant effects on the finite SNR performance.
Moreover, it should be noted that [26] also considers using the rate-splitting along with CC and propose schemes benefiting from spatial multiplexing and CC gains in a MISO-BC setup.However, as shown in [27] the resulting DoF performance is worse than the zero-forcing proposal in [25], and, consequently, is inferior to our scheme as well.Although the works [28] and [29] consider the finite SNR performance of coded caching in broadcast channels, they assume a single-antenna transmitter, and thus in contrast to our paper, the interference management potentials of transmitter via its multiple antennas are not investigated.Finally, the authors in [30] addressed the subpacketization bottleneck in the multicast CC schemes [8], [12], [25], and proposed a simple ZF based multiantenna transmission scheme substantially reducing the required subpacketization, while providing the same high SNR DoF as [12], [25].

B. Main Contributions
In this paper, extending the joint interference nulling and CC concept originally proposed in [25], [27], a joint design of CC and generic multicast beamforming is introduced to simultaneously benefit from spatial multiplexing gain, improved management of inter-stream interference from coded messages transmitted in parallel, and the global caching gain.Our proposal results in a general content delivery scheme for any values of the problem parameters, i.e., the number of users K, library size N , cache size M , and number of transmit antennas L such that t = KM/N is an integer value.The general signal-to-interference-plus-noise ratio (SINR) expressions are handled directly to optimally balance the detrimental impact of both noise and inter-stream interference at low SNR.As the resulting optimization problems are not necessarily convex, successive convex approximation (SCA) methods are used to devise efficient iterative algorithms similarly to existing multicast beamformer design solutions [31].
Moreover, reduced complexity alternatives are introduced to control the size of the subset of users t + α served during a given time interval, and the overlap β among the multicast messages transmitted in parallel.Depending on the available spatial degrees of freedom, several multicast messages are transmitted in parallel to distinct subsets of users.The benefits of such schemes are twofold.First, the complexity of the beamformers design is managed by controlling the number of constraints and variables in the corresponding optimization problem.Second, a better rate performance is attained by exploiting the transmit antennas to achieve multiplexing gain and at the same time compensating for the worst users channel effects, at each specific SNR.In the least complex form of implementation with β = 1, the multicast messages do not overlap at all.This results in linear receiver implementation, which does not require successive interference cancellation (SIC) unlike in the general case.Thus, with a small loss in performance, the complexity of both the receiver and transmitter implementation can be significantly reduced.
Finally, DoF analysis of the proposed schemes is provided showing that the DoF only depends on α and it is independent of β.
Parts of this paper have been published in the conference publications [32]- [34].Considering simple 3and 4-user scenarios, the basic idea of combining multi-group multicast beamformer design and CC was first introduced in [32], while the reduced complexity multicast mode selection idea and the simple linear multicast beamforming strategy were introduced in [33] and [34], respectively.In this paper, in addition to simple scenarios considered in [32]- [34], a general content delivery scheme applicable for a wide range of the problem parameters values is provided along with a corresponding DoF analysis.
In this paper we use the following notations.We use (.) H to denote the Hermitian of a complex matrix.Let C and IN denote the set of complex and natural numbers and .be the norm of a complex vector.Also [m] denotes the set of integer numbers {1, ..., m}, and ⊕ represents addition in the corresponding finite field.For any vector v, we define v ⊥ such that v H v ⊥ = 0.
Moreover, A and |A| denote a set of indexes and its cardinality, while a collection of sets and the number of such sets are indicated by B ans |B|, respectively.II.SYSTEM MODEL Downlink transmission from a single L-antenna BS serving K cache enabled single-antenna users is considered.The BS is assumed to have access to a library of N files {W 1 , . . ., W N }, each of size F bits.Each user k is equipped with a cache memory of M F bits and has a message Z k = Z k (W 1 , . . ., W N ) stored in its cache, where Z k (•) denotes a function of the library files with entropy not larger than M F bits.This operation is referred to as the cache content placement, and it is performed once and at no cost, e.g. during network off-peak hours.
Upon a set of requests d k ∈ [1 : N ] at the content delivery phase, the BS multicasts coded signals, such that at the end of transmission all users can reliably decode their requested files.
Notice that user k decoder, in order to produce the decoded file W d k , makes use of its own cache content Z k as well as its own received signal from the wireless channel.
The received signal at user terminal k at time instant i, i = 1, . . ., n can be written as where the channel vector between the BS and UE k is denoted by h k ∈ C L , w S T is the multicast beamformer dedicated to users in subset T of set S ⊆ [1 : K] of users, and XS T (i) is the corresponding multicast message chosen from a unit power complex Gaussian codebook at time instant i.The size of T depends on the parameters K, M and N such that |T | = t + 1, where t KM/N [12], [27].The main idea in CC is (by careful cache content placement) to provide multicasting opportunities to groups of size t + 1, in which a common coded message would be useful for all the members of the multicast group.This is called the Global Coded Caching Gain, and is proportional to the total memory of the users, i.e., KM , normalized by the library size, i.e., N (for more details refer to [8]).In the following, the time index i is ignored for simplicity.
The receiver noise is assumed to be circularly symmetric zero mean z k ∼ CN (0, N 0 ).Finally, the CSIT of all K users is assumed to be perfectly known at the BS.Note that (1) is defined for a given set of users k ∈ S served at time instant i. Depending on the chosen transmission strategy and parametrization, the delivery of the requested files W d k ∀ k may require multiple time intervals/slots carried out for all possible partitionings and subsets S ⊆ [1 : K].

III. MULTICAST BEAMFORMING FOR CODED CACHING
In this work, we focus on the worst-case (over the users) delivery rate at which the system can serve all users requesting for any file of the library.Multicasting opportunities due to the coded caching [8], [12], [25] are utilized to devise an efficient multiantenna multicast beamforming method that perform well over the entire SNR region.In this section, we first introduce the proposed concept and its variations in four simple scenarios and discuss the generalization of the proposed schemes in Section IV.The basic multigroup multicast beamformer design for the classical 3-user case [8], [12], [25] is first described in Scenario 1, which in turn is extended to 4-user case in Scenario 2 to demonstrate how the size and complexity of the problem quickly increases for larger values of K. Reduced complexity alternatives to Scenario 2 are introduced in Scenarios 3 and 4 by controlling the size of the subset {S ⊆ [K]} served during a given time interval, and the overlap among the multicast messages transmitted in parallel, respectively.
Consider a content delivery scenario illustrated in Fig. 1, where a transmitter with L ≥ 2 antennas should deliver requests arising at K = 3 users from a library W = {A, B, C} of size N = 3 files each of F bits.Suppose that in the cache content placement phase each user can cache M = 1 files of F bits, without knowing the actual requests beforehand.In the content delivery phase we suppose each user requests one file from the library.Following the same cache content placement strategy as in [8] the cache contents of users are as follows where each file is divided into 3 equal-sized subfiles.
At the content delivery phase, suppose that the 1st, the 2nd, and the 3rd user request files A, B, and C, respectively.In the simple broadcast scenario in [8], the following coded messages are sent to users S = {1, 2, 3} by the transmitter one after another where ⊕ represents summation in the corresponding finite field, and the superscript S is omitted for ease of presentation.In such coding scheme, each coded message is received by all 3 users, but is only beneficial to 2 of them.For example, X 1,2 is useful for the 1st and 2nd user only.It can be easily checked that after transmission is concluded, all users can decode their requested files.Moreover, for every possible combination of the users requests, the scheme works with the same cache content placement, but with another set of coded delivery messages.
Consequently, in Scenario 1, we can combine the spatial multiplexing gain and the global caching gain following the scheme in [25] (see also [12], [14]).In [25], the unwanted messages at each user are forced to zero by sending where X stands for the modulated X, chosen from a unit power complex Gaussian codebook [25].
The key point here is to note that although this scheme is order-optimal in terms of DoF [14] it is suboptimal at low SNR regime [25], [27].Therefore, in this paper, instead of nulling interference at unwanted users, general multicast beamforming vectors w S T are defined as where [K] denotes the set of integer numbers {1, ..., K} and the superscript S is omitted for simplicity.As a result, the received signals at users 1 − 3 will be where the desired terms for each user are underlined.
Let us focus on user 1 who is interested in decoding both X1,2 , and X1,3 while X2,3 appears as Gaussian interference.Thus, from receiver 1 perspective, y 1 is a Gaussian multiple access channel (MAC).Suppose now user 1 can decode both of its required messages X1,2 and X1,3 with the equal rate1 where the rates R 1 1 and R 1 2 correspond to X1,2 , and X1,3 , respectively.Thus, the total useful rate is 2R 1 M AC .Since the user 1 must receive the missing 2/3F bits (A 2 and A 3 ), the time needed to decode file . As all the users decode their files in parallel, the time needed to complete the decoding process is constrained by the worst user as Then, the Symmetric Rate (Goodput) per user will be which, when optimized with respect to the beamforming vectors, can be found as max Finally, the symmetric rate maximization for K = 3 is given as which can be equally presented in an epigraph from as max .
The rest of the constraints as in (9).(10) Problem ( 10) is non-convex due to the SINR constraints.Similarly to [31], successive convex approximation (SCA) approach can be used to devise an iterative algorithm that is able to converge to a local solution.To begin with, the SINR constraint for γ 1  1 can be reformulated as Now, the R.H.S of ( 12) is a convex quadratic-over-linear function and it can be linearly approximated (lower bounded) as where wk,i and γ1 1 denote the fixed values (points of approximation) for the corresponding variables from the previous iteration.Using (13) and reformulating the objective in the epigraph form, the approximated problem is written as max .
This is a convex problem that can be readily solved using existing convex solvers.However, the logarithmic functions require further approximations to be able to apply the convention of convex programming algorithms.Problem ( 14) can be equally formulated as computationally efficient second order cone problem (SOCP).To this end, we note that the sum rate constraint can be bounded as Now, the equivalent SOCP reformulation follows as max .
The rest of the constraints as in (14) .(15) Finally, a solution for the original problem (9) can be found by solving (14) in an iterative manner using SCA, i.e, by updating the points of approximations wk,i and γl j in (13) after each iteration.As each difference-of-convex constraint in ( 12) is lower bounded by (13), the monotonic convergence of the objective of ( 14) is guaranteed.Note that the final symmetric rates are achieved by time sharing between the rate allocations corresponding to different points (decoding orders) in the sum rate region of the MAC channel.
As a lower complexity alternative, a zero forcing solution, denoted as CC with ZF, is also proposed 2 .By assigning the interference terms are canceled and (9) becomes: This is readily a convex power optimization problem with three real valued variables, and hence it can be solved in an optimal manner.
In the following, three baseline reference cases for the proposed multiantenna caching scheme are introduced.
2) 2nd Baseline Scheme: MaxMinSNR Multicasting: The message X 1,2 is multicast to the users 1 and 2, without any interference (orthogonally), by sending the signal w X1,2 .A single transmit beamformer is found to minimize the time needed for multicasting the common message:3 Similarly, the messages X 1,3 and X 2,3 should be delivered to the users with corresponding times T 1,3 and T 2,3 .Finally the resulting symmetric rate (Goodput) per user will be Note that, in this scheme, only the coded caching gain is exploited, while the multiple transmit antennas are used just for the beamforming gain.
3) 3rd Baseline Scheme: MaxMinRate Unicast: In this scheme, only the local caching gain is exploited and the CC gain is ignored altogether.The BS simply sends min(K, L) parallel independent streams to the users at each time instant.All the users can be served in parallel if On the other hand, if L < K, the users need to be divided into subsets of size L served at distinct time slots.Now, let us consider the case L = 2 and K = 3, and focus on users 1 and 2 in time slot 1.The transmitted signal to deliver A 2 and B 1 to users 1 and 2, respectively, is given as Thus, the delivery time of F/3 bits is where The minimum delivery time in (18) can be equivalently formulated as a maxmin SINR problem and solved optimally.By repeating the same procedure for the subsets {1, 3} and {2, 3}, the symmetric rate expression is equivalent to (18).
In this scenario, the number of users K and files N is further increased in order to demonstrate how the size and complexity of the problem quickly increases for larger values of K. We assume that the BS transmitter has L ≥ 3 antennas, and there are K = 4 users each with cache size M = 1, requesting files from a library W = {A, B, C, D} of N = 4 files.Following the same cache content placement strategy as in [8] the cache contents of users are as follows where here each file is divided into four non-overlapping equal-sized subfiles.
At the content delivery phase, suppose that the users 1 − 4 request files A − D, respectively.
Here, we have t KM/N = 1 and the subsets S and T will be of size 4 and t + 1 = 2, respectively (for details see [12], [27] and Section IV).Following the approach of Scenario 1, the transmit signal vector is where It can be easily verified that if each multicast message X T is delivered to all the members of T then all users can decode their requested files.
The received signal at user k = 1, 2, 3, 4 is written as where, as an example, the desired terms of user 1 are underlined.As in Scenario 1, each user faces a MAC channel, now with three desired signals, three Gaussian interference terms, and one noise term.Suppose that user k can decode each of its desired signals with the rate R k M AC .Consequently, this user receives useful information with the rate 3R k M AC , and the time required to fetch the entire file is . Following the same steps as in ( 6)-( 7), the symmetric rate per user can be found as where and where the rate bounds R 1 1 , R 1 2 and R 1 3 of user 1, for example, correspond to X1,2 , X1,3 and X1,4 , respectively.The bounds R 1 4 , R 1 5 and R 1 6 limit the sum rate of any combination of two transmitted multicast signals, and finally R 1  7 is the sum rate bound for all 3 messages.As the 3-dimensional MAC rate region for each user is formed by 7 rate constraints, the following optimization problem is solved to find the symmetric rate per stream: In order to solve the above non-convex problem, the SCA method is again used and the SINR constraints are approximated similarly to ( 12)-( 13).
The Second Baseline Scheme is similar to Scenario 1.In summary, each X T is being delivered to the users in the subset T , without interference.Thus, in total six time slots are needed to transmit the corresponding multicast messages.
The Third Unicasting Baseline Scheme is also the same as in Scenario 1.Let us first consider beamformed symbols w 1 Ã2 + w 2 B1 + w 3 C1 transmitted to the first the subset of users {1, 2, 3} with the corresponding minimum transmission time where the rates R k are given as in (20).to the baseline max-min SNR scheme (see (17) for K = 3).
The cache content placement works similarly, except that each subfile is split into 2 mini-files (indicated by superscripts) in order to allow different contents to be transmitted in each subset S. As a result, the following content is stored in user cache memories Subsequently, we focus on the users S = {1, 2, 3}.Let us send them the following transmit vector where . This transmission should be such that X T is received correctly at all users in T ⊂ {1, 2, 3}, |T | = 2 .Let us call the corresponding common rate for coding each X T as R 1,2,3 .Then, since each minifile is of length F/8, the time needed for this transmission is T 1,2,3 = F where 2 each coded with the rate R 1,2,4 and the corresponding transmission time is T 1,2,4 = F respectively, where 3 are coded with the rate R 1,3,4 with the corresponding transmission time 3 are coded with the rate R 2,3,4 and T 2,3,4 = F 8 1 R 2,3,4 .Since these transmissions are done in different time slots, the Symmetric Rate Per User of this example is The beamforming vectors are optimized separately to maximize the symmetric rate for each transmission interval.For each subset S the formulation is exactly the same as the one in Scenario 1.The difference is that in this scenario we have potentially more antennas available (L ≥ 3) allowing for further improved multicast beamforming performance.

D. Scenario 4: Simple Linear TX-RX strategy
In Scenarios 1-3, each user is allocated with a number of parallel streams that need to be decoded using SIC receiver structure.In this example, in contrast, we consider the same setting as in Scenarios 2-3 with L ≥ 3, K = 4, N = 4, M = 1 but no overlap is allowed among user groups served by multiple multicast messages transmitted in parallel.This leads to a simpler TX-RX strategy where all 6 multicast streams introduced in Scenario 2 are delivered across three orthogonal time intervals/slots, instead of transmitting all in parallel as in (21).In time slots 1-3, the multicast beamforming vectors are generated as w 1,2 (A 2 ⊕ B 1 ) + w 3,4 (C 4 ⊕ D 3 ), In each time slot, all 4 users are served with 2 parallel multicast streams.Each stream causes inter-stream interference to 2 other users not included in the given multicast group.Therefore, the BS, equipped at least with 3 antennas, has enough spatial degrees of freedom to manage the inter-stream interference between multicast streams.The beamforming vectors are optimized separately to maximize the symmetric rate R C (i) for each transmission interval i.Thus, the corresponding time to deliver the multicast messages containing F/4 fractions of the files in i) .Since these transmissions are done in 3 different time slots, the overall Symmetric Rate Per User of this scheme is As will be shown in Section V, the scheme provides the same overall DoF (slope) as the original scheme in Scenario 2, but with a constant gap at high SNR due to simplified TX-RX processing.
As no overlap is allowed, each user decodes a single multicast message in a given time slot.Therefore, neither SIC receiver nor MAC rate region constraints are needed in the problem formulation unlike in Scenario 2. As a result, the achievable rate is uniquely defined by the SINR of the received data stream.Let us define γ C (i) to be the common symmetric SINR for all users served in time slot i such that R C (i) = log(1 + γ C (i)).The multigroup multicast beamformer optimization problem for ith timeslot can be then expressed as the following common SINR maximization problem: where P(1) = {{1, 2}, {3, 4}}, P(2) = {{1, 3}, {2, 4}} and P(3) = {{1, 4}, {2, 3}}.The resulting problem is a multi-group multicast beamforming for common SINR maximization and several solutions exist, for example via semidefinite relaxation (SDR) of beamformers and solving (iteratively via bisection) as a semidefinite program (SDP) [35].Here, instead, we adopt the SCA solution from [31], based on which (31) can be solved efficiently as a series of second order cone programs.Unlike the SDP based designs, the SCA technique solves for beamformers directly, thereby avoiding the need for any randomization procedure if rank-1 beamformers are to be recovered from the SDR solutions [31].
For example, by approximating the SINR constraints as in ( 12)-( 13), the common SINR for time slot 1, γ C (1) can be solved (for a given approximation point w1,2 , w3,4 , γC (1) and by omitting the slot index i) as max where L(w T , w T , h k , γ C ) is given in (13).

IV. GENERAL CASE FORMULATION, ALGORITHM, AND RATE ANALYSIS
In general, the number of parallel multicast streams to be decoded at each user grows linearly when K, L, N are increased with the same ratio.Extending the fully overlapping approach introduced in Scenario 2, the number of rate constraints in the user specific MAC region grows exponentially, i.e., by 2 (K−1) − 1 per user if L ≥ N = K.For example, the case L = 4, K = 5, N = 5 and M = 1 would require altogether 5 2 = 10 multicast messages and each user should be able to decode 4 multicast messages.Thus, the total number of rate constraints would be K × (2 (K−1) − 1) = 5 × 15 while the number of SINR constraints to be approximated would be 5×4.As an efficient way to reduce the complexity of the problem both at the transmitter and the receivers (with a certain performance loss at high SNR), we may limit the size of user subsets benefiting from multicast messages transmitted in parallel as in Scenario 3 or limit the overlap among the multicast messages as in Scenario 4, reflected in parameters α and β, respectively, introduced later in this section.
In the following, the general algorithm for the delivery phase for any set of parameters K, L, N and M in Algorithm 1 is described.Let us first provide a light description of the algorithm.The cache content placement phase is the same as the scheme proposed in [8], where each file is split into K t subfiles, and we do not repeat it here.The only difference is that here we further split each subfile in [8] into minifiles such that the total number of minifiles is K t where δ := t+α t+β ∈ N.This further splitting is needed in order to allow different content to be transmitted in each additional time interval introduced due to parameters α and β, similarly to Scenario 3.
In the generalized scheme, instead of fixing the size of the subsets {S ⊆ [K]} to be min(t + L, K) as in Scenario 2, we introduce a new integer parameter α bounded by and define the size of subsets {S ⊆ [K]} to be t + α.The parameter α has two roles.First, it manages the trade-off between the multiplexing and multicast beamforming/diversity gains due to multiple transmit antennas, and thus should be designed carefully at each SN R to result in the maximum throughput.Second, it enables us to control the size of the MAC channel elements with respect to each user, and in turn, to control the optimization problem complexity for determining the beamforming vectors, as will be explained later.This generalization reduces to the baseline max-min SNR beamforming scheme if α = 1 (see (17) for K = 3).to Scenario 1 or Scenario 2, respectively, and the optimal multicast beamformers can be found by solving ( 14) or ( 24) (for corresponding k ∈ S).
After the required initializations, the algorithm contains an outer loop which goes over all the (t + α)-subsets of all the users [K].Let us now consider a scenario with K = 8 users with t = 1 and α = 5 depicted in Fig. 3 and focus on one particular realization of these (t + α)-subsets S = {1, 2, 3, 4, 5, 6}.For this specific set S, the second loop goes over all possible partitionings of S into (t + β)-groups, which are collected in P. Here, β, bounded by 1 ≤ β ≤ α, is another design parameter which controls the overlap among the multicast messages, i.e., the complexity of the beamformer design problem.
In this paper, the number of t + β sets within each t + α set is restricted to integer values δ := t+α t+β ∈ N. 4 In this example, there are 3 parallel coded messages for every pair of users inside P 1 , and 3 coded messages for every pair in P 2 , resulting in a total of 6 coded messages.
It should be noted that common messages for users in different groups are not allowed, which is the main ingredient behind controlling complexity of the beamformer design.In general, assuming δ ∈ N, there will be δ t+β t+1 coded messages involved for a fixed partitioning P, while these (t + 1)-subsets for multicast beamforming are collected in the collection of sets Ω S,P := i=1,...,δ {T ⊆ P i , |T | = t + 1} for a specific S and P. The transmit vector X(S, P) consists of all these coded messages multiplied by their corresponding beamformers.In the example shown in Fig. 3, the transmit vector X(S, P) for S = {1, 2, 3, 4, 5, 6}, P 1 = {1, 2, 3} and P 2 = {4, 5, 6} is generated as + w S,P Note that here β can control the number of coded messages aimed at each user.For example, if we allow β = α = 5, then there will be a total of 6 2 = 15 coded messages transmitted in parallel, of which every user would need to decode 5.By contrast, in the example scenario for β = 2, there are in total 6 parallel coded messages of which every user needs to decode 2.
Finally, the beamformers are optimized to deliver each coded message to its intended users at the highest common rate, considering interference from other terms as well as noise.The optimum beamformers are denoted by {w S,P T , T ∈ Ω S,P } * for a specific partitioning P of the set S. The inner loop in the algorithm (line 8) ensures that the above procedure is repeated for all possible partitionings of a given S in a TDMA manner (for example in Fig. 3, all the 10 possible partitionings in the table should be considered), and finally the outer loop repeats this process for all possible (t + α)-subsets S.
The following theorem characterizes the achievable delivery rate of this algorithm.A detailed analysis of the algorithm elements, and the corresponding performance analysis is provided in the proof that follows.
Theorem 1. Algorithm 1 will result in the following symmetric rate for all P = {P i } i=1,...,δ : ˙ i=1,...,δ P i = S, |P i | = t + β do 5: for all T ∈ Ω S,P do 7: end for 9: {w S,P T , T ∈ Ω S,P } * = arg max {w S T ,T ∈Ω S,P } min k∈S R k M AC S, P, {w S,P T , T ∈ Ω S,P } 10: X(S, P) ← T ∈Ω S,P w S,P T XT 11: transmit X(S, P) with the rate min k∈S R k M AC S, P, {w S,P T , T ∈ Ω S,P } * 12: end for 13: end for 14: end procedure where T * C (S, P) is the optimized transmission time to the subset S for a specific partitioning P. The outer sum is over all possible t + α sets S while the inner sum collects all disjoint unions ˙ P i of S such that |P i | = t + β, and given δ := t+α t+β ∈ N.Each T * C (S, P) is optimized over a set of multicast beamformers w S,P T , T ∈ Ω S,P T * C (S, P) = where R k M AC is the generalized stream specific rate expression for user k and given as R k M AC S, P, {w S,P T , T ∈ Ω S,P } = min and where and Ω S,P := i=1,...,δ The SINR expressions in (39) are non-convex, and hence, they need to be relaxed and approximated in a successive manner, similarly to (12)-( 13).First, (39) is relaxed as Now, the R.H.S of (41) is a convex quadratic-over-linear function and it can be linearly approximated and lower bounded as (sets S, P omitted) where wT and γk T denote the fixed values (points of approximation) for the corresponding variables from the previous iteration.
Before going to the proof of Theorem 1, let us revisit the simple scenarios introduced in Section III and relate each of them to the generic algorithm above.By inserting the parameters listed below into (36)-(38), the corresponding scenario specific symmetric rate expressions given in Section III can be recovered.The proof of Theorem 1 is given in the following.
Proof.In the cache content placement phase, each file is divided into K t subfiles as follows and each subfile is further divided into mini-files where In the original coded caching scheme of [8], there are K t+1 coded messages (called coded subfiles, each of size equal to a sub-file) which should be delivered to all (t+1)-subsets of users [K], i.e., X T := ⊕ k∈T W d k ,T \{k} should be delivered to all members of T for all T ⊆ [K], |T | = t+1.
Since in our construction (inner and outer loops in Algorithm 1, each (t + 1)-subset appears multiple times, we need to transmit smaller coded messages (called coded mini-files, each of size equal to a mini-file) in each appearance, which ensures that delivering each coded mini-file provides the targeted users with fresh (not transmitted before) mini-files they require.This is the main reason behind dividing each subfile into Γ mini-files.In order to do this, we define the operator NEW(.) which when operated on each sub-file returns the next fresh mini-file of that sub-file, which then will be used in forming coded mini-files.More specifically we have if the last application of NEW on the sub file W n,τ had returned W j n,τ .Next, we describe how these tasks are fulfilled with the help of multi-antenna interference management.
Let us focus on a specific (t + α)-subset of the users, namely S, and a specific partitiong of this subset, namely P = {P i } i=1,...,δ : ˙ i=1,...,δ P i = S, |P i | = t + β.Then, Ω S,P is the collection of all (t + 1)-subsets of S, such that each subset is contained inside a group P i of the partition.
Then, sum of coded mini-files of these (t + 1)-subsets with the corresponding beamformers will be transmitted to users in S in the form of the transmit signal X(S, P) ← T ∈Ω S,P w S,P T XS,P T (47) where XS,P T is ensured to be a coded mini-file combined of fresh mini-files for each involved user.
Assume that all the involved coded mini-files are successfully received at their intended users.
Then, all the subsets T ∈ Ω S,P will receive one coded mini-file, containing a fresh mini-file for each user in T .It can be easily verified that, if we go over all the possible (t + α)-subsets and their corresponding partitionings, each (t + 1)-subset of [K] will appear Γ times (given in (45)), and due to the appropriate mini-file indexing, each user will be able to decode a fresh mini-file in each transmission shot.Thus, these coded mini-files constitute the whole coded subfile.As this is true for all the (t + 1)-subset of [K], all the original tasks of [8] are fulfilled.
It just remains to be proven that by transmitting X(S, P) with the rate stated in Theorem 1, all the users in S will be able to decode their desired coded mini-files.Consider a user k ∈ S, which happens to be in the group P i of the partitioning P.Then, it is clear that this user will be interested in the coded mini-files X N T T such that T ∈ Ω S,P k , and all the remaining coded mini-files X N T T , T ∈ ΩS,P k will appear as interference to this user.Thus, this user faces a Gaussian MAC with |Ω S,P k | desired terms, | ΩS,P k | interference terms, and a noise term.Clearly, by restricting the transmission rate to the achievable Gaussian MAC rate in (38), this user can decode all the desired terms with an equal rate.Since we are transmitting the common message of size F ( K t )Γ to the users in S at the rate of the worst user, all of them will be able to decode the file within the minimum delivery time given in (37).
Finally, since each user decodes one requested file at the end, the symmetric (per-user) rate of the proposed scheme will be R sym = F/T where the total time T can be derived as where T * C (S, P) is the transmission time for the given subset S and partitioning P.
The following Degrees of Freedom (DoF) analysis of the proposed scheme shows that the DoF only depends on α and it is independent of β.By choosing α = L, the order optimal DoF can be achieved (see [27]).
Corollary 1.The DoF of the rate derived in the above theorem is Proof.DoF is defined as where (a) is due to that fact that the number of terms in the inner and outer summations are which concludes the proof.
Also, we characterize the results in [25] and the max-min SNR beamforming baseline scheme as a special cases of Theorem 1 in the following remark.
Remark 1.In Theorem 1, if we set α = β = min(L, K − t) and the beamforming vectors are chosen based on the zero forcing principle, the interference terms vanish, and it reduces to the results of [25].Furthermore, if we set α = β = 1, the result reduces to the baseline maxmin SNR beamforming scheme.
Moreover, the complexity of the optimization problem is characterized in the following remark.
Remark 2. All of the constraints involved can be rewritten as second-order cones (SOCs).
The SINR and transmit power constraints are readily in SOC form.However, the MAC sum rate constraints involving exponents (as seen, e.g., in (15)) require some additional steps for the complete SOC formulation [36].In the general case, the complexity of the beamformer design (36) is largely dominated by the number of simultaneously transmitted messages, that is, the partitioning size |Ω S,P k | = t+β−1 t . The number of MAC rate region constraints increases exponentially with β +t.However, the size of each SOC constraint involved with the MAC region is fairly small.On the other hand, the complexity of the SINR constraints scales quadratically with α + 1 and L [37].It should be noted that, the beamformer design can be split into K α+t parallel problems, which greatly improves the optimization latency and individual problem complexity as α is decreased.The receiver complexity is mostly affected by parameter β, i.e., whether or not SIC is needed.From the receiver perspective, β > 1 indicates the number of desired multicast messages decoded at each user using the SIC receiver structure.Remark 3. The above discussion is for parameter values such that t+β divides t+α.In general, one can vary α and β such that this condition holds true, however if it is not possible to ensure, a readily available option is always to set β = α, which by choosing α = L will achieve full DoF.For other cases where t + β does not divide t + α, extending the above techniques to arrive at satisfactory finite-SNR performance is challenging due to the asymmetries arising in the combinatorial nature of the problem.Therefore, this problem can be posed as an interesting topic for further research.

V. NUMERICAL EXAMPLES
The numerical examples are generated for various combinations of parameters L, K, N, M and |S|, including Scenarios 1 -4.The channels are considered to be i.i.d.complex Gaussian.
The average performance is attained over 500 independent channel realizations.The SNR is defined as P N 0 , where P is the power budget and N 0 = 1 is the fixed noise floor.All the Matlab codes are available online at https://github.com/kalesan/sim-cc-miso-bc.Fig. 4 shows the performance of the interference coordination with CC in Scenario 1, with K = 3 users and L = 2 antennas.It can be seen that the proposed CC multicast beamforming scheme via SCA, denoted as CC-SCA, achieves 3 − 5 dB gain at low SNR as compared to the ZF with equal power loading [25].At high SNR, the ZF with optimal power loading in (16) achieves comparable performance while other schemes have significant performance gap.At low SNR regime, the simple MaxMin SNR multicasting with CC (labelled as 'CC-SCA (α = β = 1)') has similar performance as the CC-SCA scheme with full overlap between multicast streams (α = β = 2 ).This is due to the fact that, at low SNR, an efficient strategy for beamforming is to concentrate all available power to a single (multicast) stream at a time and to serve different users/streams in TDMA fashion.Due to simultaneous global CC gain and interstream interference handling, both CC-SCA and CC-ZF schemes achieve an additional DoF, which was already shown (for high SNR) in [12], [25].The unicasting scheme does not perform well in this scenario as it does not utilize the global caching gain (only the local cache).
In Fig. 5, the number of transmit antennas is increased to L = 3.This provides more than 3dB additional gain for the CC-BF at low SNR, when compared to the L = 2 antenna scenario, while the DoF is the same for all the compared schemes.The optimal ZF multicast beamformer solution is no longer trivial, as the additional antenna makes the interference free signal space two-dimensional for the ZF schemes.A heuristic solution is used where orthogonal projection is first employed to get interference free signal space and then the strongest eigenvector of the stacked user channel matrix, projected to the null space, is used to get a sufficiently good direction within the interference free signal space.It can be seen that the ZF scheme does achieve the same DoF as CC-BF method, but there is a constant performance gap at high SNR.Interestingly, the CC-BF scheme with L = 2 antennas has better performance than MaxMin SINR unicast with L = 3 antennas.Both schemes have the same DoF, but the global caching gain is more beneficial than the additional spatial DoF of the unicast method.The results demonstrate that, at low SNR, the performance loss from using highly suboptimal ZF criterion can be more than 10 dB at low SNR.Furthermore, all 3 parametrizations provide almost identical performance.At high SNR region, the asymptotic DoF (slope) is the same for all cases while the low complexity β = 1 case suffers from about 5dB SNR penalty, which in turn can be alleviated by using a higher overlap (β = 2) among parallel multicast streams.

VI. CONCLUSIONS
Multicasting opportunities provided by caching at user terminal were utilized to devise an efficient multiantenna transmission with CC.General multicast beamforming strategies for content delivery with any values of the problem parameters, i.e., the number of users K, library size N , cache size M , and number of antennas L, size of the user subset t + α, and the overlap among the multicast messages β were employed, optimally balancing the detrimental impact of both noise and inter-stream interference from coded messages transmitted in parallel.Furthermore, the DoF was shown to only depend on α while being independent of β.The schemes were shown to perform significantly better than several base-line schemes over the entire SNR region.
and J. Kaleva are with Centre for Wireless Communications, University of Oulu P.O.Box 4500, FIN-90014 University of Oulu, Finland {antti.tolli,jarkko.kaleva}@oulu.fi, S. P. Shariatpanahi is with the Department of Electrical and Computer Engineering, University of Tehran, Tehran 14176-14418, Iran, and also with the School of Computer Science, Institute for Research in Fundamental Sciences, Tehran 19538-33511, Iran (pooya@ipm.ir), and B. Khalaj is with Department of Electrical Engineering, Sharif University of Technology, Tehran, Iran (khalaj@sharif.edu).Parts of this work has been published in 2018 IEEE International Symposium of Information Theory, 2018 International Workshop on Contenct Caching and Delivery in Wireless Networks and 2018 Asilomar Conference on Signals, Systems and Computers.This work was supported in part by the Academy of Finland grants No. 279101 and 319059, as well as 6Genesis Flagship grant No. 318927.arXiv:1711.03364v3[cs.IT] 22 Dec 2018

T 1 , 3 , 4 +
Similarly, the transmitted signals to three remaining user subsets {1, 2, 4}, {1, 3, 4} and {2, 3, 4} are w 1 Ã3 + w 2 B3 + w 4 D1 , w 1 Ã4 + w 3 C2 + w 4 D2 and w 2 B4 + w 3 C4 + w 4 D3 , respectively.Finally, the Symmetric Rate will be F/(T 1,2,3 + T 1,2,4 + T 2,3,4 ).C. Scenario 3: L ≥ 3, K = 4, N = 4, M = 1 and |S| = 3In this example, a reduced complexity alternative for Scenario 2 is considered.Instead of fixing the size of the served user set to |S| = 4 as in Scenario 2, we restrict the size of the subsets S ⊂[4] benefiting from a common transmitted signal to |S| = 3.Thus, the size of the MAC channel for each user is reduced from 3 to 2 and each user needs to decode just 2 multicast streams.This in turn, reduces the complexity of the problem for determining the beamforming vectors for each subset S ⊂[4].As will be shown later, besides complexity reduction, controlling the size of each subset allows us to handle the trade-off between the multiplexing and multicast beamforming gains due to multiple transmit antennas, resulting in even better rate performance at certain SNR values.Note that for |S| = 2, the beamformer design for each subset S reduces

(
t+α)! δ!(t+β)!δ and K t+α respectively, and since lim SN R→∞ (log SN R × T * C (S, P)) does not depend on particular S and P, (a) is valid for any of S and P indexed in the summations.Also (b) follows from(37) and (c) is due to the fact that lim SN R→∞ R k M AC S, P, {w S,P T , T ∈ Ω S,P } log SN R

Fig. 2 6 5
Fig.2illustrates some subset selection possibilities for K = 5.For a six user (K = 6) scenario shown in Fig.7, there are four possible subset sizes |S| =[2,3,4,5] that can be used to reduce the serving set size for multicast transmission in S. From Fig.7, we can observe again that, by reducing the subset size to |S| = 5 or 4, the average symmetric rate per user can be even improved at medium SNR as compared to the case where all users are served simultaneously, i.e., |S| = 6.At high SNR region, however, the reduced subset cases become highly suboptimal as the spatial DoF for transmitting parallel streams is limited by α.The high SNR slope for each curve in Fig.7is equivalent to the user specific DoF given in (50), ranging from 2 5 (α = 1) to 6 5 (α = 5).From complexity reduction perspective, the multicast mode with the smallest subset size providing close to optimal performance should be selected.In Fig.7, for example, subset sizes |S| = 3, |S| = 4, |S| = 5 could be used up to 0 dB, 10 dB and 30 dB, respectively, for optimal performance-complexity trade-off.In Fig.8, the impact of parameter β = [1, 2, 5] controlling the overlap among the parallel multicast messages is assessed with a fixed α = 5 for both SCA and ZF methods.The CC-ZF plots are generated by imposing zero-interference constraint similarly to(16).The results with α = 5 and β = 1 represent the case with no overlap, and hence, SIC is not required at the receivers.The multicast transmission is split into in total 15 time slots to cover all disjoint