block, bfq: update to latest bfq-v8-v4.4 state · 833ce657e4 - evie/android_kernel_oneplus_msm8998 - Gay Catgirls Forgejo: gay catgirls having sex

evie/android_kernel_oneplus_msm8998

block, bfq: update to latest bfq-v8-v4.4 state

BFQ-v8r12 up to 887cf43acdb1d5415fa678e4a63be8fe1bab2d3a

Change-Id: I4725397969026ff9fa969d598c4378f24800c31d
Signed-off-by: Alexander Martinz <alex@amartinz.at>

This commit is contained in:

Alexander Martinz

2018-01-16 22:15:14 +01:00

• committed by

codeworkx

parent 6b7d5107ba

commit 833ce657e4

7 changed files with 4706 additions and 2280 deletions

2

Documentation/block/00-INDEX

View file

 @ -1,5 +1,7 @@
 -INDEX
 	- This file
 bfq-iosched.txt
 	- BFQ IO scheduler and its tunables
 biodoc.txt
 	- Notes on the Generic Block Layer Rewrite in Linux 2.5
 capability.txt

545

Documentation/block/bfq-iosched.txt Normal file

View file

 @ -0,0 +1,545 @@
 BFQ (Budget Fair Queueing)
 ==========================
 BFQ is a proportional-share I/O scheduler, with some extra
 low-latency capabilities. In addition to cgroups support (blkio or io
 controllers), BFQ's main features are:
 - BFQ guarantees a high system and application responsiveness, and a
   low latency for time-sensitive applications, such as audio or video
   players;
 - BFQ distributes bandwidth, and not just time, among processes or
   groups (switching back to time distribution when needed to keep
   throughput high).
 In its default configuration, BFQ privileges latency over
 throughput. So, when needed for achieving a lower latency, BFQ builds
 schedules that may lead to a lower throughput. If your main or only
 goal, for a given device, is to achieve the maximum-possible
 throughput at all times, then do switch off all low-latency heuristics
 for that device, by setting low_latency to 0. Full details in Section 3.
 On average CPUs, the current version of BFQ can handle devices
 performing at most ~30K IOPS; at most ~50 KIOPS on faster CPUs. As a
 reference, 30-50 KIOPS correspond to very high bandwidths with
 sequential I/O (e.g., 8-12 GB/s if I/O requests are 256 KB large), and
 to 120-200 MB/s with 4KB random I/O.
 The table of contents follow. Impatients can just jump to Section 3.
 CONTENTS
 . When may BFQ be useful?
 -1 Personal systems
 -2 Server systems
 . How does BFQ work?
 . What are BFQ's tunable?
 . BFQ group scheduling
 -1 Service guarantees provided
 -2 Interface
 . When may BFQ be useful?
 ==========================
 BFQ provides the following benefits on personal and server systems.
 -1 Personal systems
 --------------------
 Low latency for interactive applications
 Regardless of the actual background workload, BFQ guarantees that, for
 interactive tasks, the storage device is virtually as responsive as if
 it was idle. For example, even if one or more of the following
 background workloads are being executed:
 - one or more large files are being read, written or copied,
 - a tree of source files is being compiled,
 - one or more virtual machines are performing I/O,
 - a software update is in progress,
 - indexing daemons are scanning filesystems and updating their
   databases,
 starting an application or loading a file from within an application
 takes about the same time as if the storage device was idle. As a
 comparison, with CFQ, NOOP or DEADLINE, and in the same conditions,
 applications experience high latencies, or even become unresponsive
 until the background workload terminates (also on SSDs).
 Low latency for soft real-time applications
 Also soft real-time applications, such as audio and video
 players/streamers, enjoy a low latency and a low drop rate, regardless
 of the background I/O workload. As a consequence, these applications
 do not suffer from almost any glitch due to the background workload.
 Higher speed for code-development tasks
 If some additional workload happens to be executed in parallel, then
 BFQ executes the I/O-related components of typical code-development
 tasks (compilation, checkout, merge, ...) much more quickly than CFQ,
 NOOP or DEADLINE.
 High throughput
 On hard disks, BFQ achieves up to 30% higher throughput than CFQ, and
 up to 150% higher throughput than DEADLINE and NOOP, with all the
 sequential workloads considered in our tests. With random workloads,
 and with all the workloads on flash-based devices, BFQ achieves,
 instead, about the same throughput as the other schedulers.
 Strong fairness, bandwidth and delay guarantees
 BFQ distributes the device throughput, and not just the device time,
 among I/O-bound applications in proportion their weights, with any
 workload and regardless of the device parameters. From these bandwidth
 guarantees, it is possible to compute tight per-I/O-request delay
 guarantees by a simple formula. If not configured for strict service
 guarantees, BFQ switches to time-based resource sharing (only) for
 applications that would otherwise cause a throughput loss.
 -2 Server systems
 ------------------
 Most benefits for server systems follow from the same service
 properties as above. In particular, regardless of whether additional,
 possibly heavy workloads are being served, BFQ guarantees:
 . audio and video-streaming with zero or very low jitter and drop
   rate;
 . fast retrieval of WEB pages and embedded objects;
 . real-time recording of data in live-dumping applications (e.g.,
   packet logging);
 . responsiveness in local and remote access to a server.
 . How does BFQ work?
 =====================
 BFQ is a proportional-share I/O scheduler, whose general structure,
 plus a lot of code, are borrowed from CFQ.
 - Each process doing I/O on a device is associated with a weight and a
   (bfq_)queue.
 - BFQ grants exclusive access to the device, for a while, to one queue
   (process) at a time, and implements this service model by
   associating every queue with a budget, measured in number of
   sectors.
   - After a queue is granted access to the device, the budget of the
     queue is decremented, on each request dispatch, by the size of the
     request.
   - The in-service queue is expired, i.e., its service is suspended,
     only if one of the following events occurs: 1) the queue finishes
     its budget, 2) the queue empties, 3) a "budget timeout" fires.
     - The budget timeout prevents processes doing random I/O from
       holding the device for too long and dramatically reducing
       throughput.
     - Actually, as in CFQ, a queue associated with a process issuing
       sync requests may not be expired immediately when it empties. In
       contrast, BFQ may idle the device for a short time interval,
       giving the process the chance to go on being served if it issues
       a new request in time. Device idling typically boosts the
       throughput on rotational devices, if processes do synchronous
       and sequential I/O. In addition, under BFQ, device idling is
       also instrumental in guaranteeing the desired throughput
       fraction to processes issuing sync requests (see the description
       of the slice_idle tunable in this document, or [1, 2], for more
       details).
       - With respect to idling for service guarantees, if several
 	processes are competing for the device at the same time, but
 	all processes (and groups, after the following commit) have
 	the same weight, then BFQ guarantees the expected throughput
 	distribution without ever idling the device. Throughput is
 	thus as high as possible in this common scenario.
   - If low-latency mode is enabled (default configuration), BFQ
     executes some special heuristics to detect interactive and soft
     real-time applications (e.g., video or audio players/streamers),
     and to reduce their latency. The most important action taken to
     achieve this goal is to give to the queues associated with these
     applications more than their fair share of the device
     throughput. For brevity, we call just "weight-raising" the whole
     sets of actions taken by BFQ to privilege these queues. In
     particular, BFQ provides a milder form of weight-raising for
     interactive applications, and a stronger form for soft real-time
     applications.
   - BFQ automatically deactivates idling for queues born in a burst of
     queue creations. In fact, these queues are usually associated with
     the processes of applications and services that benefit mostly
     from a high throughput. Examples are systemd during boot, or git
     grep.
   - As CFQ, BFQ merges queues performing interleaved I/O, i.e.,
     performing random I/O that becomes mostly sequential if
     merged. Differently from CFQ, BFQ achieves this goal with a more
     reactive mechanism, called Early Queue Merge (EQM). EQM is so
     responsive in detecting interleaved I/O (cooperating processes),
     that it enables BFQ to achieve a high throughput, by queue
     merging, even for queues for which CFQ needs a different
     mechanism, preemption, to get a high throughput. As such EQM is a
     unified mechanism to achieve a high throughput with interleaved
     I/O.
   - Queues are scheduled according to a variant of WF2Q+, named
     B-WF2Q+, and implemented using an augmented rb-tree to preserve an
     O(log N) overall complexity.  See [2] for more details. B-WF2Q+ is
     also ready for hierarchical scheduling. However, for a cleaner
     logical breakdown, the code that enables and completes
     hierarchical support is provided in the next commit, which focuses
     exactly on this feature.
   - B-WF2Q+ guarantees a tight deviation with respect to an ideal,
     perfectly fair, and smooth service. In particular, B-WF2Q+
     guarantees that each queue receives a fraction of the device
     throughput proportional to its weight, even if the throughput
     fluctuates, and regardless of: the device parameters, the current
     workload and the budgets assigned to the queue.
   - The last, budget-independence, property (although probably
     counterintuitive in the first place) is definitely beneficial, for
     the following reasons:
     - First, with any proportional-share scheduler, the maximum
       deviation with respect to an ideal service is proportional to
       the maximum budget (slice) assigned to queues. As a consequence,
       BFQ can keep this deviation tight not only because of the
       accurate service of B-WF2Q+, but also because BFQ *does not*
       need to assign a larger budget to a queue to let the queue
       receive a higher fraction of the device throughput.
     - Second, BFQ is free to choose, for every process (queue), the
       budget that best fits the needs of the process, or best
       leverages the I/O pattern of the process. In particular, BFQ
       updates queue budgets with a simple feedback-loop algorithm that
       allows a high throughput to be achieved, while still providing
       tight latency guarantees to time-sensitive applications. When
       the in-service queue expires, this algorithm computes the next
       budget of the queue so as to:
       - Let large budgets be eventually assigned to the queues
 	associated with I/O-bound applications performing sequential
 	I/O: in fact, the longer these applications are served once
 	got access to the device, the higher the throughput is.
       - Let small budgets be eventually assigned to the queues
 	associated with time-sensitive applications (which typically
 	perform sporadic and short I/O), because, the smaller the
 	budget assigned to a queue waiting for service is, the sooner
 	B-WF2Q+ will serve that queue (Subsec 3.3 in [2]).
 - If several processes are competing for the device at the same time,
   but all processes and groups have the same weight, then BFQ
   guarantees the expected throughput distribution without ever idling
   the device. It uses preemption instead. Throughput is then much
   higher in this common scenario.
 - ioprio classes are served in strict priority order, i.e.,
   lower-priority queues are not served as long as there are
   higher-priority queues.  Among queues in the same class, the
   bandwidth is distributed in proportion to the weight of each
   queue. A very thin extra bandwidth is however guaranteed to
   the Idle class, to prevent it from starving.
 . What are BFQ's tunable?
 ==========================
 The tunables back_seek-max, back_seek_penalty, fifo_expire_async and
 fifo_expire_sync below are the same as in CFQ. Their description is
 just copied from that for CFQ. Some considerations in the description
 of slice_idle are copied from CFQ too.
 per-process ioprio and weight
 -----------------------------
 Unless the cgroups interface is used (see "4. BFQ group scheduling"),
 weights can be assigned to processes only indirectly, through I/O
 priorities, and according to the relation:
 weight = (IOPRIO_BE_NR - ioprio) * 10.
 Beware that, if low-latency is set, then BFQ automatically raises the
 weight of the queues associated with interactive and soft real-time
 applications. Unset this tunable if you need/want to control weights.
 slice_idle
 ----------
 This parameter specifies how long BFQ should idle for next I/O
 request, when certain sync BFQ queues become empty. By default
 slice_idle is a non-zero value. Idling has a double purpose: boosting
 throughput and making sure that the desired throughput distribution is
 respected (see the description of how BFQ works, and, if needed, the
 papers referred there).
 As for throughput, idling can be very helpful on highly seeky media
 like single spindle SATA/SAS disks where we can cut down on overall
 number of seeks and see improved throughput.
 Setting slice_idle to 0 will remove all the idling on queues and one
 should see an overall improved throughput on faster storage devices
 like multiple SATA/SAS disks in hardware RAID configuration.
 So depending on storage and workload, it might be useful to set
 slice_idle=0.  In general for SATA/SAS disks and software RAID of
 SATA/SAS disks keeping slice_idle enabled should be useful. For any
 configurations where there are multiple spindles behind single LUN
 (Host based hardware RAID controller or for storage arrays), setting
 slice_idle=0 might end up in better throughput and acceptable
 latencies.
 Idling is however necessary to have service guarantees enforced in
 case of differentiated weights or differentiated I/O-request lengths.
 To see why, suppose that a given BFQ queue A must get several I/O
 requests served for each request served for another queue B. Idling
 ensures that, if A makes a new I/O request slightly after becoming
 empty, then no request of B is dispatched in the middle, and thus A
 does not lose the possibility to get more than one request dispatched
 before the next request of B is dispatched. Note that idling
 guarantees the desired differentiated treatment of queues only in
 terms of I/O-request dispatches. To guarantee that the actual service
 order then corresponds to the dispatch order, the strict_guarantees
 tunable must be set too.
 There is an important flipside for idling: apart from the above cases
 where it is beneficial also for throughput, idling can severely impact
 throughput. One important case is random workload. Because of this
 issue, BFQ tends to avoid idling as much as possible, when it is not
 beneficial also for throughput. As a consequence of this behavior, and
 of further issues described for the strict_guarantees tunable,
 short-term service guarantees may be occasionally violated. And, in
 some cases, these guarantees may be more important than guaranteeing
 maximum throughput. For example, in video playing/streaming, a very
 low drop rate may be more important than maximum throughput. In these
 cases, consider setting the strict_guarantees parameter.
 strict_guarantees
 -----------------
 If this parameter is set (default: unset), then BFQ
 - always performs idling when the in-service queue becomes empty;
 - forces the device to serve one I/O request at a time, by dispatching a
   new request only if there is no outstanding request.
 In the presence of differentiated weights or I/O-request sizes, both
 the above conditions are needed to guarantee that every BFQ queue
 receives its allotted share of the bandwidth. The first condition is
 needed for the reasons explained in the description of the slice_idle
 tunable.  The second condition is needed because all modern storage
 devices reorder internally-queued requests, which may trivially break
 the service guarantees enforced by the I/O scheduler.
 Setting strict_guarantees may evidently affect throughput.
 back_seek_max
 -------------
 This specifies, given in Kbytes, the maximum "distance" for backward seeking.
 The distance is the amount of space from the current head location to the
 sectors that are backward in terms of distance.
 This parameter allows the scheduler to anticipate requests in the "backward"
 direction and consider them as being the "next" if they are within this
 distance from the current head location.
 back_seek_penalty
 -----------------
 This parameter is used to compute the cost of backward seeking. If the
 backward distance of request is just 1/back_seek_penalty from a "front"
 request, then the seeking cost of two requests is considered equivalent.
 So scheduler will not bias toward one or the other request (otherwise scheduler
 will bias toward front request). Default value of back_seek_penalty is 2.
 fifo_expire_async
 -----------------
 This parameter is used to set the timeout of asynchronous requests. Default
 value of this is 248ms.
 fifo_expire_sync
 ----------------
 This parameter is used to set the timeout of synchronous requests. Default
 value of this is 124ms. In case to favor synchronous requests over asynchronous
 one, this value should be decreased relative to fifo_expire_async.
 low_latency
 -----------
 This parameter is used to enable/disable BFQ's low latency mode. By
 default, low latency mode is enabled. If enabled, interactive and soft
 real-time applications are privileged and experience a lower latency,
 as explained in more detail in the description of how BFQ works.
 DISABLE this mode if you need full control on bandwidth
 distribution. In fact, if it is enabled, then BFQ automatically
 increases the bandwidth share of privileged applications, as the main
 means to guarantee a lower latency to them.
 In addition, as already highlighted at the beginning of this document,
 DISABLE this mode if your only goal is to achieve a high throughput.
 In fact, privileging the I/O of some application over the rest may
 entail a lower throughput. To achieve the highest-possible throughput
 on a non-rotational device, setting slice_idle to 0 may be needed too
 (at the cost of giving up any strong guarantee on fairness and low
 latency).
 timeout_sync
 ------------
 Maximum amount of device time that can be given to a task (queue) once
 it has been selected for service. On devices with costly seeks,
 increasing this time usually increases maximum throughput. On the
 opposite end, increasing this time coarsens the granularity of the
 short-term bandwidth and latency guarantees, especially if the
 following parameter is set to zero.
 max_budget
 ----------
 Maximum amount of service, measured in sectors, that can be provided
 to a BFQ queue once it is set in service (of course within the limits
 of the above timeout). According to what said in the description of
 the algorithm, larger values increase the throughput in proportion to
 the percentage of sequential I/O requests issued. The price of larger
 values is that they coarsen the granularity of short-term bandwidth
 and latency guarantees.
 The default value is 0, which enables auto-tuning: BFQ sets max_budget
 to the maximum number of sectors that can be served during
 timeout_sync, according to the estimated peak rate.
 weights
 -------
 Read-only parameter, used to show the weights of the currently active
 BFQ queues.
 wr_ tunables
 ------------
 BFQ exports a few parameters to control/tune the behavior of
 low-latency heuristics.
 wr_coeff
 Factor by which the weight of a weight-raised queue is multiplied. If
 the queue is deemed soft real-time, then the weight is further
 multiplied by an additional, constant factor.
 wr_max_time
 Maximum duration of a weight-raising period for an interactive task
 (ms). If set to zero (default value), then this value is computed
 automatically, as a function of the peak rate of the device. In any
 case, when the value of this parameter is read, it always reports the
 current duration, regardless of whether it has been set manually or
 computed automatically.
 wr_max_softrt_rate
 Maximum service rate below which a queue is deemed to be associated
 with a soft real-time application, and is then weight-raised
 accordingly (sectors/sec).
 wr_min_idle_time
 Minimum idle period after which interactive weight-raising may be
 reactivated for a queue (in ms).
 wr_rt_max_time
 Maximum weight-raising duration for soft real-time queues (in ms). The
 start time from which this duration is considered is automatically
 moved forward if the queue is detected to be still soft real-time
 before the current soft real-time weight-raising period finishes.
 wr_min_inter_arr_async
 Minimum period between I/O request arrivals after which weight-raising
 may be reactivated for an already busy async queue (in ms).
 . Group scheduling with BFQ
 ============================
 BFQ supports both cgroups-v1 and cgroups-v2 io controllers, namely
 blkio and io. In particular, BFQ supports weight-based proportional
 share. To activate cgroups support, set BFQ_GROUP_IOSCHED.
 -1 Service guarantees provided
 -------------------------------
 With BFQ, proportional share means true proportional share of the
 device bandwidth, according to group weights. For example, a group
 with weight 200 gets twice the bandwidth, and not just twice the time,
 of a group with weight 100.
 BFQ supports hierarchies (group trees) of any depth. Bandwidth is
 distributed among groups and processes in the expected way: for each
 group, the children of the group share the whole bandwidth of the
 group in proportion to their weights. In particular, this implies
 that, for each leaf group, every process of the group receives the
 same share of the whole group bandwidth, unless the ioprio of the
 process is modified.
 The resource-sharing guarantee for a group may partially or totally
 switch from bandwidth to time, if providing bandwidth guarantees to
 the group lowers the throughput too much. This switch occurs on a
 per-process basis: if a process of a leaf group causes throughput loss
 if served in such a way to receive its share of the bandwidth, then
 BFQ switches back to just time-based proportional share for that
 process.
 -2 Interface
 -------------
 To get proportional sharing of bandwidth with BFQ for a given device,
 BFQ must of course be the active scheduler for that device.
 Within each group directory, the names of the files associated with
 BFQ-specific cgroup parameters and stats begin with the "bfq."
 prefix. So, with cgroups-v1 or cgroups-v2, the full prefix for
 BFQ-specific files is "blkio.bfq." or "io.bfq." For example, the group
 parameter to set the weight of a group with BFQ is blkio.bfq.weight
 or io.bfq.weight.
 Parameters to set
 -----------------
 For each group, there is only the following parameter to set.
 weight (namely blkio.bfq.weight or io.bfq-weight): the weight of the
 group inside its parent. Available values: 1..10000 (default 100). The
 linear mapping between ioprio and weights, described at the beginning
 of the tunable section, is still valid, but all weights higher than
 IOPRIO_BE_NR*10 are mapped to ioprio 0.
 Recall that, if low-latency is set, then BFQ automatically raises the
 weight of the queues associated with interactive and soft real-time
 applications. Unset this tunable if you need/want to control weights.
 [1] P. Valente, A. Avanzini, "Evolution of the BFQ Storage I/O
     Scheduler", Proceedings of the First Workshop on Mobile System
     Technologies (MST-2015), May 2015.
     http://algogroup.unimore.it/people/paolo/disk_sched/mst-2015.pdf
 [2] P. Valente and M. Andreolini, "Improving Application
     Responsiveness with the BFQ Disk I/O Scheduler", Proceedings of
     the 5th Annual International Systems and Storage Conference
     (SYSTOR '12), June 2012.
     Slightly extended version:
     http://algogroup.unimore.it/people/paolo/disk_sched/bfq-v1-suite-
 							results.pdf

18

block/Kconfig.iosched

View file

 @ -54,20 +54,20 @@ config IOSCHED_BFQ
 	tristate "BFQ I/O scheduler"
 	default n
 	---help---
 	  The BFQ I/O scheduler tries to distribute bandwidth among
 	  all processes according to their weights.
 	  It aims at distributing the bandwidth as desired, independently of
 	  the disk parameters and with any workload. It also tries to
 	  guarantee low latency to interactive and soft real-time
 	  applications. If compiled built-in (saying Y here), BFQ can
 	  be configured to support hierarchical scheduling.
 	The BFQ I/O scheduler distributes bandwidth among all
 	processes according to their weights, regardless of the
 	device parameters and with any workload. It also guarantees
 	a low latency to interactive and soft real-time applications.
 	Details in Documentation/block/bfq-iosched.txt
 config BFQ_GROUP_IOSCHED
 	bool "BFQ hierarchical scheduling support"
 	depends on CGROUPS && IOSCHED_BFQ=y
 	depends on IOSCHED_BFQ && BLK_CGROUP
 	default n
 	---help---
 	  Enable hierarchical scheduling in BFQ, using the blkio controller.
 	Enable hierarchical scheduling in BFQ, using the blkio
 	(cgroups-v1) or io (cgroups-v2) controller.
 choice
 	prompt "Default I/O scheduler"

									
										513

block/bfq-cgroup.c
									
										View file
										
				@ -7,7 +7,9 @@

				 * Copyright (C) 2008 Fabio Checconi <fabio@gandalf.sssup.it>

				 *		      Paolo Valente <paolo.valente@unimore.it>

				 *

				 * Copyright (C) 2010 Paolo Valente <paolo.valente@unimore.it>

				 * Copyright (C) 2015 Paolo Valente <paolo.valente@unimore.it>

				 *

				 * Copyright (C) 2016 Paolo Valente <paolo.valente@linaro.org>

				 *

				 * Licensed under the GPL-2 as detailed in the accompanying COPYING.BFQ

				 * file.

				@ -162,7 +164,7 @@ static struct blkcg_gq *bfqg_to_blkg(struct bfq_group *bfqg)

				static struct bfq_group *blkg_to_bfqg(struct blkcg_gq *blkg)

				{

					struct blkg_policy_data *pd = blkg_to_pd(blkg, &blkcg_policy_bfq);

					BUG_ON(!pd);

					return pd_to_bfqg(pd);

				}

				@ -224,14 +226,6 @@ static void bfqg_stats_update_io_merged(struct bfq_group *bfqg, int rw)

					blkg_rwstat_add(&bfqg->stats.merged, rw, 1);

				}

				static void bfqg_stats_update_dispatch(struct bfq_group *bfqg,

									      uint64_t bytes, int rw)

				{

					blkg_stat_add(&bfqg->stats.sectors, bytes >> 9);

					blkg_rwstat_add(&bfqg->stats.serviced, rw, 1);

					blkg_rwstat_add(&bfqg->stats.service_bytes, rw, bytes);

				}

				static void bfqg_stats_update_completion(struct bfq_group *bfqg,

							uint64_t start_time, uint64_t io_start_time, int rw)

				{

				@ -248,17 +242,11 @@ static void bfqg_stats_update_completion(struct bfq_group *bfqg,

				/* @stats = 0 */

				static void bfqg_stats_reset(struct bfqg_stats *stats)

				{

					if (!stats)

						return;

					/* queued stats shouldn't be cleared */

					blkg_rwstat_reset(&stats->service_bytes);

					blkg_rwstat_reset(&stats->serviced);

					blkg_rwstat_reset(&stats->merged);

					blkg_rwstat_reset(&stats->service_time);

					blkg_rwstat_reset(&stats->wait_time);

					blkg_stat_reset(&stats->time);

					blkg_stat_reset(&stats->unaccounted_time);

					blkg_stat_reset(&stats->avg_queue_size_sum);

					blkg_stat_reset(&stats->avg_queue_size_samples);

					blkg_stat_reset(&stats->dequeue);

				@ -268,19 +256,16 @@ static void bfqg_stats_reset(struct bfqg_stats *stats)

				}

				/* @to += @from */

				static void bfqg_stats_merge(struct bfqg_stats *to, struct bfqg_stats *from)

				static void bfqg_stats_add_aux(struct bfqg_stats *to, struct bfqg_stats *from)

				{

					if (!to || !from)

						return;

					/* queued stats shouldn't be cleared */

					blkg_rwstat_add_aux(&to->service_bytes, &from->service_bytes);

					blkg_rwstat_add_aux(&to->serviced, &from->serviced);

					blkg_rwstat_add_aux(&to->merged, &from->merged);

					blkg_rwstat_add_aux(&to->service_time, &from->service_time);

					blkg_rwstat_add_aux(&to->wait_time, &from->wait_time);

					blkg_stat_add_aux(&from->time, &from->time);

					blkg_stat_add_aux(&to->unaccounted_time, &from->unaccounted_time);

					blkg_stat_add_aux(&to->avg_queue_size_sum, &from->avg_queue_size_sum);

					blkg_stat_add_aux(&to->avg_queue_size_samples, &from->avg_queue_size_samples);

					blkg_stat_add_aux(&to->dequeue, &from->dequeue);

				@ -308,10 +293,8 @@ static void bfqg_stats_xfer_dead(struct bfq_group *bfqg)

					if (unlikely(!parent))

						return;

					bfqg_stats_merge(&parent->dead_stats, &bfqg->stats);

					bfqg_stats_merge(&parent->dead_stats, &bfqg->dead_stats);

					bfqg_stats_add_aux(&parent->stats, &bfqg->stats);

					bfqg_stats_reset(&bfqg->stats);

					bfqg_stats_reset(&bfqg->dead_stats);

				}

				static void bfq_init_entity(struct bfq_entity *entity,

				@ -326,21 +309,17 @@ static void bfq_init_entity(struct bfq_entity *entity,

						bfqq->ioprio_class = bfqq->new_ioprio_class;

						bfqg_get(bfqg);

					}

					entity->parent = bfqg->my_entity;

					entity->parent = bfqg->my_entity; /* NULL for root group */

					entity->sched_data = &bfqg->sched_data;

				}

				static void bfqg_stats_exit(struct bfqg_stats *stats)

				{

					blkg_rwstat_exit(&stats->service_bytes);

					blkg_rwstat_exit(&stats->serviced);

					blkg_rwstat_exit(&stats->merged);

					blkg_rwstat_exit(&stats->service_time);

					blkg_rwstat_exit(&stats->wait_time);

					blkg_rwstat_exit(&stats->queued);

					blkg_stat_exit(&stats->sectors);

					blkg_stat_exit(&stats->time);

					blkg_stat_exit(&stats->unaccounted_time);

					blkg_stat_exit(&stats->avg_queue_size_sum);

					blkg_stat_exit(&stats->avg_queue_size_samples);

					blkg_stat_exit(&stats->dequeue);

				@ -351,15 +330,11 @@ static void bfqg_stats_exit(struct bfqg_stats *stats)

				static int bfqg_stats_init(struct bfqg_stats *stats, gfp_t gfp)

				{

					if (blkg_rwstat_init(&stats->service_bytes, gfp) ||

					    blkg_rwstat_init(&stats->serviced, gfp) ||

					    blkg_rwstat_init(&stats->merged, gfp) ||

					if (blkg_rwstat_init(&stats->merged, gfp) ||

					    blkg_rwstat_init(&stats->service_time, gfp) ||

					    blkg_rwstat_init(&stats->wait_time, gfp) ||

					    blkg_rwstat_init(&stats->queued, gfp) ||

					    blkg_stat_init(&stats->sectors, gfp) ||

					    blkg_stat_init(&stats->time, gfp) ||

					    blkg_stat_init(&stats->unaccounted_time, gfp) ||

					    blkg_stat_init(&stats->avg_queue_size_sum, gfp) ||

					    blkg_stat_init(&stats->avg_queue_size_samples, gfp) ||

					    blkg_stat_init(&stats->dequeue, gfp) ||

				@ -383,11 +358,27 @@ static struct bfq_group_data *blkcg_to_bfqgd(struct blkcg *blkcg)

					return cpd_to_bfqgd(blkcg_to_cpd(blkcg, &blkcg_policy_bfq));

				}

				static struct blkcg_policy_data *bfq_cpd_alloc(gfp_t gfp)

				{

					struct bfq_group_data *bgd;

					bgd = kzalloc(sizeof(*bgd), GFP_KERNEL);

					if (!bgd)

						return NULL;

					return &bgd->pd;

				}

				static void bfq_cpd_init(struct blkcg_policy_data *cpd)

				{

					struct bfq_group_data *d = cpd_to_bfqgd(cpd);

					d->weight = BFQ_DEFAULT_GRP_WEIGHT;

					d->weight = cgroup_subsys_on_dfl(io_cgrp_subsys) ?

						CGROUP_WEIGHT_DFL : BFQ_WEIGHT_LEGACY_DFL;

				}

				static void bfq_cpd_free(struct blkcg_policy_data *cpd)

				{

					kfree(cpd_to_bfqgd(cpd));

				}

				static struct blkg_policy_data *bfq_pd_alloc(gfp_t gfp, int node)

				@ -398,8 +389,7 @@ static struct blkg_policy_data *bfq_pd_alloc(gfp_t gfp, int node)

					if (!bfqg)

						return NULL;

					if (bfqg_stats_init(&bfqg->stats, gfp) ||

					    bfqg_stats_init(&bfqg->dead_stats, gfp)) {

					if (bfqg_stats_init(&bfqg->stats, gfp)) {

						kfree(bfqg);

						return NULL;

					}

				@ -407,27 +397,20 @@ static struct blkg_policy_data *bfq_pd_alloc(gfp_t gfp, int node)

					return &bfqg->pd;

				}

				static void bfq_group_set_parent(struct bfq_group *bfqg,

									struct bfq_group *parent)

				{

					struct bfq_entity *entity;

					BUG_ON(!parent);

					BUG_ON(!bfqg);

					BUG_ON(bfqg == parent);

					entity = &bfqg->entity;

					entity->parent = parent->my_entity;

					entity->sched_data = &parent->sched_data;

				}

				static void bfq_pd_init(struct blkg_policy_data *pd)

				{

					struct blkcg_gq *blkg = pd_to_blkg(pd);

					struct bfq_group *bfqg = blkg_to_bfqg(blkg);

					struct bfq_data *bfqd = blkg->q->elevator->elevator_data;

					struct bfq_entity *entity = &bfqg->entity;

					struct bfq_group_data *d = blkcg_to_bfqgd(blkg->blkcg);

					struct blkcg_gq *blkg;

					struct bfq_group *bfqg;

					struct bfq_data *bfqd;

					struct bfq_entity *entity;

					struct bfq_group_data *d;

					blkg = pd_to_blkg(pd);

					BUG_ON(!blkg);

					bfqg = blkg_to_bfqg(blkg);

					bfqd = blkg->q->elevator->elevator_data;

					entity = &bfqg->entity;

					d = blkcg_to_bfqgd(blkg->blkcg);

					entity->orig_weight = entity->weight = entity->new_weight = d->weight;

					entity->my_sched_data = &bfqg->sched_data;

				@ -445,70 +428,53 @@ static void bfq_pd_free(struct blkg_policy_data *pd)

					struct bfq_group *bfqg = pd_to_bfqg(pd);

					bfqg_stats_exit(&bfqg->stats);

					bfqg_stats_exit(&bfqg->dead_stats);

					return kfree(bfqg);

				}

				/* offset delta from bfqg->stats to bfqg->dead_stats */

				static const int dead_stats_off_delta = offsetof(struct bfq_group, dead_stats) -

									offsetof(struct bfq_group, stats);

				/* to be used by recursive prfill, sums live and dead stats recursively */

				static u64 bfqg_stat_pd_recursive_sum(struct blkg_policy_data *pd, int off)

				{

					u64 sum = 0;

					sum += blkg_stat_recursive_sum(pd_to_blkg(pd), &blkcg_policy_bfq, off);

					sum += blkg_stat_recursive_sum(pd_to_blkg(pd), &blkcg_policy_bfq,

								       off + dead_stats_off_delta);

					return sum;

				}

				/* to be used by recursive prfill, sums live and dead rwstats recursively */

				static struct blkg_rwstat bfqg_rwstat_pd_recursive_sum(struct blkg_policy_data *pd,

										       int off)

				{

					struct blkg_rwstat a, b;

					a = blkg_rwstat_recursive_sum(pd_to_blkg(pd), &blkcg_policy_bfq, off);

					b = blkg_rwstat_recursive_sum(pd_to_blkg(pd), &blkcg_policy_bfq,

								      off + dead_stats_off_delta);

					blkg_rwstat_add_aux(&a, &b);

					return a;

				}

				static void bfq_pd_reset_stats(struct blkg_policy_data *pd)

				{

					struct bfq_group *bfqg = pd_to_bfqg(pd);

					bfqg_stats_reset(&bfqg->stats);

					bfqg_stats_reset(&bfqg->dead_stats);

				}

				static struct bfq_group *bfq_find_alloc_group(struct bfq_data *bfqd,

									      struct blkcg *blkcg)

				static void bfq_group_set_parent(struct bfq_group *bfqg,

								 struct bfq_group *parent)

				{

					struct request_queue *q = bfqd->queue;

					struct bfq_group *bfqg = NULL, *parent;

					struct bfq_entity *entity = NULL;

					struct bfq_entity *entity;

					BUG_ON(!parent);

					BUG_ON(!bfqg);

					BUG_ON(bfqg == parent);

					entity = &bfqg->entity;

					entity->parent = parent->my_entity;

					entity->sched_data = &parent->sched_data;

				}

				static struct bfq_group *bfq_lookup_bfqg(struct bfq_data *bfqd,

									 struct blkcg *blkcg)

				{

					struct blkcg_gq *blkg;

					blkg = blkg_lookup(blkcg, bfqd->queue);

					if (likely(blkg))

						return blkg_to_bfqg(blkg);

					return NULL;

				}

				static struct bfq_group *bfq_find_set_group(struct bfq_data *bfqd,

									    struct blkcg *blkcg)

				{

					struct bfq_group *bfqg, *parent;

					struct bfq_entity *entity;

					assert_spin_locked(bfqd->queue->queue_lock);

					/* avoid lookup for the common case where there's no blkcg */

					if (blkcg == &blkcg_root) {

						bfqg = bfqd->root_group;

					} else {

						struct blkcg_gq *blkg;

					bfqg = bfq_lookup_bfqg(bfqd, blkcg);

						blkg = blkg_lookup_create(blkcg, q);

						if (!IS_ERR(blkg))

							bfqg = blkg_to_bfqg(blkg);

						else /* fallback to root_group */

							bfqg = bfqd->root_group;

					}

					BUG_ON(!bfqg);

					if (unlikely(!bfqg))

						return NULL;

					/*

					 * Update chain of bfq_groups as we might be handling a leaf group

				@ -531,13 +497,18 @@ static struct bfq_group *bfq_find_alloc_group(struct bfq_data *bfqd,

					return bfqg;

				}

				static void bfq_pos_tree_add_move(struct bfq_data *bfqd, struct bfq_queue *bfqq);

				static void bfq_pos_tree_add_move(struct bfq_data *bfqd,

								  struct bfq_queue *bfqq);

				static void bfq_bfqq_expire(struct bfq_data *bfqd,

							    struct bfq_queue *bfqq,

							    bool compensate,

							    enum bfqq_expiration reason);

				/**

				 * bfq_bfqq_move - migrate @bfqq to @bfqg.

				 * @bfqd: queue descriptor.

				 * @bfqq: the queue to move.

				 * @entity: @bfqq's entity.

				 * @bfqg: the group to move to.

				 *

				 * Move @bfqq to @bfqg, deactivating it from its old group and reactivating

				@ -548,26 +519,40 @@ static void bfq_pos_tree_add_move(struct bfq_data *bfqd, struct bfq_queue *bfqq)

				 * rcu_read_lock()).

				 */

				static void bfq_bfqq_move(struct bfq_data *bfqd, struct bfq_queue *bfqq,

							  struct bfq_entity *entity, struct bfq_group *bfqg)

							  struct bfq_group *bfqg)

				{

					int busy, resume;

					struct bfq_entity *entity = &bfqq->entity;

					busy = bfq_bfqq_busy(bfqq);

					resume = !RB_EMPTY_ROOT(&bfqq->sort_list);

					BUG_ON(resume && !entity->on_st);

					BUG_ON(busy && !resume && entity->on_st &&

					BUG_ON(!bfq_bfqq_busy(bfqq) && !RB_EMPTY_ROOT(&bfqq->sort_list));

					BUG_ON(!RB_EMPTY_ROOT(&bfqq->sort_list) && !entity->on_st);

					BUG_ON(bfq_bfqq_busy(bfqq) && RB_EMPTY_ROOT(&bfqq->sort_list)

					       && entity->on_st &&

					       bfqq != bfqd->in_service_queue);

					BUG_ON(!bfq_bfqq_busy(bfqq) && bfqq == bfqd->in_service_queue);

					if (busy) {

						BUG_ON(atomic_read(&bfqq->ref) < 2);

					/* If bfqq is empty, then bfq_bfqq_expire also invokes

					 * bfq_del_bfqq_busy, thereby removing bfqq and its entity

					 * from data structures related to current group. Otherwise we

					 * need to remove bfqq explicitly with bfq_deactivate_bfqq, as

					 * we do below.

					 */

					if (bfqq == bfqd->in_service_queue)

						bfq_bfqq_expire(bfqd, bfqd->in_service_queue,

								false, BFQ_BFQQ_PREEMPTED);

						if (!resume)

							bfq_del_bfqq_busy(bfqd, bfqq, 0);

						else

							bfq_deactivate_bfqq(bfqd, bfqq, 0);

					} else if (entity->on_st)

					BUG_ON(entity->on_st && !bfq_bfqq_busy(bfqq)

					    && &bfq_entity_service_tree(entity)->idle !=

					       entity->tree);

					BUG_ON(RB_EMPTY_ROOT(&bfqq->sort_list) && bfq_bfqq_busy(bfqq));

					if (bfq_bfqq_busy(bfqq))

						bfq_deactivate_bfqq(bfqd, bfqq, false, false);

					else if (entity->on_st) {

						BUG_ON(&bfq_entity_service_tree(entity)->idle !=

						       entity->tree);

						bfq_put_idle_entity(bfq_entity_service_tree(entity), entity);

					}

					bfqg_put(bfqq_group(bfqq));

					/*

				@ -579,14 +564,17 @@ static void bfq_bfqq_move(struct bfq_data *bfqd, struct bfq_queue *bfqq,

					entity->sched_data = &bfqg->sched_data;

					bfqg_get(bfqg);

					if (busy) {

					BUG_ON(RB_EMPTY_ROOT(&bfqq->sort_list) && bfq_bfqq_busy(bfqq));

					if (bfq_bfqq_busy(bfqq)) {

						bfq_pos_tree_add_move(bfqd, bfqq);

						if (resume)

							bfq_activate_bfqq(bfqd, bfqq);

						bfq_activate_bfqq(bfqd, bfqq);

					}

					if (!bfqd->in_service_queue && !bfqd->rq_in_driver)

						bfq_schedule_dispatch(bfqd);

					BUG_ON(entity->on_st && !bfq_bfqq_busy(bfqq)

					       && &bfq_entity_service_tree(entity)->idle !=

					       entity->tree);

				}

				/**

				@ -613,7 +601,11 @@ static struct bfq_group *__bfq_bic_change_cgroup(struct bfq_data *bfqd,

					lockdep_assert_held(bfqd->queue->queue_lock);

					bfqg = bfq_find_alloc_group(bfqd, blkcg);

					bfqg = bfq_find_set_group(bfqd, blkcg);

					if (unlikely(!bfqg))

						bfqg = bfqd->root_group;

					if (async_bfqq) {

						entity = &async_bfqq->entity;

				@ -621,7 +613,8 @@ static struct bfq_group *__bfq_bic_change_cgroup(struct bfq_data *bfqd,

							bic_set_bfqq(bic, NULL, 0);

							bfq_log_bfqq(bfqd, async_bfqq,

								     "bic_change_group: %p %d",

								     async_bfqq, atomic_read(&async_bfqq->ref));

								     async_bfqq,

								     async_bfqq->ref);

							bfq_put_queue(async_bfqq);

						}

					}

				@ -629,7 +622,7 @@ static struct bfq_group *__bfq_bic_change_cgroup(struct bfq_data *bfqd,

					if (sync_bfqq) {

						entity = &sync_bfqq->entity;

						if (entity->sched_data != &bfqg->sched_data)

							bfq_bfqq_move(bfqd, sync_bfqq, entity, bfqg);

							bfq_bfqq_move(bfqd, sync_bfqq, bfqg);

					}

					return bfqg;

				@ -638,25 +631,23 @@ static struct bfq_group *__bfq_bic_change_cgroup(struct bfq_data *bfqd,

				static void bfq_bic_update_cgroup(struct bfq_io_cq *bic, struct bio *bio)

				{

					struct bfq_data *bfqd = bic_to_bfqd(bic);

					struct blkcg *blkcg;

					struct bfq_group *bfqg = NULL;

					uint64_t id;

					uint64_t serial_nr;

					rcu_read_lock();

					blkcg = bio_blkcg(bio);

					id = blkcg->css.serial_nr;

					rcu_read_unlock();

					serial_nr = bio_blkcg(bio)->css.serial_nr;

					/*

					 * Check whether blkcg has changed.  The condition may trigger

					 * spuriously on a newly created cic but there's no harm.

					 */

					if (unlikely(!bfqd) || likely(bic->blkcg_id == id))

						return;

					if (unlikely(!bfqd) || likely(bic->blkcg_serial_nr == serial_nr))

						goto out;

					bfqg = __bfq_bic_change_cgroup(bfqd, bic, blkcg);

					BUG_ON(!bfqg);

					bic->blkcg_id = id;

					bfqg = __bfq_bic_change_cgroup(bfqd, bic, bio_blkcg(bio));

					bic->blkcg_serial_nr = serial_nr;

				out:

					rcu_read_unlock();

				}

				/**

				@ -668,7 +659,7 @@ static void bfq_flush_idle_tree(struct bfq_service_tree *st)

					struct bfq_entity *entity = st->first_idle;

					for (; entity ; entity = st->first_idle)

						__bfq_deactivate_entity(entity, 0);

						__bfq_deactivate_entity(entity, false);

				}

				/**

				@ -682,7 +673,7 @@ static void bfq_reparent_leaf_entity(struct bfq_data *bfqd,

					struct bfq_queue *bfqq = bfq_entity_to_bfqq(entity);

					BUG_ON(!bfqq);

					bfq_bfqq_move(bfqd, bfqq, entity, bfqd->root_group);

					bfq_bfqq_move(bfqd, bfqq, bfqd->root_group);

					return;

				}

				@ -716,11 +707,12 @@ static void bfq_reparent_active_entities(struct bfq_data *bfqd,

				}

				/**

				 * bfq_destroy_group - destroy @bfqg.

				 * @bfqg: the group being destroyed.

				 * bfq_pd_offline - deactivate the entity associated with @pd,

				 *		    and reparent its children entities.

				 * @pd: descriptor of the policy going offline.

				 *

				 * Destroy @bfqg, making sure that it is not referenced from its parent.

				 * blkio already grabs the queue_lock for us, so no need to use RCU-based magic

				 * blkio already grabs the queue_lock for us, so no need to use

				 * RCU-based magic

				 */

				static void bfq_pd_offline(struct blkg_policy_data *pd)

				{

				@ -775,10 +767,15 @@ static void bfq_pd_offline(struct blkg_policy_data *pd)

					BUG_ON(bfqg->sched_data.next_in_service);

					BUG_ON(bfqg->sched_data.in_service_entity);

					__bfq_deactivate_entity(entity, 0);

					__bfq_deactivate_entity(entity, false);

					bfq_put_async_queues(bfqd, bfqg);

					BUG_ON(entity->tree);

					/*

					 * @blkg is going offline and will be ignored by

					 * blkg_[rw]stat_recursive_sum().  Transfer stats to the parent so

					 * that they don't get lost.  If IOs complete after this point, the

					 * stats for them will be lost.  Oh well...

					 */

					bfqg_stats_xfer_dead(bfqg);

				}

				@ -788,46 +785,35 @@ static void bfq_end_wr_async(struct bfq_data *bfqd)

					list_for_each_entry(blkg, &bfqd->queue->blkg_list, q_node) {

						struct bfq_group *bfqg = blkg_to_bfqg(blkg);

						BUG_ON(!bfqg);

						bfq_end_wr_async_queues(bfqd, bfqg);

					}

					bfq_end_wr_async_queues(bfqd, bfqd->root_group);

				}

				static u64 bfqio_cgroup_weight_read(struct cgroup_subsys_state *css,

								       struct cftype *cftype)

				{

					struct blkcg *blkcg = css_to_blkcg(css);

					struct bfq_group_data *bfqgd = blkcg_to_bfqgd(blkcg);

					int ret = -EINVAL;

					spin_lock_irq(&blkcg->lock);

					ret = bfqgd->weight;

					spin_unlock_irq(&blkcg->lock);

					return ret;

				}

				static int bfqio_cgroup_weight_read_dfl(struct seq_file *sf, void *v)

				static int bfq_io_show_weight(struct seq_file *sf, void *v)

				{

					struct blkcg *blkcg = css_to_blkcg(seq_css(sf));

					struct bfq_group_data *bfqgd = blkcg_to_bfqgd(blkcg);

					unsigned int val = 0;

					spin_lock_irq(&blkcg->lock);

					seq_printf(sf, "%u\n", bfqgd->weight);

					spin_unlock_irq(&blkcg->lock);

					if (bfqgd)

						val = bfqgd->weight;

					seq_printf(sf, "%u\n", val);

					return 0;

				}

				static int bfqio_cgroup_weight_write(struct cgroup_subsys_state *css,

									struct cftype *cftype,

									u64 val)

				static int bfq_io_set_weight_legacy(struct cgroup_subsys_state *css,

								    struct cftype *cftype,

								    u64 val)

				{

					struct blkcg *blkcg = css_to_blkcg(css);

					struct bfq_group_data *bfqgd = blkcg_to_bfqgd(blkcg);

					struct blkcg_gq *blkg;

					int ret = -EINVAL;

					int ret = -ERANGE;

					if (val < BFQ_MIN_WEIGHT || val > BFQ_MAX_WEIGHT)

						return ret;

				@ -871,13 +857,18 @@ static int bfqio_cgroup_weight_write(struct cgroup_subsys_state *css,

					return ret;

				}

				static ssize_t bfqio_cgroup_weight_write_dfl(struct kernfs_open_file *of,

									     char *buf, size_t nbytes,

									     loff_t off)

				static ssize_t bfq_io_set_weight(struct kernfs_open_file *of,

								 char *buf, size_t nbytes,

								 loff_t off)

				{

					u64 weight;

					/* First unsigned long found in the file is used */

					return bfqio_cgroup_weight_write(of_css(of), NULL,

									 simple_strtoull(strim(buf), NULL, 0));

					int ret = kstrtoull(strim(buf), 0, &weight);

					if (ret)

						return ret;

					return bfq_io_set_weight_legacy(of_css(of), NULL, weight);

				}

				static int bfqg_print_stat(struct seq_file *sf, void *v)

				@ -897,16 +888,17 @@ static int bfqg_print_rwstat(struct seq_file *sf, void *v)

				static u64 bfqg_prfill_stat_recursive(struct seq_file *sf,

								      struct blkg_policy_data *pd, int off)

				{

					u64 sum = bfqg_stat_pd_recursive_sum(pd, off);

					u64 sum = blkg_stat_recursive_sum(pd_to_blkg(pd),

									  &blkcg_policy_bfq, off);

					return __blkg_prfill_u64(sf, pd, sum);

				}

				static u64 bfqg_prfill_rwstat_recursive(struct seq_file *sf,

									struct blkg_policy_data *pd, int off)

				{

					struct blkg_rwstat sum = bfqg_rwstat_pd_recursive_sum(pd, off);

					struct blkg_rwstat sum = blkg_rwstat_recursive_sum(pd_to_blkg(pd),

											   &blkcg_policy_bfq,

											   off);

					return __blkg_prfill_rwstat(sf, pd, &sum);

				}

				@ -926,6 +918,41 @@ static int bfqg_print_rwstat_recursive(struct seq_file *sf, void *v)

					return 0;

				}

				static u64 bfqg_prfill_sectors(struct seq_file *sf, struct blkg_policy_data *pd,

							       int off)

				{

					u64 sum = blkg_rwstat_total(&pd->blkg->stat_bytes);

					return __blkg_prfill_u64(sf, pd, sum >> 9);

				}

				static int bfqg_print_stat_sectors(struct seq_file *sf, void *v)

				{

					blkcg_print_blkgs(sf, css_to_blkcg(seq_css(sf)),

							  bfqg_prfill_sectors, &blkcg_policy_bfq, 0, false);

					return 0;

				}

				static u64 bfqg_prfill_sectors_recursive(struct seq_file *sf,

									 struct blkg_policy_data *pd, int off)

				{

					struct blkg_rwstat tmp = blkg_rwstat_recursive_sum(pd->blkg, NULL,

									offsetof(struct blkcg_gq, stat_bytes));

					u64 sum = atomic64_read(&tmp.aux_cnt[BLKG_RWSTAT_READ]) +

						atomic64_read(&tmp.aux_cnt[BLKG_RWSTAT_WRITE]);

					return __blkg_prfill_u64(sf, pd, sum >> 9);

				}

				static int bfqg_print_stat_sectors_recursive(struct seq_file *sf, void *v)

				{

					blkcg_print_blkgs(sf, css_to_blkcg(seq_css(sf)),

							  bfqg_prfill_sectors_recursive, &blkcg_policy_bfq, 0,

							  false);

					return 0;

				}

				static u64 bfqg_prfill_avg_queue_size(struct seq_file *sf,

								      struct blkg_policy_data *pd, int off)

				{

				@ -961,38 +988,14 @@ static struct bfq_group *bfq_create_group_hierarchy(struct bfq_data *bfqd, int n

				        return blkg_to_bfqg(bfqd->queue->root_blkg);

				}

				static struct blkcg_policy_data *bfq_cpd_alloc(gfp_t gfp)

				{

				        struct bfq_group_data *bgd;

				        bgd = kzalloc(sizeof(*bgd), GFP_KERNEL);

				        if (!bgd)

				                return NULL;

				        return &bgd->pd;

				}

				static void bfq_cpd_free(struct blkcg_policy_data *cpd)

				{

				        kfree(cpd_to_bfqgd(cpd));

				}

				static struct cftype bfqio_files_dfl[] = {

					{

						.name = "weight",

						.flags = CFTYPE_NOT_ON_ROOT,

						.seq_show = bfqio_cgroup_weight_read_dfl,

						.write = bfqio_cgroup_weight_write_dfl,

					},

					{} /* terminate */

				};

				static struct cftype bfqio_files[] = {

				static struct cftype bfq_blkcg_legacy_files[] = {

					{

						.name = "bfq.weight",

						.read_u64 = bfqio_cgroup_weight_read,

						.write_u64 = bfqio_cgroup_weight_write,

						.flags = CFTYPE_NOT_ON_ROOT,

						.seq_show = bfq_io_show_weight,

						.write_u64 = bfq_io_set_weight_legacy,

					},

					/* statistics, cover only the tasks in the bfqg */

					/* statistics, covers only the tasks in the bfqg */

					{

						.name = "bfq.time",

						.private = offsetof(struct bfq_group, stats.time),

				@ -1000,18 +1003,17 @@ static struct cftype bfqio_files[] = {

					},

					{

						.name = "bfq.sectors",

						.private = offsetof(struct bfq_group, stats.sectors),

						.seq_show = bfqg_print_stat,

						.seq_show = bfqg_print_stat_sectors,

					},

					{

						.name = "bfq.io_service_bytes",

						.private = offsetof(struct bfq_group, stats.service_bytes),

						.seq_show = bfqg_print_rwstat,

						.private = (unsigned long)&blkcg_policy_bfq,

						.seq_show = blkg_print_stat_bytes,

					},

					{

						.name = "bfq.io_serviced",

						.private = offsetof(struct bfq_group, stats.serviced),

						.seq_show = bfqg_print_rwstat,

						.private = (unsigned long)&blkcg_policy_bfq,

						.seq_show = blkg_print_stat_ios,

					},

					{

						.name = "bfq.io_service_time",

				@ -1042,18 +1044,17 @@ static struct cftype bfqio_files[] = {

					},

					{

						.name = "bfq.sectors_recursive",

						.private = offsetof(struct bfq_group, stats.sectors),

						.seq_show = bfqg_print_stat_recursive,

						.seq_show = bfqg_print_stat_sectors_recursive,

					},

					{

						.name = "bfq.io_service_bytes_recursive",

						.private = offsetof(struct bfq_group, stats.service_bytes),

						.seq_show = bfqg_print_rwstat_recursive,

						.private = (unsigned long)&blkcg_policy_bfq,

						.seq_show = blkg_print_stat_bytes_recursive,

					},

					{

						.name = "bfq.io_serviced_recursive",

						.private = offsetof(struct bfq_group, stats.serviced),

						.seq_show = bfqg_print_rwstat_recursive,

						.private = (unsigned long)&blkcg_policy_bfq,

						.seq_show = blkg_print_stat_ios_recursive,

					},

					{

						.name = "bfq.io_service_time_recursive",

				@ -1099,32 +1100,41 @@ static struct cftype bfqio_files[] = {

						.private = offsetof(struct bfq_group, stats.dequeue),

						.seq_show = bfqg_print_stat,

					},

					{

						.name = "bfq.unaccounted_time",

						.private = offsetof(struct bfq_group, stats.unaccounted_time),

						.seq_show = bfqg_print_stat,

					},

					{ }	/* terminate */

				};

				static struct blkcg_policy blkcg_policy_bfq = {

				       .dfl_cftypes            = bfqio_files_dfl,

				       .legacy_cftypes         = bfqio_files,

				       .pd_alloc_fn            = bfq_pd_alloc,

				       .pd_init_fn             = bfq_pd_init,

				       .pd_offline_fn          = bfq_pd_offline,

				       .pd_free_fn             = bfq_pd_free,

				       .pd_reset_stats_fn      = bfq_pd_reset_stats,

				       .cpd_alloc_fn           = bfq_cpd_alloc,

				       .cpd_init_fn            = bfq_cpd_init,

				       .cpd_bind_fn	       = bfq_cpd_init,

				       .cpd_free_fn            = bfq_cpd_free,

				static struct cftype bfq_blkg_files[] = {

					{

						.name = "bfq.weight",

						.flags = CFTYPE_NOT_ON_ROOT,

						.seq_show = bfq_io_show_weight,

						.write = bfq_io_set_weight,

					},

					{} /* terminate */

				};

				#else

				#else /* CONFIG_BFQ_GROUP_IOSCHED */

				static inline void bfqg_stats_update_io_add(struct bfq_group *bfqg,

							struct bfq_queue *bfqq, int rw) { }

				static inline void bfqg_stats_update_io_remove(struct bfq_group *bfqg,

									       int rw) { }

				static inline void bfqg_stats_update_io_merged(struct bfq_group *bfqg,

									       int rw) { }

				static inline void bfqg_stats_update_completion(struct bfq_group *bfqg,

							uint64_t start_time, uint64_t io_start_time, int rw) { }

				static inline void

				bfqg_stats_set_start_group_wait_time(struct bfq_group *bfqg,

								     struct bfq_group *curr_bfqg) { }

				static inline void bfqg_stats_end_empty_time(struct bfqg_stats *stats) { }

				static inline void bfqg_stats_update_dequeue(struct bfq_group *bfqg) { }

				static inline void bfqg_stats_set_start_empty_time(struct bfq_group *bfqg) { }

				static inline void bfqg_stats_update_idle_time(struct bfq_group *bfqg) { }

				static inline void bfqg_stats_set_start_idle_time(struct bfq_group *bfqg) { }

				static inline void bfqg_stats_update_avg_queue_size(struct bfq_group *bfqg) { }

				static void bfq_bfqq_move(struct bfq_data *bfqd, struct bfq_queue *bfqq,

							  struct bfq_group *bfqg) {}

				static void bfq_init_entity(struct bfq_entity *entity,

							    struct bfq_group *bfqg)

				@ -1139,37 +1149,26 @@ static void bfq_init_entity(struct bfq_entity *entity,

					entity->sched_data = &bfqg->sched_data;

				}

				static struct bfq_group *

				bfq_bic_update_cgroup(struct bfq_io_cq *bic, struct bio *bio)

				{

					struct bfq_data *bfqd = bic_to_bfqd(bic);

					return bfqd->root_group;

				}

				static void bfq_bfqq_move(struct bfq_data *bfqd,

							  struct bfq_queue *bfqq,

							  struct bfq_entity *entity,

							  struct bfq_group *bfqg)

				{

				}

				static void bfq_bic_update_cgroup(struct bfq_io_cq *bic, struct bio *bio) {}

				static void bfq_end_wr_async(struct bfq_data *bfqd)

				{

					bfq_end_wr_async_queues(bfqd, bfqd->root_group);

				}

				static void bfq_disconnect_groups(struct bfq_data *bfqd)

				{

					bfq_put_async_queues(bfqd, bfqd->root_group);

				}

				static struct bfq_group *bfq_find_alloc_group(struct bfq_data *bfqd,

				                                              struct blkcg *blkcg)

				static struct bfq_group *bfq_find_set_group(struct bfq_data *bfqd,

									    struct blkcg *blkcg)

				{

					return bfqd->root_group;

				}

				static struct bfq_group *bfq_create_group_hierarchy(struct bfq_data *bfqd, int node)

				static struct bfq_group *bfqq_group(struct bfq_queue *bfqq)

				{

					return bfqq->bfqd->root_group;

				}

				static struct bfq_group *

				bfq_create_group_hierarchy(struct bfq_data *bfqd, int node)

				{

					struct bfq_group *bfqg;

					int i;

3590

block/bfq-iosched.c

View file

File diff suppressed because it is too large Load diff

1497

block/bfq-sched.c

View file

File diff suppressed because it is too large Load diff

821

block/bfq.h

View file

File diff suppressed because it is too large Load diff