Given a number of places in the tree that need to calculate this value
explicitly, might as well just create a macro for it.
(akpm: must be implemented as a macro for callee typeof() usage)
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a proper prototype for vty_init() in include/linux/vt_kern.h
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ad a proper prototype for migration_init() in include/linux/fs.h
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a proper prototype for signals_init() in include/linux/signal.h
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- All implementations can be __devinit
- The function prototypes were in asm/timex.h but they all must be the same,
so create a single declaration in linux/timex.h.
- uninline the sparc64 version to match the other architectures
- Don't bother #defining ARCH_HAS_READ_CURRENT_TIMER to a particular value.
[ezk@cs.sunysb.edu: fix build]
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Haavard Skinnemoen <hskinnemoen@atmel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch contains the scheduled removal of OSS drivers whose config
options have been removed in 2.6.23.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a proper prototype for show_interrupts() in include/linux/interrupt.h
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This allows a flag to be set on loop devices so that when they are
closed for the last time, they'll self-destruct.
In general, so that we can automatically allocate loop devices (as with
losetup -f) and have them disappear when we're done with them.
In particular, right now, so that we can stop relying on the hackish
special-case in umount(8) which kills off loop devices which were set up by
'mount -oloop'. That means we can stop putting crap in /etc/mtab which
doesn't belong there, which means it can be a symlink to /proc/mounts, which
means yet another writable file on the root filesystem is eliminated and the
'stateless' folks get happier... and OLPC trac #356 can be closed.
The mount(8) side of that is at
http://marc.info/?l=util-linux-ng&m=119362955431694&w=2
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Cc: Bernardo Innocenti <bernie@codewiz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The 32-bit version is more efficient (and apparently gives better hash
results than the 64-bit version), so users who are only hashing a 32-bit
quantity can now opt to use the 32-bit version explicitly, rather than
promoting to a long.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Cc: William Lee Irwin III <wli@holomorphy.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The source and destination addresses are included to allow channel
selection based on address alignment.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Haavard Skinnemoen <hskinnemoen@atmel.com>
Pass a full set of flags to drivers' per-operation 'prep' routines.
Currently the only flag passed is DMA_PREP_INTERRUPT. The expectation is
that arch-specific async_tx_find_channel() implementations can exploit this
capability to find the best channel for an operation.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Shannon Nelson <shannon.nelson@intel.com>
Reviewed-by: Haavard Skinnemoen <hskinnemoen@atmel.com>
The tx_set_src and tx_set_dest methods were originally implemented to allow
an array of addresses to be passed down from async_xor to the dmaengine
driver while minimizing stack overhead. Removing these methods allows
drivers to have all transaction parameters available at 'prep' time, saves
two function pointers in struct dma_async_tx_descriptor, and reduces the
number of indirect branches..
A consequence of moving this data to the 'prep' routine is that
multi-source routines like async_xor need temporary storage to convert an
array of linear addresses into an array of dma addresses. In order to keep
the same stack footprint of the previous implementation the input array is
reused as storage for the dma addresses. This requires that
sizeof(dma_addr_t) be less than or equal to sizeof(void *). As a
consequence CONFIG_DMADEVICES now depends on !CONFIG_HIGHMEM64G. It also
requires that drivers be able to make descriptor resources available when
the 'prep' routine is polled.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Shannon Nelson <shannon.nelson@intel.com>
Remove the unused ASYNC_TX_ASSUME_COHERENT flag. Async_tx is
meant to hide the difference between asynchronous hardware and synchronous
software operations, this flag requires clients to understand cache
coherency consequences of the async path.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Haavard Skinnemoen <hskinnemoen@atmel.com>
qc->n_iter was used for libata's own sg walking before sg chaining
replaced it. During conversion, the field and its usage in sata_fsl
were left behind. Kill the filed and update sata_fsl.
tj: This was part of James's libata-use-block-layer-padding patch.
Separated out by me.
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Tejun Heo <htejun@gmail.com>
Cc: Li Yang <leoli@freescale.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Marvell's Orion SoC includes SATA controllers based on Marvell's
PCI-to-SATA 88SX controllers. This patch extends the libATA sata_mv
driver to support those controllers.
[edited to use linux/ata_platform.h -jg]
Signed-off-by: Saeed Bishara <saeed@marvell.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Iterating through a device node's parents is simple enough, but dealing
with the refcounts properly is a little ugly, and replicating that logic
is asking for someone to get it wrong or forget it all together, eg:
while (dn != NULL) {
/* loop body */
tmp = of_get_parent(dn);
of_node_put(dn);
dn = tmp;
}
So add of_get_next_parent(), inspired by of_get_next_child(). The
contract is that it returns the parent and drops the reference on the
current node, this makes the loop look like:
while (dn != NULL) {
/* loop body */
dn = of_get_next_parent(dn);
}
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
egrep serial /proc/acpi/battery/BAT0/info
serial number: 32090
serial number can tell you from the imminent danger
of beeing set on fire.
Signed-off-by: maximilian attems <max@stro.at>
Acked-by: Alexey Starikovskiy <astarikovskiy@suse.de>
Signed-off-by: Len Brown <len.brown@intel.com>
ide-cris.c:
* Add cris_setup_ports() helper and use it instead of ide_setup_ports()
(fixes random value being set in ->io_ports[IDE_IRQ_OFFSET]).
buddha.c:
* Add buddha_setup_ports() helper and use it instead of ide_setup_ports().
falconide.c:
* Add falconide_setup_ports() helper and use it instead of ide_setup_ports(),
also fix return value of falconide_init() while at it.
gayle.c:
* Add gayle_setup_ports() helper and use it instead of ide_setup_ports().
macide.c:
* Add macide_setup_ports() helper and use it instead of ide_setup_ports()
(fixes incorrect value being set in ->io_ports[IDE_IRQ_OFFSET]).
q40ide.c:
* Fix q40_ide_setup_ports() comments.
ide.c:
* Remove no longer needed ide_setup_ports().
Cc: Mikael Starvik <starvik@axis.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
This is Palmchip BK3710 IDE controller support.
The IDE controller logic supports PIO, MultiWord-DMA and Ultra-DMA modes.
Supports interface to Compact Flash (CF) configured in True-IDE mode.
Bart:
- remove dead code
- fix ide_hwif_setup_dma() build problem
Signed-off-by: Anton Salnikov <asalnikov@ru.mvista.com>
Reviewed-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Reviewed-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Required by next patch to use it from the flow classifier.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The following is an implementation of the Windows Management
Instrumentation (WMI) ACPI interface mapper (PNP0C14).
What it does:
Parses the _WDG method and exports functions to process WMI method calls,
data block query/ set commands (both based on GUID) and does basic event
handling.
How: WMI presents an in kernel interface here (essentially, a minimal
wrapper around ACPI)
(const char *guid assume the 36 character ASCII representation of
a GUID - e.g. 67C3371D-95A3-4C37-BB61-DD47B491DAAB)
wmi_evaluate_method(const char *guid, u8 instance, u32 method_id,
const struct acpi_buffer *in, struct acpi_buffer *out)
wmi_query_block(const char *guid, u8 instance,
struct acpi_buffer *out)
wmi_set_block(const char *guid, u38 instance,
const struct acpi_buffer *in)
wmi_install_notify_handler(acpi_notify_handler handler);
wmi_remove_notify_handler(void);
wmi_get_event_data(u32 event, struct acpi_buffer *out)
wmi_has_guid(const char guid*)
wmi_has_guid() is a helper function to find if a GUID exists or not on the
system (a quick and easy way for WMI dependant drivers to see if the
the method/ block they want exists, since GUIDs are supposed to be unique).
Event handling - allow a WMI based driver to register a notifier handler
for each GUID with WMI. When a notification is sent to a GUID in WMI, the
handler registered with WMI is then called (it is left to the caller to
ask for the WMI event data associated with the GUID, if needed).
What it won't do:
Unicode - The MS article[1] calls for converting between ASCII and Unicode (or
vice versa) if a GUID is marked as "string". This is left up to the calling
driver.
Handle a MOF[1] - the WMI mapper just exports methods, data and events to
userspace. MOF handling is down to userspace.
Userspace interface - this will be added later.
[1] http://www.microsoft.com/whdc/system/pnppwr/wmi/wmi-acpi.mspx
===
ChangeLog
==
v1 (2007-10-02):
* Initial release
v2 (2007-10-05):
* Cleaned up code - split up super "wmi_evaluate_block" -> each external
symbol now handles its own ACPI calls, rather than handing off to
a "super" method (and in turn, is a lot simpler to read)
* Added a find_guid() symbol - return true if a given GUID exists on
the system
* wmi_* functions now return type acpi_status (since they are just
fancy wrappers around acpi_evaluate_object())
* Removed extra debug code
v3 (2007-10-27)
* More code clean up - now passes checkpatch.pl
* Change data block calls - ref MS spec, method ID is not required for
them, so drop it from the function parameters.
* Const'ify guid in the function call parameters.
* Fix _WDG buffer handling - copy the data to our own private structure.
* Change WMI from tristate to bool - otherwise the external functions are
not exported in linux/acpi.h if you try to build WMI as a module.
* Fix more flag comparisons.
* Add a maintainers entry - since I wrote this, I should take the blame
for it.
v4 (2007-10-30)
* Add missing brace from after fixing checkpatch errors.
* Rewrote event handling - allow external drivers to register with WMI to
handle WMI events
* Clean up flags and sanitise flag handling
v5 (2007-11-03)
* Add sysfs interface for userspace. Export events over netlink again.
* Remove module left overs, fully convert to built-in driver.
* Tweak in-kernel API to use u8 for instance, since this is what the GUID
blocks use (so instance cannot be greater than u8).
* Export wmi_get_event_data() for in kernel WMI drivers.
v6 (2007-11-07)
* Split out userspace into a different patch
v7 (2007-11-20)
* Fix driver to handle multiple PNP0C14 devices - store all GUIDs using
the kernel's built in list functions, and just keep adding to the list
every time we handle a PNP0C14 devices - GUIDs will always be unique,
and WMI callers do not know or care about different devices.
* Change WMI event handler registration to use its' own event handling
struct; we should not pass an acpi_handle down to any WMI based drivers
- they should be able to function with only the calls provided in WMI.
* Update my e-mail address
v8 (2007-11-28)
* Convert back to a module.
* Update Kconfig to default to building as a module.
* Remove an erroneous printk.
* Simply comments for string flag (since we now leave the handling to the
caller).
v9 (2007-12-07)
* Add back missing MODULE_DEVICE_TABLE for autoloading
* Checkpatch fixes
v10 (2007-12-12)
* Workaround broken GUIDs declared expensive without a WCxx method.
* Minor cleanups
v11 (2007-12-17)
* More fixing for broken GUIDs declared expensive without a WCxx method.
* Add basic EmbeddedControl region handling.
v12 (2007-12-18)
* Changed EC region handling code, as per Alexey's comments.
v13 (2007-12-27)
* Changed event handling so that we can have one event handler registered
per GUID, as per Matthew Garrett's suggestion.
v14 (2008-01-12)
* Remove ACPI debug statements
v15 (2008-02-01)
* Replace two remaining 'x == NULL' type tests with '!x'
v16 (2008-02-05)
* Change MAINTAINERS entry, as I am not, and never have been, paid to work
on WMI
* Remove 'default' line from Kconfig
Signed-off-by: Carlos Corbacho <carlos@strangeworlds.co.uk>
CC: Matthew Garrett <mjg59@srcf.ucam.org>
CC: Alexey Starikovskiy <aystarik@gmail.com>
Signed-off-by: Len Brown <len.brown@intel.com>
m68k allmodconfig gives
drivers/net/wireless/b43/main.c:251: error: implicit declaration of function 'mmiowb'
because CONFIG_B43=m, CONFIG_SSB_PCIHOST=n.
Might be Kconfig bustage, but this works...
Cc: Michael Buesch <mb@bu3sch.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
* 'agp-patches' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/agp-2.6:
agp: remove flush_agp_mappings calls from new flush handling code
intel-agp: introduce IS_I915 and do some cleanups..
[intel_agp] fix name for G35 chipset
intel-agp: fixup resource handling in flush code.
intel-agp: add new chipset ID
agp: remove unnecessary pci_dev_put
agp: remove uid comparison as security check
fix AGP warning
agp/intel: Add chipset flushing support for i8xx chipsets.
intel-agp: add chipset flushing support
agp: add chipset flushing support to AGP interface
Add some new card definitions and fix a typo (from Eugen Paiuc).
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Make it possible to unregister a led classdev object in a safe way during a
suspend/resume cycle.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Michael Buesch <mb@bu3sch.de>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: "John W. Linville" <linville@tuxdriver.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Len Brown <lenb@kernel.org>
Cc: Greg KH <greg@kroah.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Make it possible to unregister a Hardware Random Number Generator
device object in a safe way during a suspend/resume cycle.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Michael Buesch <mb@bu3sch.de>
Cc: Michael Buesch <mb@bu3sch.de>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: "John W. Linville" <linville@tuxdriver.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Len Brown <lenb@kernel.org>
Cc: Greg KH <greg@kroah.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Make it possible to unregister a misc device object in a safe way during a
suspend/resume cycle.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Michael Buesch <mb@bu3sch.de>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: "John W. Linville" <linville@tuxdriver.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Len Brown <lenb@kernel.org>
Cc: Greg KH <greg@kroah.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Replace latency.c use with pm_qos_params use.
Signed-off-by: mark gross <mgross@linux.intel.com>
Cc: "John W. Linville" <linville@tuxdriver.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Jaroslav Kysela <perex@suse.cz>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The following patch is a generalization of the latency.c implementation done
by Arjan last year. It provides infrastructure for more than one parameter,
and exposes a user mode interface for processes to register pm_qos
expectations of processes.
This interface provides a kernel and user mode interface for registering
performance expectations by drivers, subsystems and user space applications on
one of the parameters.
Currently we have {cpu_dma_latency, network_latency, network_throughput} as
the initial set of pm_qos parameters.
The infrastructure exposes multiple misc device nodes one per implemented
parameter. The set of parameters implement is defined by pm_qos_power_init()
and pm_qos_params.h. This is done because having the available parameters
being runtime configurable or changeable from a driver was seen as too easy to
abuse.
For each parameter a list of performance requirements is maintained along with
an aggregated target value. The aggregated target value is updated with
changes to the requirement list or elements of the list. Typically the
aggregated target value is simply the max or min of the requirement values
held in the parameter list elements.
>From kernel mode the use of this interface is simple:
pm_qos_add_requirement(param_id, name, target_value):
Will insert a named element in the list for that identified PM_QOS
parameter with the target value. Upon change to this list the new target is
recomputed and any registered notifiers are called only if the target value
is now different.
pm_qos_update_requirement(param_id, name, new_target_value):
Will search the list identified by the param_id for the named list element
and then update its target value, calling the notification tree if the
aggregated target is changed. with that name is already registered.
pm_qos_remove_requirement(param_id, name):
Will search the identified list for the named element and remove it, after
removal it will update the aggregate target and call the notification tree
if the target was changed as a result of removing the named requirement.
>From user mode:
Only processes can register a pm_qos requirement. To provide for
automatic cleanup for process the interface requires the process to register
its parameter requirements in the following way:
To register the default pm_qos target for the specific parameter, the
process must open one of /dev/[cpu_dma_latency, network_latency,
network_throughput]
As long as the device node is held open that process has a registered
requirement on the parameter. The name of the requirement is
"process_<PID>" derived from the current->pid from within the open system
call.
To change the requested target value the process needs to write a s32
value to the open device node. This translates to a
pm_qos_update_requirement call.
To remove the user mode request for a target value simply close the device
node.
[akpm@linux-foundation.org: fix warnings]
[akpm@linux-foundation.org: fix build]
[akpm@linux-foundation.org: fix build again]
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: mark gross <mgross@linux.intel.com>
Cc: "John W. Linville" <linville@tuxdriver.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Jaroslav Kysela <perex@suse.cz>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Venki Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Adam Belay <abelay@novell.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
kernel_shutdown_prepare() can now become static.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Acked-by: Pavel Machek <pavel@ucw.cz>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Smack is the Simplified Mandatory Access Control Kernel.
Smack implements mandatory access control (MAC) using labels
attached to tasks and data containers, including files, SVIPC,
and other tasks. Smack is a kernel based scheme that requires
an absolute minimum of application support and a very small
amount of configuration data.
Smack uses extended attributes and
provides a set of general mount options, borrowing technics used
elsewhere. Smack uses netlabel for CIPSO labeling. Smack provides
a pseudo-filesystem smackfs that is used for manipulation of
system Smack attributes.
The patch, patches for ls and sshd, a README, a startup script,
and x86 binaries for ls and sshd are also available on
http://www.schaufler-ca.com
Development has been done using Fedora Core 7 in a virtual machine
environment and on an old Sony laptop.
Smack provides mandatory access controls based on the label attached
to a task and the label attached to the object it is attempting to
access. Smack labels are deliberately short (1-23 characters) text
strings. Single character labels using special characters are reserved
for system use. The only operation applied to Smack labels is equality
comparison. No wildcards or expressions, regular or otherwise, are
used. Smack labels are composed of printable characters and may not
include "/".
A file always gets the Smack label of the task that created it.
Smack defines and uses these labels:
"*" - pronounced "star"
"_" - pronounced "floor"
"^" - pronounced "hat"
"?" - pronounced "huh"
The access rules enforced by Smack are, in order:
1. Any access requested by a task labeled "*" is denied.
2. A read or execute access requested by a task labeled "^"
is permitted.
3. A read or execute access requested on an object labeled "_"
is permitted.
4. Any access requested on an object labeled "*" is permitted.
5. Any access requested by a task on an object with the same
label is permitted.
6. Any access requested that is explicitly defined in the loaded
rule set is permitted.
7. Any other access is denied.
Rules may be explicitly defined by writing subject,object,access
triples to /smack/load.
Smack rule sets can be easily defined that describe Bell&LaPadula
sensitivity, Biba integrity, and a variety of interesting
configurations. Smack rule sets can be modified on the fly to
accommodate changes in the operating environment or even the time
of day.
Some practical use cases:
Hierarchical levels. The less common of the two usual uses
for MLS systems is to define hierarchical levels, often
unclassified, confidential, secret, and so on. To set up smack
to support this, these rules could be defined:
C Unclass rx
S C rx
S Unclass rx
TS S rx
TS C rx
TS Unclass rx
A TS process can read S, C, and Unclass data, but cannot write it.
An S process can read C and Unclass. Note that specifying that
TS can read S and S can read C does not imply TS can read C, it
has to be explicitly stated.
Non-hierarchical categories. This is the more common of the
usual uses for an MLS system. Since the default rule is that a
subject cannot access an object with a different label no
access rules are required to implement compartmentalization.
A case that the Bell & LaPadula policy does not allow is demonstrated
with this Smack access rule:
A case that Bell&LaPadula does not allow that Smack does:
ESPN ABC r
ABC ESPN r
On my portable video device I have two applications, one that
shows ABC programming and the other ESPN programming. ESPN wants
to show me sport stories that show up as news, and ABC will
only provide minimal information about a sports story if ESPN
is covering it. Each side can look at the other's info, neither
can change the other. Neither can see what FOX is up to, which
is just as well all things considered.
Another case that I especially like:
SatData Guard w
Guard Publish w
A program running with the Guard label opens a UDP socket and
accepts messages sent by a program running with a SatData label.
The Guard program inspects the message to ensure it is wholesome
and if it is sends it to a program running with the Publish label.
This program then puts the information passed in an appropriate
place. Note that the Guard program cannot write to a Publish
file system object because file system semanitic require read as
well as write.
The four cases (categories, levels, mutual read, guardbox) here
are all quite real, and problems I've been asked to solve over
the years. The first two are easy to do with traditonal MLS systems
while the last two you can't without invoking privilege, at least
for a while.
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Cc: Joshua Brindle <method@manicmethod.com>
Cc: Paul Moore <paul.moore@hp.com>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: James Morris <jmorris@namei.org>
Cc: "Ahmed S. Darwish" <darwish.07@gmail.com>
Cc: Andrew G. Morgan <morgan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The capability bounding set is a set beyond which capabilities cannot grow.
Currently cap_bset is per-system. It can be manipulated through sysctl,
but only init can add capabilities. Root can remove capabilities. By
default it includes all caps except CAP_SETPCAP.
This patch makes the bounding set per-process when file capabilities are
enabled. It is inherited at fork from parent. Noone can add elements,
CAP_SETPCAP is required to remove them.
One example use of this is to start a safer container. For instance, until
device namespaces or per-container device whitelists are introduced, it is
best to take CAP_MKNOD away from a container.
The bounding set will not affect pP and pE immediately. It will only
affect pP' and pE' after subsequent exec()s. It also does not affect pI,
and exec() does not constrain pI'. So to really start a shell with no way
of regain CAP_MKNOD, you would do
prctl(PR_CAPBSET_DROP, CAP_MKNOD);
cap_t cap = cap_get_proc();
cap_value_t caparray[1];
caparray[0] = CAP_MKNOD;
cap_set_flag(cap, CAP_INHERITABLE, 1, caparray, CAP_DROP);
cap_set_proc(cap);
cap_free(cap);
The following test program will get and set the bounding
set (but not pI). For instance
./bset get
(lists capabilities in bset)
./bset drop cap_net_raw
(starts shell with new bset)
(use capset, setuid binary, or binary with
file capabilities to try to increase caps)
************************************************************
cap_bound.c
************************************************************
#include <sys/prctl.h>
#include <linux/capability.h>
#include <sys/types.h>
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#ifndef PR_CAPBSET_READ
#define PR_CAPBSET_READ 23
#endif
#ifndef PR_CAPBSET_DROP
#define PR_CAPBSET_DROP 24
#endif
int usage(char *me)
{
printf("Usage: %s get\n", me);
printf(" %s drop <capability>\n", me);
return 1;
}
#define numcaps 32
char *captable[numcaps] = {
"cap_chown",
"cap_dac_override",
"cap_dac_read_search",
"cap_fowner",
"cap_fsetid",
"cap_kill",
"cap_setgid",
"cap_setuid",
"cap_setpcap",
"cap_linux_immutable",
"cap_net_bind_service",
"cap_net_broadcast",
"cap_net_admin",
"cap_net_raw",
"cap_ipc_lock",
"cap_ipc_owner",
"cap_sys_module",
"cap_sys_rawio",
"cap_sys_chroot",
"cap_sys_ptrace",
"cap_sys_pacct",
"cap_sys_admin",
"cap_sys_boot",
"cap_sys_nice",
"cap_sys_resource",
"cap_sys_time",
"cap_sys_tty_config",
"cap_mknod",
"cap_lease",
"cap_audit_write",
"cap_audit_control",
"cap_setfcap"
};
int getbcap(void)
{
int comma=0;
unsigned long i;
int ret;
printf("i know of %d capabilities\n", numcaps);
printf("capability bounding set:");
for (i=0; i<numcaps; i++) {
ret = prctl(PR_CAPBSET_READ, i);
if (ret < 0)
perror("prctl");
else if (ret==1)
printf("%s%s", (comma++) ? ", " : " ", captable[i]);
}
printf("\n");
return 0;
}
int capdrop(char *str)
{
unsigned long i;
int found=0;
for (i=0; i<numcaps; i++) {
if (strcmp(captable[i], str) == 0) {
found=1;
break;
}
}
if (!found)
return 1;
if (prctl(PR_CAPBSET_DROP, i)) {
perror("prctl");
return 1;
}
return 0;
}
int main(int argc, char *argv[])
{
if (argc<2)
return usage(argv[0]);
if (strcmp(argv[1], "get")==0)
return getbcap();
if (strcmp(argv[1], "drop")!=0 || argc<3)
return usage(argv[0]);
if (capdrop(argv[2])) {
printf("unknown capability\n");
return 1;
}
return execl("/bin/bash", "/bin/bash", NULL);
}
************************************************************
[serue@us.ibm.com: fix typo]
Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Signed-off-by: Andrew G. Morgan <morgan@kernel.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: James Morris <jmorris@namei.org>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: Casey Schaufler <casey@schaufler-ca.com>a
Signed-off-by: "Serge E. Hallyn" <serue@us.ibm.com>
Tested-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
KaiGai Kohei observed that this line in the linux header is not needed.
Signed-off-by: Andrew G. Morgan <morgan@kernel.org>
Cc: KaiGai Kohei <kaigai@kaigai.gr.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The patch supports legacy (32-bit) capability userspace, and where possible
translates 32-bit capabilities to/from userspace and the VFS to 64-bit
kernel space capabilities. If a capability set cannot be compressed into
32-bits for consumption by user space, the system call fails, with -ERANGE.
FWIW libcap-2.00 supports this change (and earlier capability formats)
http://www.kernel.org/pub/linux/libs/security/linux-privs/kernel-2.6/
[akpm@linux-foundation.org: coding-syle fixes]
[akpm@linux-foundation.org: use get_task_comm()]
[ezk@cs.sunysb.edu: build fix]
[akpm@linux-foundation.org: do not initialise statics to 0 or NULL]
[akpm@linux-foundation.org: unused var]
[serue@us.ibm.com: export __cap_ symbols]
Signed-off-by: Andrew G. Morgan <morgan@kernel.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: James Morris <jmorris@namei.org>
Cc: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Revert b68680e473 to make way for the next
patch: "Add 64-bit capability support to the kernel".
We want to keep the vfs_cap_data.data[] structure, using two 'data's for
64-bit caps (and later three for 96-bit caps), whereas
b68680e473 had gotten rid of the 'data' struct
made its members inline.
The 64-bit caps patch keeps the stack abuse fix at get_file_caps(), which was
the more important part of that patch.
[akpm@linux-foundation.org: coding-style fixes]
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Serge Hallyn <serue@us.ibm.com>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: James Morris <jmorris@namei.org>
Cc: Casey Schaufler <casey@schaufler-ca.com>
Cc: Andrew Morgan <morgan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch modifies the interface to inode_getsecurity to have the function
return a buffer containing the security blob and its length via parameters
instead of relying on the calling function to give it an appropriately sized
buffer.
Security blobs obtained with this function should be freed using the
release_secctx LSM hook. This alleviates the problem of the caller having to
guess a length and preallocate a buffer for this function allowing it to be
used elsewhere for Labeled NFS.
The patch also removed the unused err parameter. The conversion is similar to
the one performed by Al Viro for the security_getprocattr hook.
Signed-off-by: David P. Quigley <dpquigl@tycho.nsa.gov>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Chris Wright <chrisw@sous-sol.org>
Acked-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Cc: Casey Schaufler <casey@schaufler-ca.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
After making dirty a 100M file, the normal behavior is to start the
writeback for all data after 30s delays. But sometimes the following
happens instead:
- after 30s: ~4M
- after 5s: ~4M
- after 5s: all remaining 92M
Some analyze shows that the internal io dispatch queues goes like this:
s_io s_more_io
-------------------------
1) 100M,1K 0
2) 1K 96M
3) 0 96M
1) initial state with a 100M file and a 1K file
2) 4M written, nr_to_write <= 0, so write more
3) 1K written, nr_to_write > 0, no more writes(BUG)
nr_to_write > 0 in (3) fools the upper layer to think that data have all
been written out. The big dirty file is actually still sitting in
s_more_io. We cannot simply splice s_more_io back to s_io as soon as s_io
becomes empty, and let the loop in generic_sync_sb_inodes() continue: this
may starve newly expired inodes in s_dirty. It is also not an option to
draw inodes from both s_more_io and s_dirty, an let the loop go on: this
might lead to live locks, and might also starve other superblocks in sync
time(well kupdate may still starve some superblocks, that's another bug).
We have to return when a full scan of s_io completes. So nr_to_write > 0
does not necessarily mean that "all data are written". This patch
introduces a flag writeback_control.more_io to indicate that more io should
be done. With it the big dirty file no longer has to wait for the next
kupdate invokation 5s later.
In sync_sb_inodes() we only set more_io on super_blocks we actually
visited. This avoids the interaction between two pdflush deamons.
Also in __sync_single_inode() we don't blindly keep requeuing the io if the
filesystem cannot progress. Failing to do so may lead to 100% iowait.
Tested-by: Mike Snitzer <snitzer@gmail.com>
Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
Cc: Michael Rubin <mrubin@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
After running SetPageUptodate, preceeding stores to the page contents to
actually bring it uptodate may not be ordered with the store to set the
page uptodate.
Therefore, another CPU which checks PageUptodate is true, then reads the
page contents can get stale data.
Fix this by having an smp_wmb before SetPageUptodate, and smp_rmb after
PageUptodate.
Many places that test PageUptodate, do so with the page locked, and this
would be enough to ensure memory ordering in those places if
SetPageUptodate were only called while the page is locked. Unfortunately
that is not always the case for some filesystems, but it could be an idea
for the future.
Also bring the handling of anonymous page uptodateness in line with that of
file backed page management, by marking anon pages as uptodate when they
_are_ uptodate, rather than when our implementation requires that they be
marked as such. Doing allows us to get rid of the smp_wmb's in the page
copying functions, which were especially added for anonymous pages for an
analogous memory ordering problem. Both file and anonymous pages are
handled with the same barriers.
FAQ:
Q. Why not do this in flush_dcache_page?
A. Firstly, flush_dcache_page handles only one side (the smb side) of the
ordering protocol; we'd still need smp_rmb somewhere. Secondly, hiding away
memory barriers in a completely unrelated function is nasty; at least in the
PageUptodate macros, they are located together with (half) the operations
involved in the ordering. Thirdly, the smp_wmb is only required when first
bringing the page uptodate, wheras flush_dcache_page should be called each time
it is written to through the kernel mapping. It is logically the wrong place to
put it.
Q. Why does this increase my text size / reduce my performance / etc.
A. Because it is adding the necessary instructions to eliminate the data-race.
Q. Can it be improved?
A. Yes, eg. if you were to create a rule that all SetPageUptodate operations
run under the page lock, we could avoid the smp_rmb places where PageUptodate
is queried under the page lock. Requires audit of all filesystems and at least
some would need reworking. That's great you're interested, I'm eagerly awaiting
your patches.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add vm.highmem_is_dirtyable toggle
A 32 bit machine with HIGHMEM64 enabled running DCC has an MMAPed file of
approximately 2Gb size which contains a hash format that is written
randomly by the dbclean process. On 2.6.16 this process took a few
minutes. With lowmem only accounting of dirty ratios, this takes about 12
hours of 100% disk IO, all random writes.
Include a toggle in /proc/sys/vm/highmem_is_dirtyable which can be set to 1 to
add the highmem back to the total available memory count.
[akpm@linux-foundation.org: Fix the CONFIG_DETECT_SOFTLOCKUP=y build]
Signed-off-by: Bron Gondwana <brong@fastmail.fm>
Cc: Ethan Solomita <solo@google.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: WU Fengguang <wfg@mail.ustc.edu.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>