Controlling CPU speed

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
User avatar
hgoel
Member
Member
Posts: 89
Joined: Sun Feb 09, 2014 7:11 pm
Libera.chat IRC: hgoel
Location: Within a meter of a computer

Controlling CPU speed

Post by hgoel »

I've been looking for information on changing the CPU speed on recent x86_64 processors in protected mode. I've got this issue where the OS runs really well in qemu but runs extremely slowly on bare metal (the same CPU), I can only think of two possible issues, either the CPU is under clocked at power on (this is a laptop so I think that's possible as a power saving measure) and maybe the caches are disabled at startup. It doesn't seem like it could be that the caches are disabled since otherwise I should see a speed up once all the code is doing is a single multiply in a loop (which should obviously fit in cache) but I don't.
"If the truth is a cruel mistress, than a lie must be a nice girl"
Working on Cardinal
Find me at [url=irc://chat.freenode.net:6697/Cardinal-OS]#Cardinal-OS[/url] on freenode!
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: Controlling CPU speed

Post by Brendan »

Hi,
hgoel0974 wrote:I've been looking for information on changing the CPU speed on recent x86_64 processors in protected mode. I've got this issue where the OS runs really well in qemu but runs extremely slowly on bare metal (the same CPU), I can only think of two possible issues, either the CPU is under clocked at power on (this is a laptop so I think that's possible as a power saving measure) and maybe the caches are disabled at startup.
Unfortunately ACPI killed sanity. By providing a (hideously ugly and over-complicated) standard interface (ACPI's AML) for software to use, ACPI made it possible for CPU manufacturers to avoid doing it in a simple/clean/architectural way, which mostly resulted in CPU manufacturers all doing "random CPU specific idiocy" with no usable documentation. ;)
hgoel0974 wrote:It doesn't seem like it could be that the caches are disabled since otherwise I should see a speed up once all the code is doing is a single multiply in a loop (which should obviously fit in cache) but I don't.
When caches are disabled, nothing fits in cache and no loop sees a speedup.

Fortunately it's relatively easy to check if caches are enabled - starting with the 2 flags in CR0, then the MSRs for MTRRs.


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
User avatar
hgoel
Member
Member
Posts: 89
Joined: Sun Feb 09, 2014 7:11 pm
Libera.chat IRC: hgoel
Location: Within a meter of a computer

Re: Controlling CPU speed

Post by hgoel »

Brendan wrote:Hi,
hgoel0974 wrote:I've been looking for information on changing the CPU speed on recent x86_64 processors in protected mode. I've got this issue where the OS runs really well in qemu but runs extremely slowly on bare metal (the same CPU), I can only think of two possible issues, either the CPU is under clocked at power on (this is a laptop so I think that's possible as a power saving measure) and maybe the caches are disabled at startup.
Unfortunately ACPI killed sanity. By providing a (hideously ugly and over-complicated) standard interface (ACPI's AML) for software to use, ACPI made it possible for CPU manufacturers to avoid doing it in a simple/clean/architectural way, which mostly resulted in CPU manufacturers all doing "random CPU specific idiocy" with no usable documentation. ;)
hgoel0974 wrote:It doesn't seem like it could be that the caches are disabled since otherwise I should see a speed up once all the code is doing is a single multiply in a loop (which should obviously fit in cache) but I don't.
When caches are disabled, nothing fits in cache and no loop sees a speedup.

Fortunately it's relatively easy to check if caches are enabled - starting with the 2 flags in CR0, then the MSRs for MTRRs.


Cheers,

Brendan
I've read about ACPI, seems like overkill to have an entire bytecode language to be parsed just for power management. I decided to actually read the IA32 manual and found information regarding both controlling CPU speed in an Intel specific way and regarding enabling caches. I should have read the manual before asking. Thanks for your answer though!
"If the truth is a cruel mistress, than a lie must be a nice girl"
Working on Cardinal
Find me at [url=irc://chat.freenode.net:6697/Cardinal-OS]#Cardinal-OS[/url] on freenode!
jnc100
Member
Member
Posts: 775
Joined: Mon Apr 09, 2007 12:10 pm
Location: London, UK
Contact:

Re: Controlling CPU speed

Post by jnc100 »

Out of interest how do you know it is running slow? I presume you dump something to the screen/serial port etc at regular intervals and it is the rate of this which is slower on the actual machine. Could it be that it is your display routine (i.e. interfacing with hardware) which is slower on the physical machine rather than the cpu itself per se?

Regards,
John.
User avatar
hgoel
Member
Member
Posts: 89
Joined: Sun Feb 09, 2014 7:11 pm
Libera.chat IRC: hgoel
Location: Within a meter of a computer

Re: Controlling CPU speed

Post by hgoel »

Yes, that's exactly it, the display is updated extremely slowly on hardware compared to VM
"If the truth is a cruel mistress, than a lie must be a nice girl"
Working on Cardinal
Find me at [url=irc://chat.freenode.net:6697/Cardinal-OS]#Cardinal-OS[/url] on freenode!
jnc100
Member
Member
Posts: 775
Joined: Mon Apr 09, 2007 12:10 pm
Location: London, UK
Contact:

Re: Controlling CPU speed

Post by jnc100 »

In that case, is the slow down in the cpu or in the writing to video memory? In other words, do you do lots of calculations on the cpu then output a character very occasionally, or do you try and output them really quickly (with few calculations in between)?

If the latter (i.e. your task is io bound), there are a number of explanations why it is slower on real hardware. How are you updating the display? If text mode are you writing directly to video memory or using BIOS interrupts? Are you attempting to read from video memory before writing back (e.g. to implement scrolling)? Have you configured the paging settings appropriately for the video memory? MTRRs should be properly set up for you by the BIOS, but it doesn't hurt to check.

Regards,
John.
User avatar
hgoel
Member
Member
Posts: 89
Joined: Sun Feb 09, 2014 7:11 pm
Libera.chat IRC: hgoel
Location: Within a meter of a computer

Re: Controlling CPU speed

Post by hgoel »

jnc100 wrote:In that case, is the slow down in the cpu or in the writing to video memory? In other words, do you do lots of calculations on the cpu then output a character very occasionally, or do you try and output them really quickly (with few calculations in between)?
For now I just try to output stuff really quickly, so it's possible you're right and I'm IO bound (I'm having GRUB setup a 1920x1080x32 display mode for me :P )
jnc100 wrote: If the latter (i.e. your task is io bound), there are a number of explanations why it is slower on real hardware. How are you updating the display? If text mode are you writing directly to video memory or using BIOS interrupts? Are you attempting to read from video memory before writing back (e.g. to implement scrolling)? Have you configured the paging settings appropriately for the video memory? MTRRs should be properly set up for you by the BIOS, but it doesn't hurt to check.

Regards,
John.
At the moment I'm updating the display by writing everything to local non-framebuffer memory and then once I'm done, swapping things over with a memcpy. I haven't touched the MTRRs yet as I've been trying to switch over to using the APIC, IO APIC and HPET instead of the PIC and PIT, which I just finished doing, so I'll setup the MTRRs now. I think I have the paging stuff setup correctly for video memory, however the OP was made when I hadn't setup paging.
I'm not sure how much work it is to write drivers for the Intel HD Graphics BLT engine, looking through the docs it doesn't seem as complicated as I expected, but it's still probably a little far off since I'm still just setting up the basic stuff like multitasking and minimal ACPI
"If the truth is a cruel mistress, than a lie must be a nice girl"
Working on Cardinal
Find me at [url=irc://chat.freenode.net:6697/Cardinal-OS]#Cardinal-OS[/url] on freenode!
jnc100
Member
Member
Posts: 775
Joined: Mon Apr 09, 2007 12:10 pm
Location: London, UK
Contact:

Re: Controlling CPU speed

Post by jnc100 »

You may wish to optimize your screen updating routines first before trying to implement hardware blitting. Firstly, you probably want to support the concept of only updating the video memory with those parts of the back buffer which have actually changed by keeping track of the updated areas as 'dirty rectangles'. Secondly, when copying large areas of memory, your memcpy routine can probably be improved with the use of mmx/sse stores/loads if support for those are available.

Regards,
John.
User avatar
hgoel
Member
Member
Posts: 89
Joined: Sun Feb 09, 2014 7:11 pm
Libera.chat IRC: hgoel
Location: Within a meter of a computer

Re: Controlling CPU speed

Post by hgoel »

jnc100 wrote:You may wish to optimize your screen updating routines first before trying to implement hardware blitting. Firstly, you probably want to support the concept of only updating the video memory with those parts of the back buffer which have actually changed by keeping track of the updated areas as 'dirty rectangles'. Secondly, when copying large areas of memory, your memcpy routine can probably be improved with the use of mmx/sse stores/loads if support for those are available.

Regards,
John.
I spent some time looking through the Intel HD Graphics docs and it seems very clear that it's going to be a while before implementing hardware acceleration is going to be a good idea. For now I'm going to follow your suggestion and optimize my drawing routines, I do have the VFPU enabled so I should really be putting it to use where I can. Currently my memcpy uses rep movsb from glibc but I can create a version optimized for the graphics code.
"If the truth is a cruel mistress, than a lie must be a nice girl"
Working on Cardinal
Find me at [url=irc://chat.freenode.net:6697/Cardinal-OS]#Cardinal-OS[/url] on freenode!
Post Reply