TCP/IP

Solar · Post by **Solar** » Tue Apr 06, 2004 6:34 am

DennisCGc wrote: If I use a sniffing program, the format of the package is the same as described in the rfcs.

You mean you never encountered an unspecified source address, invalid CRC, too-large / too-small package, or any other inconsistency with what the RFC's tell you?

There are many broken implementations out there that are widely used. You may comply to every RFC there is, but "compliant" doesn't equate to "functional". Think SAMBA - another protocol that's based on "this is how it works" instead of "this is how it's specified".
Where is it used ?
How could the OS work with it ?

SMB is a protocol for drive and printer sharing among systems. SMB implementations had a reputation of being buggy. The specs were released to the public only partially, and not at all during a time when Microsoft became the primary vendor for SMB software.

Which meant that, when SAMBA stepped up to the plate to provide an Open Source implementation of the SMB protocol, they had two options:

* strictly following the spec's (and being incompatible with Windows machines); or
* reverse-engineering all the bugs, catches, and assumptions wrought into Windows SMB.

They chose the latter. (Hardly surprising.)

I have to admit that a quick Google for issues with TCP/IP came up empty. But I still strongly remember Holger's statement on "if all you had for implementing TCP/IP were the RFC's, you likely won't even get a connection". I didn't make this one up. If I come across any old records, I'll post them here.

DennisCGc · Post by **DennisCGc** » Wed Apr 07, 2004 7:42 am

Solar wrote:
DennisCGc wrote: If I use a sniffing program, the format of the package is the same as described in the rfcs.
You mean you never encountered an unspecified source address, invalid CRC, too-large / too-small package, or any other inconsistency with what the RFC's tell you?

No, the format is the same as said in the rfcs, the "fields" are the same.

SMB is a protocol for drive and printer sharing among systems. SMB implementations had a reputation of being buggy. The specs were released to the public only partially, and not at all during a time when Microsoft became the primary vendor for SMB software.

Which meant that, when SAMBA stepped up to the plate to provide an Open Source implementation of the SMB protocol, they had two options:

* strictly following the spec's (and being incompatible with Windows machines); or
* reverse-engineering all the bugs, catches, and assumptions wrought into Windows SMB.

They chose the latter. (Hardly surprising.)

I didn't mean that ::) , but the changes in TCP/IP, how could the OS work with it ?
Since TCP/IP is the most used protocol, a change of the protocol should not be toleranted, that's what I think.

Candy · Post by **Candy** » Wed Apr 07, 2004 8:36 am

DennisCGc wrote:
Solar wrote: You mean you never encountered an unspecified source address, invalid CRC, too-large / too-small package, or any other inconsistency with what the RFC's tell you?
No, the format is the same as said in the rfcs, the "fields" are the same.

That's true. The specification says that these are packets that conform to TCP, and does not specify what you should do with other packets, that include TCP packets with errors and non-TCP packets. Also, seeing the checksum is correct doesn't mean the packet is valid TCP. The entire point of the CRC is not to call the others non-tcp, it's to more or less guarantee the integrity of this packet as a TCP packet with those fields. Most, if not all, of the other "deviations" are caused by bugs or by software actually not implementing TCP.

The only thing you should learn from this style of packet sending, is that you should check, double check and triple-check every byte you get, for validity. Is the source address not from a network explicitly connected to a different cable (IP spoofing), is there an invalid combination of flags (or, check all the valid combinations and if none is found, drop it), is the checksum correct, is this the Nth SYN packet from a certain host within X seconds, is this packet part of a connection, if it isn't, how did it get here, if it is, does the sequence number correspond, check for holes, go-back-N and SR implementation choices, fast retransmission, delay timers...

The point of this sentence was to show you TCP is hard. The only thing you should really learn from it is that you must always distrust things you cannot be 100% perfectly sure of, and keep every possible input value within reasonability.

Also, you have a lot of implementation-defined points, and you need a high-quality timer (with very little overhead) for the packet sending.
You might want to make it a dedicated TCP-only timer (one that keeps a list and always appends to the end, which is faster than searching through it).

As I've got an idea how to, follow me:

You implement a timer with an add function and a remove function. Each timer is a static 30seconds. That means, 3000 hundreds of a second. You have an int in there. It represents the most recent time you added a packet, or if none, the current time minus 3000 (or zero, make that a kludge). You have a function that can tell you what the hundredths-count of the moment is.

Adding a packet takes the current time minus the last-time-added time, giving you the time between the last packet added and this one in hundreds of a second. Add this one to the end of the list with that delay.

Removing a packet is hard. Because you'd have to traverse the list, it's also unpractical, because it's not O(1). This timer should scale very well, so let's not add an O(n) function to it.
Instead of actually removing it, you add it to a list of removed-packets. This sounds really crooked, but it works nicely.

Every time you get a timer pulse you loop through all the packets with time_to_finish < 10 (nobody shoots a tenth of a second, and this prevents you from requesting a hundredth second delay). You compare its identifier with the one at the top of the removed-stack. If they are equal, you remove both and don't resend. If it is not equal, the packet was not responded to within the 30s limit. Resend the packet, increase its count and add it to the back of the list with the add-function.

Note that this algorithm assumes you use go-back-N. In selective repeat the acks can appear in any order, so the order of the remove-list can not be guaranteed. However, it's only really useful to use SR on low-bandwidth lines on which the order can be changed.

I didn't mean that ::) , but the changes in TCP/IP, how could the OS work with it ?
Since TCP/IP is the most used protocol, a change of the protocol should not be toleranted, that's what I think.

Let's try using some of the reserved fields. Add an option field. Change the meaning of one of the fields. There are so many examples of people doing this... (think Cisco with their QoS ideas).

Solar · Post by **Solar** » Wed Apr 07, 2004 8:49 am

Candy wrote:
DennisCGc wrote: I didn't mean that ::) , but the changes in TCP/IP, how could the OS work with it ?
Since TCP/IP is the most used protocol, a change of the protocol should not be toleranted, that's what I think.
Let's try using some of the reserved fields. Add an option field. Change the meaning of one of the fields. There are so many examples of people doing this... (think Cisco with their QoS ideas).

Exactly. TCP/IP has been implemented many, many times over, by many different vendors, and every vendor has made a mistake or two. Mostly in the past, when TCP/IP was young, but it still happens today.

Now, if you're Pro-POS, Clicker, or MenuetOS, making a mistake in TCP/IP is harming no-one but yourself.

But Cisco, IBM and Microsoft have also been making mistakes, and sometimes, instead of going for "clean protocol", others went for "compatibility" - and implemented workarounds for popular TCP/IP bugs. Some of these buggy implementations are still around.

Now, you can step up and cleanly implement the RFC's on TCP/IP, but you might still experience troubles when "talking" to other TCP/IP implementations.

That's basically the case with every data exchange protocol. They might be cleanly specified, but you should still be prepared for the case when you have to accept buggy data and make the best of it.

HTML is another excellent example of a "poluted" protocol. Just implementing the W3C spec's won't give you that much of a browser...

Candy · Post by **Candy** » Wed Apr 07, 2004 10:59 am

Solar wrote: But Cisco, IBM and Microsoft have also been making mistakes, and sometimes, instead of going for "clean protocol", others went for "compatibility" - and implemented workarounds for popular TCP/IP bugs. Some of these buggy implementations are still around.

Believe some old BSD stack used the TTL from the incoming packet on the outgoing packet again. They caused really weird traceroutes, you'd get replies up to 13, then no until 27, and 28 was another reply. If it was a router you'd never get a reply on a traceroute package (ttl = 0, let's send back a timeout-message, hey, it's got ttl=0, let's throw that package away).

Now, you can step up and cleanly implement the RFC's on TCP/IP, but you might still experience troubles when "talking" to other TCP/IP implementations.

Well, yes. As far as I know, those TCP/IP implementations never had a serious bug in any of the more occuring cases. If they did, they'd never be able to converse with any of the correcter (different) implementations, so they'd have to settle in. Any nonbig-bug can be quietly ignored or handled by user interfacing.

That's basically the case with every data exchange protocol. They might be cleanly specified, but you should still be prepared for the case when you have to accept buggy data and make the best of it.

You always have to accept buggy data. You can get any data, you must be able to handle any data. If you don't you /will/ get an exploit on it.

HTML is another excellent example of a "poluted" protocol. Just implementing the W3C spec's won't give you that much of a browser...

Uhm... HTML was not so much polluted by the browsers or anything. HTML was polluted by the people using it. I've seen people not even understanding what "nesting" is supposed to be, and closing tags in the same order as opening them (hint: should be inverse order). Aside from that, nobody cares, because IE eats it. If you choke, they don't care.

Hate HTML for that... I'd like to make the world an XHTML-only place... (translation for non-geeks: a better place)

DennisCGc · Post by **DennisCGc** » Wed Apr 07, 2004 11:20 am

Ah... I see, I know there's an option field

But I don't mind it.
But TCP/IP is not "polluted" for normal surfing, emailing, downloading, telnetting, etc.
The routers like Cisco uses "different" implentations, as far as I can get the relevant information of the posts.

Now, you can step up and cleanly implement the RFC's on TCP/IP, but you might still experience troubles when "talking" to other TCP/IP implementations.

Could be true, that's why I implent it first, and then, after debugging heavily, it could communicate with each computer.

PS. I never know that this would exist, I had to do a school project about the internet for the subject English.
Since I examined the relevant information, I never noticed that ::)

Ozguxxx · Post by **Ozguxxx** » Thu Apr 08, 2004 4:17 am

Oh people changed this thread very much since I left... TCP's being polluted might be correct in the way that rfc's leave lots of actions to be taken blank so every person can implement them in a different way but I dont believe that it CAN be the most polluted protocol.

Internet is based on the tier 1 backbones most of which are the universities like ucsb and some old labs like at&t bell in US, and I dont believe that they change their implementations but they most probably change the rfcs -or write new ones- as soon as they see some performance lack in algorithms and fix them. Since tier 1 backbones are theoretically conservative institutions, and also all other internet traffic is based on them, everybody has to be compatible with their implementations. So you have to at least follow the rfcs while implementing tcp. Consequently I think there is no way you can fail if you follow rfcs.

The connection between some local isp and a host(a dial up connection for instance) is a very LOCAL problem, that is it depends very highly on network, hardware etc., which can be solved in many different ways and rfcs might not be telling about how it is done(this is probably wrong) but it will surely have a minimal standard I think so that we -I mean amateur people- can get the connetion established with low level programming.

Candy · Post by **Candy** » Thu Apr 08, 2004 4:33 am

Ozgunh82 wrote: Internet is based on the tier 1 backbones most of which are the universities like ucsb and some old labs like at&t bell in US, and I dont believe that they change their implementations but they most probably change the rfcs -or write new ones- as soon as they see some performance lack in algorithms and fix them. Since tier 1 backbones are theoretically conservative institutions, and also all other internet traffic is based on them, everybody has to be compatible with their implementations. So you have to at least follow the rfcs while implementing tcp. Consequently I think there is no way you can fail if you follow rfcs.

The idea behind a backbone is that it operates on the lowest level it can (IP in this case) so the speed you can handle things with is the highest you can get. They don't implement TCP.

The connection between some local isp and a host(a dial up connection for instance) is a very LOCAL problem, that is it depends very highly on network, hardware etc., which can be solved in many different ways and rfcs might not be telling about how it is done(this is probably wrong) but it will surely have a minimal standard I think so that we -I mean amateur people- can get the connetion established with low level programming.

The connection between your ISP and you is both a hardware problem (stick the right connectors in the right holes) and a software problem. The software problem is more of a perceived simple problem. It is based on a complex problem, built up from a lot of simple problems.

The simple problem you perceive is, you don't have an internet connection. The complex problem is that the problem may be in some part because of your TCP/IP/UDP/DHCP/DNS/HTTP/HTML implementation, in part because the other side of the line appears to have a different idea about them, and in part because the line itself is dead. Also, some of these protocols can fail, such as DHCP, because there's no server for instance.

Establishing a connection is a high-level problem. Both in getting the protocols right and in getting the user to plug in the cable. Being able to send out bits on a cable in a way you define is a problem that you can solve with lowlevel programming.

The RFC's give an accurate representation of the protocols, and what parameters they should have, what their legal values are etc. If you don't check them for being legal, then don't be surprised if they aren't. You can theoretically get any bit combination from the cables, and in practice, they're never random. They're 99.99999% (your test data) normal packets, and in only 0.00001% are they specifically constructed to test the code to the limits (my test data). If you fail my tests, you have a TCP implementation that is fully compliant and can do anything, but is not usable. That's the problem we're really talking about.

Solar · Post by **Solar** » Thu Apr 08, 2004 4:44 am

Candy wrote: Uhm... HTML was not so much polluted by the browsers or anything. HTML was polluted by the people using it...

...which would not have been possible if the browsers would have implemented HTML in a "pure" way, i.e. similar to TCP/IP-by-RFC.

People used HTML wrongly, but they were not punished for it because browsers accepted their code anyway.

That's what I was talking about: Not everything that's not 100% pure by-the-RFC-book is rejected by other implementations. Of course you can implement a TCP/IP stack that only accepts data that's by-the-RFC-book, but that's probably like building a browser accepting only "valid HTML"...

Anyways, since I can't back up my claims on TCP/IP with real case studies, I'll shut up. Just keep it in the back of your mind that going after spec's only is something that really works only until your world gets heterogenous...

Candy · Post by **Candy** » Thu Apr 08, 2004 5:26 am

Solar wrote: People used HTML wrongly, but they were not punished for it because browsers accepted their code anyway.

The point of the user agents is that they should be able to parse all correct data 100% correctly (the original browsers <v5 make pitfalls here) while they should be able to make the best from the non-100% correct data. The point of the users is that the user agent displays from what they type to display what they want to see. Since the users do not aim for the compliancy, and the user agents do not enforce it, nobody uses it. The road of least resistance.

Anyways, since I can't back up my claims on TCP/IP with real case studies, I'll shut up. Just keep it in the back of your mind that going after spec's only is something that really works only until your world gets heterogenous...

You don't have to shut up about it

. The only valid way to make a usable implementation is to make it in a few bits, you make 1 section that can determine whether it's what it should be, one which can correctly process all correct packages, and one which can process incorrect packages. Determining what it is, and parsing the correct ones are both well-doable sections. The third will keep you debugging for years to come

. Note that this goes for all byte streams & blocks you will parse, ever. F.Ex: If you find the mouse sending way too high X movement counts (IE, going from a complete stop to max speed), mark the package as bad. Then, handle all normal packets with a normal fast handler that doesn't check the content, and parse the rest with a heuristic-like function.

HTML-specific, one of the simplest algorithms I've seen to make a non-compliant doc compliant is by patching around the pitfalls. You have a stack that contains the currently open tags, and if the one you try to close is not the most recently opened, but within the N most recently opened, you close all the tags that are in between, close the tag the user wants to close and reopen all the other tags. If you don't find it within N tags, ignore the close. This converts :

 something something something 

to

 something something something 

which you can verify as being both legal & having the same target for the content. I'm not sure how this works out on real life bad-html sites, I don't write them

.

DennisCGc · Post by **DennisCGc** » Thu Apr 08, 2004 8:02 am

So.. you're saying HTML is wrongly used ?
That could be true, since it's human.
But I can't imagine that it's used in TCP/IP except when between routers, because routers might add additional information.

And you could be right about the reserved fields, but a "normal" OS (which means that it's not specially created for the internet, unlike OpenBSD) just ignores them.

Pype.Clicker · Post by **Pype.Clicker** » Fri Apr 09, 2004 3:35 am

if you read RFCs as a daily part of your job, you'll see that it's clearly said that some things *MUST* be done while other *SHOULD* be done (not shouting here... the RFC themselves capitalize these words) and still others *MAY* be done.

When you come to implement a RFC, you must be 100% conformant to what MUST be done and allow things that MAY happen to happen. It's just another state of mind: you're working on 'risky data' and should not trust your other party to be correct. Logging weird things may be the solution for a nice implementation.

Note also that IP options are ... optionnal

Thus if your TCP/IP stack does not support QoS, you're not to blame, neither if you don't support multicast or something ...

Some of you said something like

If you just do what are in the RFC, you don't get a working TCP stack, the RFC defines what a packet/connection should look like on the network, but not how it is handled/generated locally

Okay, that's a bit discomfortable at the first sight, but that's normal. As some RFCs are Standards, they restrict themselves to what all protocols implementation should agree. Being too restrictive would lead to something noone else but the initiator can implement and sell

Candy · Post by **Candy** » Fri Apr 09, 2004 11:24 am

Pype.Clicker wrote: Note also that IP options are ... optionnal Thus if your TCP/IP stack does not support QoS, you're not to blame, neither if you don't support multicast or something ...

Note also that both the QoS field and the source IP / destination IP fields are not optional. You have to accept them and be able to send them out again. They might be optional in TCP/IP itself though...

Pype.Clicker · Post by **Pype.Clicker** » Sat Apr 10, 2004 2:44 am

Candy wrote: Note also that both the QoS field

Are we both talking about the 'Type Of Service' field in the IP header ? indeed it's not optionnal, though you're not required to set it to any value nor are you required to copy the value you received when responding (according to http://www.faqs.org/rfcs/rfc1349.html)

and the source IP / destination IP fields are not optional.

I hope so!! Did i ever said something f00lish like 'source/destinations are optionnal' ?? Setting the IP source to something else than the emitting machine's own IP address is an attack and ISPs should enforce rules in their routers to prevent this from happening ...

Candy · Post by **Candy** » Sat Apr 10, 2004 2:48 am

Pype.Clicker wrote: Are we both talking about the 'Type Of Service' field in the IP header ? indeed it's not optionnal, though you're not required to set it to any value nor are you required to copy the value you received when responding (according to http://www.faqs.org/rfcs/rfc1349.html)

Ok, didn't know that. Still, lots of routers will give you crappy performance (less usable implementation!) if you do not set it.

and the source IP / destination IP fields are not optional.
I hope so!! Did i ever said something f00lish like 'source/destinations are optionnal' ?? Setting the IP source to something else than the emitting machine's own IP address is an attack and ISPs should enforce rules in their routers to prevent this from happening ...

You implied that if you didn't want to support multicast that you wouldn't get packets with multicast addresses. I think you will get them / see them pass by, so you must at least be able to handle them. Multicast isn't required, but the fields of multicast are NOT optional, so you cannot ignore it.

OSDev.org

TCP/IP

Re:TCP/IP

Re:TCP/IP

Re:TCP/IP

Re:TCP/IP

Re:TCP/IP

Re:TCP/IP

Re:TCP/IP

Re:TCP/IP

Re:TCP/IP

Re:TCP/IP

Re:TCP/IP

Re:TCP/IP

Re:TCP/IP

Re:TCP/IP

Re:TCP/IP