A window into military affairs - yes, the process was followed to a T in the DoD, as it always is, and very similar to the process at CDC but with less visibility because DHA (Defense Health Agency) is part of the DoD. Data calls are a regular feature in DoD, followed by organizations asking for clarifications, delaying, ducking, trying to figure out what will keep them off the red lists which are discussed in private at higher levels. A call will come out from one organization to its subordinates, but there will be a cross-cutting call from another organization with a related purpose asking for things in a slightly different format, maybe for a slightly different purpose. Perhaps Public Affairs wants to show how well things are going, but Defense Logistics needs to anticipate class 8 (medical materiel) requirements, and the MTFs (treatment facilities) need to gauge patient load...
Anyway, this is all pretty similar to the way CDC and HHS squabbled over COVID data calls going out to states, localities, hospitals, clinics, physicians... the question of what is a reportable illness, what the timelines are, what the diagnostic standards are, who gets to set them, the format of the reports, whether personal information can be included - or whether including it means a sudden-death report to Civil Rights as a HIPAA violation and an investigation - it's all about dynamics inside very large organizations, so large that they have ceased being a single organization, in many ways.
The DoD components and agencies try hard to keep their sights aligned, but a large fraction of COVID related orders for quite a while were about data reporting requirements, subtle and not so subtle shifts in what got reported, to whom, how and when. This also dogged public health and was additionally made a political hot football in the context of the press. In the DoD, the same data is sometimes weaponized as an intra-organizational tool, and so people are similarly cautious and feisty about the data calls.
All that rings true to me, and it's much the same in other departments. My view is that the whole idea of 'reporting' in the USG sense is both obsolete and fraught with a large variety of irresistible opportunities for mischief, manipulation, and abuse. It's just crazy that reporting requirements keep proliferating even though no one reads them and everyone knows they are prone to being full of bogus, "results-oriented" info. Over time everyone should be moving to something like 'surveillance' which is unavoidable, automatic, passive, and tolerably transparent, and subject to and 'corroboratable' by some other data or later audit. Metaphorically bodycam everyone and everything.
The trouble is, every leader wants this to be true of their own subordinates, but not themselves, when they will insist on maximum opportunity to act as a filter for information, especially to make the numbers come out 'right'. And you can't fudge your own reports if your subordinates can't also fudge the data after you lean on them to "check the numbers again".
Yeah, but. IIRC you've remarked several times on Kling's old blog about how surveillability of e-mail (not to mention paper documents) pushes all important communications into informal cliques using alternative, non-surveillable channels. And one reason for this being that it's impossible to have a frank discussion on serious topics unless it's private. Maybe there could be cultural norms that would preclude the use of surveillance data to 'whistle-blow' on frank discussions and make them possible even under surveillance, but it's not easy to imagine how that would work.
My position is that there are two kinds of surveillance here which are really different in their character and effect.
There is the "Internal Signals Intelligence" of recording all messages on an organization's IT equipment, but in a repository to which very few have any general and direct access, and then only for special purposes. Other people must request info in a particularized way, and they can't query the data directly as they rely on insiders to search, retrieve, copy, and redact the documents for them, which, ah, isn't always perfectly reliable. This recording causes a lot of the problems which I've written about previously, and which is worse than useless (for the particular purpose of public supervision and auditing) and indeed, counterproductive as it does a lot more harm than good.
Then there is the MIS approach of the automated collecting all the stats about all the 'cases' for the typical kind of government activity involving repeated procedures and not just some special unique project. Then everyone with access to the system (which, if it's not classified or privileged, should be literally everyone, at least all members of Congress) has both real-time awareness anytime they want it, and the comprehensive historical archive for analysis, and they don't have to rely on anyone else to write a 'report', which creates hazards for delay and, more importantly and more frequently, 'massaging' of the data, to include the nuclear option of just refusing to draft or release certain reports anymore.
One of my favorites is the estimate of the unauthorized population, which was 11.4 million in 2006, but by the hot-economy year of 2018 after several major border surge crises had grown all the way to ... lol, jk, it's still exactly 11.4 million. And if you believe that, I've got an International Bridge in Del Rio I'd like to sell you. Lots of funny stories about that one if you ask the right people.
So, what I mean by 'surveillance' more broadly is something more like automated databasing, the sort of thing that feeds into modern Management Information Systems. I'm sure you're familiar with all this, but I'll run through it anyway.
Let's say you are a Walmart exec and you want to know, "How many tubes of Crest toothpaste do we have on retail shelves right now?" One way to do it would be a manual data call: send a message to every regional manager to order people to count them up. Then the regional managers relay to the district managers, then the store managers, and on until finally you get to the poor schlub who is tasked to literally count the tubes and 'report'.
Then all the little reports get added up, sent up, added again, sent up again, and finally, maybe days or more later, HQ gets a number that is out of date and had all kinds of counting errors even when it was fresh, and, if you are lucky, it's also not full of lies, er, I mean, 'filtering'. And that was just one item at one moment in time. If you had asked people to tell you the number of tubes which *had been* on the shelves ten weeks ago, they would just have to make rough, bad guesses based on other bulk records.
In the alternative, you could have a computerized inventory system with scans at every step of movement and the guy at HQ not only has automatic and near-perfect visibility of the exact status and coordinates of every tube of toothpaste in the whole system, but a comprehensive record of the whole history and path of every tube going back to when the system was installed. The exec may have to query a database, but he never has to ask any human to 'report' anything. Sure, there are imperfections and failure modes and fallible human inputs and potential for database manipulation in this model too, but it's generally a lot better than the former one.
I simply cannot overstate how much closer typical business at most government entities is to the former type of approach than to the latter, even when the matters are routinely 'reported', and even when - get this - all the information is already recorded in a variety of digital information systems, which are of course mutually incompatible and which - the CIOs will insist - can't be made to connect to each other, for reasons.
The two primary ways that Congress exercises its sacred oversight function that, in principle anyway, should rely heavily on this kind of data, is, one, to put 'reporting requirements' into statutes, which end up producing mostly useless, politically skewed, and voluminous novellas nobody reads. You ever hear of anyone being prosecuted, held in contempt, or otherwise disciplined for failing to write a report the way Congress intended for it be written, when of course it's always written in a way at least 40% of the Congress supports? And two, to politely request the data in writing (which includes the performative theater of hearings via questions for the record.) It takes ages for them to get responses which, predictably, don't usually contain the information they really wanted.
Which, in my opinion, is simply absurd, *if* you really want to get the data, and really want to prevent people lying about it. If you think it's really important that when your friends are in charge that they still have the option of lying about it, then you will absolutely make sure the government keeps falling behind best practices in the business world and never, ever converges to real-time databasing with transparency and free-querying for anyone with oversight authority. "You can't manage what you can't measure" becomes "You can't manipulate what you can't do manually."
> automated collecting all the stats about all the 'cases' for the typical kind of government activity involving repeated procedures and not just some special unique project
Oh. I see what you mean, but calling this "surveillance" is very confusing and is probably going to wrong-foot almost everyone (as it did myself). In the IT world, which overwhelmingly manages processes rather than stocks (as in your toothpaste tube example), this is called logging and tracing, as you may know, and it is now ubiquitous and very useful. There is a whole genre of specialized databases to store such data, query languages and SQL extensions geared to retrieving it, software suites to analyze it, protocols to enable different services that participate in the same business process to emit their logs and traces in such a way that they can be correlated later, and so on. There is some tension between the amount of logging and efficiency, but it usually pertains to interior processing which can generate arbitrarily large amounts of logs (imagine logging all network packets with full content - sometimes this is useful but the overhead and storage requirements are usually prohibitive, so it's only enabled briefly to debug specific problems). With business-level stuff, the best bet as a rule is to log everything that can't be trivially reconstructed by looking at other logs and source code.
That's a fair criticism of the term, so maybe a modifier / qualifier would help.
One thing to keep in mind is that this kind of automatic panopticon system is distinguishable from the normal business management context of one used internally by the typical company to understand and monitor its own operations, in that it would be used - indeed, perhaps preferably owned and operated by - an *external* entity for purposes of supervision, audit, oversight, etc. of the subject entity.
The watching and recording of everything you do by some external entity, without the control that can be exercised by a 'supervisor', feels a lot like 'surveillance'.
Since the term "surveillance capitalism" has achieved some prominence and currency, maybe something like "surveillance oversight"?
I suspect anything with the word "surveillance" in it will encounter too much pushback. Why not call it "automated logging"? Logbooks are familiar and used all over the place, from military and the navy to hobby diving and ham radio. They are also used by external entities for purposes of supervision, audit, and oversight.
Beyond the nefarious reasons, the overriding issue is also that what can superficially look like the same data can end up being different and incompatible for a particular purpose for a variety of completely legitimate reasons
Three cheers to Emily Oster. Data quality is always terrible in any large organization. What boggles me is that people keep trying to use terrible data and make meaningful arguments around "it is the best data available" as though the answer to having bad, spoiled ingredients for a cake is to just go ahead and make the cake.
Can any number of government agencies at various levels across fifty states and federal organizations, sitting on top of city and county level organizations, actually collect coherent data and make it available in a consistent and meaningful format? I am not optimistic; businesses have trouble doing that at a much smaller scale, for a more focused audience, with a more limited scope, and a profit motive behind it. I think that in many cases we need to get used to the uncertainty and vagaries of what we can know, and at least acknowledge we are mostly blind at the large scale.
The counterpoint is that data doesn't have to be perfect, just good enough to accurately answer a question.
As someone with a fair amount of experience with these things, the real problem is that the ability to collect data far outstrips our ability to systematize and describe it.
It's like having a huge library with no Dewey Decimal System (which evolved into the more general UDC). In fact, there's basically no prospect of a something like a UDC for all data in general.
Without this, people essentially re-write the same book over and over, re-collecting basically the same data to answer slightly different questions. Sometimes it's even cheaper to do so than to re-format the data to answer that slightly different question.
Well, kind of. Accuracy is a function of the data quality, so how accurate you want to get matters. The real problem is that as you say, we don't have good descriptions of the data we have. One example would be the "COVID Deaths" numbers we see all the time. Are they counting people who died because of COVID, or merely had COVID when they died (from, say, wildebeest stampede)? That's kind of important, but the answer no doubt varies across every collection point in the US, and the numbers just get dumped into one big pile when the CDC reports them. To the CDC's credit, they report the numbers as "deaths with COVID" or "involving COVID", but they are not careful to disabuse people of the notion that every one of those people died only because of COVID.
In a more mundane example, with corporate data, just having a bloody Data Dictionary to tell people "Hey, this is what this field named WARBLARGLFLARGLE is supposed to contain," is like asking IT for a personal Gutenberg Bible for your home office. If you are lucky they have something from the Gideons to send you, but more likely they say "Why do you need that? Just ask so and so, they know all about those fields." It is so freaking depressing how much data is collected, just to sit while people say "Why can't we get information on X?!" Do we collect it? Who knows? Maybe we do, maybe we don't, maybe we collect it five slightly different ways.
I think perhaps we have oversold ourselves on the promise of big data, and even medium data.
Definitely oversold. Even if one reconciles the slightly different data (something this conversation is a currently welcome diversion to for me), most of the time the answer is so what. If datasets A and B have a lot of overlap, then A U B usually doesn't have much additional value.
The big efficiencies to be gained (if they exist) come from the tedious but theoretically one-time process documenting and joining A and B. That way, going forward, there's only the cost of maintaining one set of data. Or simply knowing that A and B exist, so you don't go out and create dataset C which is another reinvention of A and B.
On the shot clock concept, I know Brisbane (Australia) has something like this. Don't know the details, but I have a friend who is an architect/developer and had a development proposal automatically approved because the planning department didn't object in time. And yet it remains a very livable city. Who'd a thunk?
That last point is already the law in California. Permits that don’t require environmental review are supposed to be automatically approved in 90 days after submission if no determination has been made before then. The city of San Francisco has admitted to be out of compliance with this law for 20 years. Who’s going to enforce it? https://sfbos.org/permit-streamlining-act
A window into military affairs - yes, the process was followed to a T in the DoD, as it always is, and very similar to the process at CDC but with less visibility because DHA (Defense Health Agency) is part of the DoD. Data calls are a regular feature in DoD, followed by organizations asking for clarifications, delaying, ducking, trying to figure out what will keep them off the red lists which are discussed in private at higher levels. A call will come out from one organization to its subordinates, but there will be a cross-cutting call from another organization with a related purpose asking for things in a slightly different format, maybe for a slightly different purpose. Perhaps Public Affairs wants to show how well things are going, but Defense Logistics needs to anticipate class 8 (medical materiel) requirements, and the MTFs (treatment facilities) need to gauge patient load...
Anyway, this is all pretty similar to the way CDC and HHS squabbled over COVID data calls going out to states, localities, hospitals, clinics, physicians... the question of what is a reportable illness, what the timelines are, what the diagnostic standards are, who gets to set them, the format of the reports, whether personal information can be included - or whether including it means a sudden-death report to Civil Rights as a HIPAA violation and an investigation - it's all about dynamics inside very large organizations, so large that they have ceased being a single organization, in many ways.
The DoD components and agencies try hard to keep their sights aligned, but a large fraction of COVID related orders for quite a while were about data reporting requirements, subtle and not so subtle shifts in what got reported, to whom, how and when. This also dogged public health and was additionally made a political hot football in the context of the press. In the DoD, the same data is sometimes weaponized as an intra-organizational tool, and so people are similarly cautious and feisty about the data calls.
All that rings true to me, and it's much the same in other departments. My view is that the whole idea of 'reporting' in the USG sense is both obsolete and fraught with a large variety of irresistible opportunities for mischief, manipulation, and abuse. It's just crazy that reporting requirements keep proliferating even though no one reads them and everyone knows they are prone to being full of bogus, "results-oriented" info. Over time everyone should be moving to something like 'surveillance' which is unavoidable, automatic, passive, and tolerably transparent, and subject to and 'corroboratable' by some other data or later audit. Metaphorically bodycam everyone and everything.
The trouble is, every leader wants this to be true of their own subordinates, but not themselves, when they will insist on maximum opportunity to act as a filter for information, especially to make the numbers come out 'right'. And you can't fudge your own reports if your subordinates can't also fudge the data after you lean on them to "check the numbers again".
I'd make one modification to your statement - every leader wants it to be true of someone else's subordinates, but not themselves or their own...
Yeah, but. IIRC you've remarked several times on Kling's old blog about how surveillability of e-mail (not to mention paper documents) pushes all important communications into informal cliques using alternative, non-surveillable channels. And one reason for this being that it's impossible to have a frank discussion on serious topics unless it's private. Maybe there could be cultural norms that would preclude the use of surveillance data to 'whistle-blow' on frank discussions and make them possible even under surveillance, but it's not easy to imagine how that would work.
My position is that there are two kinds of surveillance here which are really different in their character and effect.
There is the "Internal Signals Intelligence" of recording all messages on an organization's IT equipment, but in a repository to which very few have any general and direct access, and then only for special purposes. Other people must request info in a particularized way, and they can't query the data directly as they rely on insiders to search, retrieve, copy, and redact the documents for them, which, ah, isn't always perfectly reliable. This recording causes a lot of the problems which I've written about previously, and which is worse than useless (for the particular purpose of public supervision and auditing) and indeed, counterproductive as it does a lot more harm than good.
Then there is the MIS approach of the automated collecting all the stats about all the 'cases' for the typical kind of government activity involving repeated procedures and not just some special unique project. Then everyone with access to the system (which, if it's not classified or privileged, should be literally everyone, at least all members of Congress) has both real-time awareness anytime they want it, and the comprehensive historical archive for analysis, and they don't have to rely on anyone else to write a 'report', which creates hazards for delay and, more importantly and more frequently, 'massaging' of the data, to include the nuclear option of just refusing to draft or release certain reports anymore.
One of my favorites is the estimate of the unauthorized population, which was 11.4 million in 2006, but by the hot-economy year of 2018 after several major border surge crises had grown all the way to ... lol, jk, it's still exactly 11.4 million. And if you believe that, I've got an International Bridge in Del Rio I'd like to sell you. Lots of funny stories about that one if you ask the right people.
So, what I mean by 'surveillance' more broadly is something more like automated databasing, the sort of thing that feeds into modern Management Information Systems. I'm sure you're familiar with all this, but I'll run through it anyway.
Let's say you are a Walmart exec and you want to know, "How many tubes of Crest toothpaste do we have on retail shelves right now?" One way to do it would be a manual data call: send a message to every regional manager to order people to count them up. Then the regional managers relay to the district managers, then the store managers, and on until finally you get to the poor schlub who is tasked to literally count the tubes and 'report'.
Then all the little reports get added up, sent up, added again, sent up again, and finally, maybe days or more later, HQ gets a number that is out of date and had all kinds of counting errors even when it was fresh, and, if you are lucky, it's also not full of lies, er, I mean, 'filtering'. And that was just one item at one moment in time. If you had asked people to tell you the number of tubes which *had been* on the shelves ten weeks ago, they would just have to make rough, bad guesses based on other bulk records.
In the alternative, you could have a computerized inventory system with scans at every step of movement and the guy at HQ not only has automatic and near-perfect visibility of the exact status and coordinates of every tube of toothpaste in the whole system, but a comprehensive record of the whole history and path of every tube going back to when the system was installed. The exec may have to query a database, but he never has to ask any human to 'report' anything. Sure, there are imperfections and failure modes and fallible human inputs and potential for database manipulation in this model too, but it's generally a lot better than the former one.
I simply cannot overstate how much closer typical business at most government entities is to the former type of approach than to the latter, even when the matters are routinely 'reported', and even when - get this - all the information is already recorded in a variety of digital information systems, which are of course mutually incompatible and which - the CIOs will insist - can't be made to connect to each other, for reasons.
The two primary ways that Congress exercises its sacred oversight function that, in principle anyway, should rely heavily on this kind of data, is, one, to put 'reporting requirements' into statutes, which end up producing mostly useless, politically skewed, and voluminous novellas nobody reads. You ever hear of anyone being prosecuted, held in contempt, or otherwise disciplined for failing to write a report the way Congress intended for it be written, when of course it's always written in a way at least 40% of the Congress supports? And two, to politely request the data in writing (which includes the performative theater of hearings via questions for the record.) It takes ages for them to get responses which, predictably, don't usually contain the information they really wanted.
Which, in my opinion, is simply absurd, *if* you really want to get the data, and really want to prevent people lying about it. If you think it's really important that when your friends are in charge that they still have the option of lying about it, then you will absolutely make sure the government keeps falling behind best practices in the business world and never, ever converges to real-time databasing with transparency and free-querying for anyone with oversight authority. "You can't manage what you can't measure" becomes "You can't manipulate what you can't do manually."
> automated collecting all the stats about all the 'cases' for the typical kind of government activity involving repeated procedures and not just some special unique project
Oh. I see what you mean, but calling this "surveillance" is very confusing and is probably going to wrong-foot almost everyone (as it did myself). In the IT world, which overwhelmingly manages processes rather than stocks (as in your toothpaste tube example), this is called logging and tracing, as you may know, and it is now ubiquitous and very useful. There is a whole genre of specialized databases to store such data, query languages and SQL extensions geared to retrieving it, software suites to analyze it, protocols to enable different services that participate in the same business process to emit their logs and traces in such a way that they can be correlated later, and so on. There is some tension between the amount of logging and efficiency, but it usually pertains to interior processing which can generate arbitrarily large amounts of logs (imagine logging all network packets with full content - sometimes this is useful but the overhead and storage requirements are usually prohibitive, so it's only enabled briefly to debug specific problems). With business-level stuff, the best bet as a rule is to log everything that can't be trivially reconstructed by looking at other logs and source code.
That's a fair criticism of the term, so maybe a modifier / qualifier would help.
One thing to keep in mind is that this kind of automatic panopticon system is distinguishable from the normal business management context of one used internally by the typical company to understand and monitor its own operations, in that it would be used - indeed, perhaps preferably owned and operated by - an *external* entity for purposes of supervision, audit, oversight, etc. of the subject entity.
The watching and recording of everything you do by some external entity, without the control that can be exercised by a 'supervisor', feels a lot like 'surveillance'.
Since the term "surveillance capitalism" has achieved some prominence and currency, maybe something like "surveillance oversight"?
I suspect anything with the word "surveillance" in it will encounter too much pushback. Why not call it "automated logging"? Logbooks are familiar and used all over the place, from military and the navy to hobby diving and ham radio. They are also used by external entities for purposes of supervision, audit, and oversight.
Beyond the nefarious reasons, the overriding issue is also that what can superficially look like the same data can end up being different and incompatible for a particular purpose for a variety of completely legitimate reasons
Three cheers to Emily Oster. Data quality is always terrible in any large organization. What boggles me is that people keep trying to use terrible data and make meaningful arguments around "it is the best data available" as though the answer to having bad, spoiled ingredients for a cake is to just go ahead and make the cake.
Can any number of government agencies at various levels across fifty states and federal organizations, sitting on top of city and county level organizations, actually collect coherent data and make it available in a consistent and meaningful format? I am not optimistic; businesses have trouble doing that at a much smaller scale, for a more focused audience, with a more limited scope, and a profit motive behind it. I think that in many cases we need to get used to the uncertainty and vagaries of what we can know, and at least acknowledge we are mostly blind at the large scale.
The counterpoint is that data doesn't have to be perfect, just good enough to accurately answer a question.
As someone with a fair amount of experience with these things, the real problem is that the ability to collect data far outstrips our ability to systematize and describe it.
It's like having a huge library with no Dewey Decimal System (which evolved into the more general UDC). In fact, there's basically no prospect of a something like a UDC for all data in general.
Without this, people essentially re-write the same book over and over, re-collecting basically the same data to answer slightly different questions. Sometimes it's even cheaper to do so than to re-format the data to answer that slightly different question.
Well, kind of. Accuracy is a function of the data quality, so how accurate you want to get matters. The real problem is that as you say, we don't have good descriptions of the data we have. One example would be the "COVID Deaths" numbers we see all the time. Are they counting people who died because of COVID, or merely had COVID when they died (from, say, wildebeest stampede)? That's kind of important, but the answer no doubt varies across every collection point in the US, and the numbers just get dumped into one big pile when the CDC reports them. To the CDC's credit, they report the numbers as "deaths with COVID" or "involving COVID", but they are not careful to disabuse people of the notion that every one of those people died only because of COVID.
In a more mundane example, with corporate data, just having a bloody Data Dictionary to tell people "Hey, this is what this field named WARBLARGLFLARGLE is supposed to contain," is like asking IT for a personal Gutenberg Bible for your home office. If you are lucky they have something from the Gideons to send you, but more likely they say "Why do you need that? Just ask so and so, they know all about those fields." It is so freaking depressing how much data is collected, just to sit while people say "Why can't we get information on X?!" Do we collect it? Who knows? Maybe we do, maybe we don't, maybe we collect it five slightly different ways.
I think perhaps we have oversold ourselves on the promise of big data, and even medium data.
Definitely oversold. Even if one reconciles the slightly different data (something this conversation is a currently welcome diversion to for me), most of the time the answer is so what. If datasets A and B have a lot of overlap, then A U B usually doesn't have much additional value.
The big efficiencies to be gained (if they exist) come from the tedious but theoretically one-time process documenting and joining A and B. That way, going forward, there's only the cost of maintaining one set of data. Or simply knowing that A and B exist, so you don't go out and create dataset C which is another reinvention of A and B.
We might work for the same company... :P If I were a drinker, this would have me on my second liver by now.
On the shot clock concept, I know Brisbane (Australia) has something like this. Don't know the details, but I have a friend who is an architect/developer and had a development proposal automatically approved because the planning department didn't object in time. And yet it remains a very livable city. Who'd a thunk?
That last point is already the law in California. Permits that don’t require environmental review are supposed to be automatically approved in 90 days after submission if no determination has been made before then. The city of San Francisco has admitted to be out of compliance with this law for 20 years. Who’s going to enforce it? https://sfbos.org/permit-streamlining-act