stata - Precisions and counts -


i working educational dataset called ipeds national center educational statistics. track students in college based upon major, degree completion, etc. problem in stata trying determine total count degrees obtained specific major.

they have variable cipcode contains values serve "majors". cipcode might 14.2501 "petroleum engineering, 16.0102 "linguistics" , forth.

when write particular code

tab cipcode if cipcode==14.2501  

it reports no observations. code give me totals?

/*convert float variable string variable , use force replace*/ tostring cipcode, gen(cipcode_str) format(%6.4f) force replace cipcode_str = reverse(substr(reverse(cipcode_str), indexnot(reverse(cipcode_str), "0"), .)) replace cipcode_str = reverse(substr(reverse(cipcode_str), indexnot(reverse(cipcode_str), "."), .))  /* created total variable called total_t1 total count of stem majors listed in table 1*/ gen total_t1 = cipcode_str== "14.2501" + "14.3901" + "15.0999" + "40.0601" 

this minimal example confirms problem. (see, way, https://stackoverflow.com/help/mcve advice on examples.)

* code  clear input code  14.2501  14.2501  14.2501  end   tab code if code == 14.2501 tab code if code == float(14.2501)  * results  . tab code if code == 14.2501 no observations  . tab code if code == float(14.2501)         code |      freq.     percent        cum. ------------+-----------------------------------     14.2501 |          3      100.00      100.00 ------------+-----------------------------------       total |          3      100.00 

the keyword 1 use, precision. in stata, search precision resources, starting blog posts william gould. decimal 14.2501 hard (impossible) hold in binary , details of holding variable type float can bite.

it's hard see you're doing last block of code, don't explain. last statement looks puzzling, you're adding strings. consider happens

. gen whatever =  "14.2501" + "14.3901" + "15.0999" + "40.0601"  . di whatever[1] 14.250114.390115.099940.0601 

the result long string cannot valid cipcode. suspect reaching towards

 ... if inlist(cipcode_str, "14.2501", "14.3901", "15.0999", "40.0601")  

which quite different.

but using float() minimal trick problem.


Comments