sql - Store hashes of User-Agent strings in a MySQL table: insert if not exists, return id -


inspired following 2 answers on stackoverflow tried implement table goal store user-agent strings in it:

here's table structure:

create table if not exists ua_strings (     ua_id integer primary key auto_increment,      ua_hash binary(16),      ua text,      unique key ua_hash (ua_hash) ); 

i'd achieve following:

  • input: user-agent string should inserted in table if doesn't exist yet

  • output: ua_id

so far i've come solution:

insert ignore ua_strings (ua_hash, ua) values (unhex(md5('test')), 'test'); select ua_id ua_strings ua_hash = unhex(md5('test')); 
  • is possible make 1 query out of 2 queries?
  • how can improve table structure or queries in terms of speed , elegance?

the important thing rid of insert ignore. discovered increment primary key if fails. can burn through 4 billion keys way. select first, it's going common case anyway.

my first thought put logic database function. gives benefits of encapsulation. can change how works later.

my second rid of hash. it's taking place of index on ua. since need equivalence checks performant, hash index ideal, not supported on mysql table formats.

in place use index on first 255 bytes of user-agent, should plenty mysql job. gives benefit of full index should need more simple gets.

create table if not exists ua_strings (     ua_id integer unsigned primary key auto_increment,      ua text,     key(ua(255)) ); 

the function (note, not best @ writing mysql functions).

delimiter // create function get_ua_id (ua_string text) returns integer begin     declare ret integer;      select ua_id ret ua_strings ua = ua_string;      /* it's not in table, put in table */     case when row_count() = 0         insert ua_strings (ua) values (ua_string);         select last_insert_id() ret;     else begin end;     end case;      return ret; end// delimiter ; 

a function hash similar. hide implementation details in function , benchmark two.

and don't use md5. it's not going affect performance use sha1, can spare 4 bytes per entry, , avoid hidden problems down road. using md5 saying "even though there's locks better in every way, i'll use crappy lock because don't think door important right now". you're not security expert (neither i) , don't know parts important , aren't. put decent locks on everything. if sha1 turns out huge performance problem, can change encapsulation of function.

no matter benchmarking results, willing bet profiling reveal choice has no effect on performance of whatever system part of. go simpler , more flexible indexed option, optimize if turns problem later.


Comments

Popular posts from this blog

node.js - Mongoose: Cast to ObjectId failed for value on newly created object after setting the value -

gradle error "Cannot convert the provided notation to a File or URI" -

python - NameError: name 'subprocess' is not defined -